Reducing assembly complexity of microbial genomes with single-molecule sequencing.
Koren, Sergey; Harhay, Gregory P; Smith, Timothy P L; Bono, James L; Harhay, Dayna M; Mcvey, Scott D; Radune, Diana; Bergman, Nicholas H; Phillippy, Adam M
2013-01-01
The short reads output by first- and second-generation DNA sequencing instruments cannot completely reconstruct microbial chromosomes. Therefore, most genomes have been left unfinished due to the significant resources required to manually close gaps in draft assemblies. Third-generation, single-molecule sequencing addresses this problem by greatly increasing sequencing read length, which simplifies the assembly problem. To measure the benefit of single-molecule sequencing on microbial genome assembly, we sequenced and assembled the genomes of six bacteria and analyzed the repeat complexity of 2,267 complete bacteria and archaea. Our results indicate that the majority of known bacterial and archaeal genomes can be assembled without gaps, at finished-grade quality, using a single PacBio RS sequencing library. These single-library assemblies are also more accurate than typical short-read assemblies and hybrid assemblies of short and long reads. Automated assembly of long, single-molecule sequencing data reduces the cost of microbial finishing to $1,000 for most genomes, and future advances in this technology are expected to drive the cost lower. This is expected to increase the number of completed genomes, improve the quality of microbial genome databases, and enable high-fidelity, population-scale studies of pan-genomes and chromosomal organization.
Sequencing intractable DNA to close microbial genomes.
Hurt, Richard A; Brown, Steven D; Podar, Mircea; Palumbo, Anthony V; Elias, Dwayne A
2012-01-01
Advancement in high throughput DNA sequencing technologies has supported a rapid proliferation of microbial genome sequencing projects, providing the genetic blueprint for in-depth studies. Oftentimes, difficult to sequence regions in microbial genomes are ruled "intractable" resulting in a growing number of genomes with sequence gaps deposited in databases. A procedure was developed to sequence such problematic regions in the "non-contiguous finished" Desulfovibrio desulfuricans ND132 genome (6 intractable gaps) and the Desulfovibrio africanus genome (1 intractable gap). The polynucleotides surrounding each gap formed GC rich secondary structures making the regions refractory to amplification and sequencing. Strand-displacing DNA polymerases used in concert with a novel ramped PCR extension cycle supported amplification and closure of all gap regions in both genomes. The developed procedures support accurate gene annotation, and provide a step-wise method that reduces the effort required for genome finishing.
RefSeq microbial genomes database: new representation and annotation strategy.
Tatusova, Tatiana; Ciufo, Stacy; Fedorov, Boris; O'Neill, Kathleen; Tolstoy, Igor
2014-01-01
The source of the microbial genomic sequences in the RefSeq collection is the set of primary sequence records submitted to the International Nucleotide Sequence Database public archives. These can be accessed through the Entrez search and retrieval system at http://www.ncbi.nlm.nih.gov/genome. Next-generation sequencing has enabled researchers to perform genomic sequencing at rates that were unimaginable in the past. Microbial genomes can now be sequenced in a matter of hours, which has led to a significant increase in the number of assembled genomes deposited in the public archives. This huge increase in DNA sequence data presents new challenges for the annotation, analysis and visualization bioinformatics tools. New strategies have been developed for the annotation and representation of reference genomes and sequence variations derived from population studies and clinical outbreaks.
Microbial ecology in the age of genomics and metagenomics: concepts, tools, and recent advances.
Xu, Jianping
2006-06-01
Microbial ecology examines the diversity and activity of micro-organisms in Earth's biosphere. In the last 20 years, the application of genomics tools have revolutionized microbial ecological studies and drastically expanded our view on the previously underappreciated microbial world. This review first introduces the basic concepts in microbial ecology and the main genomics methods that have been used to examine natural microbial populations and communities. In the ensuing three specific sections, the applications of the genomics in microbial ecological research are highlighted. The first describes the widespread application of multilocus sequence typing and representational difference analysis in studying genetic variation within microbial species. Such investigations have identified that migration, horizontal gene transfer and recombination are common in natural microbial populations and that microbial strains can be highly variable in genome size and gene content. The second section highlights and summarizes the use of four specific genomics methods (phylogenetic analysis of ribosomal RNA, DNA-DNA re-association kinetics, metagenomics, and micro-arrays) in analysing the diversity and potential activity of microbial populations and communities from a variety of terrestrial and aquatic environments. Such analyses have identified many unexpected phylogenetic lineages in viruses, bacteria, archaea, and microbial eukaryotes. Functional analyses of environmental DNA also revealed highly prevalent, but previously unknown, metabolic processes in natural microbial communities. In the third section, the ecological implications of sequenced microbial genomes are briefly discussed. Comparative analyses of prokaryotic genomic sequences suggest the importance of ecology in determining microbial genome size and gene content. The significant variability in genome size and gene content among strains and species of prokaryotes indicate the highly fluid nature of prokaryotic genomes, a result consistent with those from multilocus sequence typing and representational difference analyses. The integration of various levels of ecological analyses coupled to the application and further development of high throughput technologies are accelerating the pace of discovery in microbial ecology.
Comparative genome analysis in the integrated microbial genomes (IMG) system.
Markowitz, Victor M; Kyrpides, Nikos C
2007-01-01
Comparative genome analysis is critical for the effective exploration of a rapidly growing number of complete and draft sequences for microbial genomes. The Integrated Microbial Genomes (IMG) system (img.jgi.doe.gov) has been developed as a community resource that provides support for comparative analysis of microbial genomes in an integrated context. IMG allows users to navigate the multidimensional microbial genome data space and focus their analysis on a subset of genes, genomes, and functions of interest. IMG provides graphical viewers, summaries, and occurrence profile tools for comparing genes, pathways, and functions (terms) across specific genomes. Genes can be further examined using gene neighborhoods and compared with sequence alignment tools.
INDIGO – INtegrated Data Warehouse of MIcrobial GenOmes with Examples from the Red Sea Extremophiles
Alam, Intikhab; Antunes, André; Kamau, Allan Anthony; Ba alawi, Wail; Kalkatawi, Manal; Stingl, Ulrich; Bajic, Vladimir B.
2013-01-01
Background The next generation sequencing technologies substantially increased the throughput of microbial genome sequencing. To functionally annotate newly sequenced microbial genomes, a variety of experimental and computational methods are used. Integration of information from different sources is a powerful approach to enhance such annotation. Functional analysis of microbial genomes, necessary for downstream experiments, crucially depends on this annotation but it is hampered by the current lack of suitable information integration and exploration systems for microbial genomes. Results We developed a data warehouse system (INDIGO) that enables the integration of annotations for exploration and analysis of newly sequenced microbial genomes. INDIGO offers an opportunity to construct complex queries and combine annotations from multiple sources starting from genomic sequence to protein domain, gene ontology and pathway levels. This data warehouse is aimed at being populated with information from genomes of pure cultures and uncultured single cells of Red Sea bacteria and Archaea. Currently, INDIGO contains information from Salinisphaera shabanensis, Haloplasma contractile, and Halorhabdus tiamatea - extremophiles isolated from deep-sea anoxic brine lakes of the Red Sea. We provide examples of utilizing the system to gain new insights into specific aspects on the unique lifestyle and adaptations of these organisms to extreme environments. Conclusions We developed a data warehouse system, INDIGO, which enables comprehensive integration of information from various resources to be used for annotation, exploration and analysis of microbial genomes. It will be regularly updated and extended with new genomes. It is aimed to serve as a resource dedicated to the Red Sea microbes. In addition, through INDIGO, we provide our Automatic Annotation of Microbial Genomes (AAMG) pipeline. The INDIGO web server is freely available at http://www.cbrc.kaust.edu.sa/indigo. PMID:24324765
Alam, Intikhab; Antunes, André; Kamau, Allan Anthony; Ba Alawi, Wail; Kalkatawi, Manal; Stingl, Ulrich; Bajic, Vladimir B
2013-01-01
The next generation sequencing technologies substantially increased the throughput of microbial genome sequencing. To functionally annotate newly sequenced microbial genomes, a variety of experimental and computational methods are used. Integration of information from different sources is a powerful approach to enhance such annotation. Functional analysis of microbial genomes, necessary for downstream experiments, crucially depends on this annotation but it is hampered by the current lack of suitable information integration and exploration systems for microbial genomes. We developed a data warehouse system (INDIGO) that enables the integration of annotations for exploration and analysis of newly sequenced microbial genomes. INDIGO offers an opportunity to construct complex queries and combine annotations from multiple sources starting from genomic sequence to protein domain, gene ontology and pathway levels. This data warehouse is aimed at being populated with information from genomes of pure cultures and uncultured single cells of Red Sea bacteria and Archaea. Currently, INDIGO contains information from Salinisphaera shabanensis, Haloplasma contractile, and Halorhabdus tiamatea - extremophiles isolated from deep-sea anoxic brine lakes of the Red Sea. We provide examples of utilizing the system to gain new insights into specific aspects on the unique lifestyle and adaptations of these organisms to extreme environments. We developed a data warehouse system, INDIGO, which enables comprehensive integration of information from various resources to be used for annotation, exploration and analysis of microbial genomes. It will be regularly updated and extended with new genomes. It is aimed to serve as a resource dedicated to the Red Sea microbes. In addition, through INDIGO, we provide our Automatic Annotation of Microbial Genomes (AAMG) pipeline. The INDIGO web server is freely available at http://www.cbrc.kaust.edu.sa/indigo.
Mavromatis, Konstantinos; Land, Miriam L; Brettin, Thomas S; Quest, Daniel J; Copeland, Alex; Clum, Alicia; Goodwin, Lynne; Woyke, Tanja; Lapidus, Alla; Klenk, Hans Peter; Cottingham, Robert W; Kyrpides, Nikos C
2012-01-01
The emergence of next generation sequencing (NGS) has provided the means for rapid and high throughput sequencing and data generation at low cost, while concomitantly creating a new set of challenges. The number of available assembled microbial genomes continues to grow rapidly and their quality reflects the quality of the sequencing technology used, but also of the analysis software employed for assembly and annotation. In this work, we have explored the quality of the microbial draft genomes across various sequencing technologies. We have compared the draft and finished assemblies of 133 microbial genomes sequenced at the Department of Energy-Joint Genome Institute and finished at the Los Alamos National Laboratory using a variety of combinations of sequencing technologies, reflecting the transition of the institute from Sanger-based sequencing platforms to NGS platforms. The quality of the public assemblies and of the associated gene annotations was evaluated using various metrics. Results obtained with the different sequencing technologies, as well as their effects on downstream processes, were analyzed. Our results demonstrate that the Illumina HiSeq 2000 sequencing system, the primary sequencing technology currently used for de novo genome sequencing and assembly at JGI, has various advantages in terms of total sequence throughput and cost, but it also introduces challenges for the downstream analyses. In all cases assembly results although on average are of high quality, need to be viewed critically and consider sources of errors in them prior to analysis. These data follow the evolution of microbial sequencing and downstream processing at the JGI from draft genome sequences with large gaps corresponding to missing genes of significant biological role to assemblies with multiple small gaps (Illumina) and finally to assemblies that generate almost complete genomes (Illumina+PacBio).
IDENTIFICATION OF AVIAN-SPECIFIC FECAL METAGENOMIC SEQUENCES USING GENOME FRAGMENT ENRICHMENTS
Sequence analysis of microbial genomes has provided biologists the opportunity to compare genetic differences between closely related microorganisms. While random sequencing has also been used to study natural microbial communities, metagenomic comparisons via sequencing analysis...
Specialized microbial databases for inductive exploration of microbial genome sequences
Fang, Gang; Ho, Christine; Qiu, Yaowu; Cubas, Virginie; Yu, Zhou; Cabau, Cédric; Cheung, Frankie; Moszer, Ivan; Danchin, Antoine
2005-01-01
Background The enormous amount of genome sequence data asks for user-oriented databases to manage sequences and annotations. Queries must include search tools permitting function identification through exploration of related objects. Methods The GenoList package for collecting and mining microbial genome databases has been rewritten using MySQL as the database management system. Functions that were not available in MySQL, such as nested subquery, have been implemented. Results Inductive reasoning in the study of genomes starts from "islands of knowledge", centered around genes with some known background. With this concept of "neighborhood" in mind, a modified version of the GenoList structure has been used for organizing sequence data from prokaryotic genomes of particular interest in China. GenoChore , a set of 17 specialized end-user-oriented microbial databases (including one instance of Microsporidia, Encephalitozoon cuniculi, a member of Eukarya) has been made publicly available. These databases allow the user to browse genome sequence and annotation data using standard queries. In addition they provide a weekly update of searches against the world-wide protein sequences data libraries, allowing one to monitor annotation updates on genes of interest. Finally, they allow users to search for patterns in DNA or protein sequences, taking into account a clustering of genes into formal operons, as well as providing extra facilities to query sequences using predefined sequence patterns. Conclusion This growing set of specialized microbial databases organize data created by the first Chinese bacterial genome programs (ThermaList, Thermoanaerobacter tencongensis, LeptoList, with two different genomes of Leptospira interrogans and SepiList, Staphylococcus epidermidis) associated to related organisms for comparison. PMID:15698474
2013-01-01
A need for a genomic species definition is emerging from several independent studies worldwide. In this commentary paper, we discuss recent studies on the genomic taxonomy of diverse microbial groups and a unified species definition based on genomics. Accordingly, strains from the same microbial species share >95% Average Amino Acid Identity (AAI) and Average Nucleotide Identity (ANI), >95% identity based on multiple alignment genes, <10 in Karlin genomic signature, and > 70% in silico Genome-to-Genome Hybridization similarity (GGDH). Species of the same genus will form monophyletic groups on the basis of 16S rRNA gene sequences, Multilocus Sequence Analysis (MLSA) and supertree analysis. In addition to the established requirements for species descriptions, we propose that new taxa descriptions should also include at least a draft genome sequence of the type strain in order to obtain a clear outlook on the genomic landscape of the novel microbe. The application of the new genomic species definition put forward here will allow researchers to use genome sequences to define simultaneously coherent phenotypic and genomic groups. PMID:24365132
Quake, Steve
2018-02-02
Stanford University's Steve Quake on "Sequencing Single Cell Microbial Genomes with Microfluidic Amplification Tools" at the Metagenomics Informatics Challenges Workshop held at the DOE JGI on October 12-13, 2011.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Quake, Steve
2011-10-12
Stanford University's Steve Quake on "Sequencing Single Cell Microbial Genomes with Microfluidic Amplification Tools" at the Metagenomics Informatics Challenges Workshop held at the DOE JGI on October 12-13, 2011.
NASA Technical Reports Server (NTRS)
Doolittle, Russell F.
2002-01-01
The publication of the first complete sequence of a bacterial genome in 1995 was a signal event, underscored by the fact that the article has been cited more than 2,100 times during the intervening seven years. It was a marvelous technical achievement, made possible by automatic DNA-sequencing machines. The feat is the more impressive in that complete genome sequencing has now been adopted in many different laboratories around the world. Four years ago in these columns I examined the situation after a dozen microbial genomes had been completed. Now, with upwards of 60 microbial genome sequences determined and twice that many in progress, it seems reasonable to assess just what is being learned. Are new concepts emerging about how cells work? Have there been practical benefits in the fields of medicine and agriculture? Is it feasible to determine the genomic sequence of every bacterial species on Earth? The answers to these questions maybe Yes, Perhaps, and No, respectively.
Reducing assembly complexity of microbial genomes with single-molecule sequencing
USDA-ARS?s Scientific Manuscript database
Genome assembly algorithms cannot fully reconstruct microbial chromosomes from the DNA reads output by first or second-generation sequencing instruments. Therefore, most genomes are left unfinished due to the significant resources required to manually close gaps left in the draft assemblies. Single-...
The need for high-quality whole-genome sequence databases in microbial forensics.
Sjödin, Andreas; Broman, Tina; Melefors, Öjar; Andersson, Gunnar; Rasmusson, Birgitta; Knutsson, Rickard; Forsman, Mats
2013-09-01
Microbial forensics is an important part of a strengthened capability to respond to biocrime and bioterrorism incidents to aid in the complex task of distinguishing between natural outbreaks and deliberate acts. The goal of a microbial forensic investigation is to identify and criminally prosecute those responsible for a biological attack, and it involves a detailed analysis of the weapon--that is, the pathogen. The recent development of next-generation sequencing (NGS) technologies has greatly increased the resolution that can be achieved in microbial forensic analyses. It is now possible to identify, quickly and in an unbiased manner, previously undetectable genome differences between closely related isolates. This development is particularly relevant for the most deadly bacterial diseases that are caused by bacterial lineages with extremely low levels of genetic diversity. Whole-genome analysis of pathogens is envisaged to be increasingly essential for this purpose. In a microbial forensic context, whole-genome sequence analysis is the ultimate method for strain comparisons as it is informative during identification, characterization, and attribution--all 3 major stages of the investigation--and at all levels of microbial strain identity resolution (ie, it resolves the full spectrum from family to isolate). Given these capabilities, one bottleneck in microbial forensics investigations is the availability of high-quality reference databases of bacterial whole-genome sequences. To be of high quality, databases need to be curated and accurate in terms of sequences, metadata, and genetic diversity coverage. The development of whole-genome sequence databases will be instrumental in successfully tracing pathogens in the future.
Piao, Hailan; Froula, Jeff; Du, Changbin; Kim, Tae-Wan; Hawley, Erik R; Bauer, Stefan; Wang, Zhong; Ivanova, Nathalia; Clark, Douglas S; Klenk, Hans-Peter; Hess, Matthias
2014-08-01
Although recent nucleotide sequencing technologies have significantly enhanced our understanding of microbial genomes, the function of ∼35% of genes identified in a genome currently remains unknown. To improve the understanding of microbial genomes and consequently of microbial processes it will be crucial to assign a function to this "genomic dark matter." Due to the urgent need for additional carbohydrate-active enzymes for improved production of transportation fuels from lignocellulosic biomass, we screened the genomes of more than 5,500 microorganisms for hypothetical proteins that are located in the proximity of already known cellulases. We identified, synthesized and expressed a total of 17 putative cellulase genes with insufficient sequence similarity to currently known cellulases to be identified as such using traditional sequence annotation techniques that rely on significant sequence similarity. The recombinant proteins of the newly identified putative cellulases were subjected to enzymatic activity assays to verify their hydrolytic activity towards cellulose and lignocellulosic biomass. Eleven (65%) of the tested enzymes had significant activity towards at least one of the substrates. This high success rate highlights that a gene context-based approach can be used to assign function to genes that are otherwise categorized as "genomic dark matter" and to identify biomass-degrading enzymes that have little sequence similarity to already known cellulases. The ability to assign function to genes that have no related sequence representatives with functional annotation will be important to enhance our understanding of microbial processes and to identify microbial proteins for a wide range of applications. © 2014 Wiley Periodicals, Inc.
Lee, Chi-Ching; Chen, Yi-Ping Phoebe; Yao, Tzu-Jung; Ma, Cheng-Yu; Lo, Wei-Cheng; Lyu, Ping-Chiang; Tang, Chuan Yi
2013-04-10
Sequencing of microbial genomes is important because of microbial-carrying antibiotic and pathogenetic activities. However, even with the help of new assembling software, finishing a whole genome is a time-consuming task. In most bacteria, pathogenetic or antibiotic genes are carried in genomic islands. Therefore, a quick genomic island (GI) prediction method is useful for ongoing sequencing genomes. In this work, we built a Web server called GI-POP (http://gipop.life.nthu.edu.tw) which integrates a sequence assembling tool, a functional annotation pipeline, and a high-performance GI predicting module, in a support vector machine (SVM)-based method called genomic island genomic profile scanning (GI-GPS). The draft genomes of the ongoing genome projects in contigs or scaffolds can be submitted to our Web server, and it provides the functional annotation and highly probable GI-predicting results. GI-POP is a comprehensive annotation Web server designed for ongoing genome project analysis. Researchers can perform annotation and obtain pre-analytic information include possible GIs, coding/non-coding sequences and functional analysis from their draft genomes. This pre-analytic system can provide useful information for finishing a genome sequencing project. Copyright © 2012 Elsevier B.V. All rights reserved.
Distilled single-cell genome sequencing and de novo assembly for sparse microbial communities.
Taghavi, Zeinab; Movahedi, Narjes S; Draghici, Sorin; Chitsaz, Hamidreza
2013-10-01
Identification of every single genome present in a microbial sample is an important and challenging task with crucial applications. It is challenging because there are typically millions of cells in a microbial sample, the vast majority of which elude cultivation. The most accurate method to date is exhaustive single-cell sequencing using multiple displacement amplification, which is simply intractable for a large number of cells. However, there is hope for breaking this barrier, as the number of different cell types with distinct genome sequences is usually much smaller than the number of cells. Here, we present a novel divide and conquer method to sequence and de novo assemble all distinct genomes present in a microbial sample with a sequencing cost and computational complexity proportional to the number of genome types, rather than the number of cells. The method is implemented in a tool called Squeezambler. We evaluated Squeezambler on simulated data. The proposed divide and conquer method successfully reduces the cost of sequencing in comparison with the naïve exhaustive approach. Squeezambler and datasets are available at http://compbio.cs.wayne.edu/software/squeezambler/.
ReprDB and panDB: minimalist databases with maximal microbial representation.
Zhou, Wei; Gay, Nicole; Oh, Julia
2018-01-18
Profiling of shotgun metagenomic samples is hindered by a lack of unified microbial reference genome databases that (i) assemble genomic information from all open access microbial genomes, (ii) have relatively small sizes, and (iii) are compatible to various metagenomic read mapping tools. Moreover, computational tools to rapidly compile and update such databases to accommodate the rapid increase in new reference genomes do not exist. As a result, database-guided analyses often fail to profile a substantial fraction of metagenomic shotgun sequencing reads from complex microbiomes. We report pipelines that efficiently traverse all open access microbial genomes and assemble non-redundant genomic information. The pipelines result in two species-resolution microbial reference databases of relatively small sizes: reprDB, which assembles microbial representative or reference genomes, and panDB, for which we developed a novel iterative alignment algorithm to identify and assemble non-redundant genomic regions in multiple sequenced strains. With the databases, we managed to assign taxonomic labels and genome positions to the majority of metagenomic reads from human skin and gut microbiomes, demonstrating a significant improvement over a previous database-guided analysis on the same datasets. reprDB and panDB leverage the rapid increases in the number of open access microbial genomes to more fully profile metagenomic samples. Additionally, the databases exclude redundant sequence information to avoid inflated storage or memory space and indexing or analyzing time. Finally, the novel iterative alignment algorithm significantly increases efficiency in pan-genome identification and can be useful in comparative genomic analyses.
Bernard, Guillaume; Chan, Cheong Xin; Ragan, Mark A
2016-07-01
Alignment-free (AF) approaches have recently been highlighted as alternatives to methods based on multiple sequence alignment in phylogenetic inference. However, the sensitivity of AF methods to genome-scale evolutionary scenarios is little known. Here, using simulated microbial genome data we systematically assess the sensitivity of nine AF methods to three important evolutionary scenarios: sequence divergence, lateral genetic transfer (LGT) and genome rearrangement. Among these, AF methods are most sensitive to the extent of sequence divergence, less sensitive to low and moderate frequencies of LGT, and most robust against genome rearrangement. We describe the application of AF methods to three well-studied empirical genome datasets, and introduce a new application of the jackknife to assess node support. Our results demonstrate that AF phylogenomics is computationally scalable to multi-genome data and can generate biologically meaningful phylogenies and insights into microbial evolution.
The standard operating procedure of the DOE-JGI Microbial Genome Annotation Pipeline (MGAP v.4).
Huntemann, Marcel; Ivanova, Natalia N; Mavromatis, Konstantinos; Tripp, H James; Paez-Espino, David; Palaniappan, Krishnaveni; Szeto, Ernest; Pillay, Manoj; Chen, I-Min A; Pati, Amrita; Nielsen, Torben; Markowitz, Victor M; Kyrpides, Nikos C
2015-01-01
The DOE-JGI Microbial Genome Annotation Pipeline performs structural and functional annotation of microbial genomes that are further included into the Integrated Microbial Genome comparative analysis system. MGAP is applied to assembled nucleotide sequence datasets that are provided via the IMG submission site. Dataset submission for annotation first requires project and associated metadata description in GOLD. The MGAP sequence data processing consists of feature prediction including identification of protein-coding genes, non-coding RNAs and regulatory RNA features, as well as CRISPR elements. Structural annotation is followed by assignment of protein product names and functions.
Treangen, Todd J; Ondov, Brian D; Koren, Sergey; Phillippy, Adam M
2014-01-01
Whole-genome sequences are now available for many microbial species and clades, however existing whole-genome alignment methods are limited in their ability to perform sequence comparisons of multiple sequences simultaneously. Here we present the Harvest suite of core-genome alignment and visualization tools for the rapid and simultaneous analysis of thousands of intraspecific microbial strains. Harvest includes Parsnp, a fast core-genome multi-aligner, and Gingr, a dynamic visual platform. Together they provide interactive core-genome alignments, variant calls, recombination detection, and phylogenetic trees. Using simulated and real data we demonstrate that our approach exhibits unrivaled speed while maintaining the accuracy of existing methods. The Harvest suite is open-source and freely available from: http://github.com/marbl/harvest.
Large-scale contamination of microbial isolate genomes by Illumina PhiX control.
Mukherjee, Supratim; Huntemann, Marcel; Ivanova, Natalia; Kyrpides, Nikos C; Pati, Amrita
2015-01-01
With the rapid growth and development of sequencing technologies, genomes have become the new go-to for exploring solutions to some of the world's biggest challenges such as searching for alternative energy sources and exploration of genomic dark matter. However, progress in sequencing has been accompanied by its share of errors that can occur during template or library preparation, sequencing, imaging or data analysis. In this study we screened over 18,000 publicly available microbial isolate genome sequences in the Integrated Microbial Genomes database and identified more than 1000 genomes that are contaminated with PhiX, a control frequently used during Illumina sequencing runs. Approximately 10% of these genomes have been published in literature and 129 contaminated genomes were sequenced under the Human Microbiome Project. Raw sequence reads are prone to contamination from various sources and are usually eliminated during downstream quality control steps. Detection of PhiX contaminated genomes indicates a lapse in either the application or effectiveness of proper quality control measures. The presence of PhiX contamination in several publicly available isolate genomes can result in additional errors when such data are used in comparative genomics analyses. Such contamination of public databases have far-reaching consequences in the form of erroneous data interpretation and analyses, and necessitates better measures to proofread raw sequences before releasing them to the broader scientific community.
Angiuoli, Samuel V; White, James R; Matalka, Malcolm; White, Owen; Fricke, W Florian
2011-01-01
The widespread popularity of genomic applications is threatened by the "bioinformatics bottleneck" resulting from uncertainty about the cost and infrastructure needed to meet increasing demands for next-generation sequence analysis. Cloud computing services have been discussed as potential new bioinformatics support systems but have not been evaluated thoroughly. We present benchmark costs and runtimes for common microbial genomics applications, including 16S rRNA analysis, microbial whole-genome shotgun (WGS) sequence assembly and annotation, WGS metagenomics and large-scale BLAST. Sequence dataset types and sizes were selected to correspond to outputs typically generated by small- to midsize facilities equipped with 454 and Illumina platforms, except for WGS metagenomics where sampling of Illumina data was used. Automated analysis pipelines, as implemented in the CloVR virtual machine, were used in order to guarantee transparency, reproducibility and portability across different operating systems, including the commercial Amazon Elastic Compute Cloud (EC2), which was used to attach real dollar costs to each analysis type. We found considerable differences in computational requirements, runtimes and costs associated with different microbial genomics applications. While all 16S analyses completed on a single-CPU desktop in under three hours, microbial genome and metagenome analyses utilized multi-CPU support of up to 120 CPUs on Amazon EC2, where each analysis completed in under 24 hours for less than $60. Representative datasets were used to estimate maximum data throughput on different cluster sizes and to compare costs between EC2 and comparable local grid servers. Although bioinformatics requirements for microbial genomics depend on dataset characteristics and the analysis protocols applied, our results suggests that smaller sequencing facilities (up to three Roche/454 or one Illumina GAIIx sequencer) invested in 16S rRNA amplicon sequencing, microbial single-genome and metagenomics WGS projects can achieve cost-efficient bioinformatics support using CloVR in combination with Amazon EC2 as an alternative to local computing centers.
Angiuoli, Samuel V.; White, James R.; Matalka, Malcolm; White, Owen; Fricke, W. Florian
2011-01-01
Background The widespread popularity of genomic applications is threatened by the “bioinformatics bottleneck” resulting from uncertainty about the cost and infrastructure needed to meet increasing demands for next-generation sequence analysis. Cloud computing services have been discussed as potential new bioinformatics support systems but have not been evaluated thoroughly. Results We present benchmark costs and runtimes for common microbial genomics applications, including 16S rRNA analysis, microbial whole-genome shotgun (WGS) sequence assembly and annotation, WGS metagenomics and large-scale BLAST. Sequence dataset types and sizes were selected to correspond to outputs typically generated by small- to midsize facilities equipped with 454 and Illumina platforms, except for WGS metagenomics where sampling of Illumina data was used. Automated analysis pipelines, as implemented in the CloVR virtual machine, were used in order to guarantee transparency, reproducibility and portability across different operating systems, including the commercial Amazon Elastic Compute Cloud (EC2), which was used to attach real dollar costs to each analysis type. We found considerable differences in computational requirements, runtimes and costs associated with different microbial genomics applications. While all 16S analyses completed on a single-CPU desktop in under three hours, microbial genome and metagenome analyses utilized multi-CPU support of up to 120 CPUs on Amazon EC2, where each analysis completed in under 24 hours for less than $60. Representative datasets were used to estimate maximum data throughput on different cluster sizes and to compare costs between EC2 and comparable local grid servers. Conclusions Although bioinformatics requirements for microbial genomics depend on dataset characteristics and the analysis protocols applied, our results suggests that smaller sequencing facilities (up to three Roche/454 or one Illumina GAIIx sequencer) invested in 16S rRNA amplicon sequencing, microbial single-genome and metagenomics WGS projects can achieve cost-efficient bioinformatics support using CloVR in combination with Amazon EC2 as an alternative to local computing centers. PMID:22028928
Zhang, Huimin; He, Hongkui; Yu, Xiujuan; Xu, Zhaohui; Zhang, Zhizhou
2016-11-01
It remains an unsolved problem to quantify a natural microbial community by rapidly and conveniently measuring multiple species with functional significance. Most widely used high throughput next-generation sequencing methods can only generate information mainly for genus-level taxonomic identification and quantification, and detection of multiple species in a complex microbial community is still heavily dependent on approaches based on near full-length ribosome RNA gene or genome sequence information. In this study, we used near full-length rRNA gene library sequencing plus Primer-Blast to design species-specific primers based on whole microbial genome sequences. The primers were intended to be specific at the species level within relevant microbial communities, i.e., a defined genomics background. The primers were tested with samples collected from the Daqu (also called fermentation starters) and pit mud of a traditional Chinese liquor production plant. Sixteen pairs of primers were found to be suitable for identification of individual species. Among them, seven pairs were chosen to measure the abundance of microbial species through quantitative PCR. The combination of near full-length ribosome RNA gene library sequencing and Primer-Blast may represent a broadly useful protocol to quantify multiple species in complex microbial population samples with species-specific primers.
Ten years of maintaining and expanding a microbial genome and metagenome analysis system.
Markowitz, Victor M; Chen, I-Min A; Chu, Ken; Pati, Amrita; Ivanova, Natalia N; Kyrpides, Nikos C
2015-11-01
Launched in March 2005, the Integrated Microbial Genomes (IMG) system is a comprehensive data management system that supports multidimensional comparative analysis of genomic data. At the core of the IMG system is a data warehouse that contains genome and metagenome datasets sequenced at the Joint Genome Institute or provided by scientific users, as well as public genome datasets available at the National Center for Biotechnology Information Genbank sequence data archive. Genomes and metagenome datasets are processed using IMG's microbial genome and metagenome sequence data processing pipelines and are integrated into the data warehouse using IMG's data integration toolkits. Microbial genome and metagenome application specific data marts and user interfaces provide access to different subsets of IMG's data and analysis toolkits. This review article revisits IMG's original aims, highlights key milestones reached by the system during the past 10 years, and discusses the main challenges faced by a rapidly expanding system, in particular the complexity of maintaining such a system in an academic setting with limited budgets and computing and data management infrastructure. Copyright © 2015 Elsevier Ltd. All rights reserved.
The standard operating procedure of the DOE-JGI Microbial Genome Annotation Pipeline (MGAP v.4)
Huntemann, Marcel; Ivanova, Natalia N.; Mavromatis, Konstantinos; ...
2015-10-26
The DOE-JGI Microbial Genome Annotation Pipeline performs structural and functional annotation of microbial genomes that are further included into the Integrated Microbial Genome comparative analysis system. MGAP is applied to assembled nucleotide sequence datasets that are provided via the IMG submission site. Dataset submission for annotation first requires project and associated metadata description in GOLD. The MGAP sequence data processing consists of feature prediction including identification of protein-coding genes, non-coding RNAs and regulatory RNA features, as well as CRISPR elements. In conclusion, structural annotation is followed by assignment of protein product names and functions.
The standard operating procedure of the DOE-JGI Microbial Genome Annotation Pipeline (MGAP v.4)
DOE Office of Scientific and Technical Information (OSTI.GOV)
Huntemann, Marcel; Ivanova, Natalia N.; Mavromatis, Konstantinos
The DOE-JGI Microbial Genome Annotation Pipeline performs structural and functional annotation of microbial genomes that are further included into the Integrated Microbial Genome comparative analysis system. MGAP is applied to assembled nucleotide sequence datasets that are provided via the IMG submission site. Dataset submission for annotation first requires project and associated metadata description in GOLD. The MGAP sequence data processing consists of feature prediction including identification of protein-coding genes, non-coding RNAs and regulatory RNA features, as well as CRISPR elements. In conclusion, structural annotation is followed by assignment of protein product names and functions.
Kamada, Mayumi; Hase, Sumitaka; Sato, Kengo; Toyoda, Atsushi; Fujiyama, Asao; Sakakibara, Yasubumi
2014-01-01
De novo microbial genome sequencing reached a turning point with third-generation sequencing (TGS) platforms, and several microbial genomes have been improved by TGS long reads. Bacillus subtilis natto is closely related to the laboratory standard strain B. subtilis Marburg 168, and it has a function in the production of the traditional Japanese fermented food “natto.” The B. subtilis natto BEST195 genome was previously sequenced with short reads, but it included some incomplete regions. We resequenced the BEST195 genome using a PacBio RS sequencer, and we successfully obtained a complete genome sequence from one scaffold without any gaps, and we also applied Illumina MiSeq short reads to enhance quality. Compared with the previous BEST195 draft genome and Marburg 168 genome, we found that incomplete regions in the previous genome sequence were attributed to GC-bias and repetitive sequences, and we also identified some novel genes that are found only in the new genome. PMID:25329997
Strain/species identification in metagenomes using genome-specific markers
Tu, Qichao; He, Zhili; Zhou, Jizhong
2014-01-01
Shotgun metagenome sequencing has become a fast, cheap and high-throughput technology for characterizing microbial communities in complex environments and human body sites. However, accurate identification of microorganisms at the strain/species level remains extremely challenging. We present a novel k-mer-based approach, termed GSMer, that identifies genome-specific markers (GSMs) from currently sequenced microbial genomes, which were then used for strain/species-level identification in metagenomes. Using 5390 sequenced microbial genomes, 8 770 321 50-mer strain-specific and 11 736 360 species-specific GSMs were identified for 4088 strains and 2005 species (4933 strains), respectively. The GSMs were first evaluated against mock community metagenomes, recently sequenced genomes and real metagenomes from different body sites, suggesting that the identified GSMs were specific to their targeting genomes. Sensitivity evaluation against synthetic metagenomes with different coverage suggested that 50 GSMs per strain were sufficient to identify most microbial strains with ≥0.25× coverage, and 10% of selected GSMs in a database should be detected for confident positive callings. Application of GSMs identified 45 and 74 microbial strains/species significantly associated with type 2 diabetes patients and obese/lean individuals from corresponding gastrointestinal tract metagenomes, respectively. Our result agreed with previous studies but provided strain-level information. The approach can be directly applied to identify microbial strains/species from raw metagenomes, without the effort of complex data pre-processing. PMID:24523352
Improved bacteriophage genome data is necessary for integrating viral and bacterial ecology.
Bibby, Kyle
2014-02-01
The recent rise in "omics"-enabled approaches has lead to improved understanding in many areas of microbial ecology. However, despite the importance that viruses play in a broad microbial ecology context, viral ecology remains largely not integrated into high-throughput microbial ecology studies. A fundamental hindrance to the integration of viral ecology into omics-enabled microbial ecology studies is the lack of suitable reference bacteriophage genomes in reference databases-currently, only 0.001% of bacteriophage diversity is represented in genome sequence databases. This commentary serves to highlight this issue and to promote bacteriophage genome sequencing as a valuable scientific undertaking to both better understand bacteriophage diversity and move towards a more holistic view of microbial ecology.
Signatures of natural selection and ecological differentiation in microbial genomes.
Shapiro, B Jesse
2014-01-01
We live in a microbial world. Most of the genetic and metabolic diversity that exists on earth - and has existed for billions of years - is microbial. Making sense of this vast diversity is a daunting task, but one that can be approached systematically by analyzing microbial genome sequences. This chapter explores how the evolutionary forces of recombination and selection act to shape microbial genome sequences, leaving signatures that can be detected using comparative genomics and population-genetic tests for selection. I describe the major classes of tests, paying special attention to their relative strengths and weaknesses when applied to microbes. Specifically, I apply a suite of tests for selection to a set of closely-related bacterial genomes with different microhabitat preferences within the marine water column, shedding light on the genomic mechanisms of ecological differentiation in the wild. I will focus on the joint problem of simultaneously inferring the boundaries between microbial populations, and the selective forces operating within and between populations.
Uchiyama, Ikuo; Mihara, Motohiro; Nishide, Hiroyo; Chiba, Hirokazu
2015-01-01
The microbial genome database for comparative analysis (MBGD) (available at http://mbgd.genome.ad.jp/) is a comprehensive ortholog database for flexible comparative analysis of microbial genomes, where the users are allowed to create an ortholog table among any specified set of organisms. Because of the rapid increase in microbial genome data owing to the next-generation sequencing technology, it becomes increasingly challenging to maintain high-quality orthology relationships while allowing the users to incorporate the latest genomic data available into an analysis. Because many of the recently accumulating genomic data are draft genome sequences for which some complete genome sequences of the same or closely related species are available, MBGD now stores draft genome data and allows the users to incorporate them into a user-specific ortholog database using the MyMBGD functionality. In this function, draft genome data are incorporated into an existing ortholog table created only from the complete genome data in an incremental manner to prevent low-quality draft data from affecting clustering results. In addition, to provide high-quality orthology relationships, the standard ortholog table containing all the representative genomes, which is first created by the rapid classification program DomClust, is now refined using DomRefine, a recently developed program for improving domain-level clustering using multiple sequence alignment information. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.
VirSorter: mining viral signal from microbial genomic data.
Roux, Simon; Enault, Francois; Hurwitz, Bonnie L; Sullivan, Matthew B
2015-01-01
Viruses of microbes impact all ecosystems where microbes drive key energy and substrate transformations including the oceans, humans and industrial fermenters. However, despite this recognized importance, our understanding of viral diversity and impacts remains limited by too few model systems and reference genomes. One way to fill these gaps in our knowledge of viral diversity is through the detection of viral signal in microbial genomic data. While multiple approaches have been developed and applied for the detection of prophages (viral genomes integrated in a microbial genome), new types of microbial genomic data are emerging that are more fragmented and larger scale, such as Single-cell Amplified Genomes (SAGs) of uncultivated organisms or genomic fragments assembled from metagenomic sequencing. Here, we present VirSorter, a tool designed to detect viral signal in these different types of microbial sequence data in both a reference-dependent and reference-independent manner, leveraging probabilistic models and extensive virome data to maximize detection of novel viruses. Performance testing shows that VirSorter's prophage prediction capability compares to that of available prophage predictors for complete genomes, but is superior in predicting viral sequences outside of a host genome (i.e., from extrachromosomal prophages, lytic infections, or partially assembled prophages). Furthermore, VirSorter outperforms existing tools for fragmented genomic and metagenomic datasets, and can identify viral signal in assembled sequence (contigs) as short as 3kb, while providing near-perfect identification (>95% Recall and 100% Precision) on contigs of at least 10kb. Because VirSorter scales to large datasets, it can also be used in "reverse" to more confidently identify viral sequence in viral metagenomes by sorting away cellular DNA whether derived from gene transfer agents, generalized transduction or contamination. Finally, VirSorter is made available through the iPlant Cyberinfrastructure that provides a web-based user interface interconnected with the required computing resources. VirSorter thus complements existing prophage prediction softwares to better leverage fragmented, SAG and metagenomic datasets in a way that will scale to modern sequencing. Given these features, VirSorter should enable the discovery of new viruses in microbial datasets, and further our understanding of uncultivated viral communities across diverse ecosystems.
VirSorter: mining viral signal from microbial genomic data
Roux, Simon; Enault, Francois; Hurwitz, Bonnie L.
2015-01-01
Viruses of microbes impact all ecosystems where microbes drive key energy and substrate transformations including the oceans, humans and industrial fermenters. However, despite this recognized importance, our understanding of viral diversity and impacts remains limited by too few model systems and reference genomes. One way to fill these gaps in our knowledge of viral diversity is through the detection of viral signal in microbial genomic data. While multiple approaches have been developed and applied for the detection of prophages (viral genomes integrated in a microbial genome), new types of microbial genomic data are emerging that are more fragmented and larger scale, such as Single-cell Amplified Genomes (SAGs) of uncultivated organisms or genomic fragments assembled from metagenomic sequencing. Here, we present VirSorter, a tool designed to detect viral signal in these different types of microbial sequence data in both a reference-dependent and reference-independent manner, leveraging probabilistic models and extensive virome data to maximize detection of novel viruses. Performance testing shows that VirSorter’s prophage prediction capability compares to that of available prophage predictors for complete genomes, but is superior in predicting viral sequences outside of a host genome (i.e., from extrachromosomal prophages, lytic infections, or partially assembled prophages). Furthermore, VirSorter outperforms existing tools for fragmented genomic and metagenomic datasets, and can identify viral signal in assembled sequence (contigs) as short as 3kb, while providing near-perfect identification (>95% Recall and 100% Precision) on contigs of at least 10kb. Because VirSorter scales to large datasets, it can also be used in “reverse” to more confidently identify viral sequence in viral metagenomes by sorting away cellular DNA whether derived from gene transfer agents, generalized transduction or contamination. Finally, VirSorter is made available through the iPlant Cyberinfrastructure that provides a web-based user interface interconnected with the required computing resources. VirSorter thus complements existing prophage prediction softwares to better leverage fragmented, SAG and metagenomic datasets in a way that will scale to modern sequencing. Given these features, VirSorter should enable the discovery of new viruses in microbial datasets, and further our understanding of uncultivated viral communities across diverse ecosystems. PMID:26038737
Microbial genome-wide association studies: lessons from human GWAS.
Power, Robert A; Parkhill, Julian; de Oliveira, Tulio
2017-01-01
The reduced costs of sequencing have led to whole-genome sequences for a large number of microorganisms, enabling the application of microbial genome-wide association studies (GWAS). Given the successes of human GWAS in understanding disease aetiology and identifying potential drug targets, microbial GWAS are likely to further advance our understanding of infectious diseases. These advances include insights into pressing global health problems, such as antibiotic resistance and disease transmission. In this Review, we outline the methodologies of GWAS, the current state of the field of microbial GWAS, and how lessons from human GWAS can direct the future of the field.
MicroScope: a platform for microbial genome annotation and comparative genomics
Vallenet, D.; Engelen, S.; Mornico, D.; Cruveiller, S.; Fleury, L.; Lajus, A.; Rouy, Z.; Roche, D.; Salvignol, G.; Scarpelli, C.; Médigue, C.
2009-01-01
The initial outcome of genome sequencing is the creation of long text strings written in a four letter alphabet. The role of in silico sequence analysis is to assist biologists in the act of associating biological knowledge with these sequences, allowing investigators to make inferences and predictions that can be tested experimentally. A wide variety of software is available to the scientific community, and can be used to identify genomic objects, before predicting their biological functions. However, only a limited number of biologically interesting features can be revealed from an isolated sequence. Comparative genomics tools, on the other hand, by bringing together the information contained in numerous genomes simultaneously, allow annotators to make inferences based on the idea that evolution and natural selection are central to the definition of all biological processes. We have developed the MicroScope platform in order to offer a web-based framework for the systematic and efficient revision of microbial genome annotation and comparative analysis (http://www.genoscope.cns.fr/agc/microscope). Starting with the description of the flow chart of the annotation processes implemented in the MicroScope pipeline, and the development of traditional and novel microbial annotation and comparative analysis tools, this article emphasizes the essential role of expert annotation as a complement of automatic annotation. Several examples illustrate the use of implemented tools for the review and curation of annotations of both new and publicly available microbial genomes within MicroScope’s rich integrated genome framework. The platform is used as a viewer in order to browse updated annotation information of available microbial genomes (more than 440 organisms to date), and in the context of new annotation projects (117 bacterial genomes). The human expertise gathered in the MicroScope database (about 280,000 independent annotations) contributes to improve the quality of microbial genome annotation, especially for genomes initially analyzed by automatic procedures alone. Database URLs: http://www.genoscope.cns.fr/agc/mage and http://www.genoscope.cns.fr/agc/microcyc PMID:20157493
MicroScope: a platform for microbial genome annotation and comparative genomics.
Vallenet, D; Engelen, S; Mornico, D; Cruveiller, S; Fleury, L; Lajus, A; Rouy, Z; Roche, D; Salvignol, G; Scarpelli, C; Médigue, C
2009-01-01
The initial outcome of genome sequencing is the creation of long text strings written in a four letter alphabet. The role of in silico sequence analysis is to assist biologists in the act of associating biological knowledge with these sequences, allowing investigators to make inferences and predictions that can be tested experimentally. A wide variety of software is available to the scientific community, and can be used to identify genomic objects, before predicting their biological functions. However, only a limited number of biologically interesting features can be revealed from an isolated sequence. Comparative genomics tools, on the other hand, by bringing together the information contained in numerous genomes simultaneously, allow annotators to make inferences based on the idea that evolution and natural selection are central to the definition of all biological processes. We have developed the MicroScope platform in order to offer a web-based framework for the systematic and efficient revision of microbial genome annotation and comparative analysis (http://www.genoscope.cns.fr/agc/microscope). Starting with the description of the flow chart of the annotation processes implemented in the MicroScope pipeline, and the development of traditional and novel microbial annotation and comparative analysis tools, this article emphasizes the essential role of expert annotation as a complement of automatic annotation. Several examples illustrate the use of implemented tools for the review and curation of annotations of both new and publicly available microbial genomes within MicroScope's rich integrated genome framework. The platform is used as a viewer in order to browse updated annotation information of available microbial genomes (more than 440 organisms to date), and in the context of new annotation projects (117 bacterial genomes). The human expertise gathered in the MicroScope database (about 280,000 independent annotations) contributes to improve the quality of microbial genome annotation, especially for genomes initially analyzed by automatic procedures alone.Database URLs: http://www.genoscope.cns.fr/agc/mage and http://www.genoscope.cns.fr/agc/microcyc.
Fourteenth-Sixteenth Microbial Genomics Conference-2006-2008
DOE Office of Scientific and Technical Information (OSTI.GOV)
Miller, Jeffrey H
2011-04-18
The concept of an annual meeting on the E. coli genome was formulated at the Banbury Center Conference on the Genome of E. coli in October, 1991. The first meeting was held on September 10-14, 1992 at the University of Wisconsin, and this was followed by a yearly series of meetings, and by an expansion to include The fourteenth meeting took place September 24-28, 2006 at Lake Arrowhead, CA, the fifteenth September 16-20, 2007 at the University of Maryland, College Park, MD, and the sixteenth September 14-18, 2008 at Lake Arrowhead. The full program for the 16th meeting is attached.more » There have been rapid and exciting advances in microbial genomics that now make possible comparing large data sets of sequences from a wide variety of microbial genomes, and from whole microbial communities. Examining the “microbiomes”, the living microbial communities in different host organisms opens up many possibilities for understanding the landscape presented to pathogenic microorganisms. For quite some time there has been a shifting emphasis from pure sequence data to trying to understand how to use that information to solve biological problems. Towards this end new technologies are being developed and improved. Using genetics, functional genomics, and proteomics has been the recent focus of many different laboratories. A key element is the integration of different aspects of microbiology, sequencing technology, analysis techniques, and bioinformatics. The goal of these conference is to provide a regular forum for these interactions to occur. While there have been a number of genome conferences, what distinguishes the Microbial Genomics Conference is its emphasis on bringing together biology and genetics with sequencing and bioinformatics. Also, this conference is the longest continuing meeting, now established as a major regular annual meeting. In addition to its coverage of microbial genomes and biodiversity, the meetings also highlight microbial communities and the use of genomic information to aid in the understanding of pathogens and biothreats. An additional focus cover s“bioenergetics. The meetings have a mix of invited and participant-initiated presentations and poster sessions during which investigators from different disciplines become familiar with available data bases and new tools facilitating coordination of information. The fields are moving very fast both in the acquisition of new knowledge of genome contents and also in the management and analysis of the information. The key is connecting bodies of knowledge on sequences, genetic organization and regulation to be able to relate the significance of this information to understanding cellular processes. To our knowledge, no other meeting synthesizes the biology of organisms, sequence information and database analysis, as well as the comparison with other completed genome sequences.« less
Ross, Daniel E.; Marshall, Christopher W.; May, Harold D.; ...
2015-01-15
A draft genome of Sulfurospirillum sp. strain MES was isolated through taxonomic binning of a metagenome sequenced from a microbial electrosynthesis system (MES) actively producing acetate and hydrogen. The genome contains the nosZDFLY genes, which are involved in nitrous oxide reduction, suggesting the potential role of this strain in denitrification.
Ross, Daniel E; Marshall, Christopher W; May, Harold D; Norman, R Sean
2017-09-07
Draft genome sequences of Acetobacterium sp. strain MES1 and Desulfovibrio sp. strain MES5 were obtained from the metagenome of a cathode-associated community enriched within a microbial electrosynthesis system (MES). The draft genome sequences provide insight into the functional potential of these microorganisms within an MES and a foundation for future comparative analyses. Copyright © 2017 Ross et al.
Pang, Changlong; Li, Ang; Cui, Di; Yang, Jixian; Ma, Fang; Guo, Haijuan
2016-02-20
Klebsiella pneumoniae J1 is a Gram-negative strain, which belongs to a protein-based microbial flocculant-producing bacterium. However, little genetic information is known about this species. Here we carried out a whole-genome sequence analysis of this strain and report the complete genome sequence of this organism and its genetic basis for carbohydrate metabolism, capsule biosynthesis and transport system. Copyright © 2016 Elsevier B.V. All rights reserved.
Mining of Microbial Genomes for the Novel Sources of Nitrilases.
Sharma, Nikhil; Thakur, Neerja; Raj, Tilak; Savitri; Bhalla, Tek Chand
2017-01-01
Next-generation DNA sequencing (NGS) has made it feasible to sequence large number of microbial genomes and advancements in computational biology have opened enormous opportunities to mine genome sequence data for novel genes and enzymes or their sources. In the present communication in silico mining of microbial genomes has been carried out to find novel sources of nitrilases. The sequences selected were analyzed for homology and considered for designing motifs. The manually designed motifs based on amino acid sequences of nitrilases were used to screen 2000 microbial genomes (translated to proteomes). This resulted in identification of one hundred thirty-eight putative/hypothetical sequences which could potentially code for nitrilase activity. In vitro validation of nine predicted sources of nitrilases was done for nitrile/cyanide hydrolyzing activity. Out of nine predicted nitrilases, Gluconacetobacter diazotrophicus , Sphingopyxis alaskensis , Saccharomonospora viridis , and Shimwellia blattae were specific for aliphatic nitriles, whereas nitrilases from Geodermatophilus obscurus , Nocardiopsis dassonvillei , Runella slithyformis , and Streptomyces albus possessed activity for aromatic nitriles. Flavobacterium indicum was specific towards potassium cyanide (KCN) which revealed the presence of nitrilase homolog, that is, cyanide dihydratase with no activity for either aliphatic, aromatic, or aryl nitriles. The present study reports the novel sources of nitrilases and cyanide dihydratase which were not reported hitherto by in silico or in vitro studies.
Microbial bioinformatics 2020.
Pallen, Mark J
2016-09-01
Microbial bioinformatics in 2020 will remain a vibrant, creative discipline, adding value to the ever-growing flood of new sequence data, while embracing novel technologies and fresh approaches. Databases and search strategies will struggle to cope and manual curation will not be sustainable during the scale-up to the million-microbial-genome era. Microbial taxonomy will have to adapt to a situation in which most microorganisms are discovered and characterised through the analysis of sequences. Genome sequencing will become a routine approach in clinical and research laboratories, with fresh demands for interpretable user-friendly outputs. The "internet of things" will penetrate healthcare systems, so that even a piece of hospital plumbing might have its own IP address that can be integrated with pathogen genome sequences. Microbiome mania will continue, but the tide will turn from molecular barcoding towards metagenomics. Crowd-sourced analyses will collide with cloud computing, but eternal vigilance will be the price of preventing the misinterpretation and overselling of microbial sequence data. Output from hand-held sequencers will be analysed on mobile devices. Open-source training materials will address the need for the development of a skilled labour force. As we boldly go into the third decade of the twenty-first century, microbial sequence space will remain the final frontier! © 2016 The Author. Microbial Biotechnology published by John Wiley & Sons Ltd and Society for Applied Microbiology.
Analysis of Illumina Microbial Assemblies
DOE Office of Scientific and Technical Information (OSTI.GOV)
Clum, Alicia; Foster, Brian; Froula, Jeff
2010-05-28
Since the emerging of second generation sequencing technologies, the evaluation of different sequencing approaches and their assembly strategies for different types of genomes has become an important undertaken. Next generation sequencing technologies dramatically increase sequence throughput while decreasing cost, making them an attractive tool for whole genome shotgun sequencing. To compare different approaches for de-novo whole genome assembly, appropriate tools and a solid understanding of both quantity and quality of the underlying sequence data are crucial. Here, we performed an in-depth analysis of short-read Illumina sequence assembly strategies for bacterial and archaeal genomes. Different types of Illumina libraries as wellmore » as different trim parameters and assemblers were evaluated. Results of the comparative analysis and sequencing platforms will be presented. The goal of this analysis is to develop a cost-effective approach for the increased throughput of the generation of high quality microbial genomes.« less
Comparative Genomics of Bacillus species and its Relevance in Industrial Microbiology.
Sharma, Archana; Satyanarayana, T
2013-01-01
With the advent of high throughput sequencing platforms and relevant analytical tools, the rate of microbial genome sequencing has accelerated which has in turn led to better understanding of microbial molecular biology and genetics. The complete genome sequences of important industrial organisms provide opportunities for human health, industry, and the environment. Bacillus species are the dominant workhorses in industrial fermentations. Today, genome sequences of several Bacillus species are available, and comparative genomics of this genus helps in understanding their physiology, biochemistry, and genetics. The genomes of these bacterial species are the sources of many industrially important enzymes and antibiotics and, therefore, provide an opportunity to tailor enzymes with desired properties to suit a wide range of applications. A comparative account of strengths and weaknesses of the different sequencing platforms are also highlighted in the review.
USDA-ARS?s Scientific Manuscript database
The ARS Microbial Genome Sequence Database (http://199.133.98.43), a web-based database server, was established utilizing the BIGSdb (Bacterial Isolate Genomics Sequence Database) software package, developed at Oxford University, as a tool to manage multi-locus sequence data for the family Streptomy...
Large-Scale Sequencing: The Future of Genomic Sciences Colloquium
DOE Office of Scientific and Technical Information (OSTI.GOV)
Margaret Riley; Merry Buckley
2009-01-01
Genetic sequencing and the various molecular techniques it has enabled have revolutionized the field of microbiology. Examining and comparing the genetic sequences borne by microbes - including bacteria, archaea, viruses, and microbial eukaryotes - provides researchers insights into the processes microbes carry out, their pathogenic traits, and new ways to use microorganisms in medicine and manufacturing. Until recently, sequencing entire microbial genomes has been laborious and expensive, and the decision to sequence the genome of an organism was made on a case-by-case basis by individual researchers and funding agencies. Now, thanks to new technologies, the cost and effort of sequencingmore » is within reach for even the smallest facilities, and the ability to sequence the genomes of a significant fraction of microbial life may be possible. The availability of numerous microbial genomes will enable unprecedented insights into microbial evolution, function, and physiology. However, the current ad hoc approach to gathering sequence data has resulted in an unbalanced and highly biased sampling of microbial diversity. A well-coordinated, large-scale effort to target the breadth and depth of microbial diversity would result in the greatest impact. The American Academy of Microbiology convened a colloquium to discuss the scientific benefits of engaging in a large-scale, taxonomically-based sequencing project. A group of individuals with expertise in microbiology, genomics, informatics, ecology, and evolution deliberated on the issues inherent in such an effort and generated a set of specific recommendations for how best to proceed. The vast majority of microbes are presently uncultured and, thus, pose significant challenges to such a taxonomically-based approach to sampling genome diversity. However, we have yet to even scratch the surface of the genomic diversity among cultured microbes. A coordinated sequencing effort of cultured organisms is an appropriate place to begin, since not only are their genomes available, but they are also accompanied by data on environment and physiology that can be used to understand the resulting data. As single cell isolation methods improve, there should be a shift toward incorporating uncultured organisms and communities into this effort. Efforts to sequence cultivated isolates should target characterized isolates from culture collections for which biochemical data are available, as well as other cultures of lasting value from personal collections. The genomes of type strains should be among the first targets for sequencing, but creative culture methods, novel cell isolation, and sorting methods would all be helpful in obtaining organisms we have not yet been able to cultivate for sequencing. The data that should be provided for strains targeted for sequencing will depend on the phylogenetic context of the organism and the amount of information available about its nearest relatives. Annotation is an important part of transforming genome sequences into useful resources, but it represents the most significant bottleneck to the field of comparative genomics right now and must be addressed. Furthermore, there is a need for more consistency in both annotation and achieving annotation data. As new annotation tools become available over time, re-annotation of genomes should be implemented, taking advantage of advancements in annotation techniques in order to capitalize on the genome sequences and increase both the societal and scientific benefit of genomics work. Given the proper resources, the knowledge and ability exist to be able to select model systems, some simple, some less so, and dissect them so that we may understand the processes and interactions at work in them. Colloquium participants suggest a five-pronged, coordinated initiative to exhaustively describe six different microbial ecosystems, designed to describe all the gene diversity, across genomes. In this effort, sequencing should be complemented by other experimental data, particularly transcriptomics and metabolomics data, all of which should be gathered and curated continuously. Systematic genomics efforts like the ones outlined in this document would significantly broaden our view of biological diversity and have major effects on science. This has to be backed up with examples. Considering these potential impacts and the need for acquiescence from both the public and scientists to get such projects funded and functioning, education and training will be crucial. New collaborations within the scientific community will also be necessary.« less
Stepanauskas, Ramunas; Fergusson, Elizabeth A; Brown, Joseph; Poulton, Nicole J; Tupper, Ben; Labonté, Jessica M; Becraft, Eric D; Brown, Julia M; Pachiadaki, Maria G; Povilaitis, Tadas; Thompson, Brian P; Mascena, Corianna J; Bellows, Wendy K; Lubys, Arvydas
2017-07-20
Microbial single-cell genomics can be used to provide insights into the metabolic potential, interactions, and evolution of uncultured microorganisms. Here we present WGA-X, a method based on multiple displacement amplification of DNA that utilizes a thermostable mutant of the phi29 polymerase. WGA-X enhances genome recovery from individual microbial cells and viral particles while maintaining ease of use and scalability. The greatest improvements are observed when amplifying high G+C content templates, such as those belonging to the predominant bacteria in agricultural soils. By integrating WGA-X with calibrated index-cell sorting and high-throughput genomic sequencing, we are able to analyze genomic sequences and cell sizes of hundreds of individual, uncultured bacteria, archaea, protists, and viral particles, obtained directly from marine and soil samples, in a single experiment. This approach may find diverse applications in microbiology and in biomedical and forensic studies of humans and other multicellular organisms.Single-cell genomics can be used to study uncultured microorganisms. Here, Stepanauskas et al. present a method combining improved multiple displacement amplification and FACS, to obtain genomic sequences and cell size information from uncultivated microbial cells and viral particles in environmental samples.
Lin, Hsin-Hung; Liao, Yu-Chieh
2015-01-01
Despite the ever-increasing output of next-generation sequencing data along with developing assemblers, dozens to hundreds of gaps still exist in de novo microbial assemblies due to uneven coverage and large genomic repeats. Third-generation single-molecule, real-time (SMRT) sequencing technology avoids amplification artifacts and generates kilobase-long reads with the potential to complete microbial genome assembly. However, due to the low accuracy (~85%) of third-generation sequences, a considerable amount of long reads (>50X) are required for self-correction and for subsequent de novo assembly. Recently-developed hybrid approaches, using next-generation sequencing data and as few as 5X long reads, have been proposed to improve the completeness of microbial assembly. In this study we have evaluated the contemporary hybrid approaches and demonstrated that assembling corrected long reads (by runCA) produced the best assembly compared to long-read scaffolding (e.g., AHA, Cerulean and SSPACE-LongRead) and gap-filling (SPAdes). For generating corrected long reads, we further examined long-read correction tools, such as ECTools, LSC, LoRDEC, PBcR pipeline and proovread. We have demonstrated that three microbial genomes including Escherichia coli K12 MG1655, Meiothermus ruber DSM1279 and Pdeobacter heparinus DSM2366 were successfully hybrid assembled by runCA into near-perfect assemblies using ECTools-corrected long reads. In addition, we developed a tool, Patch, which implements corrected long reads and pre-assembled contigs as inputs, to enhance microbial genome assemblies. With the additional 20X long reads, short reads of S. cerevisiae W303 were hybrid assembled into 115 contigs using the verified strategy, ECTools + runCA. Patch was subsequently applied to upgrade the assembly to a 35-contig draft genome. Our evaluation of the hybrid approaches shows that assembling the ECTools-corrected long reads via runCA generates near complete microbial genomes, suggesting that genome assembly could benefit from re-analyzing the available hybrid datasets that were not assembled in an optimal fashion.
Rapid Threat Organism Recognition Pipeline
DOE Office of Scientific and Technical Information (OSTI.GOV)
Williams, Kelly P.; Solberg, Owen D.; Schoeniger, Joseph S.
2013-05-07
The RAPTOR computational pipeline identifies microbial nucleic acid sequences present in sequence data from clinical samples. It takes as input raw short-read genomic sequence data (in particular, the type generated by the Illumina sequencing platforms) and outputs taxonomic evaluation of detected microbes in various human-readable formats. This software was designed to assist in the diagnosis or characterization of infectious disease, by detecting pathogen sequences in nucleic acid sequence data from clinical samples. It has also been applied in the detection of algal pathogens, when algal biofuel ponds became unproductive. RAPTOR first trims and filters genomic sequence reads based on qualitymore » and related considerations, then performs a quick alignment to the human (or other host) genome to filter out host sequences, then performs a deeper search against microbial genomes. Alignment to a protein sequence database is optional. Alignment results are summarized and placed in a taxonomic framework using the Lowest Common Ancestor algorithm.« less
Forster, Samuel C; Browne, Hilary P; Kumar, Nitin; Hunt, Martin; Denise, Hubert; Mitchell, Alex; Finn, Robert D; Lawley, Trevor D
2016-01-04
The Human Pan-Microbe Communities (HPMC) database (http://www.hpmcd.org/) provides a manually curated, searchable, metagenomic resource to facilitate investigation of human gastrointestinal microbiota. Over the past decade, the application of metagenome sequencing to elucidate the microbial composition and functional capacity present in the human microbiome has revolutionized many concepts in our basic biology. When sufficient high quality reference genomes are available, whole genome metagenomic sequencing can provide direct biological insights and high-resolution classification. The HPMC database provides species level, standardized phylogenetic classification of over 1800 human gastrointestinal metagenomic samples. This is achieved by combining a manually curated list of bacterial genomes from human faecal samples with over 21000 additional reference genomes representing bacteria, viruses, archaea and fungi with manually curated species classification and enhanced sample metadata annotation. A user-friendly, web-based interface provides the ability to search for (i) microbial groups associated with health or disease state, (ii) health or disease states and community structure associated with a microbial group, (iii) the enrichment of a microbial gene or sequence and (iv) enrichment of a functional annotation. The HPMC database enables detailed analysis of human microbial communities and supports research from basic microbiology and immunology to therapeutic development in human health and disease. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.
One chromosome, one contig: complete microbial genomes from long-read sequencing and assembly.
Koren, Sergey; Phillippy, Adam M
2015-02-01
Like a jigsaw puzzle with large pieces, a genome sequenced with long reads is easier to assemble. However, recent sequencing technologies have favored lowering per-base cost at the expense of read length. This has dramatically reduced sequencing cost, but resulted in fragmented assemblies, which negatively affect downstream analyses and hinder the creation of finished (gapless, high-quality) genomes. In contrast, emerging long-read sequencing technologies can now produce reads tens of kilobases in length, enabling the automated finishing of microbial genomes for under $1000. This promises to improve the quality of reference databases and facilitate new studies of chromosomal structure and variation. We present an overview of these new technologies and the methods used to assemble long reads into complete genomes. Copyright © 2014 The Authors. Published by Elsevier Ltd.. All rights reserved.
Rodriguez-R, Luis M; Gunturu, Santosh; Harvey, William T; Rosselló-Mora, Ramon; Tiedje, James M; Cole, James R; Konstantinidis, Konstantinos T
2018-06-14
The small subunit ribosomal RNA gene (16S rRNA) has been successfully used to catalogue and study the diversity of prokaryotic species and communities but it offers limited resolution at the species and finer levels, and cannot represent the whole-genome diversity and fluidity. To overcome these limitations, we introduced the Microbial Genomes Atlas (MiGA), a webserver that allows the classification of an unknown query genomic sequence, complete or partial, against all taxonomically classified taxa with available genome sequences, as well as comparisons to other related genomes including uncultivated ones, based on the genome-aggregate Average Nucleotide and Amino Acid Identity (ANI/AAI) concepts. MiGA integrates best practices in sequence quality trimming and assembly and allows input to be raw reads or assemblies from isolate genomes, single-cell sequences, and metagenome-assembled genomes (MAGs). Further, MiGA can take as input hundreds of closely related genomes of the same or closely related species (a so-called 'Clade Project') to assess their gene content diversity and evolutionary relationships, and calculate important clade properties such as the pangenome and core gene sets. Therefore, MiGA is expected to facilitate a range of genome-based taxonomic and diversity studies, and quality assessment across environmental and clinical settings. MiGA is available at http://microbial-genomes.org/.
The integrated microbial genome resource of analysis.
Checcucci, Alice; Mengoni, Alessio
2015-01-01
Integrated Microbial Genomes and Metagenomes (IMG) is a biocomputational system that allows to provide information and support for annotation and comparative analysis of microbial genomes and metagenomes. IMG has been developed by the US Department of Energy (DOE)-Joint Genome Institute (JGI). IMG platform contains both draft and complete genomes, sequenced by Joint Genome Institute and other public and available genomes. Genomes of strains belonging to Archaea, Bacteria, and Eukarya domains are present as well as those of viruses and plasmids. Here, we provide some essential features of IMG system and case study for pangenome analysis.
Genomes of diverse isolates of the marine cyanobacterium Prochlorococcus
Biller, Steven J.; Berube, Paul M.; Berta-Thompson, Jessie W.; Kelly, Libusha; Roggensack, Sara E.; Awad, Lana; Roache-Johnson, Kathryn H.; Ding, Huiming; Giovannoni, Stephen J.; Rocap, Gabrielle; Moore, Lisa R.; Chisholm, Sallie W.
2014-01-01
The marine cyanobacterium Prochlorococcus is the numerically dominant photosynthetic organism in the oligotrophic oceans, and a model system in marine microbial ecology. Here we report 27 new whole genome sequences (2 complete and closed; 25 of draft quality) of cultured isolates, representing five major phylogenetic clades of Prochlorococcus. The sequenced strains were isolated from diverse regions of the oceans, facilitating studies of the drivers of microbial diversity—both in the lab and in the field. To improve the utility of these genomes for comparative genomics, we also define pre-computed clusters of orthologous groups of proteins (COGs), indicating how genes are distributed among these and other publicly available Prochlorococcus genomes. These data represent a significant expansion of Prochlorococcus reference genomes that are useful for numerous applications in microbial ecology, evolution and oceanography. PMID:25977791
Although recent technological advances in DNA sequencing and computational biology now allow scientists to compare entire microbial genomes, the use of these approaches to discern key genomic differences between natural microbial communities remains prohibitively expensive for mo...
Jakupciak, John P; Wells, Jeffrey M; Karalus, Richard J; Pawlowski, David R; Lin, Jeffrey S; Feldman, Andrew B
2013-01-01
Large-scale genomics projects are identifying biomarkers to detect human disease. B. pseudomallei and B. mallei are two closely related select agents that cause melioidosis and glanders. Accurate characterization of metagenomic samples is dependent on accurate measurements of genetic variation between isolates with resolution down to strain level. Often single biomarker sensitivity is augmented by use of multiple or panels of biomarkers. In parallel with single biomarker validation, advances in DNA sequencing enable analysis of entire genomes in a single run: population-sequencing. Potentially, direct sequencing could be used to analyze an entire genome to serve as the biomarker for genome identification. However, genome variation and population diversity complicate use of direct sequencing, as well as differences caused by sample preparation protocols including sequencing artifacts and mistakes. As part of a Department of Homeland Security program in bacterial forensics, we examined how to implement whole genome sequencing (WGS) analysis as a judicially defensible forensic method for attributing microbial sample relatedness; and also to determine the strengths and limitations of whole genome sequence analysis in a forensics context. Herein, we demonstrate use of sequencing to provide genetic characterization of populations: direct sequencing of populations.
Jakupciak, John P.; Wells, Jeffrey M.; Karalus, Richard J.; Pawlowski, David R.; Lin, Jeffrey S.; Feldman, Andrew B.
2013-01-01
Large-scale genomics projects are identifying biomarkers to detect human disease. B. pseudomallei and B. mallei are two closely related select agents that cause melioidosis and glanders. Accurate characterization of metagenomic samples is dependent on accurate measurements of genetic variation between isolates with resolution down to strain level. Often single biomarker sensitivity is augmented by use of multiple or panels of biomarkers. In parallel with single biomarker validation, advances in DNA sequencing enable analysis of entire genomes in a single run: population-sequencing. Potentially, direct sequencing could be used to analyze an entire genome to serve as the biomarker for genome identification. However, genome variation and population diversity complicate use of direct sequencing, as well as differences caused by sample preparation protocols including sequencing artifacts and mistakes. As part of a Department of Homeland Security program in bacterial forensics, we examined how to implement whole genome sequencing (WGS) analysis as a judicially defensible forensic method for attributing microbial sample relatedness; and also to determine the strengths and limitations of whole genome sequence analysis in a forensics context. Herein, we demonstrate use of sequencing to provide genetic characterization of populations: direct sequencing of populations. PMID:24455204
Viral dark matter and virus–host interactions resolved from publicly available microbial genomes
DOE Office of Scientific and Technical Information (OSTI.GOV)
Roux, Simon; Hallam, Steven J.; Woyke, Tanja
The ecological importance of viruses is now widely recognized, yet our limited knowledge of viral sequence space and virus–host interactions precludes accurate prediction of their roles and impacts. In this study, we mined publicly available bacterial and archaeal genomic data sets to identify 12,498 high-confidence viral genomes linked to their microbial hosts. These data augment public data sets 10-fold, provide first viral sequences for 13 new bacterial phyla including ecologically abundant phyla, and help taxonomically identify 7–38% of ‘unknown’ sequence space in viromes. Genome- and network-based classification was largely consistent with accepted viral taxonomy and suggested that (i) 264 newmore » viral genera were identified (doubling known genera) and (ii) cross-taxon genomic recombination is limited. Further analyses provided empirical data on extrachromosomal prophages and coinfection prevalences, as well as evaluation of in silico virus–host linkage predictions. Together these findings illustrate the value of mining viral signal from microbial genomes.« less
Viral dark matter and virus-host interactions resolved from publicly available microbial genomes.
Roux, Simon; Hallam, Steven J; Woyke, Tanja; Sullivan, Matthew B
2015-07-22
The ecological importance of viruses is now widely recognized, yet our limited knowledge of viral sequence space and virus-host interactions precludes accurate prediction of their roles and impacts. In this study, we mined publicly available bacterial and archaeal genomic data sets to identify 12,498 high-confidence viral genomes linked to their microbial hosts. These data augment public data sets 10-fold, provide first viral sequences for 13 new bacterial phyla including ecologically abundant phyla, and help taxonomically identify 7-38% of 'unknown' sequence space in viromes. Genome- and network-based classification was largely consistent with accepted viral taxonomy and suggested that (i) 264 new viral genera were identified (doubling known genera) and (ii) cross-taxon genomic recombination is limited. Further analyses provided empirical data on extrachromosomal prophages and coinfection prevalences, as well as evaluation of in silico virus-host linkage predictions. Together these findings illustrate the value of mining viral signal from microbial genomes.
Viral dark matter and virus–host interactions resolved from publicly available microbial genomes
Roux, Simon; Hallam, Steven J.; Woyke, Tanja; ...
2015-07-22
The ecological importance of viruses is now widely recognized, yet our limited knowledge of viral sequence space and virus–host interactions precludes accurate prediction of their roles and impacts. In this study, we mined publicly available bacterial and archaeal genomic data sets to identify 12,498 high-confidence viral genomes linked to their microbial hosts. These data augment public data sets 10-fold, provide first viral sequences for 13 new bacterial phyla including ecologically abundant phyla, and help taxonomically identify 7–38% of ‘unknown’ sequence space in viromes. Genome- and network-based classification was largely consistent with accepted viral taxonomy and suggested that (i) 264 newmore » viral genera were identified (doubling known genera) and (ii) cross-taxon genomic recombination is limited. Further analyses provided empirical data on extrachromosomal prophages and coinfection prevalences, as well as evaluation of in silico virus–host linkage predictions. Together these findings illustrate the value of mining viral signal from microbial genomes.« less
From cultured to uncultured genome sequences: metagenomics and modeling microbial ecosystems.
Garza, Daniel R; Dutilh, Bas E
2015-11-01
Microorganisms and the viruses that infect them are the most numerous biological entities on Earth and enclose its greatest biodiversity and genetic reservoir. With strength in their numbers, these microscopic organisms are major players in the cycles of energy and matter that sustain all life. Scientists have only scratched the surface of this vast microbial world through culture-dependent methods. Recent developments in generating metagenomes, large random samples of nucleic acid sequences isolated directly from the environment, are providing comprehensive portraits of the composition, structure, and functioning of microbial communities. Moreover, advances in metagenomic analysis have created the possibility of obtaining complete or nearly complete genome sequences from uncultured microorganisms, providing important means to study their biology, ecology, and evolution. Here we review some of the recent developments in the field of metagenomics, focusing on the discovery of genetic novelty and on methods for obtaining uncultured genome sequences, including through the recycling of previously published datasets. Moreover we discuss how metagenomics has become a core scientific tool to characterize eco-evolutionary patterns of microbial ecosystems, thus allowing us to simultaneously discover new microbes and study their natural communities. We conclude by discussing general guidelines and challenges for modeling the interactions between uncultured microorganisms and viruses based on the information contained in their genome sequences. These models will significantly advance our understanding of the functioning of microbial ecosystems and the roles of microbes in the environment.
A Case Study into Microbial Genome Assembly Gap Sequences and Finishing Strategies.
Utturkar, Sagar M; Klingeman, Dawn M; Hurt, Richard A; Brown, Steven D
2017-01-01
This study characterized regions of DNA which remained unassembled by either PacBio and Illumina sequencing technologies for seven bacterial genomes. Two genomes were manually finished using bioinformatics and PCR/Sanger sequencing approaches and regions not assembled by automated software were analyzed. Gaps present within Illumina assemblies mostly correspond to repetitive DNA regions such as multiple rRNA operon sequences. PacBio gap sequences were evaluated for several properties such as GC content, read coverage, gap length, ability to form strong secondary structures, and corresponding annotations. Our hypothesis that strong secondary DNA structures blocked DNA polymerases and contributed to gap sequences was not accepted. PacBio assemblies had few limitations overall and gaps were explained as cumulative effect of lower than average sequence coverage and repetitive sequences at contig termini. An important aspect of the present study is the compilation of biological features that interfered with assembly and included active transposons, multiple plasmid sequences, phage DNA integration, and large sequence duplication. Our targeted genome finishing approach and systematic evaluation of the unassembled DNA will be useful for others looking to close, finish, and polish microbial genome sequences.
Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data.
Chin, Chen-Shan; Alexander, David H; Marks, Patrick; Klammer, Aaron A; Drake, James; Heiner, Cheryl; Clum, Alicia; Copeland, Alex; Huddleston, John; Eichler, Evan E; Turner, Stephen W; Korlach, Jonas
2013-06-01
We present a hierarchical genome-assembly process (HGAP) for high-quality de novo microbial genome assemblies using only a single, long-insert shotgun DNA library in conjunction with Single Molecule, Real-Time (SMRT) DNA sequencing. Our method uses the longest reads as seeds to recruit all other reads for construction of highly accurate preassembled reads through a directed acyclic graph-based consensus procedure, which we follow with assembly using off-the-shelf long-read assemblers. In contrast to hybrid approaches, HGAP does not require highly accurate raw reads for error correction. We demonstrate efficient genome assembly for several microorganisms using as few as three SMRT Cell zero-mode waveguide arrays of sequencing and for BACs using just one SMRT Cell. Long repeat regions can be successfully resolved with this workflow. We also describe a consensus algorithm that incorporates SMRT sequencing primary quality values to produce de novo genome sequence exceeding 99.999% accuracy.
Food Safety in the Age of Next Generation Sequencing, Bioinformatics, and Open Data Access.
Taboada, Eduardo N; Graham, Morag R; Carriço, João A; Van Domselaar, Gary
2017-01-01
Public health labs and food regulatory agencies globally are embracing whole genome sequencing (WGS) as a revolutionary new method that is positioned to replace numerous existing diagnostic and microbial typing technologies with a single new target: the microbial draft genome. The ability to cheaply generate large amounts of microbial genome sequence data, combined with emerging policies of food regulatory and public health institutions making their microbial sequences increasingly available and public, has served to open up the field to the general scientific community. This open data access policy shift has resulted in a proliferation of data being deposited into sequence repositories and of novel bioinformatics software designed to analyze these vast datasets. There also has been a more recent drive for improved data sharing to achieve more effective global surveillance, public health and food safety. Such developments have heightened the need for enhanced analytical systems in order to process and interpret this new type of data in a timely fashion. In this review we outline the emergence of genomics, bioinformatics and open data in the context of food safety. We also survey major efforts to translate genomics and bioinformatics technologies out of the research lab and into routine use in modern food safety labs. We conclude by discussing the challenges and opportunities that remain, including those expected to play a major role in the future of food safety science.
Genomic investigations of evolutionary dynamics and epistasis in microbial evolution experiments.
Jerison, Elizabeth R; Desai, Michael M
2015-12-01
Microbial evolution experiments enable us to watch adaptation in real time, and to quantify the repeatability and predictability of evolution by comparing identical replicate populations. Further, we can resurrect ancestral types to examine changes over evolutionary time. Until recently, experimental evolution has been limited to measuring phenotypic changes, or to tracking a few genetic markers over time. However, recent advances in sequencing technology now make it possible to extensively sequence clones or whole-population samples from microbial evolution experiments. Here, we review recent work exploiting these techniques to understand the genomic basis of evolutionary change in experimental systems. We first focus on studies that analyze the dynamics of genome evolution in microbial systems. We then survey work that uses observations of sequence evolution to infer aspects of the underlying fitness landscape, concentrating on the epistatic interactions between mutations and the constraints these interactions impose on adaptation. Copyright © 2015 Elsevier Ltd. All rights reserved.
Fumoto, Masaki; Miyazaki, Satoru; Sugawara, Hideaki
2002-01-01
Genome Information Broker (GIB) is a powerful tool for the study of comparative genomics. GIB allows users to retrieve and display partial and/or whole genome sequences together with the relevant biological annotation. GIB has accumulated all the completed microbial genome and has recently been expanded to include Arabidopsis thaliana genome data from DDBJ/EMBL/GenBank. In the near future, hundreds of genome sequences will be determined. In order to handle such huge data, we have enhanced the GIB architecture by using XML, CORBA and distributed RDBs. We introduce the new GIB here. GIB is freely accessible at http://gib.genes.nig.ac.jp/. PMID:11752256
Desiderato, Joana G; Alvarenga, Danillo O; Constancio, Milena T L; Alves, Lucia M C; Varani, Alessandro M
2018-05-14
Cellulose and its associated polymers are structural components of the plant cell wall, constituting one of the major sources of carbon and energy in nature. The carbon cycle is dependent on cellulose- and lignin-decomposing microbial communities and their enzymatic systems acting as consortia. These microbial consortia are under constant exploration for their potential biotechnological use. Herein, we describe the characterization of the genome of Dyella jiangningensis FCAV SCS01, recovered from the metagenome of a lignocellulose-degrading microbial consortium, which was isolated from a sugarcane crop soil under mechanical harvesting and covered by decomposing straw. The 4.7 Mbp genome encodes 4,194 proteins, including 36 glycoside hydrolases (GH), supporting the hypothesis that this bacterium may contribute to lignocellulose decomposition. Comparative analysis among fully sequenced Dyella species indicate that the genome synteny is not conserved, and that D. jiangningensis FCAV SCS01 carries 372 unique genes, including an alpha-glucosidase and maltodextrin glucosidase coding genes, and other potential biomass degradation related genes. Additional genomic features, such as prophage-like, genomic islands and putative new biosynthetic clusters were also uncovered. Overall, D. jiangningensis FCAV SCS01 represents the first South American Dyella genome sequenced and shows an exclusive feature among its genus, related to biomass degradation.
Leichty, Aaron R; Brisson, Dustin
2014-10-01
Population genomic analyses have demonstrated power to address major questions in evolutionary and molecular microbiology. Collecting populations of genomes is hindered in many microbial species by the absence of a cost effective and practical method to collect ample quantities of sufficiently pure genomic DNA for next-generation sequencing. Here we present a simple method to amplify genomes of a target microbial species present in a complex, natural sample. The selective whole genome amplification (SWGA) technique amplifies target genomes using nucleotide sequence motifs that are common in the target microbe genome, but rare in the background genomes, to prime the highly processive phi29 polymerase. SWGA thus selectively amplifies the target genome from samples in which it originally represented a minor fraction of the total DNA. The post-SWGA samples are enriched in target genomic DNA, which are ideal for population resequencing. We demonstrate the efficacy of SWGA using both laboratory-prepared mixtures of cultured microbes as well as a natural host-microbe association. Targeted amplification of Borrelia burgdorferi mixed with Escherichia coli at genome ratios of 1:2000 resulted in >10(5)-fold amplification of the target genomes with <6.7-fold amplification of the background. SWGA-treated genomic extracts from Wolbachia pipientis-infected Drosophila melanogaster resulted in up to 70% of high-throughput resequencing reads mapping to the W. pipientis genome. By contrast, 2-9% of sequencing reads were derived from W. pipientis without prior amplification. The SWGA technique results in high sequencing coverage at a fraction of the sequencing effort, thus allowing population genomic studies at affordable costs. Copyright © 2014 by the Genetics Society of America.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Buhay, Christian
Christian Buhay from Baylor College of Medicine's Human Genome Sequencing Center discusses microbial genome finishing strategies on June 3, 2010 at the "Sequencing, Finishing, Analysis in the Future" meeting in Santa Fe, NM.
Recovering complete and draft population genomes from metagenome datasets
Sangwan, Naseer; Xia, Fangfang; Gilbert, Jack A.
2016-03-08
Assembly of metagenomic sequence data into microbial genomes is of fundamental value to improving our understanding of microbial ecology and metabolism by elucidating the functional potential of hard-to-culture microorganisms. Here, we provide a synthesis of available methods to bin metagenomic contigs into species-level groups and highlight how genetic diversity, sequencing depth, and coverage influence binning success. Despite the computational cost on application to deeply sequenced complex metagenomes (e.g., soil), covarying patterns of contig coverage across multiple datasets significantly improves the binning process. We also discuss and compare current genome validation methods and reveal how these methods tackle the problem ofmore » chimeric genome bins i.e., sequences from multiple species. Finally, we explore how population genome assembly can be used to uncover biogeographic trends and to characterize the effect of in situ functional constraints on the genome-wide evolution.« less
Recovering complete and draft population genomes from metagenome datasets
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sangwan, Naseer; Xia, Fangfang; Gilbert, Jack A.
Assembly of metagenomic sequence data into microbial genomes is of fundamental value to improving our understanding of microbial ecology and metabolism by elucidating the functional potential of hard-to-culture microorganisms. Here, we provide a synthesis of available methods to bin metagenomic contigs into species-level groups and highlight how genetic diversity, sequencing depth, and coverage influence binning success. Despite the computational cost on application to deeply sequenced complex metagenomes (e.g., soil), covarying patterns of contig coverage across multiple datasets significantly improves the binning process. We also discuss and compare current genome validation methods and reveal how these methods tackle the problem ofmore » chimeric genome bins i.e., sequences from multiple species. Finally, we explore how population genome assembly can be used to uncover biogeographic trends and to characterize the effect of in situ functional constraints on the genome-wide evolution.« less
Short reads from honey bee (Apis sp.) sequencing projects reflect microbial associate diversity
Hurst, Gregory D.D.
2017-01-01
High throughput (or ‘next generation’) sequencing has transformed most areas of biological research and is now a standard method that underpins empirical study of organismal biology, and (through comparison of genomes), reveals patterns of evolution. For projects focused on animals, these sequencing methods do not discriminate between the primary target of sequencing (the animal genome) and ‘contaminating’ material, such as associated microbes. A common first step is to filter out these contaminants to allow better assembly of the animal genome or transcriptome. Here, we aimed to assess if these ‘contaminations’ provide information with regard to biologically important microorganisms associated with the individual. To achieve this, we examined whether the short read data from Apis retrieved elements of its well established microbiome. To this end, we screened almost 1,000 short read libraries of honey bee (Apis sp.) DNA sequencing project for the presence of microbial sequences, and find sequences from known honey bee microbial associates in at least 11% of them. Further to this, we screened ∼500 Apis RNA sequencing libraries for evidence of viral infections, which were found to be present in about half of them. We then used the data to reconstruct draft genomes of three Apis associated bacteria, as well as several viral strains de novo. We conclude that ‘contamination’ in short read sequencing libraries can provide useful genomic information on microbial taxa known to be associated with the target organisms, and may even lead to the discovery of novel associations. Finally, we demonstrate that RNAseq samples from experiments commonly carry uneven viral loads across libraries. We note variation in viral presence and load may be a confounding feature of differential gene expression analyses, and as such it should be incorporated as a random factor in analyses. PMID:28717593
Short reads from honey bee (Apis sp.) sequencing projects reflect microbial associate diversity.
Gerth, Michael; Hurst, Gregory D D
2017-01-01
High throughput (or 'next generation') sequencing has transformed most areas of biological research and is now a standard method that underpins empirical study of organismal biology, and (through comparison of genomes), reveals patterns of evolution. For projects focused on animals, these sequencing methods do not discriminate between the primary target of sequencing (the animal genome) and 'contaminating' material, such as associated microbes. A common first step is to filter out these contaminants to allow better assembly of the animal genome or transcriptome. Here, we aimed to assess if these 'contaminations' provide information with regard to biologically important microorganisms associated with the individual. To achieve this, we examined whether the short read data from Apis retrieved elements of its well established microbiome. To this end, we screened almost 1,000 short read libraries of honey bee ( Apis sp.) DNA sequencing project for the presence of microbial sequences, and find sequences from known honey bee microbial associates in at least 11% of them. Further to this, we screened ∼500 Apis RNA sequencing libraries for evidence of viral infections, which were found to be present in about half of them. We then used the data to reconstruct draft genomes of three Apis associated bacteria, as well as several viral strains de novo . We conclude that 'contamination' in short read sequencing libraries can provide useful genomic information on microbial taxa known to be associated with the target organisms, and may even lead to the discovery of novel associations. Finally, we demonstrate that RNAseq samples from experiments commonly carry uneven viral loads across libraries. We note variation in viral presence and load may be a confounding feature of differential gene expression analyses, and as such it should be incorporated as a random factor in analyses.
Whole-genome random sequencing and assembly of Haemophilus influenzae Rd
DOE Office of Scientific and Technical Information (OSTI.GOV)
Fleischmann, R.D.; Adams, M.D.; White, O.
1995-07-28
An approach for genome analysis based on sequencing and assembly of unselected pieces of DNA from the whole chromosome has been applied to obtain the complete nucleotide sequence (1,830,137 base pairs) of the genome from the bacterium Haemophilus influenzae Rd. This approach eliminates the need for initial mapping efforts and is therefore applicable to the vast array of microbial species for which genome maps are unavailable. The H. influenzae Rd genome sequence (Genome Sequence DataBase accession number L42023) represents the only complete genome sequence from a free-living organism. 46 refs., 4 figs., 4 tabs.
Angly, Florent E; Willner, Dana; Prieto-Davó, Alejandra; Edwards, Robert A; Schmieder, Robert; Vega-Thurber, Rebecca; Antonopoulos, Dionysios A; Barott, Katie; Cottrell, Matthew T; Desnues, Christelle; Dinsdale, Elizabeth A; Furlan, Mike; Haynes, Matthew; Henn, Matthew R; Hu, Yongfei; Kirchman, David L; McDole, Tracey; McPherson, John D; Meyer, Folker; Miller, R Michael; Mundt, Egbert; Naviaux, Robert K; Rodriguez-Mueller, Beltran; Stevens, Rick; Wegley, Linda; Zhang, Lixin; Zhu, Baoli; Rohwer, Forest
2009-12-01
Metagenomic studies characterize both the composition and diversity of uncultured viral and microbial communities. BLAST-based comparisons have typically been used for such analyses; however, sampling biases, high percentages of unknown sequences, and the use of arbitrary thresholds to find significant similarities can decrease the accuracy and validity of estimates. Here, we present Genome relative Abundance and Average Size (GAAS), a complete software package that provides improved estimates of community composition and average genome length for metagenomes in both textual and graphical formats. GAAS implements a novel methodology to control for sampling bias via length normalization, to adjust for multiple BLAST similarities by similarity weighting, and to select significant similarities using relative alignment lengths. In benchmark tests, the GAAS method was robust to both high percentages of unknown sequences and to variations in metagenomic sequence read lengths. Re-analysis of the Sargasso Sea virome using GAAS indicated that standard methodologies for metagenomic analysis may dramatically underestimate the abundance and importance of organisms with small genomes in environmental systems. Using GAAS, we conducted a meta-analysis of microbial and viral average genome lengths in over 150 metagenomes from four biomes to determine whether genome lengths vary consistently between and within biomes, and between microbial and viral communities from the same environment. Significant differences between biomes and within aquatic sub-biomes (oceans, hypersaline systems, freshwater, and microbialites) suggested that average genome length is a fundamental property of environments driven by factors at the sub-biome level. The behavior of paired viral and microbial metagenomes from the same environment indicated that microbial and viral average genome sizes are independent of each other, but indicative of community responses to stressors and environmental conditions.
Campanaro, Stefano; Treu, Laura; Kougias, Panagiotis G; De Francisci, Davide; Valle, Giorgio; Angelidaki, Irini
2016-01-01
Biogas production is an economically attractive technology that has gained momentum worldwide over the past years. Biogas is produced by a biologically mediated process, widely known as "anaerobic digestion." This process is performed by a specialized and complex microbial community, in which different members have distinct roles in the establishment of a collective organization. Deciphering the complex microbial community engaged in this process is interesting both for unraveling the network of bacterial interactions and for applicability potential to the derived knowledge. In this study, we dissect the bioma involved in anaerobic digestion by means of high throughput Illumina sequencing (~51 gigabases of sequence data), disclosing nearly one million genes and extracting 106 microbial genomes by a novel strategy combining two binning processes. Microbial phylogeny and putative taxonomy performed using >400 proteins revealed that the biogas community is a trove of new species. A new approach based on functional properties as per network representation was developed to assign roles to the microbial species. The organization of the anaerobic digestion microbiome is resembled by a funnel concept, in which the microbial consortium presents a progressive functional specialization while reaching the final step of the process (i.e., methanogenesis). Key microbial genomes encoding enzymes involved in specific metabolic pathways, such as carbohydrates utilization, fatty acids degradation, amino acids fermentation, and syntrophic acetate oxidation, were identified. Additionally, the analysis identified a new uncultured archaeon that was putatively related to Methanomassiliicoccales but surprisingly having a methylotrophic methanogenic pathway. This study is a pioneer research on the phylogenetic and functional characterization of the microbial community populating biogas reactors. By applying for the first time high-throughput sequencing and a novel binning strategy, the identified genes were anchored to single genomes providing a clear understanding of their metabolic pathways and highlighting their involvement in anaerobic digestion. The overall research established a reference catalog of biogas microbial genomes that will greatly simplify future genomic studies.
Sanz, Yolanda
2017-01-01
Abstract The miniaturized and portable DNA sequencer MinION™ has demonstrated great potential in different analyses such as genome-wide sequencing, pathogen outbreak detection and surveillance, human genome variability, and microbial diversity. In this study, we tested the ability of the MinION™ platform to perform long amplicon sequencing in order to design new approaches to study microbial diversity using a multi-locus approach. After compiling a robust database by parsing and extracting the rrn bacterial region from more than 67000 complete or draft bacterial genomes, we demonstrated that the data obtained during sequencing of the long amplicon in the MinION™ device using R9 and R9.4 chemistries were sufficient to study 2 mock microbial communities in a multiplex manner and to almost completely reconstruct the microbial diversity contained in the HM782D and D6305 mock communities. Although nanopore-based sequencing produces reads with lower per-base accuracy compared with other platforms, we presented a novel approach consisting of multi-locus and long amplicon sequencing using the MinION™ MkIb DNA sequencer and R9 and R9.4 chemistries that help to overcome the main disadvantage of this portable sequencing platform. Furthermore, the nanopore sequencing library, constructed with the last releases of pore chemistry (R9.4) and sequencing kit (SQK-LSK108), permitted the retrieval of the higher level of 1D read accuracy sufficient to characterize the microbial species present in each mock community analysed. Improvements in nanopore chemistry, such as minimizing base-calling errors and new library protocols able to produce rapid 1D libraries, will provide more reliable information in the near future. Such data will be useful for more comprehensive and faster specific detection of microbial species and strains in complex ecosystems. PMID:28605506
Microbial taxonomy in the post-genomic era: Rebuilding from scratch?
DOE Office of Scientific and Technical Information (OSTI.GOV)
Thompson, Cristiane C.; Amaral, Gilda R.; Campeão, Mariana
2014-12-23
Microbial taxonomy should provide adequate descriptions of bacterial, archaeal, and eukaryotic microbial diversity in ecological, clinical, and industrial environments. We re-evaluated the prokaryote species twice. It is time to revisit polyphasic taxonomy, its principles, and its practice, including its underlying pragmatic species concept. We will be able to realize an old dream of our predecessor taxonomists and build a genomic-based microbial taxonomy, using standardized and automated curation of high-quality complete genome sequences as the new gold standard.
Viral dark matter and virus–host interactions resolved from publicly available microbial genomes
Roux, Simon; Hallam, Steven J; Woyke, Tanja; Sullivan, Matthew B
2015-01-01
The ecological importance of viruses is now widely recognized, yet our limited knowledge of viral sequence space and virus–host interactions precludes accurate prediction of their roles and impacts. In this study, we mined publicly available bacterial and archaeal genomic data sets to identify 12,498 high-confidence viral genomes linked to their microbial hosts. These data augment public data sets 10-fold, provide first viral sequences for 13 new bacterial phyla including ecologically abundant phyla, and help taxonomically identify 7–38% of ‘unknown’ sequence space in viromes. Genome- and network-based classification was largely consistent with accepted viral taxonomy and suggested that (i) 264 new viral genera were identified (doubling known genera) and (ii) cross-taxon genomic recombination is limited. Further analyses provided empirical data on extrachromosomal prophages and coinfection prevalences, as well as evaluation of in silico virus–host linkage predictions. Together these findings illustrate the value of mining viral signal from microbial genomes. DOI: http://dx.doi.org/10.7554/eLife.08490.001 PMID:26200428
Microbial genome sequencing using optical mapping and Illumina sequencing
USDA-ARS?s Scientific Manuscript database
Introduction Optical mapping is a technique in which strands of genomic DNA are digested with one or more restriction enzymes, and a physical map of the genome constructed from the resulting image. In outline, genomic DNA is extracted from a pure culture, linearly arrayed on a specialized glass sli...
A Case Study into Microbial Genome Assembly Gap Sequences and Finishing Strategies
DOE Office of Scientific and Technical Information (OSTI.GOV)
Utturkar, Sagar M.; Klingeman, Dawn M.; Hurt, Jr., Richard A.
This study characterized regions of DNA which remained unassembled by either PacBio and Illumina sequencing technologies for seven bacterial genomes. Two genomes were manually finished using bioinformatics and PCR/Sanger sequencing approaches and regions not assembled by automated software were analyzed. Gaps present within Illumina assemblies mostly correspond to repetitive DNA regions such as multiple rRNA operon sequences. PacBio gap sequences were evaluated for several properties such as GC content, read coverage, gap length, ability to form strong secondary structures, and corresponding annotations. Our hypothesis that strong secondary DNA structures blocked DNA polymerases and contributed to gap sequences was not accepted.more » PacBio assemblies had few limitations overall and gaps were explained as cumulative effect of lower than average sequence coverage and repetitive sequences at contig termini. An important aspect of the present study is the compilation of biological features that interfered with assembly and included active transposons, multiple plasmid sequences, phage DNA integration, and large sequence duplication. Furthermore, our targeted genome finishing approach and systematic evaluation of the unassembled DNA will be useful for others looking to close, finish, and polish microbial genome sequences.« less
A Case Study into Microbial Genome Assembly Gap Sequences and Finishing Strategies
Utturkar, Sagar M.; Klingeman, Dawn M.; Hurt, Jr., Richard A.; ...
2017-07-18
This study characterized regions of DNA which remained unassembled by either PacBio and Illumina sequencing technologies for seven bacterial genomes. Two genomes were manually finished using bioinformatics and PCR/Sanger sequencing approaches and regions not assembled by automated software were analyzed. Gaps present within Illumina assemblies mostly correspond to repetitive DNA regions such as multiple rRNA operon sequences. PacBio gap sequences were evaluated for several properties such as GC content, read coverage, gap length, ability to form strong secondary structures, and corresponding annotations. Our hypothesis that strong secondary DNA structures blocked DNA polymerases and contributed to gap sequences was not accepted.more » PacBio assemblies had few limitations overall and gaps were explained as cumulative effect of lower than average sequence coverage and repetitive sequences at contig termini. An important aspect of the present study is the compilation of biological features that interfered with assembly and included active transposons, multiple plasmid sequences, phage DNA integration, and large sequence duplication. Furthermore, our targeted genome finishing approach and systematic evaluation of the unassembled DNA will be useful for others looking to close, finish, and polish microbial genome sequences.« less
A Case Study into Microbial Genome Assembly Gap Sequences and Finishing Strategies
Utturkar, Sagar M.; Klingeman, Dawn M.; Hurt, Richard A.; Brown, Steven D.
2017-01-01
This study characterized regions of DNA which remained unassembled by either PacBio and Illumina sequencing technologies for seven bacterial genomes. Two genomes were manually finished using bioinformatics and PCR/Sanger sequencing approaches and regions not assembled by automated software were analyzed. Gaps present within Illumina assemblies mostly correspond to repetitive DNA regions such as multiple rRNA operon sequences. PacBio gap sequences were evaluated for several properties such as GC content, read coverage, gap length, ability to form strong secondary structures, and corresponding annotations. Our hypothesis that strong secondary DNA structures blocked DNA polymerases and contributed to gap sequences was not accepted. PacBio assemblies had few limitations overall and gaps were explained as cumulative effect of lower than average sequence coverage and repetitive sequences at contig termini. An important aspect of the present study is the compilation of biological features that interfered with assembly and included active transposons, multiple plasmid sequences, phage DNA integration, and large sequence duplication. Our targeted genome finishing approach and systematic evaluation of the unassembled DNA will be useful for others looking to close, finish, and polish microbial genome sequences. PMID:28769883
DOE Office of Scientific and Technical Information (OSTI.GOV)
With the flood of whole genome finished and draft microbial sequences, we need faster, more scalable bioinformatics tools for sequence comparison. An algorithm is described to find single nucleotide polymorphisms (SNPs) in whole genome data. It scales to hundreds of bacterial or viral genomes, and can be used for finished and/or draft genomes available as unassembled contigs or raw, unassembled reads. The method is fast to compute, finding SNPs and building a SNP phylogeny in minutes to hours, depending on the size and diversity of the input sequences. The SNP-based trees that result are consistent with known taxonomy and treesmore » determined in other studies. The approach we describe can handle many gigabases of sequence in a single run. The algorithm is based on k-mer analysis.« less
Almeida, Mathieu; Hébert, Agnès; Abraham, Anne-Laure; Rasmussen, Simon; Monnet, Christophe; Pons, Nicolas; Delbès, Céline; Loux, Valentin; Batto, Jean-Michel; Leonard, Pierre; Kennedy, Sean; Ehrlich, Stanislas Dusko; Pop, Mihai; Montel, Marie-Christine; Irlinger, Françoise; Renault, Pierre
2014-12-13
Microbial communities of traditional cheeses are complex and insufficiently characterized. The origin, safety and functional role in cheese making of these microbial communities are still not well understood. Metagenomic analysis of these communities by high throughput shotgun sequencing is a promising approach to characterize their genomic and functional profiles. Such analyses, however, critically depend on the availability of appropriate reference genome databases against which the sequencing reads can be aligned. We built a reference genome catalog suitable for short read metagenomic analysis using a low-cost sequencing strategy. We selected 142 bacteria isolated from dairy products belonging to 137 different species and 67 genera, and succeeded to reconstruct the draft genome of 117 of them at a standard or high quality level, including isolates from the genera Kluyvera, Luteococcus and Marinilactibacillus, still missing from public database. To demonstrate the potential of this catalog, we analysed the microbial composition of the surface of two smear cheeses and one blue-veined cheese, and showed that a significant part of the microbiota of these traditional cheeses was composed of microorganisms newly sequenced in our study. Our study provides data, which combined with publicly available genome references, represents the most expansive catalog to date of cheese-associated bacteria. Using this extended dairy catalog, we revealed the presence in traditional cheese of dominant microorganisms not deliberately inoculated, mainly Gram-negative genera such as Pseudoalteromonas haloplanktis or Psychrobacter immobilis, that may contribute to the characteristics of cheese produced through traditional methods.
GWFASTA: server for FASTA search in eukaryotic and microbial genomes.
Issac, Biju; Raghava, G P S
2002-09-01
Similarity searches are a powerful method for solving important biological problems such as database scanning, evolutionary studies, gene prediction, and protein structure prediction. FASTA is a widely used sequence comparison tool for rapid database scanning. Here we describe the GWFASTA server that was developed to assist the FASTA user in similarity searches against partially and/or completely sequenced genomes. GWFASTA consists of more than 60 microbial genomes, eight eukaryote genomes, and proteomes of annotatedgenomes. Infact, it provides the maximum number of databases for similarity searching from a single platform. GWFASTA allows the submission of more than one sequence as a single query for a FASTA search. It also provides integrated post-processing of FASTA output, including compositional analysis of proteins, multiple sequences alignment, and phylogenetic analysis. Furthermore, it summarizes the search results organism-wise for prokaryotes and chromosome-wise for eukaryotes. Thus, the integration of different tools for sequence analyses makes GWFASTA a powerful toolfor biologists.
Longitudinal Metagenomic Analysis of Hospital Air Identifies Clinically Relevant Microbes.
King, Paula; Pham, Long K; Waltz, Shannon; Sphar, Dan; Yamamoto, Robert T; Conrad, Douglas; Taplitz, Randy; Torriani, Francesca; Forsyth, R Allyn
2016-01-01
We describe the sampling of sixty-three uncultured hospital air samples collected over a six-month period and analysis using shotgun metagenomic sequencing. Our primary goals were to determine the longitudinal metagenomic variability of this environment, identify and characterize genomes of potential pathogens and determine whether they are atypical to the hospital airborne metagenome. Air samples were collected from eight locations which included patient wards, the main lobby and outside. The resulting DNA libraries produced 972 million sequences representing 51 gigabases. Hierarchical clustering of samples by the most abundant 50 microbial orders generated three major nodes which primarily clustered by type of location. Because the indoor locations were longitudinally consistent, episodic relative increases in microbial genomic signatures related to the opportunistic pathogens Aspergillus, Penicillium and Stenotrophomonas were identified as outliers at specific locations. Further analysis of microbial reads specific for Stenotrophomonas maltophilia indicated homology to a sequenced multi-drug resistant clinical strain and we observed broad sequence coverage of resistance genes. We demonstrate that a shotgun metagenomic sequencing approach can be used to characterize the resistance determinants of pathogen genomes that are uncharacteristic for an otherwise consistent hospital air microbial metagenomic profile.
Marcy, Yann; Ouverney, Cleber; Bik, Elisabeth M.; Lösekann, Tina; Ivanova, Natalia; Martin, Hector Garcia; Szeto, Ernest; Platt, Darren; Hugenholtz, Philip; Relman, David A.; Quake, Stephen R.
2007-01-01
We have developed a microfluidic device that allows the isolation and genome amplification of individual microbial cells, thereby enabling organism-level genomic analysis of complex microbial ecosystems without the need for culture. This device was used to perform a directed survey of the human subgingival crevice and to isolate bacteria having rod-like morphology. Several isolated microbes had a 16S rRNA sequence that placed them in candidate phylum TM7, which has no cultivated or sequenced members. Genome amplification from individual TM7 cells allowed us to sequence and assemble >1,000 genes, providing insight into the physiology of members of this phylum. This approach enables single-cell genetic analysis of any uncultivated minority member of a microbial community. PMID:17620602
Microbial genomic island discovery, visualization and analysis.
Bertelli, Claire; Tilley, Keith E; Brinkman, Fiona S L
2018-06-03
Horizontal gene transfer (also called lateral gene transfer) is a major mechanism for microbial genome evolution, enabling rapid adaptation and survival in specific niches. Genomic islands (GIs), commonly defined as clusters of bacterial or archaeal genes of probable horizontal origin, are of particular medical, environmental and/or industrial interest, as they disproportionately encode virulence factors and some antimicrobial resistance genes and may harbor entire metabolic pathways that confer a specific adaptation (solvent resistance, symbiosis properties, etc). As large-scale analyses of microbial genomes increases, such as for genomic epidemiology investigations of infectious disease outbreaks in public health, there is increased appreciation of the need to accurately predict and track GIs. Over the past decade, numerous computational tools have been developed to tackle the challenges inherent in accurate GI prediction. We review here the main types of GI prediction methods and discuss their advantages and limitations for a routine analysis of microbial genomes in this era of rapid whole-genome sequencing. An assessment is provided of 20 GI prediction software methods that use sequence-composition bias to identify the GIs, using a reference GI data set from 104 genomes obtained using an independent comparative genomics approach. Finally, we present guidelines to assist researchers in effectively identifying these key genomic regions.
Draft Genome Sequence of a Bacillus Bacterium from the Atacama Desert Wetlands Metagenome
Vilo, Claudia; Galetovic, Alexandra; Araya, Jorge E.; Dong, Qunfeng
2015-01-01
We report here the draft genome sequence of a Bacillus bacterium isolated from the microflora of Nostoc colonies grown at the Andean wetlands in northern Chile. We consider this genome sequence to be a molecular tool for exploring microbial relationships and adaptation strategies to the prevailing extreme conditions at the Atacama Desert. PMID:26294639
New Technology Drafts: Production and Improvements
Lapidus, Alla
2018-01-22
Alla Lapidus, head of the DOE Joint Genome Institute's Finishing group, gives a talk on how the DOE JGI's microbial genome sequencing pipeline has been adapted to accommodate next generation sequencing platforms at the "Sequencing, Finishing, Analysis in the Future" meeting in Santa Fe, NM.
Although recent technological advances in DNA sequencing and computational biology now allow scientists to compare entire microbial genomes, comparisons of closely related bacterial species and individual isolates by whole-genome sequencing approaches remains prohibitively expens...
Optimizing and evaluating the reconstruction of Metagenome-assembled microbial genomes.
Papudeshi, Bhavya; Haggerty, J Matthew; Doane, Michael; Morris, Megan M; Walsh, Kevin; Beattie, Douglas T; Pande, Dnyanada; Zaeri, Parisa; Silva, Genivaldo G Z; Thompson, Fabiano; Edwards, Robert A; Dinsdale, Elizabeth A
2017-11-28
Microbiome/host interactions describe characteristics that affect the host's health. Shotgun metagenomics includes sequencing a random subset of the microbiome to analyze its taxonomic and metabolic potential. Reconstruction of DNA fragments into genomes from metagenomes (called metagenome-assembled genomes) assigns unknown fragments to taxa/function and facilitates discovery of novel organisms. Genome reconstruction incorporates sequence assembly and sorting of assembled sequences into bins, characteristic of a genome. However, the microbial community composition, including taxonomic and phylogenetic diversity may influence genome reconstruction. We determine the optimal reconstruction method for four microbiome projects that had variable sequencing platforms (IonTorrent and Illumina), diversity (high or low), and environment (coral reefs and kelp forests), using a set of parameters to select for optimal assembly and binning tools. We tested the effects of the assembly and binning processes on population genome reconstruction using 105 marine metagenomes from 4 projects. Reconstructed genomes were obtained from each project using 3 assemblers (IDBA, MetaVelvet, and SPAdes) and 2 binning tools (GroopM and MetaBat). We assessed the efficiency of assemblers using statistics that including contig continuity and contig chimerism and the effectiveness of binning tools using genome completeness and taxonomic identification. We concluded that SPAdes, assembled more contigs (143,718 ± 124 contigs) of longer length (N50 = 1632 ± 108 bp), and incorporated the most sequences (sequences-assembled = 19.65%). The microbial richness and evenness were maintained across the assembly, suggesting low contig chimeras. SPAdes assembly was responsive to the biological and technological variations within the project, compared with other assemblers. Among binning tools, we conclude that MetaBat produced bins with less variation in GC content (average standard deviation: 1.49), low species richness (4.91 ± 0.66), and higher genome completeness (40.92 ± 1.75) across all projects. MetaBat extracted 115 bins from the 4 projects of which 66 bins were identified as reconstructed metagenome-assembled genomes with sequences belonging to a specific genus. We identified 13 novel genomes, some of which were 100% complete, but show low similarity to genomes within databases. In conclusion, we present a set of biologically relevant parameters for evaluation to select for optimal assembly and binning tools. For the tools we tested, SPAdes assembler and MetaBat binning tools reconstructed quality metagenome-assembled genomes for the four projects. We also conclude that metagenomes from microbial communities that have high coverage of phylogenetically distinct, and low taxonomic diversity results in highest quality metagenome-assembled genomes.
Deciphering the distance to antibiotic resistance for the pneumococcus using genome sequencing data
Mobegi, Fredrick M.; Cremers, Amelieke J. H.; de Jonge, Marien I.; Bentley, Stephen D.; van Hijum, Sacha A. F. T.; Zomer, Aldert
2017-01-01
Advances in genome sequencing technologies and genome-wide association studies (GWAS) have provided unprecedented insights into the molecular basis of microbial phenotypes and enabled the identification of the underlying genetic variants in real populations. However, utilization of genome sequencing in clinical phenotyping of bacteria is challenging due to the lack of reliable and accurate approaches. Here, we report a method for predicting microbial resistance patterns using genome sequencing data. We analyzed whole genome sequences of 1,680 Streptococcus pneumoniae isolates from four independent populations using GWAS and identified probable hotspots of genetic variation which correlate with phenotypes of resistance to essential classes of antibiotics. With the premise that accumulation of putative resistance-conferring SNPs, potentially in combination with specific resistance genes, precedes full resistance, we retrogressively surveyed the hotspot loci and quantified the number of SNPs and/or genes, which if accumulated would confer full resistance to an otherwise susceptible strain. We name this approach the ‘distance to resistance’. It can be used to identify the creep towards complete antibiotics resistance in bacteria using genome sequencing. This approach serves as a basis for the development of future sequencing-based methods for predicting resistance profiles of bacterial strains in hospital microbiology and public health settings. PMID:28205635
MEGAnnotator: a user-friendly pipeline for microbial genomes assembly and annotation.
Lugli, Gabriele Andrea; Milani, Christian; Mancabelli, Leonardo; van Sinderen, Douwe; Ventura, Marco
2016-04-01
Genome annotation is one of the key actions that must be undertaken in order to decipher the genetic blueprint of organisms. Thus, a correct and reliable annotation is essential in rendering genomic data valuable. Here, we describe a bioinformatics pipeline based on freely available software programs coordinated by a multithreaded script named MEGAnnotator (Multithreaded Enhanced prokaryotic Genome Annotator). This pipeline allows the generation of multiple annotated formats fulfilling the NCBI guidelines for assembled microbial genome submission, based on DNA shotgun sequencing reads, and minimizes manual intervention, while also reducing waiting times between software program executions and improving final quality of both assembly and annotation outputs. MEGAnnotator provides an efficient way to pre-arrange the assembly and annotation work required to process NGS genome sequence data. The script improves the final quality of microbial genome annotation by reducing ambiguous annotations. Moreover, the MEGAnnotator platform allows the user to perform a partial annotation of pre-assembled genomes and includes an option to accomplish metagenomic data set assemblies. MEGAnnotator platform will be useful for microbiologists interested in genome analyses of bacteria as well as those investigating the complexity of microbial communities that do not possess the necessary skills to prepare their own bioinformatics pipeline. © FEMS 2016. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Technological advances in DNA sequencing and computational biology allow scientists to compare entire microbial genomes. However, the use of these approaches to discern key genomic differences between natural microbial communities remains prohibitively expensive for most laborato...
IDENTIFICATION OF BACTERIAL DNA MARKERS FOR THE DETECTION OF HUMAN AND CATTLE FECAL POLLUTION
Technological advances in DNA sequencing and computational biology allow scientists to compare entire microbial genomes. However, the use of these approaches to discern key genomic differences between natural microbial communities remains prohibitively expensive for most laborato...
Approaches for in silico finishing of microbial genome sequences
Kremer, Frederico Schmitt; McBride, Alan John Alexander; Pinto, Luciano da Silva
2017-01-01
Abstract The introduction of next-generation sequencing (NGS) had a significant effect on the availability of genomic information, leading to an increase in the number of sequenced genomes from a large spectrum of organisms. Unfortunately, due to the limitations implied by the short-read sequencing platforms, most of these newly sequenced genomes remained as “drafts”, incomplete representations of the whole genetic content. The previous genome sequencing studies indicated that finishing a genome sequenced by NGS, even bacteria, may require additional sequencing to fill the gaps, making the entire process very expensive. As such, several in silico approaches have been developed to optimize the genome assemblies and facilitate the finishing process. The present review aims to explore some free (open source, in many cases) tools that are available to facilitate genome finishing. PMID:28898352
Approaches for in silico finishing of microbial genome sequences.
Kremer, Frederico Schmitt; McBride, Alan John Alexander; Pinto, Luciano da Silva
The introduction of next-generation sequencing (NGS) had a significant effect on the availability of genomic information, leading to an increase in the number of sequenced genomes from a large spectrum of organisms. Unfortunately, due to the limitations implied by the short-read sequencing platforms, most of these newly sequenced genomes remained as "drafts", incomplete representations of the whole genetic content. The previous genome sequencing studies indicated that finishing a genome sequenced by NGS, even bacteria, may require additional sequencing to fill the gaps, making the entire process very expensive. As such, several in silico approaches have been developed to optimize the genome assemblies and facilitate the finishing process. The present review aims to explore some free (open source, in many cases) tools that are available to facilitate genome finishing.
Novel approaches in function-driven single-cell genomics.
Doud, Devin F R; Woyke, Tanja
2017-07-01
Deeper sequencing and improved bioinformatics in conjunction with single-cell and metagenomic approaches continue to illuminate undercharacterized environmental microbial communities. This has propelled the 'who is there, and what might they be doing' paradigm to the uncultivated and has already radically changed the topology of the tree of life and provided key insights into the microbial contribution to biogeochemistry. While characterization of 'who' based on marker genes can describe a large fraction of the community, answering 'what are they doing' remains the elusive pinnacle for microbiology. Function-driven single-cell genomics provides a solution by using a function-based screen to subsample complex microbial communities in a targeted manner for the isolation and genome sequencing of single cells. This enables single-cell sequencing to be focused on cells with specific phenotypic or metabolic characteristics of interest. Recovered genomes are conclusively implicated for both encoding and exhibiting the feature of interest, improving downstream annotation and revealing activity levels within that environment. This emerging approach has already improved our understanding of microbial community functioning and facilitated the experimental analysis of uncharacterized gene product space. Here we provide a comprehensive review of strategies that have been applied for function-driven single-cell genomics and the future directions we envision. © FEMS 2017.
Marine Protists Are Not Just Big Bacteria.
Keeling, Patrick J; Campo, Javier Del
2017-06-05
The study of marine microbial ecology has been completely transformed by molecular and genomic data: after centuries of relative neglect, genomics has revealed the surprising extent of microbial diversity and how microbial processes transform ocean and global ecosystems. But the revolution is not complete: major gaps in our understanding remain, and one obvious example is that microbial eukaryotes, or protists, are still largely neglected. Here we examine various ways in which protists might be better integrated into models of marine microbial ecology, what challenges this will present, and why understanding the limitations of our tools is a significant concern. In part this is a technical challenge - eukaryotic genomes are more difficult to characterize - but eukaryotic adaptations are also more dependent on morphology and behaviour than they are on the metabolic diversity that typifies bacteria, and these cannot be inferred from genomic data as readily as metabolism can be. We therefore cannot simply follow in the methodological footsteps of bacterial ecology and hope for similar success. Understanding microbial eukaryotes will require different approaches, including greater emphasis on taxonomically and trophically diverse model systems. Molecular sequencing will continue to play a role, and advances in environmental sequence tag studies and single-cell methods for genomic and transcriptomics offer particular promise. Copyright © 2017 Elsevier Ltd. All rights reserved.
Allen, Jonathan E.; Brown, Trevor S.; Gardner, Shea N.; McLoughlin, Kevin S.; Forsberg, Jonathan A.; Kirkup, Benjamin C.; Chromy, Brett A.; Luciw, Paul A.; Elster, Eric A.
2014-01-01
Combat wound healing and resolution are highly affected by the resident microbial flora. We therefore sought to achieve comprehensive detection of microbial populations in wounds using novel genomic technologies and bioinformatics analyses. We employed a microarray capable of detecting all sequenced pathogens for interrogation of 124 wound samples from extremity injuries in combat-injured U.S. service members. A subset of samples was also processed via next-generation sequencing and metagenomic analysis. Array analysis detected microbial targets in 51% of all wound samples, with Acinetobacter baumannii being the most frequently detected species. Multiple Pseudomonas species were also detected in tissue biopsy specimens. Detection of the Acinetobacter plasmid pRAY correlated significantly with wound failure, while detection of enteric-associated bacteria was associated significantly with successful healing. Whole-genome sequencing revealed broad microbial biodiversity between samples. The total wound bioburden did not associate significantly with wound outcome, although temporal shifts were observed over the course of treatment. Given that standard microbiological methods do not detect the full range of microbes in each wound, these data emphasize the importance of supplementation with molecular techniques for thorough characterization of wound-associated microbes. Future application of genomic protocols for assessing microbial content could allow application of specialized care through early and rapid identification and management of critical patterns in wound bioburden. PMID:24829242
Draft Genome of Janthinobacterium sp. RA13 Isolated from Lake Washington Sediment
McTaggart, Tami L.; Shapiro, Nicole; Woyke, Tanja
2015-01-01
Sequencing the genome of Janthinobacterium sp. RA13 from Lake Washington sediment is announced. From the genome content, a versatile life-style is predicted, but not bona fide methylotrophy. With the availability of its genomic sequence, Janthinobacterium sp. RA13 presents a prospective model for studying microbial communities in lake sediments. PMID:25676775
IMG ER: a system for microbial genome annotation expert review and curation.
Markowitz, Victor M; Mavromatis, Konstantinos; Ivanova, Natalia N; Chen, I-Min A; Chu, Ken; Kyrpides, Nikos C
2009-09-01
A rapidly increasing number of microbial genomes are sequenced by organizations worldwide and are eventually included into various public genome data resources. The quality of the annotations depends largely on the original dataset providers, with erroneous or incomplete annotations often carried over into the public resources and difficult to correct. We have developed an Expert Review (ER) version of the Integrated Microbial Genomes (IMG) system, with the goal of supporting systematic and efficient revision of microbial genome annotations. IMG ER provides tools for the review and curation of annotations of both new and publicly available microbial genomes within IMG's rich integrated genome framework. New genome datasets are included into IMG ER prior to their public release either with their native annotations or with annotations generated by IMG ER's annotation pipeline. IMG ER tools allow addressing annotation problems detected with IMG's comparative analysis tools, such as genes missed by gene prediction pipelines or genes without an associated function. Over the past year, IMG ER was used for improving the annotations of about 150 microbial genomes.
Contemporary molecular tools in microbial ecology and their application to advancing biotechnology.
Rashid, Mamoon; Stingl, Ulrich
2015-12-01
Novel methods in microbial ecology are revolutionizing our understanding of the structure and function of microbes in the environment, but concomitant advances in applications of these tools to biotechnology are mostly lagging behind. After more than a century of efforts to improve microbial culturing techniques, about 70-80% of microbial diversity - recently called the "microbial dark matter" - remains uncultured. In early attempts to identify and sample these so far uncultured taxonomic lineages, methods that amplify and sequence ribosomal RNA genes were extensively used. Recent developments in cell separation techniques, DNA amplification, and high-throughput DNA sequencing platforms have now made the discovery of genes/genomes of uncultured microorganisms from different environments possible through the use of metagenomic techniques and single-cell genomics. When used synergistically, these metagenomic and single-cell techniques create a powerful tool to study microbial diversity. These genomics techniques have already been successfully exploited to identify sources for i) novel enzymes or natural products for biotechnology applications, ii) novel genes from extremophiles, and iii) whole genomes or operons from uncultured microbes. More can be done to utilize these tools more efficiently in biotechnology. Copyright © 2015 Elsevier Inc. All rights reserved.
Ross, Daniel E.; Gulliver, Djuna
2016-10-06
The draft genome sequence ofPseudomonas stutzeristrain K35 was separated from a metagenome derived from a produced water microbial community of a coalbed methane well. The genome encodes a complete nitrogen fixation pathway and the upper and lower naphthalene degradation pathways.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ross, Daniel E.; Gulliver, Djuna
The draft genome sequence ofPseudomonas stutzeristrain K35 was separated from a metagenome derived from a produced water microbial community of a coalbed methane well. The genome encodes a complete nitrogen fixation pathway and the upper and lower naphthalene degradation pathways.
2013-01-01
Background Understanding the biological mechanisms used by microorganisms for plant biomass degradation is of considerable biotechnological interest. Despite of the growing number of sequenced (meta)genomes of plant biomass-degrading microbes, there is currently no technique for the systematic determination of the genomic components of this process from these data. Results We describe a computational method for the discovery of the protein domains and CAZy families involved in microbial plant biomass degradation. Our method furthermore accurately predicts the capability to degrade plant biomass for microbial species from their genome sequences. Application to a large, manually curated data set of microbial degraders and non-degraders identified gene families of enzymes known by physiological and biochemical tests to be implicated in cellulose degradation, such as GH5 and GH6. Additionally, genes of enzymes that degrade other plant polysaccharides, such as hemicellulose, pectins and oligosaccharides, were found, as well as gene families which have not previously been related to the process. For draft genomes reconstructed from a cow rumen metagenome our method predicted Bacteroidetes-affiliated species and a relative to a known plant biomass degrader to be plant biomass degraders. This was supported by the presence of genes encoding enzymatically active glycoside hydrolases in these genomes. Conclusions Our results show the potential of the method for generating novel insights into microbial plant biomass degradation from (meta-)genome data, where there is an increasing production of genome assemblages for uncultured microbes. PMID:23414703
Microbial Metagenomics: Beyond the Genome
NASA Astrophysics Data System (ADS)
Gilbert, Jack A.; Dupont, Christopher L.
2011-01-01
Metagenomics literally means “beyond the genome.” Marine microbial metagenomic databases presently comprise ˜400 billion base pairs of DNA, only ˜3% of that found in 1 ml of seawater. Very soon a trillion-base-pair sequence run will be feasible, so it is time to reflect on what we have learned from metagenomics. We review the impact of metagenomics on our understanding of marine microbial communities. We consider the studies facilitated by data generated through the Global Ocean Sampling expedition, as well as the revolution wrought at the individual laboratory level through next generation sequencing technologies. We review recent studies and discoveries since 2008, provide a discussion of bioinformatic analyses, including conceptual pipelines and sequence annotation and predict the future of metagenomics, with suggestions of collaborative community studies tailored toward answering some of the fundamental questions in marine microbial ecology.
The future is now: single-cell genomics of bacteria and archaea
Blainey, Paul C.
2013-01-01
Interest in the expanding catalog of uncultivated microorganisms, increasing recognition of heterogeneity among seemingly similar cells, and technological advances in whole-genome amplification and single-cell manipulation are driving considerable progress in single-cell genomics. Here, the spectrum of applications for single-cell genomics, key advances in the development of the field, and emerging methodology for single-cell genome sequencing are reviewed by example with attention to the diversity of approaches and their unique characteristics. Experimental strategies transcending specific methodologies are identified and organized as a road map for future studies in single-cell genomics of environmental microorganisms. Over the next decade, increasingly powerful tools for single-cell genome sequencing and analysis will play key roles in accessing the genomes of uncultivated organisms, determining the basis of microbial community functions, and fundamental aspects of microbial population biology. PMID:23298390
Comparison of microbial DNA enrichment tools for metagenomic whole genome sequencing.
Thoendel, Matthew; Jeraldo, Patricio R; Greenwood-Quaintance, Kerryl E; Yao, Janet Z; Chia, Nicholas; Hanssen, Arlen D; Abdel, Matthew P; Patel, Robin
2016-08-01
Metagenomic whole genome sequencing for detection of pathogens in clinical samples is an exciting new area for discovery and clinical testing. A major barrier to this approach is the overwhelming ratio of human to pathogen DNA in samples with low pathogen abundance, which is typical of most clinical specimens. Microbial DNA enrichment methods offer the potential to relieve this limitation by improving this ratio. Two commercially available enrichment kits, the NEBNext Microbiome DNA Enrichment Kit and the Molzym MolYsis Basic kit, were tested for their ability to enrich for microbial DNA from resected arthroplasty component sonicate fluids from prosthetic joint infections or uninfected sonicate fluids spiked with Staphylococcus aureus. Using spiked uninfected sonicate fluid there was a 6-fold enrichment of bacterial DNA with the NEBNext kit and 76-fold enrichment with the MolYsis kit. Metagenomic whole genome sequencing of sonicate fluid revealed 13- to 85-fold enrichment of bacterial DNA using the NEBNext enrichment kit. The MolYsis approach achieved 481- to 9580-fold enrichment, resulting in 7 to 59% of sequencing reads being from the pathogens known to be present in the samples. These results demonstrate the usefulness of these tools when testing clinical samples with low microbial burden using next generation sequencing. Copyright © 2016 Elsevier B.V. All rights reserved.
Complete genome sequence of Defluviimonas alba cai42T, a microbial exopolysaccharides producer.
Zhao, Jie-Yu; Geng, Shuang; Xu, Lian; Hu, Bing; Sun, Ji-Quan; Nie, Yong; Tang, Yue-Qin; Wu, Xiao-Lei
2016-12-10
Defluviimonas alba cai42 T , isolated from the oil-production water in Xinjiang Oilfield in China, has a strong ability to produce exopolysaccharides (EPS). We hereby present its complete genome sequence information which consists of a circular chromosome and three plasmids. The strain characteristically contains various genes encoding for enzymes involved in EPS biosynthesis, modification, and export. According to the genomic and physiochemical data, it is predicted that the strain has the potential to be utilized in industrial production of microbial EPS. Copyright © 2016 Elsevier B.V. All rights reserved.
Using populations of human and microbial genomes for organism detection in metagenomes
Ames, Sasha K.; Gardner, Shea N.; Marti, Jose Manuel; ...
2015-04-29
Identifying causative disease agents in human patients from shotgun metagenomic sequencing (SMS) presents a powerful tool to apply when other targeted diagnostics fail. Numerous technical challenges remain, however, before SMS can move beyond the role of research tool. Accurately separating the known and unknown organism content remains difficult, particularly when SMS is applied as a last resort. The true amount of human DNA that remains in a sample after screening against the human reference genome and filtering nonbiological components left from library preparation has previously been underreported. In this study, we create the most comprehensive collection of microbial and reference-freemore » human genetic variation available in a database optimized for efficient metagenomic search by extracting sequences from GenBank and the 1000 Genomes Project. The results reveal new human sequences found in individual Human Microbiome Project (HMP) samples. Individual samples contain up to 95% human sequence, and 4% of the individual HMP samples contain 10% or more human reads. In conclusion, left unidentified, human reads can complicate and slow down further analysis and lead to inaccurately labeled microbial taxa and ultimately lead to privacy concerns as more human genome data is collected.« less
Using populations of human and microbial genomes for organism detection in metagenomes.
Ames, Sasha K; Gardner, Shea N; Marti, Jose Manuel; Slezak, Tom R; Gokhale, Maya B; Allen, Jonathan E
2015-07-01
Identifying causative disease agents in human patients from shotgun metagenomic sequencing (SMS) presents a powerful tool to apply when other targeted diagnostics fail. Numerous technical challenges remain, however, before SMS can move beyond the role of research tool. Accurately separating the known and unknown organism content remains difficult, particularly when SMS is applied as a last resort. The true amount of human DNA that remains in a sample after screening against the human reference genome and filtering nonbiological components left from library preparation has previously been underreported. In this study, we create the most comprehensive collection of microbial and reference-free human genetic variation available in a database optimized for efficient metagenomic search by extracting sequences from GenBank and the 1000 Genomes Project. The results reveal new human sequences found in individual Human Microbiome Project (HMP) samples. Individual samples contain up to 95% human sequence, and 4% of the individual HMP samples contain 10% or more human reads. Left unidentified, human reads can complicate and slow down further analysis and lead to inaccurately labeled microbial taxa and ultimately lead to privacy concerns as more human genome data is collected. © 2015 Ames et al.; Published by Cold Spring Harbor Laboratory Press.
Using populations of human and microbial genomes for organism detection in metagenomes
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ames, Sasha K.; Gardner, Shea N.; Marti, Jose Manuel
Identifying causative disease agents in human patients from shotgun metagenomic sequencing (SMS) presents a powerful tool to apply when other targeted diagnostics fail. Numerous technical challenges remain, however, before SMS can move beyond the role of research tool. Accurately separating the known and unknown organism content remains difficult, particularly when SMS is applied as a last resort. The true amount of human DNA that remains in a sample after screening against the human reference genome and filtering nonbiological components left from library preparation has previously been underreported. In this study, we create the most comprehensive collection of microbial and reference-freemore » human genetic variation available in a database optimized for efficient metagenomic search by extracting sequences from GenBank and the 1000 Genomes Project. The results reveal new human sequences found in individual Human Microbiome Project (HMP) samples. Individual samples contain up to 95% human sequence, and 4% of the individual HMP samples contain 10% or more human reads. In conclusion, left unidentified, human reads can complicate and slow down further analysis and lead to inaccurately labeled microbial taxa and ultimately lead to privacy concerns as more human genome data is collected.« less
Elkins, James G; Hamilton-Brehm, Scott D; Lucas, Susan; Han, James; Lapidus, Alla; Cheng, Jan-Fang; Goodwin, Lynne A; Pitluck, Sam; Peters, Lin; Mikhailova, Natalia; Davenport, Karen W; Detter, John C; Han, Cliff S; Tapia, Roxanne; Land, Miriam L; Hauser, Loren; Kyrpides, Nikos C; Ivanova, Natalia N; Pagani, Ioanna; Bruce, David; Woyke, Tanja; Cottingham, Robert W
2013-04-11
Thermodesulfobacterium geofontis OPF15(T) (ATCC BAA-2454, JCM 18567) was isolated from Obsidian Pool, Yellowstone National Park, and grows optimally at 83°C. The 1.6-Mb genome sequence was finished at the Joint Genome Institute and has been deposited for future genomic studies pertaining to microbial processes and nutrient cycles in high-temperature environments.
Lee, Kevin C; Stott, Matthew B; Dunfield, Peter F; Huttenhower, Curtis; McDonald, Ian R; Morgan, Xochitl C
2016-06-15
Chthonomonas calidirosea T49(T) is a low-abundance, carbohydrate-scavenging, and thermophilic soil bacterium with a seemingly disorganized genome. We hypothesized that the C. calidirosea genome would be highly responsive to local selection pressure, resulting in the divergence of its genomic content, genome organization, and carbohydrate utilization phenotype across environments. We tested this hypothesis by sequencing the genomes of four C. calidirosea isolates obtained from four separate geothermal fields in the Taupō Volcanic Zone, New Zealand. For each isolation site, we measured physicochemical attributes and defined the associated microbial community by 16S rRNA gene sequencing. Despite their ecological and geographical isolation, the genome sequences showed low divergence (maximum, 1.17%). Isolate-specific variations included single-nucleotide polymorphisms (SNPs), restriction-modification systems, and mobile elements but few major deletions and no major rearrangements. The 50-fold variation in C. calidirosea relative abundance among the four sites correlated with site environmental characteristics but not with differences in genomic content. Conversely, the carbohydrate utilization profiles of the C. calidirosea isolates corresponded to the inferred isolate phylogenies, which only partially paralleled the geographical relationships among the sample sites. Genomic sequence conservation does not entirely parallel geographic distance, suggesting that stochastic dispersal and localized extinction, which allow for rapid population homogenization with little restriction by geographical barriers, are possible mechanisms of C. calidirosea distribution. This dispersal and extinction mechanism is likely not limited to C. calidirosea but may shape the populations and genomes of many other low-abundance free-living taxa. This study compares the genomic sequence variations and metabolisms of four strains of Chthonomonas calidirosea, a rare thermophilic bacterium from the phylum Armatimonadetes It additionally compares the microbial communities and chemistry of each of the geographically distinct sites from which the four C. calidirosea strains were isolated. C. calidirosea was previously reported to possess a highly disorganized genome, but it was unclear whether this reflected rapid evolution. Here, we show that each isolation site has a distinct chemistry and microbial community, but despite this, the C. calidirosea genome is highly conserved across all isolation sites. Furthermore, genomic sequence differences only partially paralleled geographic distance, suggesting that C. calidirosea genotypes are not primarily determined by adaptive evolution. Instead, the presence of C. calidirosea may be driven by stochastic dispersal and localized extinction. This ecological mechanism may apply to many other low-abundance taxa. Copyright © 2016 Lee et al.
Lee, Kevin C.; Stott, Matthew B.; Dunfield, Peter F.; Huttenhower, Curtis; McDonald, Ian R.
2016-01-01
ABSTRACT Chthonomonas calidirosea T49T is a low-abundance, carbohydrate-scavenging, and thermophilic soil bacterium with a seemingly disorganized genome. We hypothesized that the C. calidirosea genome would be highly responsive to local selection pressure, resulting in the divergence of its genomic content, genome organization, and carbohydrate utilization phenotype across environments. We tested this hypothesis by sequencing the genomes of four C. calidirosea isolates obtained from four separate geothermal fields in the Taupō Volcanic Zone, New Zealand. For each isolation site, we measured physicochemical attributes and defined the associated microbial community by 16S rRNA gene sequencing. Despite their ecological and geographical isolation, the genome sequences showed low divergence (maximum, 1.17%). Isolate-specific variations included single-nucleotide polymorphisms (SNPs), restriction-modification systems, and mobile elements but few major deletions and no major rearrangements. The 50-fold variation in C. calidirosea relative abundance among the four sites correlated with site environmental characteristics but not with differences in genomic content. Conversely, the carbohydrate utilization profiles of the C. calidirosea isolates corresponded to the inferred isolate phylogenies, which only partially paralleled the geographical relationships among the sample sites. Genomic sequence conservation does not entirely parallel geographic distance, suggesting that stochastic dispersal and localized extinction, which allow for rapid population homogenization with little restriction by geographical barriers, are possible mechanisms of C. calidirosea distribution. This dispersal and extinction mechanism is likely not limited to C. calidirosea but may shape the populations and genomes of many other low-abundance free-living taxa. IMPORTANCE This study compares the genomic sequence variations and metabolisms of four strains of Chthonomonas calidirosea, a rare thermophilic bacterium from the phylum Armatimonadetes. It additionally compares the microbial communities and chemistry of each of the geographically distinct sites from which the four C. calidirosea strains were isolated. C. calidirosea was previously reported to possess a highly disorganized genome, but it was unclear whether this reflected rapid evolution. Here, we show that each isolation site has a distinct chemistry and microbial community, but despite this, the C. calidirosea genome is highly conserved across all isolation sites. Furthermore, genomic sequence differences only partially paralleled geographic distance, suggesting that C. calidirosea genotypes are not primarily determined by adaptive evolution. Instead, the presence of C. calidirosea may be driven by stochastic dispersal and localized extinction. This ecological mechanism may apply to many other low-abundance taxa. PMID:27060125
Signal Processing for Metagenomics: Extracting Information from the Soup
Rosen, Gail L.; Sokhansanj, Bahrad A.; Polikar, Robi; Bruns, Mary Ann; Russell, Jacob; Garbarine, Elaine; Essinger, Steve; Yok, Non
2009-01-01
Traditionally, studies in microbial genomics have focused on single-genomes from cultured species, thereby limiting their focus to the small percentage of species that can be cultured outside their natural environment. Fortunately, recent advances in high-throughput sequencing and computational analyses have ushered in the new field of metagenomics, which aims to decode the genomes of microbes from natural communities without the need for cultivation. Although metagenomic studies have shed a great deal of insight into bacterial diversity and coding capacity, several computational challenges remain due to the massive size and complexity of metagenomic sequence data. Current tools and techniques are reviewed in this paper which address challenges in 1) genomic fragment annotation, 2) phylogenetic reconstruction, 3) functional classification of samples, and 4) interpreting complementary metaproteomics and metametabolomics data. Also surveyed are important applications of metagenomic studies, including microbial forensics and the roles of microbial communities in shaping human health and soil ecology. PMID:20436876
Theory of microbial genome evolution
NASA Astrophysics Data System (ADS)
Koonin, Eugene
Bacteria and archaea have small genomes tightly packed with protein-coding genes. This compactness is commonly perceived as evidence of adaptive genome streamlining caused by strong purifying selection in large microbial populations. In such populations, even the small cost incurred by nonfunctional DNA because of extra energy and time expenditure is thought to be sufficient for this extra genetic material to be eliminated by selection. However, contrary to the predictions of this model, there exists a consistent, positive correlation between the strength of selection at the protein sequence level, measured as the ratio of nonsynonymous to synonymous substitution rates, and microbial genome size. By fitting the genome size distributions in multiple groups of prokaryotes to predictions of mathematical models of population evolution, we show that only models in which acquisition of additional genes is, on average, slightly beneficial yield a good fit to genomic data. Thus, the number of genes in prokaryotic genomes seems to reflect the equilibrium between the benefit of additional genes that diminishes as the genome grows and deletion bias. New genes acquired by microbial genomes, on average, appear to be adaptive. Evolution of bacterial and archaeal genomes involves extensive horizontal gene transfer and gene loss. Many microbes have open pangenomes, where each newly sequenced genome contains more than 10% `ORFans', genes without detectable homologues in other species. A simple, steady-state evolutionary model reveals two sharply distinct classes of microbial genes, one of which (ORFans) is characterized by effectively instantaneous gene replacement, whereas the other consists of genes with finite, distributed replacement rates. These findings imply a conservative estimate of at least a billion distinct genes in the prokaryotic genomic universe.
A fungal mock community control for amplicon sequencing experiments
USDA-ARS?s Scientific Manuscript database
The field of microbial ecology has been profoundly advanced by the ability to profile the composition of complex microbial communities by means of high throughput amplicon sequencing of marker genes amplified directly from environmental genomic DNA extracts. However, it has become increasingly clear...
Genome-based microbial ecology of anammox granules in a full-scale wastewater treatment system.
Speth, Daan R; In 't Zandt, Michiel H; Guerrero-Cruz, Simon; Dutilh, Bas E; Jetten, Mike S M
2016-03-31
Partial-nitritation anammox (PNA) is a novel wastewater treatment procedure for energy-efficient ammonium removal. Here we use genome-resolved metagenomics to build a genome-based ecological model of the microbial community in a full-scale PNA reactor. Sludge from the bioreactor examined here is used to seed reactors in wastewater treatment plants around the world; however, the role of most of its microbial community in ammonium removal remains unknown. Our analysis yielded 23 near-complete draft genomes that together represent the majority of the microbial community. We assign these genomes to distinct anaerobic and aerobic microbial communities. In the aerobic community, nitrifying organisms and heterotrophs predominate. In the anaerobic community, widespread potential for partial denitrification suggests a nitrite loop increases treatment efficiency. Of our genomes, 19 have no previously cultivated or sequenced close relatives and six belong to bacterial phyla without any cultivated members, including the most complete Omnitrophica (formerly OP3) genome to date.
Genome-based microbial ecology of anammox granules in a full-scale wastewater treatment system
Speth, Daan R.; in 't Zandt, Michiel H.; Guerrero-Cruz, Simon; Dutilh, Bas E.; Jetten, Mike S. M.
2016-01-01
Partial-nitritation anammox (PNA) is a novel wastewater treatment procedure for energy-efficient ammonium removal. Here we use genome-resolved metagenomics to build a genome-based ecological model of the microbial community in a full-scale PNA reactor. Sludge from the bioreactor examined here is used to seed reactors in wastewater treatment plants around the world; however, the role of most of its microbial community in ammonium removal remains unknown. Our analysis yielded 23 near-complete draft genomes that together represent the majority of the microbial community. We assign these genomes to distinct anaerobic and aerobic microbial communities. In the aerobic community, nitrifying organisms and heterotrophs predominate. In the anaerobic community, widespread potential for partial denitrification suggests a nitrite loop increases treatment efficiency. Of our genomes, 19 have no previously cultivated or sequenced close relatives and six belong to bacterial phyla without any cultivated members, including the most complete Omnitrophica (formerly OP3) genome to date. PMID:27029554
Homogeneous versus heterogeneous probes for microbial ecological microarrays.
Bae, Jin-Woo; Park, Yong-Ha
2006-07-01
Microbial ecological microarrays have been developed for investigating the composition and functions of microorganism communities in environmental niches. These arrays include microbial identification microarrays, which use oligonucleotides, gene fragments or microbial genomes as probes. In this article, the advantages and disadvantages of each type of probe are reviewed. Oligonucleotide probes are currently useful for probing uncultivated bacteria that are not amenable to gene fragment probing, whereas the functional gene fragments amplified randomly from microbial genomes require phylogenetic and hierarchical categorization before use as microbial identification probes, despite their high resolution for both specificity and sensitivity. Until more bacteria are sequenced and gene fragment probes are thoroughly validated, heterogeneous bacterial genome probes will provide a simple, sensitive and quantitative tool for exploring the ecosystem structure.
Kang, Hahk-Soo
2017-02-01
Genomics-based methods are now commonplace in natural products research. A phylogeny-guided mining approach provides a means to quickly screen a large number of microbial genomes or metagenomes in search of new biosynthetic gene clusters of interest. In this approach, biosynthetic genes serve as molecular markers, and phylogenetic trees built with known and unknown marker gene sequences are used to quickly prioritize biosynthetic gene clusters for their metabolites characterization. An increase in the use of this approach has been observed for the last couple of years along with the emergence of low cost sequencing technologies. The aim of this review is to discuss the basic concept of a phylogeny-guided mining approach, and also to provide examples in which this approach was successfully applied to discover new natural products from microbial genomes and metagenomes. I believe that the phylogeny-guided mining approach will continue to play an important role in genomics-based natural products research.
Brown, Christopher T; Sharon, Itai; Thomas, Brian C; Castelle, Cindy J; Morowitz, Michael J; Banfield, Jillian F
2013-12-17
The premature infant gut has low individual but high inter-individual microbial diversity compared with adults. Based on prior 16S rRNA gene surveys, many species from this environment are expected to be similar to those previously detected in the human microbiota. However, the level of genomic novelty and metabolic variation of strains found in the infant gut remains relatively unexplored. To study the stability and function of early microbial colonizers of the premature infant gut, nine stool samples were taken during the third week of life of a premature male infant delivered via Caesarean section. Metagenomic sequences were assembled and binned into near-complete and partial genomes, enabling strain-level genomic analysis of the microbial community.We reconstructed eleven near-complete and six partial bacterial genomes representative of the key members of the microbial community. Twelve of these genomes share >90% putative ortholog amino acid identity with reference genomes. Manual curation of the assembly of one particularly novel genome resulted in the first essentially complete genome sequence (in three pieces, the order of which could not be determined due to a repeat) for Varibaculum cambriense (strain Dora), a medically relevant species that has been implicated in abscess formation.During the period studied, the microbial community undergoes a compositional shift, in which obligate anaerobes (fermenters) overtake Escherichia coli as the most abundant species. Other species remain stable, probably due to their ability to either respire anaerobically or grow by fermentation, and their capacity to tolerate fluctuating levels of oxygen. Metabolic predictions for V. cambriense suggest that, like other members of the microbial community, this organism is able to process various sugar substrates and make use of multiple different electron acceptors during anaerobic respiration. Genome comparisons within the family Actinomycetaceae reveal important differences related to respiratory metabolism and motility. Genome-based analysis provided direct insight into strain-specific potential for anaerobic respiration and yielded the first genome for the genus Varibaculum. Importantly, comparison of these de novo assembled genomes with closely related isolate genomes supported the accuracy of the metagenomic methodology. Over a one-week period, the early gut microbial community transitioned to a community with a higher representation of obligate anaerobes, emphasizing both taxonomic and metabolic instability during colonization.
2013-01-01
Background The premature infant gut has low individual but high inter-individual microbial diversity compared with adults. Based on prior 16S rRNA gene surveys, many species from this environment are expected to be similar to those previously detected in the human microbiota. However, the level of genomic novelty and metabolic variation of strains found in the infant gut remains relatively unexplored. Results To study the stability and function of early microbial colonizers of the premature infant gut, nine stool samples were taken during the third week of life of a premature male infant delivered via Caesarean section. Metagenomic sequences were assembled and binned into near-complete and partial genomes, enabling strain-level genomic analysis of the microbial community. We reconstructed eleven near-complete and six partial bacterial genomes representative of the key members of the microbial community. Twelve of these genomes share >90% putative ortholog amino acid identity with reference genomes. Manual curation of the assembly of one particularly novel genome resulted in the first essentially complete genome sequence (in three pieces, the order of which could not be determined due to a repeat) for Varibaculum cambriense (strain Dora), a medically relevant species that has been implicated in abscess formation. During the period studied, the microbial community undergoes a compositional shift, in which obligate anaerobes (fermenters) overtake Escherichia coli as the most abundant species. Other species remain stable, probably due to their ability to either respire anaerobically or grow by fermentation, and their capacity to tolerate fluctuating levels of oxygen. Metabolic predictions for V. cambriense suggest that, like other members of the microbial community, this organism is able to process various sugar substrates and make use of multiple different electron acceptors during anaerobic respiration. Genome comparisons within the family Actinomycetaceae reveal important differences related to respiratory metabolism and motility. Conclusions Genome-based analysis provided direct insight into strain-specific potential for anaerobic respiration and yielded the first genome for the genus Varibaculum. Importantly, comparison of these de novo assembled genomes with closely related isolate genomes supported the accuracy of the metagenomic methodology. Over a one-week period, the early gut microbial community transitioned to a community with a higher representation of obligate anaerobes, emphasizing both taxonomic and metabolic instability during colonization. PMID:24451181
Pyrosequencing for Microbial Identification and Characterization
Cummings, Patrick J.; Ahmed, Ray; Durocher, Jeffrey A.; Jessen, Adam; Vardi, Tamar; Obom, Kristina M.
2013-01-01
Pyrosequencing is a versatile technique that facilitates microbial genome sequencing that can be used to identify bacterial species, discriminate bacterial strains and detect genetic mutations that confer resistance to anti-microbial agents. The advantages of pyrosequencing for microbiology applications include rapid and reliable high-throughput screening and accurate identification of microbes and microbial genome mutations. Pyrosequencing involves sequencing of DNA by synthesizing the complementary strand a single base at a time, while determining the specific nucleotide being incorporated during the synthesis reaction. The reaction occurs on immobilized single stranded template DNA where the four deoxyribonucleotides (dNTP) are added sequentially and the unincorporated dNTPs are enzymatically degraded before addition of the next dNTP to the synthesis reaction. Detection of the specific base incorporated into the template is monitored by generation of chemiluminescent signals. The order of dNTPs that produce the chemiluminescent signals determines the DNA sequence of the template. The real-time sequencing capability of pyrosequencing technology enables rapid microbial identification in a single assay. In addition, the pyrosequencing instrument, can analyze the full genetic diversity of anti-microbial drug resistance, including typing of SNPs, point mutations, insertions, and deletions, as well as quantification of multiple gene copies that may occur in some anti-microbial resistance patterns. PMID:23995536
Pyrosequencing for microbial identification and characterization.
Cummings, Patrick J; Ahmed, Ray; Durocher, Jeffrey A; Jessen, Adam; Vardi, Tamar; Obom, Kristina M
2013-08-22
Pyrosequencing is a versatile technique that facilitates microbial genome sequencing that can be used to identify bacterial species, discriminate bacterial strains and detect genetic mutations that confer resistance to anti-microbial agents. The advantages of pyrosequencing for microbiology applications include rapid and reliable high-throughput screening and accurate identification of microbes and microbial genome mutations. Pyrosequencing involves sequencing of DNA by synthesizing the complementary strand a single base at a time, while determining the specific nucleotide being incorporated during the synthesis reaction. The reaction occurs on immobilized single stranded template DNA where the four deoxyribonucleotides (dNTP) are added sequentially and the unincorporated dNTPs are enzymatically degraded before addition of the next dNTP to the synthesis reaction. Detection of the specific base incorporated into the template is monitored by generation of chemiluminescent signals. The order of dNTPs that produce the chemiluminescent signals determines the DNA sequence of the template. The real-time sequencing capability of pyrosequencing technology enables rapid microbial identification in a single assay. In addition, the pyrosequencing instrument, can analyze the full genetic diversity of anti-microbial drug resistance, including typing of SNPs, point mutations, insertions, and deletions, as well as quantification of multiple gene copies that may occur in some anti-microbial resistance patterns.
Millstone: software for multiplex microbial genome analysis and engineering
DOE Office of Scientific and Technical Information (OSTI.GOV)
Goodman, Daniel B.; Kuznetsov, Gleb; Lajoie, Marc J.
Inexpensive DNA sequencing and advances in genome editing have made computational analysis a major rate-limiting step in adaptive laboratory evolution and microbial genome engineering. Here, we describe Millstone, a web-based platform that automates genotype comparison and visualization for projects with up to hundreds of genomic samples. To enable iterative genome engineering, Millstone allows users to design oligonucleotide libraries and create successive versions of reference genomes. Millstone is open source and easily deployable to a cloud platform, local cluster, or desktop, making it a scalable solution for any lab.
Millstone: software for multiplex microbial genome analysis and engineering.
Goodman, Daniel B; Kuznetsov, Gleb; Lajoie, Marc J; Ahern, Brian W; Napolitano, Michael G; Chen, Kevin Y; Chen, Changping; Church, George M
2017-05-25
Inexpensive DNA sequencing and advances in genome editing have made computational analysis a major rate-limiting step in adaptive laboratory evolution and microbial genome engineering. We describe Millstone, a web-based platform that automates genotype comparison and visualization for projects with up to hundreds of genomic samples. To enable iterative genome engineering, Millstone allows users to design oligonucleotide libraries and create successive versions of reference genomes. Millstone is open source and easily deployable to a cloud platform, local cluster, or desktop, making it a scalable solution for any lab.
Millstone: software for multiplex microbial genome analysis and engineering
Goodman, Daniel B.; Kuznetsov, Gleb; Lajoie, Marc J.; ...
2017-05-25
Inexpensive DNA sequencing and advances in genome editing have made computational analysis a major rate-limiting step in adaptive laboratory evolution and microbial genome engineering. Here, we describe Millstone, a web-based platform that automates genotype comparison and visualization for projects with up to hundreds of genomic samples. To enable iterative genome engineering, Millstone allows users to design oligonucleotide libraries and create successive versions of reference genomes. Millstone is open source and easily deployable to a cloud platform, local cluster, or desktop, making it a scalable solution for any lab.
Reads2Type: a web application for rapid microbial taxonomy identification.
Saputra, Dhany; Rasmussen, Simon; Larsen, Mette V; Haddad, Nizar; Sperotto, Maria Maddalena; Aarestrup, Frank M; Lund, Ole; Sicheritz-Pontén, Thomas
2015-11-25
Identification of bacteria may be based on sequencing and molecular analysis of a specific locus such as 16S rRNA, or a set of loci such as in multilocus sequence typing. In the near future, healthcare institutions and routine diagnostic microbiology laboratories may need to sequence the entire genome of microbial isolates. Therefore we have developed Reads2Type, a web-based tool for taxonomy identification based on whole bacterial genome sequence data. Raw sequencing data provided by the user are mapped against a set of marker probes that are derived from currently available bacteria complete genomes. Using a dataset of 1003 whole genome sequenced bacteria from various sequencing platforms, Reads2Type was able to identify the species with 99.5 % accuracy and on the minutes time scale. In comparison with other tools, Reads2Type offers the advantage of not needing to transfer sequencing files, as the entire computational analysis is done on the computer of whom utilizes the web application. This also prevents data privacy issues to arise. The Reads2Type tool is available at http://www.cbs.dtu.dk/~dhany/reads2type.html.
Hamilton-Brehm, Scott D.; Lucas, Susan; Han, James; Lapidus, Alla; Cheng, Jan-Fang; Goodwin, Lynne A.; Pitluck, Sam; Peters, Lin; Mikhailova, Natalia; Davenport, Karen W.; Detter, John C.; Han, Cliff S.; Tapia, Roxanne; Land, Miriam L.; Hauser, Loren; Kyrpides, Nikos C.; Ivanova, Natalia N.; Pagani, Ioanna; Bruce, David; Woyke, Tanja; Cottingham, Robert W.
2013-01-01
Thermodesulfobacterium geofontis OPF15T (ATCC BAA-2454, JCM 18567) was isolated from Obsidian Pool, Yellowstone National Park, and grows optimally at 83°C. The 1.6-Mb genome sequence was finished at the Joint Genome Institute and has been deposited for future genomic studies pertaining to microbial processes and nutrient cycles in high-temperature environments. PMID:23580711
Single-cell genome sequencing at ultra-high-throughput with microfluidic droplet barcoding.
Lan, Freeman; Demaree, Benjamin; Ahmed, Noorsher; Abate, Adam R
2017-07-01
The application of single-cell genome sequencing to large cell populations has been hindered by technical challenges in isolating single cells during genome preparation. Here we present single-cell genomic sequencing (SiC-seq), which uses droplet microfluidics to isolate, fragment, and barcode the genomes of single cells, followed by Illumina sequencing of pooled DNA. We demonstrate ultra-high-throughput sequencing of >50,000 cells per run in a synthetic community of Gram-negative and Gram-positive bacteria and fungi. The sequenced genomes can be sorted in silico based on characteristic sequences. We use this approach to analyze the distributions of antibiotic-resistance genes, virulence factors, and phage sequences in microbial communities from an environmental sample. The ability to routinely sequence large populations of single cells will enable the de-convolution of genetic heterogeneity in diverse cell populations.
Olson, Nathan D; Zook, Justin M; Morrow, Jayne B; Lin, Nancy J
2017-01-01
High sensitivity methods such as next generation sequencing and polymerase chain reaction (PCR) are adversely impacted by organismal and DNA contaminants. Current methods for detecting contaminants in microbial materials (genomic DNA and cultures) are not sensitive enough and require either a known or culturable contaminant. Whole genome sequencing (WGS) is a promising approach for detecting contaminants due to its sensitivity and lack of need for a priori assumptions about the contaminant. Prior to applying WGS, we must first understand its limitations for detecting contaminants and potential for false positives. Herein we demonstrate and characterize a WGS-based approach to detect organismal contaminants using an existing metagenomic taxonomic classification algorithm. Simulated WGS datasets from ten genera as individuals and binary mixtures of eight organisms at varying ratios were analyzed to evaluate the role of contaminant concentration and taxonomy on detection. For the individual genomes the false positive contaminants reported depended on the genus, with Staphylococcus , Escherichia , and Shigella having the highest proportion of false positives. For nearly all binary mixtures the contaminant was detected in the in-silico datasets at the equivalent of 1 in 1,000 cells, though F. tularensis was not detected in any of the simulated contaminant mixtures and Y. pestis was only detected at the equivalent of one in 10 cells. Once a WGS method for detecting contaminants is characterized, it can be applied to evaluate microbial material purity, in efforts to ensure that contaminants are characterized in microbial materials used to validate pathogen detection assays, generate genome assemblies for database submission, and benchmark sequencing methods.
Baldrian, Petr; López-Mondéjar, Rubén
2014-02-01
Molecular methods for the analysis of biomolecules have undergone rapid technological development in the last decade. The advent of next-generation sequencing methods and improvements in instrumental resolution enabled the analysis of complex transcriptome, proteome and metabolome data, as well as a detailed annotation of microbial genomes. The mechanisms of decomposition by model fungi have been described in unprecedented detail by the combination of genome sequencing, transcriptomics and proteomics. The increasing number of available genomes for fungi and bacteria shows that the genetic potential for decomposition of organic matter is widespread among taxonomically diverse microbial taxa, while expression studies document the importance of the regulation of expression in decomposition efficiency. Importantly, high-throughput methods of nucleic acid analysis used for the analysis of metagenomes and metatranscriptomes indicate the high diversity of decomposer communities in natural habitats and their taxonomic composition. Today, the metaproteomics of natural habitats is of interest. In combination with advanced analytical techniques to explore the products of decomposition and the accumulation of information on the genomes of environmentally relevant microorganisms, advanced methods in microbial ecophysiology should increase our understanding of the complex processes of organic matter transformation.
From Genomes to Phenotypes: Traitar, the Microbial Trait Analyzer.
Weimann, Aaron; Mooren, Kyra; Frank, Jeremy; Pope, Phillip B; Bremges, Andreas; McHardy, Alice C
2016-01-01
The number of sequenced genomes is growing exponentially, profoundly shifting the bottleneck from data generation to genome interpretation. Traits are often used to characterize and distinguish bacteria and are likely a driving factor in microbial community composition, yet little is known about the traits of most microbes. We describe Traitar, the microbial trait analyzer, which is a fully automated software package for deriving phenotypes from a genome sequence. Traitar provides phenotype classifiers to predict 67 traits related to the use of various substrates as carbon and energy sources, oxygen requirement, morphology, antibiotic susceptibility, proteolysis, and enzymatic activities. Furthermore, it suggests protein families associated with the presence of particular phenotypes. Our method uses L1-regularized L2-loss support vector machines for phenotype assignments based on phyletic patterns of protein families and their evolutionary histories across a diverse set of microbial species. We demonstrate reliable phenotype assignment for Traitar to bacterial genomes from 572 species of eight phyla, also based on incomplete single-cell genomes and simulated draft genomes. We also showcase its application in metagenomics by verifying and complementing a manual metabolic reconstruction of two novel Clostridiales species based on draft genomes recovered from commercial biogas reactors. Traitar is available at https://github.com/hzi-bifo/traitar. IMPORTANCE Bacteria are ubiquitous in our ecosystem and have a major impact on human health, e.g., by supporting digestion in the human gut. Bacterial communities can also aid in biotechnological processes such as wastewater treatment or decontamination of polluted soils. Diverse bacteria contribute with their unique capabilities to the functioning of such ecosystems, but lab experiments to investigate those capabilities are labor-intensive. Major advances in sequencing techniques open up the opportunity to study bacteria by their genome sequences. For this purpose, we have developed Traitar, software that predicts traits of bacteria on the basis of their genomes. It is applicable to studies with tens or hundreds of bacterial genomes. Traitar may help researchers in microbiology to pinpoint the traits of interest, reducing the amount of wet lab work required.
Ortholog Identification and Comparative Analysis of Microbial Genomes Using MBGD and RECOG.
Uchiyama, Ikuo
2017-01-01
Comparative genomics is becoming an essential approach for identification of genes associated with a specific function or phenotype. Here, we introduce the microbial genome database for comparative analysis (MBGD), which is a comprehensive ortholog database among the microbial genomes available so far. MBGD contains several precomputed ortholog tables including the standard ortholog table covering the entire taxonomic range and taxon-specific ortholog tables for various major taxa. In addition, MBGD allows the users to create an ortholog table within any specified set of genomes through dynamic calculations. In particular, MBGD has a "My MBGD" mode where users can upload their original genome sequences and incorporate them into orthology analysis. The created ortholog table can serve as the basis for various comparative analyses. Here, we describe the use of MBGD and briefly explain how to utilize the orthology information during comparative genome analysis in combination with the stand-alone comparative genomics software RECOG, focusing on the application to comparison of closely related microbial genomes.
Identification and Resolution of Microdiversity through Metagenomic Sequencing of Parallel Consortia
Maezato, Yukari; Wu, Yu-Wei; Romine, Margaret F.; Lindemann, Stephen R.
2015-01-01
To gain a predictive understanding of the interspecies interactions within microbial communities that govern community function, the genomic complement of every member population must be determined. Although metagenomic sequencing has enabled the de novo reconstruction of some microbial genomes from environmental communities, microdiversity confounds current genome reconstruction techniques. To overcome this issue, we performed short-read metagenomic sequencing on parallel consortia, defined as consortia cultivated under the same conditions from the same natural community with overlapping species composition. The differences in species abundance between the two consortia allowed reconstruction of near-complete (at an estimated >85% of gene complement) genome sequences for 17 of the 20 detected member species. Two Halomonas spp. indistinguishable by amplicon analysis were found to be present within the community. In addition, comparison of metagenomic reads against the consensus scaffolds revealed within-species variation for one of the Halomonas populations, one of the Rhodobacteraceae populations, and the Rhizobiales population. Genomic comparison of these representative instances of inter- and intraspecies microdiversity suggests differences in functional potential that may result in the expression of distinct roles in the community. In addition, isolation and complete genome sequence determination of six member species allowed an investigation into the sensitivity and specificity of genome reconstruction processes, demonstrating robustness across a wide range of sequence coverage (9× to 2,700×) within the metagenomic data set. PMID:26497460
Identification and Resolution of Microdiversity through Metagenomic Sequencing of Parallel Consortia
DOE Office of Scientific and Technical Information (OSTI.GOV)
Nelson, William C.; Maezato, Yukari; Wu, Yu-Wei
2015-10-23
To gain a predictive understanding of the interspecies interactions within microbial communities that govern community function, the genomic complement of every member population must be determined. Although metagenomic sequencing has enabled thede novoreconstruction of some microbial genomes from environmental communities, microdiversity confounds current genome reconstruction techniques. To overcome this issue, we performed short-read metagenomic sequencing on parallel consortia, defined as consortia cultivated under the same conditions from the same natural community with overlapping species composition. The differences in species abundance between the two consortia allowed reconstruction of near-complete (at an estimated >85% of gene complement) genome sequences for 17 ofmore » the 20 detected member species. TwoHalomonasspp. indistinguishable by amplicon analysis were found to be present within the community. In addition, comparison of metagenomic reads against the consensus scaffolds revealed within-species variation for one of theHalomonaspopulations, one of theRhodobacteraceaepopulations, and theRhizobialespopulation. Genomic comparison of these representative instances of inter- and intraspecies microdiversity suggests differences in functional potential that may result in the expression of distinct roles in the community. In addition, isolation and complete genome sequence determination of six member species allowed an investigation into the sensitivity and specificity of genome reconstruction processes, demonstrating robustness across a wide range of sequence coverage (9× to 2,700×) within the metagenomic data set.« less
Marine, Rachel; McCarren, Coleen; Vorrasane, Vansay; Nasko, Dan; Crowgey, Erin; Polson, Shawn W; Wommack, K Eric
2014-01-30
Shotgun metagenomics has become an important tool for investigating the ecology of microorganisms. Underlying these investigations is the assumption that metagenome sequence data accurately estimates the census of microbial populations. Multiple displacement amplification (MDA) of microbial community DNA is often used in cases where it is difficult to obtain enough DNA for sequencing; however, MDA can result in amplification biases that may impact subsequent estimates of population census from metagenome data. Some have posited that pooling replicate MDA reactions negates these biases and restores the accuracy of population analyses. This assumption has not been empirically tested. Using mock viral communities, we examined the influence of pooling on population-scale analyses. In pooled and single reaction MDA treatments, sequence coverage of viral populations was highly variable and coverage patterns across viral genomes were nearly identical, indicating that initial priming biases were reproducible and that pooling did not alleviate biases. In contrast, control unamplified sequence libraries showed relatively even coverage across phage genomes. MDA should be avoided for metagenomic investigations that require quantitative estimates of microbial taxa and gene functional groups. While MDA is an indispensable technique in applications such as single-cell genomics, amplification biases cannot be overcome by combining replicate MDA reactions. Alternative library preparation techniques should be utilized for quantitative microbial ecology studies utilizing metagenomic sequencing approaches.
High-resolution phylogenetic microbial community profiling
DOE Office of Scientific and Technical Information (OSTI.GOV)
Singer, Esther; Coleman-Derr, Devin; Bowman, Brett
2014-03-17
The representation of bacterial and archaeal genome sequences is strongly biased towards cultivated organisms, which belong to merely four phylogenetic groups. Functional information and inter-phylum level relationships are still largely underexplored for candidate phyla, which are often referred to as microbial dark matter. Furthermore, a large portion of the 16S rRNA gene records in the GenBank database are labeled as environmental samples and unclassified, which is in part due to low read accuracy, potential chimeric sequences produced during PCR amplifications and the low resolution of short amplicons. In order to improve the phylogenetic classification of novel species and advance ourmore » knowledge of the ecosystem function of uncultivated microorganisms, high-throughput full length 16S rRNA gene sequencing methodologies with reduced biases are needed. We evaluated the performance of PacBio single-molecule real-time (SMRT) sequencing in high-resolution phylogenetic microbial community profiling. For this purpose, we compared PacBio and Illumina metagenomic shotgun and 16S rRNA gene sequencing of a mock community as well as of an environmental sample from Sakinaw Lake, British Columbia. Sakinaw Lake is known to contain a large age of microbial species from candidate phyla. Sequencing results show that community structure based on PacBio shotgun and 16S rRNA gene sequences is highly similar in both the mock and the environmental communities. Resolution power and community representation accuracy from SMRT sequencing data appeared to be independent of GC content of microbial genomes and was higher when compared to Illumina-based metagenome shotgun and 16S rRNA gene (iTag) sequences, e.g. full-length sequencing resolved all 23 OTUs in the mock community, while iTags did not resolve closely related species. SMRT sequencing hence offers various potential benefits when characterizing uncharted microbial communities.« less
Hiessl, Sebastian; Schuldes, Jörg; Thürmer, Andrea; Halbsguth, Tobias; Bröker, Daniel; Angelov, Angel; Liebl, Wolfgang; Daniel, Rolf
2012-01-01
The increasing production of synthetic and natural poly(cis-1,4-isoprene) rubber leads to huge challenges in waste management. Only a few bacteria are known to degrade rubber, and little is known about the mechanism of microbial rubber degradation. The genome of Gordonia polyisoprenivorans strain VH2, which is one of the most effective rubber-degrading bacteria, was sequenced and annotated to elucidate the degradation pathway and other features of this actinomycete. The genome consists of a circular chromosome of 5,669,805 bp and a circular plasmid of 174,494 bp with average GC contents of 67.0% and 65.7%, respectively. It contains 5,110 putative protein-coding sequences, including many candidate genes responsible for rubber degradation and other biotechnically relevant pathways. Furthermore, we detected two homologues of a latex-clearing protein, which is supposed to be a key enzyme in rubber degradation. The deletion of these two genes for the first time revealed clear evidence that latex-clearing protein is essential for the microbial utilization of rubber. Based on the genome sequence, we predict a pathway for the microbial degradation of rubber which is supported by previous and current data on transposon mutagenesis, deletion mutants, applied comparative genomics, and literature search. PMID:22327575
Lim, Yan Wei; Cuevas, Daniel A.; Silva, Genivaldo Gueiros Z.; Aguinaldo, Kristen; Dinsdale, Elizabeth A.; Haas, Andreas F.; Hatay, Mark; Sanchez, Savannah E.; Wegley-Kelly, Linda; Dutilh, Bas E.; Harkins, Timothy T.; Lee, Clarence C.; Tom, Warren; Sandin, Stuart A.; Smith, Jennifer E.; Zgliczynski, Brian; Vermeij, Mark J.A.; Rohwer, Forest
2014-01-01
Genomics and metagenomics have revolutionized our understanding of marine microbial ecology and the importance of microbes in global geochemical cycles. However, the process of DNA sequencing has always been an abstract extension of the research expedition, completed once the samples were returned to the laboratory. During the 2013 Southern Line Islands Research Expedition, we started the first effort to bring next generation sequencing to some of the most remote locations on our planet. We successfully sequenced twenty six marine microbial genomes, and two marine microbial metagenomes using the Ion Torrent PGM platform on the Merchant Yacht Hanse Explorer. Onboard sequence assembly, annotation, and analysis enabled us to investigate the role of the microbes in the coral reef ecology of these islands and atolls. This analysis identified phosphonate as an important phosphorous source for microbes growing in the Line Islands and reinforced the importance of L-serine in marine microbial ecosystems. Sequencing in the field allowed us to propose hypotheses and conduct experiments and further sampling based on the sequences generated. By eliminating the delay between sampling and sequencing, we enhanced the productivity of the research expedition. By overcoming the hurdles associated with sequencing on a boat in the middle of the Pacific Ocean we proved the flexibility of the sequencing, annotation, and analysis pipelines. PMID:25177534
Lim, Yan Wei; Cuevas, Daniel A; Silva, Genivaldo Gueiros Z; Aguinaldo, Kristen; Dinsdale, Elizabeth A; Haas, Andreas F; Hatay, Mark; Sanchez, Savannah E; Wegley-Kelly, Linda; Dutilh, Bas E; Harkins, Timothy T; Lee, Clarence C; Tom, Warren; Sandin, Stuart A; Smith, Jennifer E; Zgliczynski, Brian; Vermeij, Mark J A; Rohwer, Forest; Edwards, Robert A
2014-01-01
Genomics and metagenomics have revolutionized our understanding of marine microbial ecology and the importance of microbes in global geochemical cycles. However, the process of DNA sequencing has always been an abstract extension of the research expedition, completed once the samples were returned to the laboratory. During the 2013 Southern Line Islands Research Expedition, we started the first effort to bring next generation sequencing to some of the most remote locations on our planet. We successfully sequenced twenty six marine microbial genomes, and two marine microbial metagenomes using the Ion Torrent PGM platform on the Merchant Yacht Hanse Explorer. Onboard sequence assembly, annotation, and analysis enabled us to investigate the role of the microbes in the coral reef ecology of these islands and atolls. This analysis identified phosphonate as an important phosphorous source for microbes growing in the Line Islands and reinforced the importance of L-serine in marine microbial ecosystems. Sequencing in the field allowed us to propose hypotheses and conduct experiments and further sampling based on the sequences generated. By eliminating the delay between sampling and sequencing, we enhanced the productivity of the research expedition. By overcoming the hurdles associated with sequencing on a boat in the middle of the Pacific Ocean we proved the flexibility of the sequencing, annotation, and analysis pipelines.
Microbial species delineation using whole genome sequences
Varghese, Neha J.; Mukherjee, Supratim; Ivanova, Natalia; Konstantinidis, Konstantinos T.; Mavrommatis, Kostas; Kyrpides, Nikos C.; Pati, Amrita
2015-01-01
Increased sequencing of microbial genomes has revealed that prevailing prokaryotic species assignments can be inconsistent with whole genome information for a significant number of species. The long-standing need for a systematic and scalable species assignment technique can be met by the genome-wide Average Nucleotide Identity (gANI) metric, which is widely acknowledged as a robust measure of genomic relatedness. In this work, we demonstrate that the combination of gANI and the alignment fraction (AF) between two genomes accurately reflects their genomic relatedness. We introduce an efficient implementation of AF,gANI and discuss its successful application to 86.5M genome pairs between 13,151 prokaryotic genomes assigned to 3032 species. Subsequently, by comparing the genome clusters obtained from complete linkage clustering of these pairs to existing taxonomy, we observed that nearly 18% of all prokaryotic species suffer from anomalies in species definition. Our results can be used to explore central questions such as whether microorganisms form a continuum of genetic diversity or distinct species represented by distinct genetic signatures. We propose that this precise and objective AF,gANI-based species definition: the MiSI (Microbial Species Identifier) method, be used to address previous inconsistencies in species classification and as the primary guide for new taxonomic species assignment, supplemented by the traditional polyphasic approach, as required. PMID:26150420
A Statistical Framework for the Functional Analysis of Metagenomes
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sharon, Itai; Pati, Amrita; Markowitz, Victor
2008-10-01
Metagenomic studies consider the genetic makeup of microbial communities as a whole, rather than their individual member organisms. The functional and metabolic potential of microbial communities can be analyzed by comparing the relative abundance of gene families in their collective genomic sequences (metagenome) under different conditions. Such comparisons require accurate estimation of gene family frequencies. They present a statistical framework for assessing these frequencies based on the Lander-Waterman theory developed originally for Whole Genome Shotgun (WGS) sequencing projects. They also provide a novel method for assessing the reliability of the estimations which can be used for removing seemingly unreliable measurements.more » They tested their method on a wide range of datasets, including simulated genomes and real WGS data from sequencing projects of whole genomes. Results suggest that their framework corrects inherent biases in accepted methods and provides a good approximation to the true statistics of gene families in WGS projects.« less
The Comprehensive Microbial Resource.
Peterson, J D; Umayam, L A; Dickinson, T; Hickey, E K; White, O
2001-01-01
One challenge presented by large-scale genome sequencing efforts is effective display of uniform information to the scientific community. The Comprehensive Microbial Resource (CMR) contains robust annotation of all complete microbial genomes and allows for a wide variety of data retrievals. The bacterial information has been placed on the Web at http://www.tigr.org/CMR for retrieval using standard web browsing technology. Retrievals can be based on protein properties such as molecular weight or hydrophobicity, GC-content, functional role assignments and taxonomy. The CMR also has special web-based tools to allow data mining using pre-run homology searches, whole genome dot-plots, batch downloading and traversal across genomes using a variety of datatypes.
The draft genome sequence and annotation of the desert woodrat Neotoma lepida.
Campbell, Michael; Oakeson, Kelly F; Yandell, Mark; Halpert, James R; Dearing, Denise
2016-09-01
We present the de novo draft genome sequence for a vertebrate mammalian herbivore, the desert woodrat (Neotoma lepida). This species is of ecological and evolutionary interest with respect to ingestion, microbial detoxification and hepatic metabolism of toxic plant secondary compounds from the highly toxic creosote bush (Larrea tridentata) and the juniper shrub (Juniperus monosperma). The draft genome sequence and annotation have been deposited at GenBank under the accession LZPO01000000.
MetaSort untangles metagenome assembly by reducing microbial community complexity
Ji, Peifeng; Zhang, Yanming; Wang, Jinfeng; Zhao, Fangqing
2017-01-01
Most current approaches to analyse metagenomic data rely on reference genomes. Novel microbial communities extend far beyond the coverage of reference databases and de novo metagenome assembly from complex microbial communities remains a great challenge. Here we present a novel experimental and bioinformatic framework, metaSort, for effective construction of bacterial genomes from metagenomic samples. MetaSort provides a sorted mini-metagenome approach based on flow cytometry and single-cell sequencing methodologies, and employs new computational algorithms to efficiently recover high-quality genomes from the sorted mini-metagenome by the complementary of the original metagenome. Through extensive evaluations, we demonstrated that metaSort has an excellent and unbiased performance on genome recovery and assembly. Furthermore, we applied metaSort to an unexplored microflora colonized on the surface of marine kelp and successfully recovered 75 high-quality genomes at one time. This approach will greatly improve access to microbial genomes from complex or novel communities. PMID:28112173
Genome assembly reborn: recent computational challenges
2009-01-01
Research into genome assembly algorithms has experienced a resurgence due to new challenges created by the development of next generation sequencing technologies. Several genome assemblers have been published in recent years specifically targeted at the new sequence data; however, the ever-changing technological landscape leads to the need for continued research. In addition, the low cost of next generation sequencing data has led to an increased use of sequencing in new settings. For example, the new field of metagenomics relies on large-scale sequencing of entire microbial communities instead of isolate genomes, leading to new computational challenges. In this article, we outline the major algorithmic approaches for genome assembly and describe recent developments in this domain. PMID:19482960
Genomic Encyclopedia of Type Strains, Phase I: The one thousand microbial genomes (KMG-I) project
Kyrpides, Nikos C.; Woyke, Tanja; Eisen, Jonathan A.; ...
2014-06-15
The Genomic Encyclopedia of Bacteria and Archaea (GEBA) project was launched by the JGI in 2007 as a pilot project with the objective of sequencing 250 bacterial and archaeal genomes. The two major goals of that project were (a) to test the hypothesis that there are many benefits to the use the phylogenetic diversity of organisms in the tree of life as a primary criterion for generating their genome sequence and (b) to develop the necessary framework, technology and organization for large-scale sequencing of microbial isolate genomes. While the GEBA pilot project has not yet been entirely completed, both ofmore » the original goals have already been successfully accomplished, leading the way for the next phase of the project. Here we propose taking the GEBA project to the next level, by generating high quality draft genomes for 1,000 bacterial and archaeal strains. This represents a combined 16-fold increase in both scale and speed as compared to the GEBA pilot project (250 isolate genomes in 4+ years). We will follow a similar approach for organism selection and sequencing prioritization as was done for the GEBA pilot project (i.e. phylogenetic novelty, availability and growth of cultures of type strains and DNA extraction capability), focusing on type strains as this ensures reproducibility of our results and provides the strongest linkage between genome sequences and other knowledge about each strain. In turn, this project will constitute a pilot phase of a larger effort that will target the genome sequences of all available type strains of the Bacteria and Archaea.« less
Genomic Encyclopedia of Type Strains, Phase I: The one thousand microbial genomes (KMG-I) project
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kyrpides, Nikos C.; Woyke, Tanja; Eisen, Jonathan A.
The Genomic Encyclopedia of Bacteria and Archaea (GEBA) project was launched by the JGI in 2007 as a pilot project with the objective of sequencing 250 bacterial and archaeal genomes. The two major goals of that project were (a) to test the hypothesis that there are many benefits to the use the phylogenetic diversity of organisms in the tree of life as a primary criterion for generating their genome sequence and (b) to develop the necessary framework, technology and organization for large-scale sequencing of microbial isolate genomes. While the GEBA pilot project has not yet been entirely completed, both ofmore » the original goals have already been successfully accomplished, leading the way for the next phase of the project. Here we propose taking the GEBA project to the next level, by generating high quality draft genomes for 1,000 bacterial and archaeal strains. This represents a combined 16-fold increase in both scale and speed as compared to the GEBA pilot project (250 isolate genomes in 4+ years). We will follow a similar approach for organism selection and sequencing prioritization as was done for the GEBA pilot project (i.e. phylogenetic novelty, availability and growth of cultures of type strains and DNA extraction capability), focusing on type strains as this ensures reproducibility of our results and provides the strongest linkage between genome sequences and other knowledge about each strain. In turn, this project will constitute a pilot phase of a larger effort that will target the genome sequences of all available type strains of the Bacteria and Archaea.« less
Microbial genome analysis: the COG approach.
Galperin, Michael Y; Kristensen, David M; Makarova, Kira S; Wolf, Yuri I; Koonin, Eugene V
2017-09-14
For the past 20 years, the Clusters of Orthologous Genes (COG) database had been a popular tool for microbial genome annotation and comparative genomics. Initially created for the purpose of evolutionary classification of protein families, the COG have been used, apart from straightforward functional annotation of sequenced genomes, for such tasks as (i) unification of genome annotation in groups of related organisms; (ii) identification of missing and/or undetected genes in complete microbial genomes; (iii) analysis of genomic neighborhoods, in many cases allowing prediction of novel functional systems; (iv) analysis of metabolic pathways and prediction of alternative forms of enzymes; (v) comparison of organisms by COG functional categories; and (vi) prioritization of targets for structural and functional characterization. Here we review the principles of the COG approach and discuss its key advantages and drawbacks in microbial genome analysis. Published by Oxford University Press 2017. This work is written by US Government employees and is in the public domain in the US.
Complete genome sequence of Streptosporangium roseum type strain (NI 9100T)
DOE Office of Scientific and Technical Information (OSTI.GOV)
Nolan, Matt; Sikorski, Johannes; Jando, Marlen
2010-01-01
Streptosporangium roseum Crauch 1955 is the type strain of the species which is the type species of the genus Streptosporangium. The pinkish coiled Streptomyces-like organism with a spore case was isolated from vegetable garden soil in 1955. Here we describe the features of this organism, together with the complete genome sequence and annotation. This is the first completed genome sequence of a member of the family Streptosporangiaceae, and the second largest microbial genome sequence ever deciphered. The 10,369,518 bp long genome with its 9421 protein-coding and 80 RNA genes is a part of the Genomic Encyclopedia of Bacteria and Archaeamore » project.« less
Genome and Transcriptome Sequencing of the Ostreid herpesvirus 1 From Tomales Bay, California
NASA Astrophysics Data System (ADS)
Burge, C. A.; Langevin, S.; Closek, C. J.; Roberts, S. B.; Friedman, C. S.
2016-02-01
Mass mortalities of larval and seed bivalve molluscs attributed to the Ostreid herpesvirus 1 (OsHV-1) occur globally. OsHV-1 was fully sequenced and characterized as a member of the Family Malacoherpesviridae. Multiple strains of OsHV-1 exist and may vary in virulence, i.e. OsHV-1 µvar. For most global variants of OsHV-1, sequence data is limited to PCR-based sequencing of segments, including two recent genomes. In the United States, OsHV-1 is limited to detection in adjacent embayments in California, Tomales and Drakes bays. Limited DNA sequence data of OsHV-1 infecting oysters in Tomales Bay indicates the virus detected in Tomales Bay is similar but not identical to any one global variant of OsHV-1. In order to better understand both strain variation and virulence of OsHV-1 infecting oysters in Tomales Bay, we used genomic and transcriptomic sequencing. Meta-genomic sequencing (Illumina MiSeq) was conducted from infected oysters (n=4 per year) collected in 2003, 2007, and 2014, where full OsHV-1 genome sequences and low overall microbial diversity were achieved from highly infected oysters. Increased microbial diversity was detected in three of four samples sequenced from 2003, where qPCR based genome copy numbers of OsHV-1 were lower. Expression analysis (SOLiD RNA sequencing) of OsHV-1 genes expressed in oyster larvae at 24 hours post exposure revealed a nearly complete transcriptome, with several highly expressed genes, which are similar to recent transcriptomic analyses of other OsHV-1 variants. Taken together, our results indicate that genome and transcriptome sequencing may be powerful tools in understanding both strain variation and virulence of non-culturable marine viruses.
Comparative genomic analysis by microbial COGs self-attraction rate.
Santoni, Daniele; Romano-Spica, Vincenzo
2009-06-21
Whole genome analysis provides new perspectives to determine phylogenetic relationships among microorganisms. The availability of whole nucleotide sequences allows different levels of comparison among genomes by several approaches. In this work, self-attraction rates were considered for each cluster of orthologous groups of proteins (COGs) class in order to analyse gene aggregation levels in physical maps. Phylogenetic relationships among microorganisms were obtained by comparing self-attraction coefficients. Eighteen-dimensional vectors were computed for a set of 168 completely sequenced microbial genomes (19 archea, 149 bacteria). The components of the vector represent the aggregation rate of the genes belonging to each of 18 COGs classes. Genes involved in nonessential functions or related to environmental conditions showed the highest aggregation rates. On the contrary genes involved in basic cellular tasks showed a more uniform distribution along the genome, except for translation genes. Self-attraction clustering approach allowed classification of Proteobacteria, Bacilli and other species belonging to Firmicutes. Rearrangement and Lateral Gene Transfer events may influence divergences from classical taxonomy. Each set of COG classes' aggregation values represents an intrinsic property of the microbial genome. This novel approach provides a new point of view for whole genome analysis and bacterial characterization.
Caboche, Ségolène; Even, Gaël; Loywick, Alexandre; Audebert, Christophe; Hot, David
2017-12-19
The increase in available sequence data has advanced the field of microbiology; however, making sense of these data without bioinformatics skills is still problematic. We describe MICRA, an automatic pipeline, available as a web interface, for microbial identification and characterization through reads analysis. MICRA uses iterative mapping against reference genomes to identify genes and variations. Additional modules allow prediction of antibiotic susceptibility and resistance and comparing the results of several samples. MICRA is fast, producing few false-positive annotations and variant calls compared to current methods, making it a tool of great interest for fully exploiting sequencing data.
Single sample resolution of rare microbial dark matter in a marine invertebrate metagenome
DOE Office of Scientific and Technical Information (OSTI.GOV)
Miller, Ian J.; Weyna, Theodore R.; Fong, Stephen S.
Direct, untargeted sequencing of environmental samples (metagenomics) and de novo genome assembly enable the study of uncultured and phylogenetically divergent organisms. However, separating individual genomes from a mixed community has often relied on the differential-coverage analysis of multiple, deeply sequenced samples. In the metagenomic investigation of the marine bryozoan Bugula neritina, we uncovered seven bacterial genomes associated with a single B. neritina individual that appeared to be transient associates, two of which were unique to one individual and undetectable using certain “universal” 16S rRNA primers and probes. We recovered high quality genome assemblies for several rare instances of “microbial darkmore » matter,” or phylogenetically divergent bacteria lacking genomes in reference databases, from a single tissue sample that was not subjected to any physical or chemical pre-treatment. One of these rare, divergent organisms has a small (593 kbp), poorly annotated genome with low GC content (20.9%) and a 16S rRNA gene with just 65% sequence similarity to the closest reference sequence. Lastly, our findings illustrate the importance of sampling strategy and de novo assembly of metagenomic reads to understand the extent and function of bacterial biodiversity.« less
Single sample resolution of rare microbial dark matter in a marine invertebrate metagenome
Miller, Ian J.; Weyna, Theodore R.; Fong, Stephen S.; ...
2016-09-29
Direct, untargeted sequencing of environmental samples (metagenomics) and de novo genome assembly enable the study of uncultured and phylogenetically divergent organisms. However, separating individual genomes from a mixed community has often relied on the differential-coverage analysis of multiple, deeply sequenced samples. In the metagenomic investigation of the marine bryozoan Bugula neritina, we uncovered seven bacterial genomes associated with a single B. neritina individual that appeared to be transient associates, two of which were unique to one individual and undetectable using certain “universal” 16S rRNA primers and probes. We recovered high quality genome assemblies for several rare instances of “microbial darkmore » matter,” or phylogenetically divergent bacteria lacking genomes in reference databases, from a single tissue sample that was not subjected to any physical or chemical pre-treatment. One of these rare, divergent organisms has a small (593 kbp), poorly annotated genome with low GC content (20.9%) and a 16S rRNA gene with just 65% sequence similarity to the closest reference sequence. Lastly, our findings illustrate the importance of sampling strategy and de novo assembly of metagenomic reads to understand the extent and function of bacterial biodiversity.« less
Gene finding in metatranscriptomic sequences.
Ismail, Wazim Mohammed; Ye, Yuzhen; Tang, Haixu
2014-01-01
Metatranscriptomic sequencing is a highly sensitive bioassay of functional activity in a microbial community, providing complementary information to the metagenomic sequencing of the community. The acquisition of the metatranscriptomic sequences will enable us to refine the annotations of the metagenomes, and to study the gene activities and their regulation in complex microbial communities and their dynamics. In this paper, we present TransGeneScan, a software tool for finding genes in assembled transcripts from metatranscriptomic sequences. By incorporating several features of metatranscriptomic sequencing, including strand-specificity, short intergenic regions, and putative antisense transcripts into a Hidden Markov Model, TranGeneScan can predict a sense transcript containing one or multiple genes (in an operon) or an antisense transcript. We tested TransGeneScan on a mock metatranscriptomic data set containing three known bacterial genomes. The results showed that TranGeneScan performs better than metagenomic gene finders (MetaGeneMark and FragGeneScan) on predicting protein coding genes in assembled transcripts, and achieves comparable or even higher accuracy than gene finders for microbial genomes (Glimmer and GeneMark). These results imply, with the assistance of metatranscriptomic sequencing, we can obtain a broad and precise picture about the genes (and their functions) in a microbial community. TransGeneScan is available as open-source software on SourceForge at https://sourceforge.net/projects/transgenescan/.
NASA Astrophysics Data System (ADS)
Anantharaman, K.; Brown, C. T.; Hug, L. A.; Sharon, I.; Castelle, C. J.; Shelton, A.; Bonet, B.; Probst, A. J.; Thomas, B. C.; Singh, A.; Wilkins, M.; Williams, K. H.; Tringe, S. G.; Beller, H. R.; Brodie, E.; Hubbard, S. S.; Banfield, J. F.
2015-12-01
Microorganisms drive the transformations of carbon compounds in the terrestrial subsurface, a key reservoir of carbon on earth, and impact other linked biogeochemical cycles. Our current knowledge of the microbial ecology in this environment is primarily based on 16S rRNA gene sequences that paint a biased picture of microbial community composition and provide no reliable information on microbial metabolism. Consequently, little is known about the identity and metabolic roles of the uncultivated microbial majority in the subsurface. In turn, this lack of understanding of the microbial processes that impact the turnover of carbon in the subsurface has restricted the scope and ability of biogeochemical models to capture key aspects of the carbon cycle. In this study, we used a culture-independent, genome-resolved metagenomic approach to decipher the metabolic capabilities of microorganisms in an aquifer adjacent to the Colorado River, near Rifle, CO, USA. We sequenced groundwater and sediment samples collected across fifteen different geochemical regimes. Sequence assembly, binning and manual curation resulted in the recovery of 2,542 high-quality genomes, 27 of which are complete. These genomes represent 1,300 non-redundant organisms comprising both abundant and rare community members. Phylogenetic analyses involving ribosomal proteins and 16S rRNA genes revealed the presence of up to 34 new phyla that were hitherto unknown. Less than 11% of all genomes belonged to the 4 most commonly represented phyla that constitute 93% of all currently available genomes. Genome-specific analyses of metabolic potential revealed the co-occurrence of important functional traits such as carbon fixation, nitrogen fixation and use of electron donors and electron acceptors. Finally, we predict that multiple organisms are often required to complete redox pathways through a complex network of metabolic handoffs that extensively cross-link subsurface biogeochemical cycles.
Novel approaches in function-driven single-cell genomics
DOE Office of Scientific and Technical Information (OSTI.GOV)
Doud, Devin F. R.; Woyke, Tanja
Deeper sequencing and improved bioinformatics in conjunction with single-cell and metagenomic approaches continue to illuminate undercharacterized environmental microbial communities. This has propelled the 'who is there, and what might they be doing' paradigm to the uncultivated and has already radically changed the topology of the tree of life and provided key insights into the microbial contribution to biogeochemistry. While characterization of 'who' based on marker genes can describe a large fraction of the community, answering 'what are they doing' remains the elusive pinnacle for microbiology. Function-driven single-cell genomics provides a solution by using a function-based screen to subsample complex microbialmore » communities in a targeted manner for the isolation and genome sequencing of single cells. This enables single-cell sequencing to be focused on cells with specific phenotypic or metabolic characteristics of interest. Recovered genomes are conclusively implicated for both encoding and exhibiting the feature of interest, improving downstream annotation and revealing activity levels within that environment. This emerging approach has already improved our understanding of microbial community functioning and facilitated the experimental analysis of uncharacterized gene product space. Here we provide a comprehensive review of strategies that have been applied for function-driven single-cell genomics and the future directions we envision.« less
Novel approaches in function-driven single-cell genomics
Doud, Devin F. R.; Woyke, Tanja
2017-06-07
Deeper sequencing and improved bioinformatics in conjunction with single-cell and metagenomic approaches continue to illuminate undercharacterized environmental microbial communities. This has propelled the 'who is there, and what might they be doing' paradigm to the uncultivated and has already radically changed the topology of the tree of life and provided key insights into the microbial contribution to biogeochemistry. While characterization of 'who' based on marker genes can describe a large fraction of the community, answering 'what are they doing' remains the elusive pinnacle for microbiology. Function-driven single-cell genomics provides a solution by using a function-based screen to subsample complex microbialmore » communities in a targeted manner for the isolation and genome sequencing of single cells. This enables single-cell sequencing to be focused on cells with specific phenotypic or metabolic characteristics of interest. Recovered genomes are conclusively implicated for both encoding and exhibiting the feature of interest, improving downstream annotation and revealing activity levels within that environment. This emerging approach has already improved our understanding of microbial community functioning and facilitated the experimental analysis of uncharacterized gene product space. Here we provide a comprehensive review of strategies that have been applied for function-driven single-cell genomics and the future directions we envision.« less
Whole-Genome Sequencing in Microbial Forensic Analysis of Gamma-Irradiated Microbial Materials.
Broomall, Stacey M; Ait Ichou, Mohamed; Krepps, Michael D; Johnsky, Lauren A; Karavis, Mark A; Hubbard, Kyle S; Insalaco, Joseph M; Betters, Janet L; Redmond, Brady W; Rivers, Bryan A; Liem, Alvin T; Hill, Jessica M; Fochler, Edward T; Roth, Pierce A; Rosenzweig, C Nicole; Skowronski, Evan W; Gibbons, Henry S
2016-01-15
Effective microbial forensic analysis of materials used in a potential biological attack requires robust methods of morphological and genetic characterization of the attack materials in order to enable the attribution of the materials to potential sources and to exclude other potential sources. The genetic homogeneity and potential intersample variability of many of the category A to C bioterrorism agents offer a particular challenge to the generation of attributive signatures, potentially requiring whole-genome or proteomic approaches to be utilized. Currently, irradiation of mail is standard practice at several government facilities judged to be at particularly high risk. Thus, initial forensic signatures would need to be recovered from inactivated (nonviable) material. In the study described in this report, we determined the effects of high-dose gamma irradiation on forensic markers of bacterial biothreat agent surrogate organisms with a particular emphasis on the suitability of genomic DNA (gDNA) recovered from such sources as a template for whole-genome analysis. While irradiation of spores and vegetative cells affected the retention of Gram and spore stains and sheared gDNA into small fragments, we found that irradiated material could be utilized to generate accurate whole-genome sequence data on the Illumina and Roche 454 sequencing platforms. Copyright © 2016, American Society for Microbiology. All Rights Reserved.
NGSPanPipe: A Pipeline for Pan-genome Identification in Microbial Strains from Experimental Reads.
Kulsum, Umay; Kapil, Arti; Singh, Harpreet; Kaur, Punit
2018-01-01
Recent advancements in sequencing technologies have decreased both time span and cost for sequencing the whole bacterial genome. High-throughput Next-Generation Sequencing (NGS) technology has led to the generation of enormous data concerning microbial populations publically available across various repositories. As a consequence, it has become possible to study and compare the genomes of different bacterial strains within a species or genus in terms of evolution, ecology and diversity. Studying the pan-genome provides insights into deciphering microevolution, global composition and diversity in virulence and pathogenesis of a species. It can also assist in identifying drug targets and proposing vaccine candidates. The effective analysis of these large genome datasets necessitates the development of robust tools. Current methods to develop pan-genome do not support direct input of raw reads from the sequencer machine but require preprocessing of reads as an assembled protein/gene sequence file or the binary matrix of orthologous genes/proteins. We have designed an easy-to-use integrated pipeline, NGSPanPipe, which can directly identify the pan-genome from short reads. The output from the pipeline is compatible with other pan-genome analysis tools. We evaluated our pipeline with other methods for developing pan-genome, i.e. reference-based assembly and de novo assembly using simulated reads of Mycobacterium tuberculosis. The single script pipeline (pipeline.pl) is applicable for all bacterial strains. It integrates multiple in-house Perl scripts and is freely accessible from https://github.com/Biomedinformatics/NGSPanPipe .
Genome sequence of a microbial lipid producing fungus Cryptococcus albidus NT2002.
Yong, Xiaoyu; Yan, Zhiying; Xu, Lin; Zhou, Jun; Wu, Xiayuan; Wu, Yuandong; Li, Yang; Chen, Zugeng; Zhou, Hua; Wei, Ping; Jia, Honghua
2016-04-10
Cryptococcus albidus NT2002, isolated from the soil in Xinjiang, China, appeared to have the ability to accumulate microbial lipid by utilizing various carbon sources. The predominant properties make it as a potential bio-platform for biodiesel production. Here, we report the complete genome sequence of C. albidus NT2002, which might provide a basis for further elucidation of the genetic background of this promising strain for developing metabolic engineering strategies to produce biodiesel in a green and sustainable manner. Copyright © 2016 Elsevier B.V. All rights reserved.
Mapping genomic features to functional traits through microbial whole genome sequences.
Zhang, Wei; Zeng, Erliang; Liu, Dan; Jones, Stuart E; Emrich, Scott
2014-01-01
Recently, the utility of trait-based approaches for microbial communities has been identified. Increasing availability of whole genome sequences provide the opportunity to explore the genetic foundations of a variety of functional traits. We proposed a machine learning framework to quantitatively link the genomic features with functional traits. Genes from bacteria genomes belonging to different functional traits were grouped to Cluster of Orthologs (COGs), and were used as features. Then, TF-IDF technique from the text mining domain was applied to transform the data to accommodate the abundance and importance of each COG. After TF-IDF processing, COGs were ranked using feature selection methods to identify their relevance to the functional trait of interest. Extensive experimental results demonstrated that functional trait related genes can be detected using our method. Further, the method has the potential to provide novel biological insights.
Zook, Justin M.; Morrow, Jayne B.; Lin, Nancy J.
2017-01-01
High sensitivity methods such as next generation sequencing and polymerase chain reaction (PCR) are adversely impacted by organismal and DNA contaminants. Current methods for detecting contaminants in microbial materials (genomic DNA and cultures) are not sensitive enough and require either a known or culturable contaminant. Whole genome sequencing (WGS) is a promising approach for detecting contaminants due to its sensitivity and lack of need for a priori assumptions about the contaminant. Prior to applying WGS, we must first understand its limitations for detecting contaminants and potential for false positives. Herein we demonstrate and characterize a WGS-based approach to detect organismal contaminants using an existing metagenomic taxonomic classification algorithm. Simulated WGS datasets from ten genera as individuals and binary mixtures of eight organisms at varying ratios were analyzed to evaluate the role of contaminant concentration and taxonomy on detection. For the individual genomes the false positive contaminants reported depended on the genus, with Staphylococcus, Escherichia, and Shigella having the highest proportion of false positives. For nearly all binary mixtures the contaminant was detected in the in-silico datasets at the equivalent of 1 in 1,000 cells, though F. tularensis was not detected in any of the simulated contaminant mixtures and Y. pestis was only detected at the equivalent of one in 10 cells. Once a WGS method for detecting contaminants is characterized, it can be applied to evaluate microbial material purity, in efforts to ensure that contaminants are characterized in microbial materials used to validate pathogen detection assays, generate genome assemblies for database submission, and benchmark sequencing methods. PMID:28924496
MPD: a pathogen genome and metagenome database
Zhang, Tingting; Miao, Jiaojiao; Han, Na; Qiang, Yujun; Zhang, Wen
2018-01-01
Abstract Advances in high-throughput sequencing have led to unprecedented growth in the amount of available genome sequencing data, especially for bacterial genomes, which has been accompanied by a challenge for the storage and management of such huge datasets. To facilitate bacterial research and related studies, we have developed the Mypathogen database (MPD), which provides access to users for searching, downloading, storing and sharing bacterial genomics data. The MPD represents the first pathogenic database for microbial genomes and metagenomes, and currently covers pathogenic microbial genomes (6604 genera, 11 071 species, 41 906 strains) and metagenomic data from host, air, water and other sources (28 816 samples). The MPD also functions as a management system for statistical and storage data that can be used by different organizations, thereby facilitating data sharing among different organizations and research groups. A user-friendly local client tool is provided to maintain the steady transmission of big sequencing data. The MPD is a useful tool for analysis and management in genomic research, especially for clinical Centers for Disease Control and epidemiological studies, and is expected to contribute to advancing knowledge on pathogenic bacteria genomes and metagenomes. Database URL: http://data.mypathogen.org PMID:29917040
High-throughput automated microfluidic sample preparation for accurate microbial genomics
Kim, Soohong; De Jonghe, Joachim; Kulesa, Anthony B.; Feldman, David; Vatanen, Tommi; Bhattacharyya, Roby P.; Berdy, Brittany; Gomez, James; Nolan, Jill; Epstein, Slava; Blainey, Paul C.
2017-01-01
Low-cost shotgun DNA sequencing is transforming the microbial sciences. Sequencing instruments are so effective that sample preparation is now the key limiting factor. Here, we introduce a microfluidic sample preparation platform that integrates the key steps in cells to sequence library sample preparation for up to 96 samples and reduces DNA input requirements 100-fold while maintaining or improving data quality. The general-purpose microarchitecture we demonstrate supports workflows with arbitrary numbers of reaction and clean-up or capture steps. By reducing the sample quantity requirements, we enabled low-input (∼10,000 cells) whole-genome shotgun (WGS) sequencing of Mycobacterium tuberculosis and soil micro-colonies with superior results. We also leveraged the enhanced throughput to sequence ∼400 clinical Pseudomonas aeruginosa libraries and demonstrate excellent single-nucleotide polymorphism detection performance that explained phenotypically observed antibiotic resistance. Fully-integrated lab-on-chip sample preparation overcomes technical barriers to enable broader deployment of genomics across many basic research and translational applications. PMID:28128213
Microbial Genome Analysis and Comparisons: Web-based Protocols and Resources
USDA-ARS?s Scientific Manuscript database
Fully annotated genome sequences of many microorganisms are publicly available as a resource. However, in-depth analysis of these genomes using specialized tools is required to derive meaningful information. We describe here the utility of three powerful publicly available genome databases and ana...
Gene context analysis in the Integrated Microbial Genomes (IMG) data management system.
Mavromatis, Konstantinos; Chu, Ken; Ivanova, Natalia; Hooper, Sean D; Markowitz, Victor M; Kyrpides, Nikos C
2009-11-24
Computational methods for determining the function of genes in newly sequenced genomes have been traditionally based on sequence similarity to genes whose function has been identified experimentally. Function prediction methods can be extended using gene context analysis approaches such as examining the conservation of chromosomal gene clusters, gene fusion events and co-occurrence profiles across genomes. Context analysis is based on the observation that functionally related genes are often having similar gene context and relies on the identification of such events across phylogenetically diverse collection of genomes. We have used the data management system of the Integrated Microbial Genomes (IMG) as the framework to implement and explore the power of gene context analysis methods because it provides one of the largest available genome integrations. Visualization and search tools to facilitate gene context analysis have been developed and applied across all publicly available archaeal and bacterial genomes in IMG. These computations are now maintained as part of IMG's regular genome content update cycle. IMG is available at: http://img.jgi.doe.gov.
The Comprehensive Microbial Resource
Peterson, Jeremy D.; Umayam, Lowell A.; Dickinson, Tanja; Hickey, Erin K.; White, Owen
2001-01-01
One challenge presented by large-scale genome sequencing efforts is effective display of uniform information to the scientific community. The Comprehensive Microbial Resource (CMR) contains robust annotation of all complete microbial genomes and allows for a wide variety of data retrievals. The bacterial information has been placed on the Web at http://www.tigr.org/CMR for retrieval using standard web browsing technology. Retrievals can be based on protein properties such as molecular weight or hydrophobicity, GC-content, functional role assignments and taxonomy. The CMR also has special web-based tools to allow data mining using pre-run homology searches, whole genome dot-plots, batch downloading and traversal across genomes using a variety of datatypes. PMID:11125067
Kim, Byung Kwon; Lee, Seong Hyuk; Kim, Seon-Young; Jeong, Haeyoung; Kwon, Soon-Kyeong; Lee, Choong Hoon; Song, Ju Yeon; Yu, Dong Su
2012-01-01
Thermococcus zilligii, a thermophilic anaerobe in freshwater, is useful for physiological research and biotechnological applications. Here we report the high-quality draft genome sequence of T. zilligii AN1T. The genome contains a number of genes for an immune system and adaptation to a microbial biomass-rich environment as well as hydrogenase genes. PMID:22740682
Tetu, Sasha G; Breakwell, Katy; Elbourne, Liam D H; Holmes, Andrew J; Gillings, Michael R; Paulsen, Ian T
2013-06-01
Beneath Australia's large, dry Nullarbor Plain lies an extensive underwater cave system, where dense microbial communities known as 'slime curtains' are found. These communities exist in isolation from photosynthetically derived carbon and are presumed to be chemoautotrophic. Earlier work found high levels of nitrite and nitrate in the cave waters and a high relative abundance of Nitrospirae in bacterial 16S rRNA clone libraries. This suggested that these communities may be supported by nitrite oxidation, however, details of the inorganic nitrogen cycling in these communities remained unclear. Here we report analysis of 16S rRNA amplicon and metagenomic sequence data from the Weebubbie cave slime curtain community. The microbial community is comprised of a diverse assortment of bacterial and archaeal genera, including an abundant population of Thaumarchaeota. Sufficient thaumarchaeotal sequence was recovered to enable a partial genome sequence to be assembled, which showed considerable synteny with the corresponding regions in the genome of the autotrophic ammonia oxidiser Nitrosopumilus maritimus SCM1. This partial genome sequence, contained regions with high sequence identity to the ammonia mono-oxygenase operon and carbon fixing 3-hydroxypropionate/4-hydroxybutyrate cycle genes of N. maritimus SCM1. Additionally, the community, as a whole, included genes encoding key enzymes for inorganic nitrogen transformations, including nitrification and denitrification. We propose that the Weebubbie slime curtain community represents a distinctive microbial ecosystem, in which primary productivity is due to the combined activity of archaeal ammonia-oxidisers and bacterial nitrite oxidisers.
Moleculo Long-Read Sequencing Facilitates Assembly and Genomic Binning from Complex Soil Metagenomes
DOE Office of Scientific and Technical Information (OSTI.GOV)
White, Richard Allen; Bottos, Eric M.; Roy Chowdhury, Taniya
ABSTRACT Soil metagenomics has been touted as the “grand challenge” for metagenomics, as the high microbial diversity and spatial heterogeneity of soils make them unamenable to current assembly platforms. Here, we aimed to improve soil metagenomic sequence assembly by applying the Moleculo synthetic long-read sequencing technology. In total, we obtained 267 Gbp of raw sequence data from a native prairie soil; these data included 109.7 Gbp of short-read data (~100 bp) from the Joint Genome Institute (JGI), an additional 87.7 Gbp of rapid-mode read data (~250 bp), plus 69.6 Gbp (>1.5 kbp) from Moleculo sequencing. The Moleculo data alone yielded over 5,600more » reads of >10 kbp in length, and over 95% of the unassembled reads mapped to contigs of >1.5 kbp. Hybrid assembly of all data resulted in more than 10,000 contigs over 10 kbp in length. We mapped three replicate metatranscriptomes derived from the same parent soil to the Moleculo subassembly and found that 95% of the predicted genes, based on their assignments to Enzyme Commission (EC) numbers, were expressed. The Moleculo subassembly also enabled binning of >100 microbial genome bins. We obtained via direct binning the first complete genome, that of “CandidatusPseudomonas sp. strain JKJ-1” from a native soil metagenome. By mapping metatranscriptome sequence reads back to the bins, we found that several bins corresponding to low-relative-abundanceAcidobacteriawere highly transcriptionally active, whereas bins corresponding to high-relative-abundanceVerrucomicrobiawere not. These results demonstrate that Moleculo sequencing provides a significant advance for resolving complex soil microbial communities. IMPORTANCESoil microorganisms carry out key processes for life on our planet, including cycling of carbon and other nutrients and supporting growth of plants. However, there is poor molecular-level understanding of their functional roles in ecosystem stability and responses to environmental perturbations. This knowledge gap is largely due to the difficulty in culturing the majority of soil microbes. Thus, use of culture-independent approaches, such as metagenomics, promises the direct assessment of the functional potential of soil microbiomes. Soil is, however, a challenge for metagenomic assembly due to its high microbial diversity and variable evenness, resulting in low coverage and uneven sampling of microbial genomes. Despite increasingly large soil metagenome data volumes (>200 Gbp), the majority of the data do not assemble. Here, we used the cutting-edge approach of synthetic long-read sequencing technology (Moleculo) to assemble soil metagenome sequence data into long contigs and used the assemblies for binning of genomes. Author Video: Anauthor video summaryof this article is available.« less
Moleculo Long-Read Sequencing Facilitates Assembly and Genomic Binning from Complex Soil Metagenomes
White, Richard Allen; Bottos, Eric M.; Roy Chowdhury, Taniya; Zucker, Jeremy D.; Brislawn, Colin J.; Nicora, Carrie D.; Fansler, Sarah J.; Glaesemann, Kurt R.; Glass, Kevin
2016-01-01
ABSTRACT Soil metagenomics has been touted as the “grand challenge” for metagenomics, as the high microbial diversity and spatial heterogeneity of soils make them unamenable to current assembly platforms. Here, we aimed to improve soil metagenomic sequence assembly by applying the Moleculo synthetic long-read sequencing technology. In total, we obtained 267 Gbp of raw sequence data from a native prairie soil; these data included 109.7 Gbp of short-read data (~100 bp) from the Joint Genome Institute (JGI), an additional 87.7 Gbp of rapid-mode read data (~250 bp), plus 69.6 Gbp (>1.5 kbp) from Moleculo sequencing. The Moleculo data alone yielded over 5,600 reads of >10 kbp in length, and over 95% of the unassembled reads mapped to contigs of >1.5 kbp. Hybrid assembly of all data resulted in more than 10,000 contigs over 10 kbp in length. We mapped three replicate metatranscriptomes derived from the same parent soil to the Moleculo subassembly and found that 95% of the predicted genes, based on their assignments to Enzyme Commission (EC) numbers, were expressed. The Moleculo subassembly also enabled binning of >100 microbial genome bins. We obtained via direct binning the first complete genome, that of “Candidatus Pseudomonas sp. strain JKJ-1” from a native soil metagenome. By mapping metatranscriptome sequence reads back to the bins, we found that several bins corresponding to low-relative-abundance Acidobacteria were highly transcriptionally active, whereas bins corresponding to high-relative-abundance Verrucomicrobia were not. These results demonstrate that Moleculo sequencing provides a significant advance for resolving complex soil microbial communities. IMPORTANCE Soil microorganisms carry out key processes for life on our planet, including cycling of carbon and other nutrients and supporting growth of plants. However, there is poor molecular-level understanding of their functional roles in ecosystem stability and responses to environmental perturbations. This knowledge gap is largely due to the difficulty in culturing the majority of soil microbes. Thus, use of culture-independent approaches, such as metagenomics, promises the direct assessment of the functional potential of soil microbiomes. Soil is, however, a challenge for metagenomic assembly due to its high microbial diversity and variable evenness, resulting in low coverage and uneven sampling of microbial genomes. Despite increasingly large soil metagenome data volumes (>200 Gbp), the majority of the data do not assemble. Here, we used the cutting-edge approach of synthetic long-read sequencing technology (Moleculo) to assemble soil metagenome sequence data into long contigs and used the assemblies for binning of genomes. Author Video: An author video summary of this article is available. PMID:27822530
Single-cell genomics-based analysis of virus–host interactions in marine surface bacterioplankton
Labonté, Jessica M.; Swan, Brandon K.; Poulos, Bonnie; ...
2015-04-07
Viral infections dynamically alter the composition and metabolic potential of marine microbial communities and the evolutionary trajectories of host populations with resulting feedback on biogeochemical cycles. It is quite possible that all microbial populations in the ocean are impacted by viral infections. Our knowledge of virus–host relationships, however, has been limited to a minute fraction of cultivated host groups. Here, we utilized single-cell sequencing to obtain genomic blueprints of viruses inside or attached to individual bacterial and archaeal cells captured in their native environment, circumventing the need for host and virus cultivation. Furthermore, a combination of comparative genomics, metagenomic fragmentmore » recruitment, sequence anomalies and irregularities in sequence coverage depth and genome recovery were utilized to detect viruses and to decipher modes of virus–host interactions. Members of all three tailed phage families were identified in 20 out of 58 phylogenetically and geographically diverse single amplified genomes (SAGs) of marine bacteria and archaea. At least four phage–host interactions had the characteristics of late lytic infections, all of which were found in metabolically active cells. One virus had genetic potential for lysogeny. Our findings include first known viruses of Thaumarchaeota, Marinimicrobia, Verrucomicrobia and Gammaproteobacteria clusters SAR86 and SAR92. Viruses were also found in SAGs of Alphaproteobacteria and Bacteroidetes. A high fragment recruitment of viral metagenomic reads confirmed that most of the SAG-associated viruses are abundant in the ocean. This study demonstrates that single-cell genomics, in conjunction with sequence-based computational tools, enable in situ, cultivation-independent insights into host–virus interactions in complex microbial communities.« less
Reconstructing each cell's genome within complex microbial communities-dream or reality?
Clingenpeel, Scott; Clum, Alicia; Schwientek, Patrick; Rinke, Christian; Woyke, Tanja
2014-01-01
As the vast majority of microorganisms have yet to be cultivated in a laboratory setting, access to their genetic makeup has largely been limited to cultivation-independent methods. These methods, namely metagenomics and more recently single-cell genomics, have become cornerstones for microbial ecology and environmental microbiology. One ultimate goal is the recovery of genome sequences from each cell within an environment to move toward a better understanding of community metabolic potential and to provide substrate for experimental work. As single-cell sequencing has the ability to decipher all sequence information contained in an individual cell, this method holds great promise in tackling such challenge. Methodological limitations and inherent biases however do exist, which will be discussed here based on environmental and benchmark data, to assess how far we are from reaching this goal.
Microbial species delineation using whole genome sequences.
Varghese, Neha J; Mukherjee, Supratim; Ivanova, Natalia; Konstantinidis, Konstantinos T; Mavrommatis, Kostas; Kyrpides, Nikos C; Pati, Amrita
2015-08-18
Increased sequencing of microbial genomes has revealed that prevailing prokaryotic species assignments can be inconsistent with whole genome information for a significant number of species. The long-standing need for a systematic and scalable species assignment technique can be met by the genome-wide Average Nucleotide Identity (gANI) metric, which is widely acknowledged as a robust measure of genomic relatedness. In this work, we demonstrate that the combination of gANI and the alignment fraction (AF) between two genomes accurately reflects their genomic relatedness. We introduce an efficient implementation of AF,gANI and discuss its successful application to 86.5M genome pairs between 13,151 prokaryotic genomes assigned to 3032 species. Subsequently, by comparing the genome clusters obtained from complete linkage clustering of these pairs to existing taxonomy, we observed that nearly 18% of all prokaryotic species suffer from anomalies in species definition. Our results can be used to explore central questions such as whether microorganisms form a continuum of genetic diversity or distinct species represented by distinct genetic signatures. We propose that this precise and objective AF,gANI-based species definition: the MiSI (Microbial Species Identifier) method, be used to address previous inconsistencies in species classification and as the primary guide for new taxonomic species assignment, supplemented by the traditional polyphasic approach, as required. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.
Theory of prokaryotic genome evolution.
Sela, Itamar; Wolf, Yuri I; Koonin, Eugene V
2016-10-11
Bacteria and archaea typically possess small genomes that are tightly packed with protein-coding genes. The compactness of prokaryotic genomes is commonly perceived as evidence of adaptive genome streamlining caused by strong purifying selection in large microbial populations. In such populations, even the small cost incurred by nonfunctional DNA because of extra energy and time expenditure is thought to be sufficient for this extra genetic material to be eliminated by selection. However, contrary to the predictions of this model, there exists a consistent, positive correlation between the strength of selection at the protein sequence level, measured as the ratio of nonsynonymous to synonymous substitution rates, and microbial genome size. Here, by fitting the genome size distributions in multiple groups of prokaryotes to predictions of mathematical models of population evolution, we show that only models in which acquisition of additional genes is, on average, slightly beneficial yield a good fit to genomic data. These results suggest that the number of genes in prokaryotic genomes reflects the equilibrium between the benefit of additional genes that diminishes as the genome grows and deletion bias (i.e., the rate of deletion of genetic material being slightly greater than the rate of acquisition). Thus, new genes acquired by microbial genomes, on average, appear to be adaptive. The tight spacing of protein-coding genes likely results from a combination of the deletion bias and purifying selection that efficiently eliminates nonfunctional, noncoding sequences.
CloVR-Comparative: automated, cloud-enabled comparative microbial genome sequence analysis pipeline.
Agrawal, Sonia; Arze, Cesar; Adkins, Ricky S; Crabtree, Jonathan; Riley, David; Vangala, Mahesh; Galens, Kevin; Fraser, Claire M; Tettelin, Hervé; White, Owen; Angiuoli, Samuel V; Mahurkar, Anup; Fricke, W Florian
2017-04-27
The benefit of increasing genomic sequence data to the scientific community depends on easy-to-use, scalable bioinformatics support. CloVR-Comparative combines commonly used bioinformatics tools into an intuitive, automated, and cloud-enabled analysis pipeline for comparative microbial genomics. CloVR-Comparative runs on annotated complete or draft genome sequences that are uploaded by the user or selected via a taxonomic tree-based user interface and downloaded from NCBI. CloVR-Comparative runs reference-free multiple whole-genome alignments to determine unique, shared and core coding sequences (CDSs) and single nucleotide polymorphisms (SNPs). Output includes short summary reports and detailed text-based results files, graphical visualizations (phylogenetic trees, circular figures), and a database file linked to the Sybil comparative genome browser. Data up- and download, pipeline configuration and monitoring, and access to Sybil are managed through CloVR-Comparative web interface. CloVR-Comparative and Sybil are distributed as part of the CloVR virtual appliance, which runs on local computers or the Amazon EC2 cloud. Representative datasets (e.g. 40 draft and complete Escherichia coli genomes) are processed in <36 h on a local desktop or at a cost of <$20 on EC2. CloVR-Comparative allows anybody with Internet access to run comparative genomics projects, while eliminating the need for on-site computational resources and expertise.
Complete Genome Sequences of Two Bacillus pumilus Strains from Cuatrociénegas, Coahuila, Mexico
Alcaraz, Luis D.; Aguilar-Salinas, Bernardo; Islas, Africa
2018-01-01
ABSTRACT We assembled the complete genome sequences of Bacillus pumilus strains 145 and 150a from Cuatrociénegas, Mexico. We detected genes codifying for proteins potentially involved in antagonism (bacteriocins) and defense mechanisms (abortive infection bacteriophage proteins and 4-azaleucine resistance). Both strains harbored prophage sequences. Our results provide insights into understanding the establishment of microbial interactions. PMID:29700165
Update on RefSeq microbial genomes resources.
Tatusova, Tatiana; Ciufo, Stacy; Federhen, Scott; Fedorov, Boris; McVeigh, Richard; O'Neill, Kathleen; Tolstoy, Igor; Zaslavsky, Leonid
2015-01-01
NCBI RefSeq genome collection http://www.ncbi.nlm.nih.gov/genome represents all three major domains of life: Eukarya, Bacteria and Archaea as well as Viruses. Prokaryotic genome sequences are the most rapidly growing part of the collection. During the year of 2014 more than 10,000 microbial genome assemblies have been publicly released bringing the total number of prokaryotic genomes close to 30,000. We continue to improve the quality and usability of the microbial genome resources by providing easy access to the data and the results of the pre-computed analysis, and improving analysis and visualization tools. A number of improvements have been incorporated into the Prokaryotic Genome Annotation Pipeline. Several new features have been added to RefSeq prokaryotic genomes data processing pipeline including the calculation of genome groups (clades) and the optimization of protein clusters generation using pan-genome approach. Published by Oxford University Press on behalf of Nucleic Acids Research 2014. This work is written by US Government employees and is in the public domain in the US.
Single-cell genomic sequencing using Multiple Displacement Amplification.
Lasken, Roger S
2007-10-01
Single microbial cells can now be sequenced using DNA amplified by the Multiple Displacement Amplification (MDA) reaction. The few femtograms of DNA in a bacterium are amplified into micrograms of high molecular weight DNA suitable for DNA library construction and Sanger sequencing. The MDA-generated DNA also performs well when used directly as template for pyrosequencing by the 454 Life Sciences method. While MDA from single cells loses some of the genomic sequence, this approach will greatly accelerate the pace of sequencing from uncultured microbes. The genetically linked sequences from single cells are also a powerful tool to be used in guiding genomic assembly of shotgun sequences of multiple organisms from environmental DNA extracts (metagenomic sequences).
Recruiting Human Microbiome Shotgun Data to Site-Specific Reference Genomes
Xie, Gary; Lo, Chien-Chi; Scholz, Matthew; Chain, Patrick S. G.
2014-01-01
The human body consists of innumerable multifaceted environments that predispose colonization by a number of distinct microbial communities, which play fundamental roles in human health and disease. In addition to community surveys and shotgun metagenomes that seek to explore the composition and diversity of these microbiomes, there are significant efforts to sequence reference microbial genomes from many body sites of healthy adults. To illustrate the utility of reference genomes when studying more complex metagenomes, we present a reference-based analysis of sequence reads generated from 55 shotgun metagenomes, selected from 5 major body sites, including 16 sub-sites. Interestingly, between 13% and 92% (62.3% average) of these shotgun reads were aligned to a then-complete list of 2780 reference genomes, including 1583 references for the human microbiome. However, no reference genome was universally found in all body sites. For any given metagenome, the body site-specific reference genomes, derived from the same body site as the sample, accounted for an average of 58.8% of the mapped reads. While different body sites did differ in abundant genera, proximal or symmetrical body sites were found to be most similar to one another. The extent of variation observed, both between individuals sampled within the same microenvironment, or at the same site within the same individual over time, calls into question comparative studies across individuals even if sampled at the same body site. This study illustrates the high utility of reference genomes and the need for further site-specific reference microbial genome sequencing, even within the already well-sampled human microbiome. PMID:24454771
Complete genome sequence of a carbon monoxide-utilizing acetogen, Eubacterium limosum KIST612.
Roh, Hanseong; Ko, Hyeok-Jin; Kim, Daehee; Choi, Dong Geon; Park, Shinyoung; Kim, Sujin; Chang, In Seop; Choi, In-Geol
2011-01-01
Eubacterium limosum KIST612 is an anaerobic acetogenic bacterium that uses CO as the sole carbon/energy source and produces acetate, butyrate, and ethanol. To evaluate its potential as a syngas microbial catalyst, we have sequenced the complete 4.3-Mb genome of E. limosum KIST612.
Two fundamentally different classes of microbial genes.
Wolf, Yuri I; Makarova, Kira S; Lobkovsky, Alexander E; Koonin, Eugene V
2016-11-07
The evolution of bacterial and archaeal genomes is highly dynamic and involves extensive horizontal gene transfer and gene loss 1-4 . Furthermore, many microbial species appear to have open pangenomes, where each newly sequenced genome contains more than 10% ORFans, that is, genes without detectable homologues in other species 5,6 . Here, we report a quantitative analysis of microbial genome evolution by fitting the parameters of a simple, steady-state evolutionary model to the comparative genomic data on the gene content and gene order similarity between archaeal genomes. The results reveal two sharply distinct classes of microbial genes, one of which is characterized by effectively instantaneous gene replacement, and the other consists of genes with finite, distributed replacement rates. These findings imply a conservative estimate of the size of the prokaryotic genomic universe, which appears to consist of at least a billion distinct genes. Furthermore, the same distribution of constraints is shown to govern the evolution of gene complement and gene order, without the need to invoke long-range conservation or the selfish operon concept 7 .
Genome-wide screening and identification of antigens for rickettsial vaccine development
USDA-ARS?s Scientific Manuscript database
The capacity to identify immunogens for vaccine development by genome-wide screening has been markedly enhanced by the availability of complete microbial genome sequences coupled to rapid proteomic and bioinformatic analysis. Critical to this genome-wide screening is in vivo testing in the context o...
Barrett, Nolan H.; McCarthy, Peter J.
2017-01-01
ABSTRACT The proteobacterium Alteromonas sp. strain V450 was isolated from the Atlantic deep-sea sponge Leiodermatium sp. Here, we report the draft genome sequence of this strain, with a genome size of approx. 4.39 Mb and a G+C content of 44.01%. The results will aid deep-sea microbial ecology, evolution, and sponge-microbe association studies. PMID:28153886
DOE Office of Scientific and Technical Information (OSTI.GOV)
Mobberley, J. M.; Romine, M. F.; Cole, J. K.
ABSTRACT The complete genome sequence ofCyanobacteriumsp. strain HL-69 consists of 3,155,247 bp and contains 2,897 predicted genes comprising a chromosome and two plasmids. The genome is consistent with a halophilic nondiazotrophic phototrophic lifestyle, and this organism is able to synthesize most B vitamins and produces several secondary metabolites.
Lateral Gene Transfer in a Heavy Metal-Contaminated-Groundwater Microbial Community
Hemme, Christopher L.; Green, Stefan J.; Rishishwar, Lavanya; Prakash, Om; Pettenato, Angelica; Chakraborty, Romy; Deutschbauer, Adam M.; Van Nostrand, Joy D.; Wu, Liyou; He, Zhili; Jordan, I. King; Arkin, Adam P.; Kostka, Joel E.
2016-01-01
ABSTRACT Unraveling the drivers controlling the response and adaptation of biological communities to environmental change, especially anthropogenic activities, is a central but poorly understood issue in ecology and evolution. Comparative genomics studies suggest that lateral gene transfer (LGT) is a major force driving microbial genome evolution, but its role in the evolution of microbial communities remains elusive. To delineate the importance of LGT in mediating the response of a groundwater microbial community to heavy metal contamination, representative Rhodanobacter reference genomes were sequenced and compared to shotgun metagenome sequences. 16S rRNA gene-based amplicon sequence analysis indicated that Rhodanobacter populations were highly abundant in contaminated wells with low pHs and high levels of nitrate and heavy metals but remained rare in the uncontaminated wells. Sequence comparisons revealed that multiple geochemically important genes, including genes encoding Fe2+/Pb2+ permeases, most denitrification enzymes, and cytochrome c553, were native to Rhodanobacter and not subjected to LGT. In contrast, the Rhodanobacter pangenome contained a recombinational hot spot in which numerous metal resistance genes were subjected to LGT and/or duplication. In particular, Co2+/Zn2+/Cd2+ efflux and mercuric resistance operon genes appeared to be highly mobile within Rhodanobacter populations. Evidence of multiple duplications of a mercuric resistance operon common to most Rhodanobacter strains was also observed. Collectively, our analyses indicated the importance of LGT during the evolution of groundwater microbial communities in response to heavy metal contamination, and a conceptual model was developed to display such adaptive evolutionary processes for explaining the extreme dominance of Rhodanobacter populations in the contaminated groundwater microbiome. PMID:27048805
Hidden weapons of microbial destruction in plant genomes
Manners, John M
2007-01-01
Recent bioinformatic analyses of sequenced plant genomes reveal a previously unrecognized abundance of genes encoding antimicrobial cysteine-rich peptides, representing a formidable and dynamic defense arsenal against plant pests and pathogens. PMID:17903311
Bardet, Lucie; Cimmino, Teresa; Buffet, Clémence; Michelle, Caroline; Rathored, Jaishriram; Tandina, Fatalmoudou; Lagier, Jean-Christophe; Khelaifia, Saber; Abrahão, Jônatas; Raoult, Didier; Rolain, Jean-Marc
2018-02-01
Culturomics is a new postgenomics field that explores the microbial diversity of the human gut coupled with taxono-genomic strategy. Culturomics, and the microbiome science more generally, are anticipated to transform global health diagnostics and inform the ways in which gut microbial diversity contributes to human health and disease, and by extension, to personalized medicine. Using culturomics, we report in this study the description of strain CB1 T ( = CSUR P1334 = DSM 29075), a new species isolated from a stool specimen from a 37-year-old Brazilian woman. This description includes phenotypic characteristics and complete genome sequence and annotation. Strain CB1 T is a gram-negative aerobic and motile bacillus, exhibits neither catalase nor oxidase activities, and presents a 98.3% 16S rRNA sequence similarity with Pseudomonas putida. The 4,723,534 bp long genome contains 4239 protein-coding genes and 74 RNA genes, including 15 rRNA genes (5 16S rRNA, 4 23S rRNA, and 6 5S rRNA) and 59 tRNA genes. Strain CB1 T was named Pseudomonas massiliensis sp. nov. and classified into the family Pseudomonadaceae. This study demonstrates the usefulness of microbial culturomics in exploration of human microbiota in diverse geographies and offers new promise for incorporating new omics technologies for innovation in diagnostic medicine and global health.
Complete Genome Sequences of Two Bacillus pumilus Strains from Cuatrociénegas, Coahuila, Mexico.
Zarza, Eugenia; Alcaraz, Luis D; Aguilar-Salinas, Bernardo; Islas, Africa; Olmedo-Álvarez, Gabriela
2018-04-26
We assembled the complete genome sequences of Bacillus pumilus strains 145 and 150a from Cuatrociénegas, Mexico. We detected genes codifying for proteins potentially involved in antagonism (bacteriocins) and defense mechanisms (abortive infection bacteriophage proteins and 4-azaleucine resistance). Both strains harbored prophage sequences. Our results provide insights into understanding the establishment of microbial interactions. Copyright © 2018 Zarza et al.
GenePRIMP: Improving Microbial Gene Prediction Quality
Pati, Amrita
2018-01-24
Amrita Pati of the DOE Joint Genome Institute's Genome Biology group talks about a computational pipeline that evaluates the accuracy of gene models in genomes and metagenomes at different stages of finishing at the "Sequencing, Finishing, Analysis in the Future" meeting in Santa Fe, NM.
Assembly of 913 microbial genomes from metagenomic sequencing of the cow rumen.
Stewart, Robert D; Auffret, Marc D; Warr, Amanda; Wiser, Andrew H; Press, Maximilian O; Langford, Kyle W; Liachko, Ivan; Snelling, Timothy J; Dewhurst, Richard J; Walker, Alan W; Roehe, Rainer; Watson, Mick
2018-02-28
The cow rumen is adapted for the breakdown of plant material into energy and nutrients, a task largely performed by enzymes encoded by the rumen microbiome. Here we present 913 draft bacterial and archaeal genomes assembled from over 800 Gb of rumen metagenomic sequence data derived from 43 Scottish cattle, using both metagenomic binning and Hi-C-based proximity-guided assembly. Most of these genomes represent previously unsequenced strains and species. The draft genomes contain over 69,000 proteins predicted to be involved in carbohydrate metabolism, over 90% of which do not have a good match in public databases. Inclusion of the 913 genomes presented here improves metagenomic read classification by sevenfold against our own data, and by fivefold against other publicly available rumen datasets. Thus, our dataset substantially improves the coverage of rumen microbial genomes in the public databases and represents a valuable resource for biomass-degrading enzyme discovery and studies of the rumen microbiome.
Barrick, Jeffrey E; Colburn, Geoffrey; Deatherage, Daniel E; Traverse, Charles C; Strand, Matthew D; Borges, Jordan J; Knoester, David B; Reba, Aaron; Meyer, Austin G
2014-11-29
Mutations that alter chromosomal structure play critical roles in evolution and disease, including in the origin of new lifestyles and pathogenic traits in microbes. Large-scale rearrangements in genomes are often mediated by recombination events involving new or existing copies of mobile genetic elements, recently duplicated genes, or other repetitive sequences. Most current software programs for predicting structural variation from short-read DNA resequencing data are intended primarily for use on human genomes. They typically disregard information in reads mapping to repeat sequences, and significant post-processing and manual examination of their output is often required to rule out false-positive predictions and precisely describe mutational events. We have implemented an algorithm for identifying structural variation from DNA resequencing data as part of the breseq computational pipeline for predicting mutations in haploid microbial genomes. Our method evaluates the support for new sequence junctions present in a clonal sample from split-read alignments to a reference genome, including matches to repeat sequences. Then, it uses a statistical model of read coverage evenness to accept or reject these predictions. Finally, breseq combines predictions of new junctions and deleted chromosomal regions to output biologically relevant descriptions of mutations and their effects on genes. We demonstrate the performance of breseq on simulated Escherichia coli genomes with deletions generating unique breakpoint sequences, new insertions of mobile genetic elements, and deletions mediated by mobile elements. Then, we reanalyze data from an E. coli K-12 mutation accumulation evolution experiment in which structural variation was not previously identified. Transposon insertions and large-scale chromosomal changes detected by breseq account for ~25% of spontaneous mutations in this strain. In all cases, we find that breseq is able to reliably predict structural variation with modest read-depth coverage of the reference genome (>40-fold). Using breseq to predict structural variation should be useful for studies of microbial epidemiology, experimental evolution, synthetic biology, and genetics when a reference genome for a closely related strain is available. In these cases, breseq can discover mutations that may be responsible for important or unintended changes in genomes that might otherwise go undetected.
Genomic Insights into Geothermal Spring Community Members using a 16S Agnostic Single-Cell Approach
NASA Astrophysics Data System (ADS)
Bowers, R. M.
2016-12-01
INSTUTIONS (ALL): DOE Joint Genome Institute, Walnut Creek, CA USA. Bigelow Laboratory for Ocean Sciences, East Boothbay, ME USA. Department of Biological Sciences, University of Calgary, Calgary, Alberta, Canada. ABSTRACT BODY: With recent advances in DNA sequencing, rapid and affordable screening of single-cell genomes has become a reality. Single-cell sequencing is a multi-step process that takes advantage of any number of single-cell sorting techniques, whole genome amplification (WGA), and 16S rRNA gene based PCR screening to identify the microbes of interest prior to shotgun sequencing. However, the 16S PCR based screening step is costly and may lead to unanticipated losses of microbial diversity, as cells that do not produce a clean 16S amplicon are typically omitted from downstream shotgun sequencing. While many of the sorted cells that fail the 16S PCR step likely originate from poor quality amplified DNA, some of the cells with good WGA kinetics may instead represent bacteria or archaea with 16S genes that fail to amplify due to primer mis-matches or the presence of intervening sequences. Using cell material from Dewar Creek, a hot spring in British Columbia, we sequenced all sorted cells with good WGA kinetics irrespective of their 16S amplification success. We show that this high-throughput approach to single-cell sequencing (i) can reduce the overall cost of single-cell genome production, and (ii). may lead to the discovery of previously unknown branches on the microbial tree of life.
Quality scores for 32,000 genomes
Land, Miriam L.; Hyatt, Doug; Jun, Se-Ran; ...
2014-12-08
More than 80% of the microbial genomes in GenBank are of ‘draft’ quality (12,553 draft vs. 2,679 finished, as of October, 2013). In this study, we have examined all the microbial DNA sequences available for complete, draft, and Sequence Read Archive genomes in GenBank as well as three other major public databases, and assigned quality scores for more than 30,000 prokaryotic genome sequences. Scores were assigned using four categories: the completeness of the assembly, the presence of full-length rRNA genes, tRNA composition and the presence of a set of 102 conserved genes in prokaryotes. Most (~88%) of the genomes hadmore » quality scores of 0.8 or better and can be safely used for standard comparative genomics analysis. We compared genomes across factors that may influence the score. We found that although sequencing depth coverage of over 100x did not ensure a better score, sequencing read length was a better indicator of sequencing quality. With few exceptions, most of the 30,000 genomes have nearly all the 102 essential genes. The score can be used to set thresholds for screening data when analyzing “all published genomes” and reference data is either not available or not applicable. The scores highlighted organisms for which commonly used tools do not perform well. This information can be used to improve tools and to serve a broad group of users as more diverse organisms are sequenced. Finally and unexpectedly, the comparison of predicted tRNAs across 15,000 high quality genomes showed that anticodons beginning with an ‘A’ (codons ending with a ‘U’) are almost non-existent, with the exception of one arginine codon (CGU); this has been noted previously in the literature for a few genomes, but not with the depth found here.« less
Quality scores for 32,000 genomes
DOE Office of Scientific and Technical Information (OSTI.GOV)
Land, Miriam L.; Hyatt, Doug; Jun, Se-Ran
More than 80% of the microbial genomes in GenBank are of ‘draft’ quality (12,553 draft vs. 2,679 finished, as of October, 2013). In this study, we have examined all the microbial DNA sequences available for complete, draft, and Sequence Read Archive genomes in GenBank as well as three other major public databases, and assigned quality scores for more than 30,000 prokaryotic genome sequences. Scores were assigned using four categories: the completeness of the assembly, the presence of full-length rRNA genes, tRNA composition and the presence of a set of 102 conserved genes in prokaryotes. Most (~88%) of the genomes hadmore » quality scores of 0.8 or better and can be safely used for standard comparative genomics analysis. We compared genomes across factors that may influence the score. We found that although sequencing depth coverage of over 100x did not ensure a better score, sequencing read length was a better indicator of sequencing quality. With few exceptions, most of the 30,000 genomes have nearly all the 102 essential genes. The score can be used to set thresholds for screening data when analyzing “all published genomes” and reference data is either not available or not applicable. The scores highlighted organisms for which commonly used tools do not perform well. This information can be used to improve tools and to serve a broad group of users as more diverse organisms are sequenced. Finally and unexpectedly, the comparison of predicted tRNAs across 15,000 high quality genomes showed that anticodons beginning with an ‘A’ (codons ending with a ‘U’) are almost non-existent, with the exception of one arginine codon (CGU); this has been noted previously in the literature for a few genomes, but not with the depth found here.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Labonté, Jessica M.; Swan, Brandon K.; Poulos, Bonnie
Viral infections dynamically alter the composition and metabolic potential of marine microbial communities and the evolutionary trajectories of host populations with resulting feedback on biogeochemical cycles. It is quite possible that all microbial populations in the ocean are impacted by viral infections. Our knowledge of virus–host relationships, however, has been limited to a minute fraction of cultivated host groups. Here, we utilized single-cell sequencing to obtain genomic blueprints of viruses inside or attached to individual bacterial and archaeal cells captured in their native environment, circumventing the need for host and virus cultivation. Furthermore, a combination of comparative genomics, metagenomic fragmentmore » recruitment, sequence anomalies and irregularities in sequence coverage depth and genome recovery were utilized to detect viruses and to decipher modes of virus–host interactions. Members of all three tailed phage families were identified in 20 out of 58 phylogenetically and geographically diverse single amplified genomes (SAGs) of marine bacteria and archaea. At least four phage–host interactions had the characteristics of late lytic infections, all of which were found in metabolically active cells. One virus had genetic potential for lysogeny. Our findings include first known viruses of Thaumarchaeota, Marinimicrobia, Verrucomicrobia and Gammaproteobacteria clusters SAR86 and SAR92. Viruses were also found in SAGs of Alphaproteobacteria and Bacteroidetes. A high fragment recruitment of viral metagenomic reads confirmed that most of the SAG-associated viruses are abundant in the ocean. This study demonstrates that single-cell genomics, in conjunction with sequence-based computational tools, enable in situ, cultivation-independent insights into host–virus interactions in complex microbial communities.« less
Wang, Guojun; Barrett, Nolan H; McCarthy, Peter J
2017-02-02
The proteobacterium Alteromonas sp. strain V450 was isolated from the Atlantic deep-sea sponge Leiodermatium sp. Here, we report the draft genome sequence of this strain, with a genome size of approx. 4.39 Mb and a G+C content of 44.01%. The results will aid deep-sea microbial ecology, evolution, and sponge-microbe association studies. Copyright © 2017 Wang et al.
The CRISPR Spacer Space Is Dominated by Sequences from Species-Specific Mobilomes
Shmakov, Sergey A.; Sitnik, Vassilii; Makarova, Kira S.; Wolf, Yuri I.; Severinov, Konstantin V.
2017-01-01
ABSTRACT Clustered regularly interspaced short palindromic repeats and CRISPR-associated protein (CRISPR-Cas) systems store the memory of past encounters with foreign DNA in unique spacers that are inserted between direct repeats in CRISPR arrays. For only a small fraction of the spacers, homologous sequences, called protospacers, are detectable in viral, plasmid, and microbial genomes. The rest of the spacers remain the CRISPR “dark matter.” We performed a comprehensive analysis of the spacers from all CRISPR-cas loci identified in bacterial and archaeal genomes, and we found that, depending on the CRISPR-Cas subtype and the prokaryotic phylum, protospacers were detectable for 1% to about 19% of the spacers (~7% global average). Among the detected protospacers, the majority, typically 80 to 90%, originated from viral genomes, including proviruses, and among the rest, the most common source was genes that are integrated into microbial chromosomes but are involved in plasmid conjugation or replication. Thus, almost all spacers with identifiable protospacers target mobile genetic elements (MGE). The GC content, as well as dinucleotide and tetranucleotide compositions, of microbial genomes, their spacer complements, and the cognate viral genomes showed a nearly perfect correlation and were almost identical. Given the near absence of self-targeting spacers, these findings are most compatible with the possibility that the spacers, including the dark matter, are derived almost completely from the species-specific microbial mobilomes. PMID:28928211
Reconstruction of a Bacterial Genome from DNA Cassettes
DOE Office of Scientific and Technical Information (OSTI.GOV)
Christopher Dupont; John Glass; Laura Sheahan
2011-12-31
This basic research program comprised two major areas: (1) acquisition and analysis of marine microbial metagenomic data and development of genomic analysis tools for broad, external community use; (2) development of a minimal bacterial genome. Our Marine Metagenomic Diversity effort generated and analyzed shotgun sequencing data from microbial communities sampled from over 250 sites around the world. About 40% of the 26 Gbp of sequence data has been made publicly available to date with a complete release anticipated in six months. Our results and those mining the deposited data have revealed a vast diversity of genes coding for critical metabolicmore » processes whose phylogenetic and geographic distributions will enable a deeper understanding of carbon and nutrient cycling, microbial ecology, and rapid rate evolutionary processes such as horizontal gene transfer by viruses and plasmids. A global assembly of the generated dataset resulted in a massive set (5Gbp) of genome fragments that provide context to the majority of the generated data that originated from uncultivated organisms. Our Synthetic Biology team has made significant progress towards the goal of synthesizing a minimal mycoplasma genome that will have all of the machinery for independent life. This project, once completed, will provide fundamentally new knowledge about requirements for microbial life and help to lay a basic research foundation for developing microbiological approaches to bioenergy.« less
De Groot, Anne S; Rappuoli, Rino
2004-02-01
Vaccine research entered a new era when the complete genome of a pathogenic bacterium was published in 1995. Since then, more than 97 bacterial pathogens have been sequenced and at least 110 additional projects are now in progress. Genome sequencing has also dramatically accelerated: high-throughput facilities can draft the sequence of an entire microbe (two to four megabases) in 1 to 2 days. Vaccine developers are using microarrays, immunoinformatics, proteomics and high-throughput immunology assays to reduce the truly unmanageable volume of information available in genome databases to a manageable size. Vaccines composed by novel antigens discovered from genome mining are already in clinical trials. Within 5 years we can expect to see a novel class of vaccines composed by genome-predicted, assembled and engineered T- and Bcell epitopes. This article addresses the convergence of three forces--microbial genome sequencing, computational immunology and new vaccine technologies--that are shifting genome mining for vaccines onto the forefront of immunology research.
Insights into the phylogeny and coding potential of microbial dark matter
NASA Astrophysics Data System (ADS)
Rinke, Christian; Schwientek, Patrick; Sczyrba, Alexander; Ivanova, Natalia N.; Anderson, Iain J.; Cheng, Jan-Fang; Darling, Aaron; Malfatti, Stephanie; Swan, Brandon K.; Gies, Esther A.; Dodsworth, Jeremy A.; Hedlund, Brian P.; Tsiamis, George; Sievert, Stefan M.; Liu, Wen-Tso; Eisen, Jonathan A.; Hallam, Steven J.; Kyrpides, Nikos C.; Stepanauskas, Ramunas; Rubin, Edward M.; Hugenholtz, Philip; Woyke, Tanja
2013-07-01
Genome sequencing enhances our understanding of the biological world by providing blueprints for the evolutionary and functional diversity that shapes the biosphere. However, microbial genomes that are currently available are of limited phylogenetic breadth, owing to our historical inability to cultivate most microorganisms in the laboratory. We apply single-cell genomics to target and sequence 201 uncultivated archaeal and bacterial cells from nine diverse habitats belonging to 29 major mostly uncharted branches of the tree of life, so-called `microbial dark matter'. With this additional genomic information, we are able to resolve many intra- and inter-phylum-level relationships and to propose two new superphyla. We uncover unexpected metabolic features that extend our understanding of biology and challenge established boundaries between the three domains of life. These include a novel amino acid use for the opal stop codon, an archaeal-type purine synthesis in Bacteria and complete sigma factors in Archaea similar to those in Bacteria. The single-cell genomes also served to phylogenetically anchor up to 20% of metagenomic reads in some habitats, facilitating organism-level interpretation of ecosystem function. This study greatly expands the genomic representation of the tree of life and provides a systematic step towards a better understanding of biological evolution on our planet.
Insights into the phylogeny and coding potential of microbial dark matter
DOE Office of Scientific and Technical Information (OSTI.GOV)
Rinke, Christian; Schwientek, Patrick; Sczyrba, Alexander
Genome sequencing enhances our understanding of the biological world by providing blueprints for the evolutionary and functional diversity that shapes the biosphere. However, microbial genomes that are currently available are of limited phylogenetic breadth, owing to our historical inability to cultivate most microorganisms in the laboratory. We apply single-cell genomics to target and sequence 201 uncultivated archaeal and bacterial cells fromnine diverse habitats belonging to 29 major mostly uncharted branches of the tree of life, so-called microbial dark matter. With this additional genomic information, we are able to resolve many intra- and inter-phylum-level relationships and to propose two new superphyla.more » We uncover unexpected metabolic features that extend our understanding of biology and challenge established boundaries between the three domains of life. These include a novel amino acid use for the opal stop codon, an archaeal-type purine synthesis in Bacteria and complete sigma factors in Archaea similar to those in Bacteria. The single-cell genomes also served to phylogenetically anchor up to 20percent of metagenomic reads in some habitats, facilitating organism-level interpretation of ecosystem function. This study greatly expands the genomic representation of the tree of life and provides a systematic step towards a better understanding of biological evolution on our planet.« less
Nielsen, H Bjørn; Almeida, Mathieu; Juncker, Agnieszka Sierakowska; Rasmussen, Simon; Li, Junhua; Sunagawa, Shinichi; Plichta, Damian R; Gautier, Laurent; Pedersen, Anders G; Le Chatelier, Emmanuelle; Pelletier, Eric; Bonde, Ida; Nielsen, Trine; Manichanh, Chaysavanh; Arumugam, Manimozhiyan; Batto, Jean-Michel; Quintanilha Dos Santos, Marcelo B; Blom, Nikolaj; Borruel, Natalia; Burgdorf, Kristoffer S; Boumezbeur, Fouad; Casellas, Francesc; Doré, Joël; Dworzynski, Piotr; Guarner, Francisco; Hansen, Torben; Hildebrand, Falk; Kaas, Rolf S; Kennedy, Sean; Kristiansen, Karsten; Kultima, Jens Roat; Léonard, Pierre; Levenez, Florence; Lund, Ole; Moumen, Bouziane; Le Paslier, Denis; Pons, Nicolas; Pedersen, Oluf; Prifti, Edi; Qin, Junjie; Raes, Jeroen; Sørensen, Søren; Tap, Julien; Tims, Sebastian; Ussery, David W; Yamada, Takuji; Renault, Pierre; Sicheritz-Ponten, Thomas; Bork, Peer; Wang, Jun; Brunak, Søren; Ehrlich, S Dusko
2014-08-01
Most current approaches for analyzing metagenomic data rely on comparisons to reference genomes, but the microbial diversity of many environments extends far beyond what is covered by reference databases. De novo segregation of complex metagenomic data into specific biological entities, such as particular bacterial strains or viruses, remains a largely unsolved problem. Here we present a method, based on binning co-abundant genes across a series of metagenomic samples, that enables comprehensive discovery of new microbial organisms, viruses and co-inherited genetic entities and aids assembly of microbial genomes without the need for reference sequences. We demonstrate the method on data from 396 human gut microbiome samples and identify 7,381 co-abundance gene groups (CAGs), including 741 metagenomic species (MGS). We use these to assemble 238 high-quality microbial genomes and identify affiliations between MGS and hundreds of viruses or genetic entities. Our method provides the means for comprehensive profiling of the diversity within complex metagenomic samples.
Expanded microbial genome coverage and improved protein family annotation in the COG database
Galperin, Michael Y.; Makarova, Kira S.; Wolf, Yuri I.; Koonin, Eugene V.
2015-01-01
Microbial genome sequencing projects produce numerous sequences of deduced proteins, only a small fraction of which have been or will ever be studied experimentally. This leaves sequence analysis as the only feasible way to annotate these proteins and assign to them tentative functions. The Clusters of Orthologous Groups of proteins (COGs) database (http://www.ncbi.nlm.nih.gov/COG/), first created in 1997, has been a popular tool for functional annotation. Its success was largely based on (i) its reliance on complete microbial genomes, which allowed reliable assignment of orthologs and paralogs for most genes; (ii) orthology-based approach, which used the function(s) of the characterized member(s) of the protein family (COG) to assign function(s) to the entire set of carefully identified orthologs and describe the range of potential functions when there were more than one; and (iii) careful manual curation of the annotation of the COGs, aimed at detailed prediction of the biological function(s) for each COG while avoiding annotation errors and overprediction. Here we present an update of the COGs, the first since 2003, and a comprehensive revision of the COG annotations and expansion of the genome coverage to include representative complete genomes from all bacterial and archaeal lineages down to the genus level. This re-analysis of the COGs shows that the original COG assignments had an error rate below 0.5% and allows an assessment of the progress in functional genomics in the past 12 years. During this time, functions of many previously uncharacterized COGs have been elucidated and tentative functional assignments of many COGs have been validated, either by targeted experiments or through the use of high-throughput methods. A particularly important development is the assignment of functions to several widespread, conserved proteins many of which turned out to participate in translation, in particular rRNA maturation and tRNA modification. The new version of the COGs is expected to become an important tool for microbial genomics. PMID:25428365
proGenomes: a resource for consistent functional and taxonomic annotations of prokaryotic genomes.
Mende, Daniel R; Letunic, Ivica; Huerta-Cepas, Jaime; Li, Simone S; Forslund, Kristoffer; Sunagawa, Shinichi; Bork, Peer
2017-01-04
The availability of microbial genomes has opened many new avenues of research within microbiology. This has been driven primarily by comparative genomics approaches, which rely on accurate and consistent characterization of genomic sequences. It is nevertheless difficult to obtain consistent taxonomic and integrated functional annotations for defined prokaryotic clades. Thus, we developed proGenomes, a resource that provides user-friendly access to currently 25 038 high-quality genomes whose sequences and consistent annotations can be retrieved individually or by taxonomic clade. These genomes are assigned to 5306 consistent and accurate taxonomic species clusters based on previously established methodology. proGenomes also contains functional information for almost 80 million protein-coding genes, including a comprehensive set of general annotations and more focused annotations for carbohydrate-active enzymes and antibiotic resistance genes. Additionally, broad habitat information is provided for many genomes. All genomes and associated information can be downloaded by user-selected clade or multiple habitat-specific sets of representative genomes. We expect that the availability of high-quality genomes with comprehensive functional annotations will promote advances in clinical microbial genomics, functional evolution and other subfields of microbiology. proGenomes is available at http://progenomes.embl.de. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
Diop, Khoudia; Diop, Awa; Levasseur, Anthony; Mediannikov, Oleg; Robert, Catherine; Armstrong, Nicholas; Couderc, Carine; Bretelle, Florence; Raoult, Didier; Fournier, Pierre-Edouard; Fenollar, Florence
2018-03-01
Microbial culturomics is a new subfield of postgenomic medicine and omics biotechnology application that has broadened our awareness on bacterial diversity of the human microbiome, including the human vaginal flora bacterial diversity. Using culturomics, a new obligate anaerobic Gram-stain-negative rod-shaped bacterium designated strain khD1 T was isolated in the vagina of a patient with bacterial vaginosis and characterized using taxonogenomics. The most abundant cellular fatty acids were C 15:0 anteiso (36%), C 16:0 (19%), and C 15:0 iso (10%). Based on an analysis of the full-length 16S rRNA gene sequences, phylogenetic analysis showed that the strain khD1 T exhibited 90% sequence similarity with Prevotella loescheii, the phylogenetically closest validated Prevotella species. With 3,763,057 bp length, the genome of strain khD1 T contained (mol%) 48.7 G + C and 3248 predicted genes, including 3194 protein-coding and 54 RNA genes. Given the phenotypical and biochemical characteristic results as well as genome sequencing, strain khD1 T is considered to represent a novel species within the genus Prevotella, for which the name Prevotella lascolaii sp. nov. is proposed. The type strain is khD1 T ( = CSUR P0109, = DSM 101754). These results show that microbial culturomics greatly improves the characterization of the human microbiome repertoire by isolating potential putative new species. Further studies will certainly clarify the microbial mechanisms of pathogenesis of these new microbes and their role in health and disease. Microbial culturomics is an important new addition to the diagnostic medicine toolbox and warrants attention in future medical, global health, and integrative biology postgraduate teaching curricula.
NASA Astrophysics Data System (ADS)
Morin, Nicolas
The MELGEN activity (MELiSSA Genetic Stability Study) mainly covers the molecular aspects of the regenerative life-support system MELiSSA (Micro-Ecological Life Support System Alternative) of the European Space Agency (ESA). The general objective of MELGEN is to establish and validate methods and the related hardware in order to detect genetic instability and microbial contaminants in the MELISSA compartments. This includes (1) a genetic description of the MELISSA strains, (2) studies of microbial behavior and genetic stability in bioreactors and (3) the detection of chemical, genetical and biological contamination and their effect on microbial metabolism. Selected as oxygen producer and complementary food source, the cyanobacterium Arthrospira sp. PCC8005 plays a major role within the MELiSSA loop. As the genomic information on this organism was insufficient, sequencing of its genome was proposed at the French National Sequencing Center, Genoscope, as a joint effort between ESA and different laboratories. So far, a preliminary assembly of 16 contigs representing circa 6.3 million basepairs was obtained. Even though the finishing of the genome is on its way, automatic annotation of the contigs has already been performed on the MaGe annotation platform, and curation of the sequence is currently being carried out, with a special focus on biosynthesis pathways, photosynthesis, and maintenance processes of the cell. According to the index of repetitiveness described by Haubold and Wiehe (2006), we discovered that the genome of Arthrospira sp. is among the 50 most repeated bacterial genomes sequenced to date. Thanks to the sequencing project, we have identified and catalogued mobile genetics elements (MGEs) dispersed throughout the unique chromosome of this cyanobacterium. They represent a quite large proportion of the genome, as genes identified as putative transposases are indeed found in circa 5 Results : We currently have a first draft of the complete genome of Arthrospira sp. PCC 8005, fully annotated. This genomic information opens the gates to a better understanding of the biology of this cyanobacterium and will be a key to the development of appropriate derivatives that provide enhanced performances (e.g. radiation resistance, genetic stability, photosynthesis and nutritive properties).
Complete Genome Sequence of Bacillus horikoshii Strain 20a from Cuatro Cienegas, Coahuila, Mexico
Alcaraz, Luis D.; Aguilar-Salinas, Bernardo; Islas, Africa
2017-01-01
ABSTRACT We sequenced the Bacillus horikoshii 20a genome, isolated from sediment collected in Cuatro Cienegas, Mexico. We identified genes involved in establishing antagonistic interactions in microbial communities (antibiotic resistance and bacteriocins) and genes related to the metabolism of cyanophycin, a reserve compound and spore matrix material potentially relevant for survival in an oligotrophic environment. PMID:28751383
Tripathi, Charu; Mahato, Nitish K; Rani, Pooja; Singh, Yogendra; Kamra, Komal; Lal, Rup
2016-01-01
Lampropedia cohaerens strain CT6(T), a non-motile, aerobic and coccoid strain was isolated from arsenic rich microbial mats (temperature ~45 °C) of a hot water spring located atop the Himalayan ranges at Manikaran, India. The present study reports the first genome sequence of type strain CT6(T) of genus Lampropedia cohaerens. Sequencing data was generated using the Illumina HiSeq 2000 platform and assembled with ABySS v 1.3.5. The 3,158,922 bp genome was assembled into 41 contigs with a mean GC content of 63.5 % and 2823 coding sequences. Strain CT6(T) was found to harbour genes involved in both the Entner-Duodoroff pathway and non-phosphorylated ED pathway. Strain CT6(T) also contained genes responsible for imparting resistance to arsenic, copper, cobalt, zinc, cadmium and magnesium, providing survival advantages at a thermal location. Additionally, the presence of genes associated with biofilm formation, pyrroloquinoline-quinone production, isoquinoline degradation and mineral phosphate solubilisation in the genome demonstrate the diverse genetic potential for survival at stressed niches.
Microbial minimalism: genome reduction in bacterial pathogens.
Moran, Nancy A
2002-03-08
When bacterial lineages make the transition from free-living or facultatively parasitic life cycles to permanent associations with hosts, they undergo a major loss of genes and DNA. Complete genome sequences are providing an understanding of how extreme genome reduction affects evolutionary directions and metabolic capabilities of obligate pathogens and symbionts.
Integrated Approach to Reconstruction of Microbial Regulatory Networks
DOE Office of Scientific and Technical Information (OSTI.GOV)
Rodionov, Dmitry A; Novichkov, Pavel S
2013-11-04
This project had the goal(s) of development of integrated bioinformatics platform for genome-scale inference and visualization of transcriptional regulatory networks (TRNs) in bacterial genomes. The work was done in Sanford-Burnham Medical Research Institute (SBMRI, P.I. D.A. Rodionov) and Lawrence Berkeley National Laboratory (LBNL, co-P.I. P.S. Novichkov). The developed computational resources include: (1) RegPredict web-platform for TRN inference and regulon reconstruction in microbial genomes, and (2) RegPrecise database for collection, visualization and comparative analysis of transcriptional regulons reconstructed by comparative genomics. These analytical resources were selected as key components in the DOE Systems Biology KnowledgeBase (SBKB). The high-quality data accumulated inmore » RegPrecise will provide essential datasets of reference regulons in diverse microbes to enable automatic reconstruction of draft TRNs in newly sequenced genomes. We outline our progress toward the three aims of this grant proposal, which were: Develop integrated platform for genome-scale regulon reconstruction; Infer regulatory annotations in several groups of bacteria and building of reference collections of microbial regulons; and Develop KnowledgeBase on microbial transcriptional regulation.« less
Genomic sequencing of Pleistocene cave bears
DOE Office of Scientific and Technical Information (OSTI.GOV)
Noonan, James P.; Hofreiter, Michael; Smith, Doug
2005-04-01
Despite the information content of genomic DNA, ancient DNA studies to date have largely been limited to amplification of mitochondrial DNA due to technical hurdles such as contamination and degradation of ancient DNAs. In this study, we describe two metagenomic libraries constructed using unamplified DNA extracted from the bones of two 40,000-year-old extinct cave bears. Analysis of {approx}1 Mb of sequence from each library showed that, despite significant microbial contamination, 5.8 percent and 1.1 percent of clones in the libraries contain cave bear inserts, yielding 26,861 bp of cave bear genome sequence. Alignment of this sequence to the dog genome,more » the closest sequenced genome to cave bear in terms of evolutionary distance, revealed roughly the expected ratio of cave bear exons, repeats and conserved noncoding sequences. Only 0.04 percent of all clones sequenced were derived from contamination with modern human DNA. Comparison of cave bear with orthologous sequences from several modern bear species revealed the evolutionary relationship of these lineages. Using the metagenomic approach described here, we have recovered substantial quantities of mammalian genomic sequence more than twice as old as any previously reported, establishing the feasibility of ancient DNA genomic sequencing programs.« less
Single-cell genomics for the masses
DOE Office of Scientific and Technical Information (OSTI.GOV)
Tringe, Susannah G.
In this issue of Nature Biotechnology, Lan et al. describe a new tool in the toolkit for studying uncultivated microbial communities, enabling orders of magnitude higher single cell genome throughput than previous methods. This is achieved by a complex droplet microfluidics workflow encompassing steps from physical cell isolation through genome sequencing, producing tens of thousands of lowcoverage genomes from individual cells.
Single-cell genomics for the masses
Tringe, Susannah G.
2017-07-12
In this issue of Nature Biotechnology, Lan et al. describe a new tool in the toolkit for studying uncultivated microbial communities, enabling orders of magnitude higher single cell genome throughput than previous methods. This is achieved by a complex droplet microfluidics workflow encompassing steps from physical cell isolation through genome sequencing, producing tens of thousands of lowcoverage genomes from individual cells.
Microbes in deep marine sediments viewed through amplicon sequencing and metagenomics
NASA Astrophysics Data System (ADS)
Biddle, J.; Leon, Z. R.; Russell, J. A., III; Martino, A. J.
2016-12-01
Nearly twenty percent of microbial biomass on Earth can be found in the marine subsurface. The majority of this is concentrated on continental margins, which have been investigated by scientific drilling. On the Costa Rica Margin, Iberian Margin and Peru Margins, sediment samples have been investigated through DNA extraction followed by amplicon and metagenomic sequencing. Overall samples show a high degree of microbial diversity, including many lineages of newly defined groups. In this talk, metagenome assembled genomes of unusual lineages will be presented, including their relationships to shallower relatives. From Costa Rica, in particular, we have retrieved deep relatives of Lokiarchaeota and Thorarchaeota, as well as other deeply branching archaeal relatives. We discuss their genome similarities to both other archaea and eukaryotes. From the Iberian Margin, relatives of Atribacteria and Aerophobetes will be discussed. Finally, we will detail the knowledge lost or gained depending on whether samples are studied via amplicon sequencing or total metagenomics, as studies in other environments have shown that up to 15% of microbial diversity is ignored when samples are studied via amplicon sequencing alone.
Genome sequencing of a single tardigrade Hypsibius dujardini individual
Arakawa, Kazuharu; Yoshida, Yuki; Tomita, Masaru
2016-01-01
Tardigrades are ubiquitous microscopic animals that play an important role in the study of metazoan phylogeny. Most terrestrial tardigrades can withstand extreme environments by entering an ametabolic desiccated state termed anhydrobiosis. Due to their small size and the non-axenic nature of laboratory cultures, molecular studies of tardigrades are prone to contamination. To minimize the possibility of microbial contaminations and to obtain high-quality genomic information, we have developed an ultra-low input library sequencing protocol to enable the genome sequencing of a single tardigrade Hypsibius dujardini individual. Here, we describe the details of our sequencing data and the ultra-low input library preparation methodologies. PMID:27529330
Genome sequencing of a single tardigrade Hypsibius dujardini individual.
Arakawa, Kazuharu; Yoshida, Yuki; Tomita, Masaru
2016-08-16
Tardigrades are ubiquitous microscopic animals that play an important role in the study of metazoan phylogeny. Most terrestrial tardigrades can withstand extreme environments by entering an ametabolic desiccated state termed anhydrobiosis. Due to their small size and the non-axenic nature of laboratory cultures, molecular studies of tardigrades are prone to contamination. To minimize the possibility of microbial contaminations and to obtain high-quality genomic information, we have developed an ultra-low input library sequencing protocol to enable the genome sequencing of a single tardigrade Hypsibius dujardini individual. Here, we describe the details of our sequencing data and the ultra-low input library preparation methodologies.
Genome signature analysis of thermal virus metagenomes reveals Archaea and thermophilic signatures
Pride, David T; Schoenfeld, Thomas
2008-01-01
Background Metagenomic analysis provides a rich source of biological information for otherwise intractable viral communities. However, study of viral metagenomes has been hampered by its nearly complete reliance on BLAST algorithms for identification of DNA sequences. We sought to develop algorithms for examination of viral metagenomes to identify the origin of sequences independent of BLAST algorithms. We chose viral metagenomes obtained from two hot springs, Bear Paw and Octopus, in Yellowstone National Park, as they represent simple microbial populations where comparatively large contigs were obtained. Thermal spring metagenomes have high proportions of sequences without significant Genbank homology, which has hampered identification of viruses and their linkage with hosts. To analyze each metagenome, we developed a method to classify DNA fragments using genome signature-based phylogenetic classification (GSPC), where metagenomic fragments are compared to a database of oligonucleotide signatures for all previously sequenced Bacteria, Archaea, and viruses. Results From both Bear Paw and Octopus hot springs, each assembled contig had more similarity to other metagenome contigs than to any sequenced microbial genome based on GSPC analysis, suggesting a genome signature common to each of these extreme environments. While viral metagenomes from Bear Paw and Octopus share some similarity, the genome signatures from each locale are largely unique. GSPC using a microbial database predicts most of the Octopus metagenome has archaeal signatures, while bacterial signatures predominate in Bear Paw; a finding consistent with those of Genbank BLAST. When using a viral database, the majority of the Octopus metagenome is predicted to belong to archaeal virus Families Globuloviridae and Fuselloviridae, while none of the Bear Paw metagenome is predicted to belong to archaeal viruses. As expected, when microbial and viral databases are combined, each of the Octopus and Bear Paw metagenomic contigs are predicted to belong to viruses rather than to any Bacteria or Archaea, consistent with the apparent viral origin of both metagenomes. Conclusion That BLAST searches identify no significant homologs for most metagenome contigs, while GSPC suggests their origin as archaeal viruses or bacteriophages, indicates GSPC provides a complementary approach in viral metagenomic analysis. PMID:18798991
Genome signature analysis of thermal virus metagenomes reveals Archaea and thermophilic signatures.
Pride, David T; Schoenfeld, Thomas
2008-09-17
Metagenomic analysis provides a rich source of biological information for otherwise intractable viral communities. However, study of viral metagenomes has been hampered by its nearly complete reliance on BLAST algorithms for identification of DNA sequences. We sought to develop algorithms for examination of viral metagenomes to identify the origin of sequences independent of BLAST algorithms. We chose viral metagenomes obtained from two hot springs, Bear Paw and Octopus, in Yellowstone National Park, as they represent simple microbial populations where comparatively large contigs were obtained. Thermal spring metagenomes have high proportions of sequences without significant Genbank homology, which has hampered identification of viruses and their linkage with hosts. To analyze each metagenome, we developed a method to classify DNA fragments using genome signature-based phylogenetic classification (GSPC), where metagenomic fragments are compared to a database of oligonucleotide signatures for all previously sequenced Bacteria, Archaea, and viruses. From both Bear Paw and Octopus hot springs, each assembled contig had more similarity to other metagenome contigs than to any sequenced microbial genome based on GSPC analysis, suggesting a genome signature common to each of these extreme environments. While viral metagenomes from Bear Paw and Octopus share some similarity, the genome signatures from each locale are largely unique. GSPC using a microbial database predicts most of the Octopus metagenome has archaeal signatures, while bacterial signatures predominate in Bear Paw; a finding consistent with those of Genbank BLAST. When using a viral database, the majority of the Octopus metagenome is predicted to belong to archaeal virus Families Globuloviridae and Fuselloviridae, while none of the Bear Paw metagenome is predicted to belong to archaeal viruses. As expected, when microbial and viral databases are combined, each of the Octopus and Bear Paw metagenomic contigs are predicted to belong to viruses rather than to any Bacteria or Archaea, consistent with the apparent viral origin of both metagenomes. That BLAST searches identify no significant homologs for most metagenome contigs, while GSPC suggests their origin as archaeal viruses or bacteriophages, indicates GSPC provides a complementary approach in viral metagenomic analysis.
The CRISPR Spacer Space Is Dominated by Sequences from Species-Specific Mobilomes.
Shmakov, Sergey A; Sitnik, Vassilii; Makarova, Kira S; Wolf, Yuri I; Severinov, Konstantin V; Koonin, Eugene V
2017-09-19
Clustered regularly interspaced short palindromic repeats and CRISPR-associated protein (CRISPR-Cas) systems store the memory of past encounters with foreign DNA in unique spacers that are inserted between direct repeats in CRISPR arrays. For only a small fraction of the spacers, homologous sequences, called protospacers, are detectable in viral, plasmid, and microbial genomes. The rest of the spacers remain the CRISPR "dark matter." We performed a comprehensive analysis of the spacers from all CRISPR- cas loci identified in bacterial and archaeal genomes, and we found that, depending on the CRISPR-Cas subtype and the prokaryotic phylum, protospacers were detectable for 1% to about 19% of the spacers (~7% global average). Among the detected protospacers, the majority, typically 80 to 90%, originated from viral genomes, including proviruses, and among the rest, the most common source was genes that are integrated into microbial chromosomes but are involved in plasmid conjugation or replication. Thus, almost all spacers with identifiable protospacers target mobile genetic elements (MGE). The GC content, as well as dinucleotide and tetranucleotide compositions, of microbial genomes, their spacer complements, and the cognate viral genomes showed a nearly perfect correlation and were almost identical. Given the near absence of self-targeting spacers, these findings are most compatible with the possibility that the spacers, including the dark matter, are derived almost completely from the species-specific microbial mobilomes. IMPORTANCE The principal function of CRISPR-Cas systems is thought to be protection of bacteria and archaea against viruses and other parasitic genetic elements. The CRISPR defense function is mediated by sequences from parasitic elements, known as spacers, that are inserted into CRISPR arrays and then transcribed and employed as guides to identify and inactivate the cognate parasitic genomes. However, only a small fraction of the CRISPR spacers match any sequences in the current databases, and of these, only a minority correspond to known parasitic elements. We show that nearly all spacers with matches originate from viral or plasmid genomes that are either free or have been integrated into the host genome. We further demonstrate that spacers with no matches have the same properties as those of identifiable origins, strongly suggesting that all spacers originate from mobile elements.
Ye, Chao; Xu, Nan; Dong, Chuan; Ye, Yuannong; Zou, Xuan; Chen, Xiulai; Guo, Fengbiao; Liu, Liming
2017-04-07
Genome-scale metabolic models (GSMMs) constitute a platform that combines genome sequences and detailed biochemical information to quantify microbial physiology at the system level. To improve the unity, integrity, correctness, and format of data in published GSMMs, a consensus IMGMD database was built in the LAMP (Linux + Apache + MySQL + PHP) system by integrating and standardizing 328 GSMMs constructed for 139 microorganisms. The IMGMD database can help microbial researchers download manually curated GSMMs, rapidly reconstruct standard GSMMs, design pathways, and identify metabolic targets for strategies on strain improvement. Moreover, the IMGMD database facilitates the integration of wet-lab and in silico data to gain an additional insight into microbial physiology. The IMGMD database is freely available, without any registration requirements, at http://imgmd.jiangnan.edu.cn/database.
Genomic and metagenomic challenges and opportunities for bioleaching: a mini-review.
Cárdenas, Juan Pablo; Quatrini, Raquel; Holmes, David S
2016-09-01
High-throughput genomic technologies are accelerating progress in understanding the diversity of microbial life in many environments. Here we highlight advances in genomics and metagenomics of microorganisms from bioleaching heaps and related acidic mining environments. Bioleaching heaps used for copper recovery provide significant opportunities to study the processes and mechanisms underlying microbial successions and the influence of community composition on ecosystem functioning. Obtaining quantitative and process-level knowledge of these dynamics is pivotal for understanding how microorganisms contribute to the solubilization of copper for industrial recovery. Advances in DNA sequencing technology provide unprecedented opportunities to obtain information about the genomes of bioleaching microorganisms, allowing predictive models of metabolic potential and ecosystem-level interactions to be constructed. These approaches are enabling predictive phenotyping of organisms many of which are recalcitrant to genetic approaches or are unculturable. This mini-review describes current bioleaching genomic and metagenomic projects and addresses the use of genome information to: (i) build metabolic models; (ii) predict microbial interactions; (iii) estimate genetic diversity; and (iv) study microbial evolution. Key challenges and perspectives of bioleaching genomics/metagenomics are addressed. Copyright © 2016 The Author(s). Published by Elsevier Masson SAS.. All rights reserved.
Xu, Yiyuan; Ren, Chong; Chen, Ruixuan
2016-01-01
Gallaecimonas pentaromativorans has been previously reported to be capable of degrading crude oil and diesel oil. G. pentaromativorans strain YA_1 was isolated from the southwest Indian Ocean and can degrade crude oil. This study reports the draft genome sequence of G. pentaromativorans, which can provide insights into the mechanisms of microbial oil biodegradation. PMID:27491993
Draft genome sequence of a strictly anaerobic dichloromethane-degrading bacterium
Kleindienst, Sara; Higgins, Steven A.; Tsementzi, Despina; ...
2016-03-03
Here, an anaerobic, dichloromethane-degrading bacterium affiliated with novel Peptococcaceae was maintained in a microbial consortium. The organism originated from pristine freshwater sediment collected from Rio Mameyes in Luquillo, Puerto Rico, in October 2009 (latitude 18°21'43.9", longitude –65°46'8.4"). The draft genome sequence is 2.1 Mb and has a G+C content of 43.5%.
Complete Genome Sequence of Bacillus horikoshii Strain 20a from Cuatro Cienegas, Coahuila, Mexico.
Zarza, Eugenia; Alcaraz, Luis D; Aguilar-Salinas, Bernardo; Islas, Africa; Olmedo-Álvarez, Gabriela
2017-07-27
We sequenced the Bacillus horikoshii 20a genome, isolated from sediment collected in Cuatro Cienegas, Mexico. We identified genes involved in establishing antagonistic interactions in microbial communities (antibiotic resistance and bacteriocins) and genes related to the metabolism of cyanophycin, a reserve compound and spore matrix material potentially relevant for survival in an oligotrophic environment. Copyright © 2017 Zarza et al.
Dissecting the human microbiome with single-cell genomics.
Tolonen, Andrew C; Xavier, Ramnik J
2017-06-14
Recent advances in genome sequencing of single microbial cells enable the assignment of functional roles to members of the human microbiome that cannot currently be cultured. This approach can reveal the genomic basis of phenotypic variation between closely related strains and can be applied to the targeted study of immunogenic bacteria in disease.
Draft Genome Sequences of 37 Salmonella enterica Strains Isolated from Poultry Sources in Nigeria
Useh, Nicodemus M.; Ngbede, Emmanuel O.; Akange, Nguavese; Thomas, Milton; Foley, Andrew; Keena, Mitchel Chan; Nelson, Eric; Christopher-Hennings, Jane; Tomita, Masaru
2016-01-01
Here, we report the availability of draft genomes of several Salmonella serotypes, isolated from poultry sources from Nigeria. These genomes will help to further understand the biological diversity of S. enterica and will serve as references in microbial trace-back studies to improve food safety. PMID:27151793
Yu, J; Blom, J; Glaeser, S P; Jaenicke, S; Juhre, T; Rupp, O; Schwengers, O; Spänig, S; Goesmann, A
2017-11-10
The rapid development of next generation sequencing technology has greatly increased the amount of available microbial genomes. As a result of this development, there is a rising demand for fast and automated approaches in analyzing these genomes in a comparative way. Whole genome sequencing also bears a huge potential for obtaining a higher resolution in phylogenetic and taxonomic classification. During the last decade, several software tools and platforms have been developed in the field of comparative genomics. In this manuscript, we review the most commonly used platforms and approaches for ortholog group analyses with a focus on their potential for phylogenetic and taxonomic research. Furthermore, we describe the latest improvements of the EDGAR platform for comparative genome analyses and present recent examples of its application for the phylogenomic analysis of different taxa. Finally, we illustrate the role of the EDGAR platform as part of the BiGi Center for Microbial Bioinformatics within the German network on Bioinformatics Infrastructure (de.NBI). Copyright © 2017 The Authors. Published by Elsevier B.V. All rights reserved.
Tettelin, Hervé; Masignani, Vega; Cieslewicz, Michael J.; Donati, Claudio; Medini, Duccio; Ward, Naomi L.; Angiuoli, Samuel V.; Crabtree, Jonathan; Jones, Amanda L.; Durkin, A. Scott; DeBoy, Robert T.; Davidsen, Tanja M.; Mora, Marirosa; Scarselli, Maria; Margarit y Ros, Immaculada; Peterson, Jeremy D.; Hauser, Christopher R.; Sundaram, Jaideep P.; Nelson, William C.; Madupu, Ramana; Brinkac, Lauren M.; Dodson, Robert J.; Rosovitz, Mary J.; Sullivan, Steven A.; Daugherty, Sean C.; Haft, Daniel H.; Selengut, Jeremy; Gwinn, Michelle L.; Zhou, Liwei; Zafar, Nikhat; Khouri, Hoda; Radune, Diana; Dimitrov, George; Watkins, Kisha; O'Connor, Kevin J. B.; Smith, Shannon; Utterback, Teresa R.; White, Owen; Rubens, Craig E.; Grandi, Guido; Madoff, Lawrence C.; Kasper, Dennis L.; Telford, John L.; Wessels, Michael R.; Rappuoli, Rino; Fraser, Claire M.
2005-01-01
The development of efficient and inexpensive genome sequencing methods has revolutionized the study of human bacterial pathogens and improved vaccine design. Unfortunately, the sequence of a single genome does not reflect how genetic variability drives pathogenesis within a bacterial species and also limits genome-wide screens for vaccine candidates or for antimicrobial targets. We have generated the genomic sequence of six strains representing the five major disease-causing serotypes of Streptococcus agalactiae, the main cause of neonatal infection in humans. Analysis of these genomes and those available in databases showed that the S. agalactiae species can be described by a pan-genome consisting of a core genome shared by all isolates, accounting for ≈80% of any single genome, plus a dispensable genome consisting of partially shared and strain-specific genes. Mathematical extrapolation of the data suggests that the gene reservoir available for inclusion in the S. agalactiae pan-genome is vast and that unique genes will continue to be identified even after sequencing hundreds of genomes. PMID:16172379
Vallenet, David; Calteau, Alexandra; Cruveiller, Stéphane; Gachet, Mathieu; Lajus, Aurélie; Josso, Adrien; Mercier, Jonathan; Renaux, Alexandre; Rollin, Johan; Rouy, Zoe; Roche, David; Scarpelli, Claude; Médigue, Claudine
2017-01-01
The annotation of genomes from NGS platforms needs to be automated and fully integrated. However, maintaining consistency and accuracy in genome annotation is a challenging problem because millions of protein database entries are not assigned reliable functions. This shortcoming limits the knowledge that can be extracted from genomes and metabolic models. Launched in 2005, the MicroScope platform (http://www.genoscope.cns.fr/agc/microscope) is an integrative resource that supports systematic and efficient revision of microbial genome annotation, data management and comparative analysis. Effective comparative analysis requires a consistent and complete view of biological data, and therefore, support for reviewing the quality of functional annotation is critical. MicroScope allows users to analyze microbial (meta)genomes together with post-genomic experiment results if any (i.e. transcriptomics, re-sequencing of evolved strains, mutant collections, phenotype data). It combines tools and graphical interfaces to analyze genomes and to perform the expert curation of gene functions in a comparative context. Starting with a short overview of the MicroScope system, this paper focuses on some major improvements of the Web interface, mainly for the submission of genomic data and on original tools and pipelines that have been developed and integrated in the platform: computation of pan-genomes and prediction of biosynthetic gene clusters. Today the resource contains data for more than 6000 microbial genomes, and among the 2700 personal accounts (65% of which are now from foreign countries), 14% of the users are performing expert annotations, on at least a weekly basis, contributing to improve the quality of microbial genome annotations. PMID:27899624
Genome-based Modeling and Design of Metabolic Interactions in Microbial Communities
Mahadevan, Radhakrishnan; Henson, Michael A.
2012-01-01
Biotechnology research is traditionally focused on individual microbial strains that are perceived to have the necessary metabolic functions, or the capability to have these functions introduced, to achieve a particular task. For many important applications, the development of such omnipotent microbes is an extremely challenging if not impossible task. By contrast, nature employs a radically different strategy based on synergistic combinations of different microbial species that collectively achieve the desired task. These natural communities have evolved to exploit the native metabolic capabilities of each species and are highly adaptive to changes in their environments. However, microbial communities have proven difficult to study due to a lack of suitable experimental and computational tools. With the advent of genome sequencing, omics technologies, bioinformatics and genome-scale modeling, researchers now have unprecedented capabilities to analyze and engineer the metabolism of microbial communities. The goal of this review is to summarize recent applications of genome-scale metabolic modeling to microbial communities. A brief introduction to lumped community models is used to motivate the need for genome-level descriptions of individual species and their metabolic interactions. The review of genome-scale models begins with static modeling approaches, which are appropriate for communities where the extracellular environment can be assumed to be time invariant or slowly varying. Dynamic extensions of the static modeling approach are described, and then applications of genome-scale models for design of synthetic microbial communities are reviewed. The review concludes with a summary of metagenomic tools for analyzing community metabolism and an outlook for future research. PMID:24688668
Genome-based Modeling and Design of Metabolic Interactions in Microbial Communities.
Mahadevan, Radhakrishnan; Henson, Michael A
2012-01-01
Biotechnology research is traditionally focused on individual microbial strains that are perceived to have the necessary metabolic functions, or the capability to have these functions introduced, to achieve a particular task. For many important applications, the development of such omnipotent microbes is an extremely challenging if not impossible task. By contrast, nature employs a radically different strategy based on synergistic combinations of different microbial species that collectively achieve the desired task. These natural communities have evolved to exploit the native metabolic capabilities of each species and are highly adaptive to changes in their environments. However, microbial communities have proven difficult to study due to a lack of suitable experimental and computational tools. With the advent of genome sequencing, omics technologies, bioinformatics and genome-scale modeling, researchers now have unprecedented capabilities to analyze and engineer the metabolism of microbial communities. The goal of this review is to summarize recent applications of genome-scale metabolic modeling to microbial communities. A brief introduction to lumped community models is used to motivate the need for genome-level descriptions of individual species and their metabolic interactions. The review of genome-scale models begins with static modeling approaches, which are appropriate for communities where the extracellular environment can be assumed to be time invariant or slowly varying. Dynamic extensions of the static modeling approach are described, and then applications of genome-scale models for design of synthetic microbial communities are reviewed. The review concludes with a summary of metagenomic tools for analyzing community metabolism and an outlook for future research.
Microbial community assembly and evolution in subseafloor sediment.
Starnawski, Piotr; Bataillon, Thomas; Ettema, Thijs J G; Jochum, Lara M; Schreiber, Lars; Chen, Xihan; Lever, Mark A; Polz, Martin F; Jørgensen, Bo B; Schramm, Andreas; Kjeldsen, Kasper U
2017-03-14
Bacterial and archaeal communities inhabiting the subsurface seabed live under strong energy limitation and have growth rates that are orders of magnitude slower than laboratory-grown cultures. It is not understood how subsurface microbial communities are assembled and whether populations undergo adaptive evolution or accumulate mutations as a result of impaired DNA repair under such energy-limited conditions. Here we use amplicon sequencing to explore changes of microbial communities during burial and isolation from the surface to the >5,000-y-old subsurface of marine sediment and identify a small core set of mostly uncultured bacteria and archaea that is present throughout the sediment column. These persisting populations constitute a small fraction of the entire community at the surface but become predominant in the subsurface. We followed patterns of genome diversity with depth in four dominant lineages of the persisting populations by mapping metagenomic sequence reads onto single-cell genomes. Nucleotide sequence diversity was uniformly low and did not change with age and depth of the sediment. Likewise, there was no detectable change in mutation rates and efficacy of selection. Our results indicate that subsurface microbial communities predominantly assemble by selective survival of taxa able to persist under extreme energy limitation.
Low-pass sequencing for microbial comparative genomics
Goo, Young Ah; Roach, Jared; Glusman, Gustavo; Baliga, Nitin S; Deutsch, Kerry; Pan, Min; Kennedy, Sean; DasSarma, Shiladitya; Victor Ng, Wailap; Hood, Leroy
2004-01-01
Background We studied four extremely halophilic archaea by low-pass shotgun sequencing: (1) the metabolically versatile Haloarcula marismortui; (2) the non-pigmented Natrialba asiatica; (3) the psychrophile Halorubrum lacusprofundi and (4) the Dead Sea isolate Halobaculum gomorrense. Approximately one thousand single pass genomic sequences per genome were obtained. The data were analyzed by comparative genomic analyses using the completed Halobacterium sp. NRC-1 genome as a reference. Low-pass shotgun sequencing is a simple, inexpensive, and rapid approach that can readily be performed on any cultured microbe. Results As expected, the four archaeal halophiles analyzed exhibit both bacterial and eukaryotic characteristics as well as uniquely archaeal traits. All five halophiles exhibit greater than sixty percent GC content and low isoelectric points (pI) for their predicted proteins. Multiple insertion sequence (IS) elements, often involved in genome rearrangements, were identified in H. lacusprofundi and H. marismortui. The core biological functions that govern cellular and genetic mechanisms of H. sp. NRC-1 appear to be conserved in these four other halophiles. Multiple TATA box binding protein (TBP) and transcription factor IIB (TFB) homologs were identified from most of the four shotgunned halophiles. The reconstructed molecular tree of all five halophiles shows a large divergence between these species, but with the closest relationship being between H. sp. NRC-1 and H. lacusprofundi. Conclusion Despite the diverse habitats of these species, all five halophiles share (1) high GC content and (2) low protein isoelectric points, which are characteristics associated with environmental exposure to UV radiation and hypersalinity, respectively. Identification of multiple IS elements in the genome of H. lacusprofundi and H. marismortui suggest that genome structure and dynamic genome reorganization might be similar to that previously observed in the IS-element rich genome of H. sp. NRC-1. Identification of multiple TBP and TFB homologs in these four halophiles are consistent with the hypothesis that different types of complex transcriptional regulation may occur through multiple TBP-TFB combinations in response to rapidly changing environmental conditions. Low-pass shotgun sequence analyses of genomes permit extensive and diverse analyses, and should be generally useful for comparative microbial genomics. PMID:14718067
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bowers, Robert M.; Kyrpides, Nikos C.; Stepanauskas, Ramunas
The number of genomes from uncultivated microbes will soon surpass the number of isolate genomes in public databases (Hugenholtz, Skarshewski, & Parks, 2016). Technological advancements in high-throughput sequencing and assembly, including single-cell genomics and the computational extraction of genomes from metagenomes (GFMs), are largely responsible. Here we propose community standards for reporting the Minimum Information about a Single-Cell Genome (MIxS-SCG) and Minimum Information about Genomes extracted From Metagenomes (MIxS-GFM) specific for Bacteria and Archaea. The standards have been developed in the context of the International Genomics Standards Consortium (GSC) community (Field et al., 2014) and can be viewed as amore » supplement to other GSC checklists including the Minimum Information about a Genome Sequence (MIGS), Minimum information about a Metagenomic Sequence(s) (MIMS) (Field et al., 2008) and Minimum Information about a Marker Gene Sequence (MIMARKS) (P. Yilmaz et al., 2011). Community-wide acceptance of MIxS-SCG and MIxS-GFM for Bacteria and Archaea will enable broad comparative analyses of genomes from the majority of taxa that remain uncultivated, improving our understanding of microbial function, ecology, and evolution.« less
Application of Sequence-based Methods in Human MicrobialEcology
DOE Office of Scientific and Technical Information (OSTI.GOV)
Weng, Li; Rubin, Edward M.; Bristow, James
2005-08-29
Ecologists studying microbial life in the environment have recognized the enormous complexity of microbial diversity for many years, and the development of a variety of culture-independent methods, many of them coupled with high-throughput DNA sequencing, has allowed this diversity to be explored in ever greater detail. Despite the widespread application of these new techniques to the characterization of uncultivated microbes and microbial communities in the environment, their application to human health and disease has lagged behind. Because DNA based-techniques for defining uncultured microbes allow not only cataloging of microbial diversity, but also insight into microbial functions, investigators are beginning tomore » apply these tools to the microbial communities that abound on and within us, in what has aptly been called the second Human Genome Project. In this review we discuss the sequence-based methods for microbial analysis that are currently available and their application to identify novel human pathogens, improve diagnosis of known infectious diseases, and to advance understanding of our relationship with microbial communities that normally reside in and on the human body.« less
Practical Approaches for Detecting Selection in Microbial Genomes.
Hedge, Jessica; Wilson, Daniel J
2016-02-01
Microbial genome evolution is shaped by a variety of selective pressures. Understanding how these processes occur can help to address important problems in microbiology by explaining observed differences in phenotypes, including virulence and resistance to antibiotics. Greater access to whole-genome sequencing provides microbiologists with the opportunity to perform large-scale analyses of selection in novel settings, such as within individual hosts. This tutorial aims to guide researchers through the fundamentals underpinning popular methods for measuring selection in pathogens. These methods are transferable to a wide variety of organisms, and the exercises provided are designed for researchers with any level of programming experience.
Genome-reconstruction for eukaryotes from complex natural microbial communities.
West, Patrick T; Probst, Alexander J; Grigoriev, Igor V; Thomas, Brian C; Banfield, Jillian F
2018-04-01
Microbial eukaryotes are integral components of natural microbial communities, and their inclusion is critical for many ecosystem studies, yet the majority of published metagenome analyses ignore eukaryotes. In order to include eukaryotes in environmental studies, we propose a method to recover eukaryotic genomes from complex metagenomic samples. A key step for genome recovery is separation of eukaryotic and prokaryotic fragments. We developed a k -mer-based strategy, EukRep, for eukaryotic sequence identification and applied it to environmental samples to show that it enables genome recovery, genome completeness evaluation, and prediction of metabolic potential. We used this approach to test the effect of addition of organic carbon on a geyser-associated microbial community and detected a substantial change of the community metabolism, with selection against almost all candidate phyla bacteria and archaea and for eukaryotes. Near complete genomes were reconstructed for three fungi placed within the Eurotiomycetes and an arthropod. While carbon fixation and sulfur oxidation were important functions in the geyser community prior to carbon addition, the organic carbon-impacted community showed enrichment for secreted proteases, secreted lipases, cellulose targeting CAZymes, and methanol oxidation. We demonstrate the broader utility of EukRep by reconstructing and evaluating relatively high-quality fungal, protist, and rotifer genomes from complex environmental samples. This approach opens the way for cultivation-independent analyses of whole microbial communities. © 2018 West et al.; Published by Cold Spring Harbor Laboratory Press.
CMG-biotools, a free workbench for basic comparative microbial genomics.
Vesth, Tammi; Lagesen, Karin; Acar, Öncel; Ussery, David
2013-01-01
Today, there are more than a hundred times as many sequenced prokaryotic genomes than were present in the year 2000. The economical sequencing of genomic DNA has facilitated a whole new approach to microbial genomics. The real power of genomics is manifested through comparative genomics that can reveal strain specific characteristics, diversity within species and many other aspects. However, comparative genomics is a field not easily entered into by scientists with few computational skills. The CMG-biotools package is designed for microbiologists with limited knowledge of computational analysis and can be used to perform a number of analyses and comparisons of genomic data. The CMG-biotools system presents a stand-alone interface for comparative microbial genomics. The package is a customized operating system, based on Xubuntu 10.10, available through the open source Ubuntu project. The system can be installed on a virtual computer, allowing the user to run the system alongside any other operating system. Source codes for all programs are provided under GNU license, which makes it possible to transfer the programs to other systems if so desired. We here demonstrate the package by comparing and analyzing the diversity within the class Negativicutes, represented by 31 genomes including 10 genera. The analyses include 16S rRNA phylogeny, basic DNA and codon statistics, proteome comparisons using BLAST and graphical analyses of DNA structures. This paper shows the strength and diverse use of the CMG-biotools system. The system can be installed on a vide range of host operating systems and utilizes as much of the host computer as desired. It allows the user to compare multiple genomes, from various sources using standardized data formats and intuitive visualizations of results. The examples presented here clearly shows that users with limited computational experience can perform complicated analysis without much training.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Robert DeSalle
2004-09-10
This project seeks to use the genomes of two close relatives, A. actinomycetemcomitans and H. aphrophilus, to understand the evolutionary changes that take place in a genome to make it more or less virulent. Our primary specific aim of this project was to sequence, annotate, and analyze the genomes of Actinobacillus actinomycetemcomitans (CU1000, serotype f) and Haemophilus aphrophilus. With these genome sequences we have then compared the whole genome sequences to each other and to the current Aa (HK1651 www.genome.ou.edu) genome project sequence along with other fully sequenced Pasteurellaceae to determine inter and intra species differences that may account formore » the differences and similarities in disease. We also propose to create and curate a comprehensive database where sequence information and analysis for the Pasteurellaceae (family that includes the genera Actinobacillus and Haemophilus) are readily accessible. And finally we have proposed to develop phylogenetic techniques that can be used to efficiently and accurately examine the evolution of genomes. Below we report on progress we have made on these major specific aims. Progress on the specific aims is reported below under two major headings--experimental approaches and bioinformatics and systematic biology approaches.« less
Microbial Ecology and Evolution in the Acid Mine Drainage Model System.
Huang, Li-Nan; Kuang, Jia-Liang; Shu, Wen-Sheng
2016-07-01
Acid mine drainage (AMD) is a unique ecological niche for acid- and toxic-metals-adapted microorganisms. These low-complexity systems offer a special opportunity for the ecological and evolutionary analyses of natural microbial assemblages. The last decade has witnessed an unprecedented interest in the study of AMD communities using 16S rRNA high-throughput sequencing and community genomic and postgenomic methodologies, significantly advancing our understanding of microbial diversity, community function, and evolution in acidic environments. This review describes new data on AMD microbial ecology and evolution, especially dynamics of microbial diversity, community functions, and population genomes, and further identifies gaps in our current knowledge that future research, with integrated applications of meta-omics technologies, will fill. Copyright © 2016 Elsevier Ltd. All rights reserved.
The microbiomes of blowflies and houseflies as bacterial transmission reservoirs.
Junqueira, Ana Carolina M; Ratan, Aakrosh; Acerbi, Enzo; Drautz-Moses, Daniela I; Premkrishnan, Balakrishnan N V; Costea, Paul I; Linz, Bodo; Purbojati, Rikky W; Paulo, Daniel F; Gaultier, Nicolas E; Subramanian, Poorani; Hasan, Nur A; Colwell, Rita R; Bork, Peer; Azeredo-Espin, Ana Maria L; Bryant, Donald A; Schuster, Stephan C
2017-11-24
Blowflies and houseflies are mechanical vectors inhabiting synanthropic environments around the world. They feed and breed in fecal and decaying organic matter, but the microbiome they harbour and transport is largely uncharacterized. We sampled 116 individual houseflies and blowflies from varying habitats on three continents and subjected them to high-coverage, whole-genome shotgun sequencing. This allowed for genomic and metagenomic analyses of the host-associated microbiome at the species level. Both fly host species segregate based on principal coordinate analysis of their microbial communities, but they also show an overlapping core microbiome. Legs and wings displayed the largest microbial diversity and were shown to be an important route for microbial dispersion. The environmental sequencing approach presented here detected a stochastic distribution of human pathogens, such as Helicobacter pylori, thereby demonstrating the potential of flies as proxies for environmental and public health surveillance.
Li, Dongmei; Greenfield, Paul; Rosewarne, Carly P.
2013-01-01
The draft genome sequence of Thermoanaerobacter sp. strain A7A was reconstructed from a metagenome of a microbial consortium obtained from the Tuna oil field in the Gippsland Basin, Australia. The organism is a strict anaerobe that is predicted to ferment a range of simple sugars and undertake sulfur reduction. PMID:24029756
Liu, Hongwei; Yin, Shuli; An, Likang; Zhang, Genwei; Cheng, Huicai; Xi, Yanhua; Cui, Guanhui; Zhang, Feiyan; Zhang, Liping
2016-07-20
Bacillus subtilis BSD-2, isolated from cotton (Gossypium spp.), had strong antagonistic activity to Verticillium dahlia Kleb and Botrytis cinerea. We sequenced and annotated the BSD-2 complete genome to help us the better use of this strain, which has surfactin, bacilysin, bacillibactin, subtilosin A, Tas A and a potential class IV lanthipeptide biosynthetic pathways. Copyright © 2016 Elsevier B.V. All rights reserved.
Integrative computational approach for genome-based study of microbial lipid-degrading enzymes.
Vorapreeda, Tayvich; Thammarongtham, Chinae; Laoteng, Kobkul
2016-07-01
Lipid-degrading or lipolytic enzymes have gained enormous attention in academic and industrial sectors. Several efforts are underway to discover new lipase enzymes from a variety of microorganisms with particular catalytic properties to be used for extensive applications. In addition, various tools and strategies have been implemented to unravel the functional relevance of the versatile lipid-degrading enzymes for special purposes. This review highlights the study of microbial lipid-degrading enzymes through an integrative computational approach. The identification of putative lipase genes from microbial genomes and metagenomic libraries using homology-based mining is discussed, with an emphasis on sequence analysis of conserved motifs and enzyme topology. Molecular modelling of three-dimensional structure on the basis of sequence similarity is shown to be a potential approach for exploring the structural and functional relationships of candidate lipase enzymes. The perspectives on a discriminative framework of cutting-edge tools and technologies, including bioinformatics, computational biology, functional genomics and functional proteomics, intended to facilitate rapid progress in understanding lipolysis mechanism and to discover novel lipid-degrading enzymes of microorganisms are discussed.
Wang, Mingjie; Ye, Yuzhen; Tang, Haixu
2012-06-01
The wide applications of next-generation sequencing (NGS) technologies in metagenomics have raised many computational challenges. One of the essential problems in metagenomics is to estimate the taxonomic composition of a microbial community, which can be approached by mapping shotgun reads acquired from the community to previously characterized microbial genomes followed by quantity profiling of these species based on the number of mapped reads. This procedure, however, is not as trivial as it appears at first glance. A shotgun metagenomic dataset often contains DNA sequences from many closely-related microbial species (e.g., within the same genus) or strains (e.g., within the same species), thus it is often difficult to determine which species/strain a specific read is sampled from when it can be mapped to a common region shared by multiple genomes at high similarity. Furthermore, high genomic variations are observed among individual genomes within the same species, which are difficult to be differentiated from the inter-species variations during reads mapping. To address these issues, a commonly used approach is to quantify taxonomic distribution only at the genus level, based on the reads mapped to all species belonging to the same genus; alternatively, reads are mapped to a set of representative genomes, each selected to represent a different genus. Here, we introduce a novel approach to the quantity estimation of closely-related species within the same genus by mapping the reads to their genomes represented by a de Bruijn graph, in which the common genomic regions among them are collapsed. Using simulated and real metagenomic datasets, we show the de Bruijn graph approach has several advantages over existing methods, including (1) it avoids redundant mapping of shotgun reads to multiple copies of the common regions in different genomes, and (2) it leads to more accurate quantification for the closely-related species (and even for strains within the same species).
ITEP: an integrated toolkit for exploration of microbial pan-genomes.
Benedict, Matthew N; Henriksen, James R; Metcalf, William W; Whitaker, Rachel J; Price, Nathan D
2014-01-03
Comparative genomics is a powerful approach for studying variation in physiological traits as well as the evolution and ecology of microorganisms. Recent technological advances have enabled sequencing large numbers of related genomes in a single project, requiring computational tools for their integrated analysis. In particular, accurate annotations and identification of gene presence and absence are critical for understanding and modeling the cellular physiology of newly sequenced genomes. Although many tools are available to compare the gene contents of related genomes, new tools are necessary to enable close examination and curation of protein families from large numbers of closely related organisms, to integrate curation with the analysis of gain and loss, and to generate metabolic networks linking the annotations to observed phenotypes. We have developed ITEP, an Integrated Toolkit for Exploration of microbial Pan-genomes, to curate protein families, compute similarities to externally-defined domains, analyze gene gain and loss, and generate draft metabolic networks from one or more curated reference network reconstructions in groups of related microbial species among which the combination of core and variable genes constitute the their "pan-genomes". The ITEP toolkit consists of: (1) a series of modular command-line scripts for identification, comparison, curation, and analysis of protein families and their distribution across many genomes; (2) a set of Python libraries for programmatic access to the same data; and (3) pre-packaged scripts to perform common analysis workflows on a collection of genomes. ITEP's capabilities include de novo protein family prediction, ortholog detection, analysis of functional domains, identification of core and variable genes and gene regions, sequence alignments and tree generation, annotation curation, and the integration of cross-genome analysis and metabolic networks for study of metabolic network evolution. ITEP is a powerful, flexible toolkit for generation and curation of protein families. ITEP's modular design allows for straightforward extension as analysis methods and tools evolve. By integrating comparative genomics with the development of draft metabolic networks, ITEP harnesses the power of comparative genomics to build confidence in links between genotype and phenotype and helps disambiguate gene annotations when they are evaluated in both evolutionary and metabolic network contexts.
Draft Genome Sequences of 37 Salmonella enterica Strains Isolated from Poultry Sources in Nigeria.
Useh, Nicodemus M; Ngbede, Emmanuel O; Akange, Nguavese; Thomas, Milton; Foley, Andrew; Keena, Mitchel Chan; Nelson, Eric; Christopher-Hennings, Jane; Tomita, Masaru; Suzuki, Haruo; Scaria, Joy
2016-05-05
Here, we report the availability of draft genomes of several Salmonella serotypes, isolated from poultry sources from Nigeria. These genomes will help to further understand the biological diversity of S. enterica and will serve as references in microbial trace-back studies to improve food safety. Copyright © 2016 Useh et al.
Collection, Culturing, and Genome Analyses of Tropical Marine Filamentous Benthic Cyanobacteria.
Moss, Nathan A; Leao, Tiago; Glukhov, Evgenia; Gerwick, Lena; Gerwick, William H
2018-01-01
Decreasing sequencing costs has sparked widespread investigation of the use of microbial genomics to accelerate the discovery and development of natural products for therapeutic uses. Tropical marine filamentous cyanobacteria have historically produced many structurally novel natural products, and therefore present an excellent opportunity for the systematic discovery of new metabolites via the information derived from genomics and molecular genetics. Adequate knowledge transfer and institutional know-how are important to maintain the capability for studying filamentous cyanobacteria due to their unusual microbial morphology and characteristics. Here, we describe workflows, procedures, and commentary on sample collection, cultivation, genomic DNA generation, bioinformatics tools, and biosynthetic pathway analysis concerning filamentous cyanobacteria. © 2018 Elsevier Inc. All rights reserved.
Expanded microbial genome coverage and improved protein family annotation in the COG database.
Galperin, Michael Y; Makarova, Kira S; Wolf, Yuri I; Koonin, Eugene V
2015-01-01
Microbial genome sequencing projects produce numerous sequences of deduced proteins, only a small fraction of which have been or will ever be studied experimentally. This leaves sequence analysis as the only feasible way to annotate these proteins and assign to them tentative functions. The Clusters of Orthologous Groups of proteins (COGs) database (http://www.ncbi.nlm.nih.gov/COG/), first created in 1997, has been a popular tool for functional annotation. Its success was largely based on (i) its reliance on complete microbial genomes, which allowed reliable assignment of orthologs and paralogs for most genes; (ii) orthology-based approach, which used the function(s) of the characterized member(s) of the protein family (COG) to assign function(s) to the entire set of carefully identified orthologs and describe the range of potential functions when there were more than one; and (iii) careful manual curation of the annotation of the COGs, aimed at detailed prediction of the biological function(s) for each COG while avoiding annotation errors and overprediction. Here we present an update of the COGs, the first since 2003, and a comprehensive revision of the COG annotations and expansion of the genome coverage to include representative complete genomes from all bacterial and archaeal lineages down to the genus level. This re-analysis of the COGs shows that the original COG assignments had an error rate below 0.5% and allows an assessment of the progress in functional genomics in the past 12 years. During this time, functions of many previously uncharacterized COGs have been elucidated and tentative functional assignments of many COGs have been validated, either by targeted experiments or through the use of high-throughput methods. A particularly important development is the assignment of functions to several widespread, conserved proteins many of which turned out to participate in translation, in particular rRNA maturation and tRNA modification. The new version of the COGs is expected to become an important tool for microbial genomics. Published by Oxford University Press on behalf of Nucleic Acids Research 2014. This work is written by US Government employees and is in the public domain in the US.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Isanapong, Jantiya; Goodwin, Lynne A.; Bruce, David
Microbial communities in the termite hindgut are essential for degrading plant material. We present the high-quality draft genome sequence of the Opitutaceae bacterium strain TAV1, the first member of the phylum Verrucomicrobia to be isolated from wood-feeding termites. The genomic analysis reveals genes coding for lignocellulosic degradation and nitrogen fixation.
Hyperexpansion of RNA Bacteriophage Diversity
Krishnamurthy, Siddharth R.; Janowski, Andrew B.; Zhao, Guoyan; Barouch, Dan; Wang, David
2016-01-01
Bacteriophage modulation of microbial populations impacts critical processes in ocean, soil, and animal ecosystems. However, the role of bacteriophages with RNA genomes (RNA bacteriophages) in these processes is poorly understood, in part because of the limited number of known RNA bacteriophage species. Here, we identify partial genome sequences of 122 RNA bacteriophage phylotypes that are highly divergent from each other and from previously described RNA bacteriophages. These novel RNA bacteriophage sequences were present in samples collected from a range of ecological niches worldwide, including invertebrates and extreme microbial sediment, demonstrating that they are more widely distributed than previously recognized. Genomic analyses of these novel bacteriophages yielded multiple novel genome organizations. Furthermore, one RNA bacteriophage was detected in the transcriptome of a pure culture of Streptomyces avermitilis, suggesting for the first time that the known tropism of RNA bacteriophages may include gram-positive bacteria. Finally, reverse transcription PCR (RT-PCR)-based screening for two specific RNA bacteriophages in stool samples from a longitudinal cohort of macaques suggested that they are generally acutely present rather than persistent. PMID:27010970
Advances in Cryptococcus genomics: insights into the evolution of pathogenesis.
Cuomo, Christina A; Rhodes, Johanna; Desjardins, Christopher A
2018-01-01
Cryptococcus species are the causative agents of cryptococcal meningitis, a significant source of mortality in immunocompromised individuals. Initial work on the molecular epidemiology of this fungal pathogen utilized genotyping approaches to describe the genetic diversity and biogeography of two species, Cryptococcus neoformans and Cryptococcus gattii. Whole genome sequencing of representatives of both species resulted in reference assemblies enabling a wide array of downstream studies and genomic resources. With the increasing availability of whole genome sequencing, both species have now had hundreds of individual isolates sequenced, providing fine-scale insight into the evolution and diversification of Cryptococcus and allowing for the first genome-wide association studies to identify genetic variants associated with human virulence. Sequencing has also begun to examine the microevolution of isolates during prolonged infection and to identify variants specific to outbreak lineages, highlighting the potential role of hyper-mutation in evolving within short time scales. We can anticipate that further advances in sequencing technology and sequencing microbial genomes at scale, including metagenomics approaches, will continue to refine our view of how the evolution of Cryptococcus drives its success as a pathogen.
Kang, Dongwan D.; Froula, Jeff; Egan, Rob; ...
2015-01-01
Grouping large genomic fragments assembled from shotgun metagenomic sequences to deconvolute complex microbial communities, or metagenome binning, enables the study of individual organisms and their interactions. Because of the complex nature of these communities, existing metagenome binning methods often miss a large number of microbial species. In addition, most of the tools are not scalable to large datasets. Here we introduce automated software called MetaBAT that integrates empirical probabilistic distances of genome abundance and tetranucleotide frequency for accurate metagenome binning. MetaBAT outperforms alternative methods in accuracy and computational efficiency on both synthetic and real metagenome datasets. Lastly, it automatically formsmore » hundreds of high quality genome bins on a very large assembly consisting millions of contigs in a matter of hours on a single node. MetaBAT is open source software and available at https://bitbucket.org/berkeleylab/metabat.« less
Nikolouli, Katerina; Mossialos, Dimitris
2012-08-01
Non-ribosomal peptide synthetases (NRPS) and type-I polyketide synthases (PKS-I) are multimodular enzymes involved in biosynthesis of oligopeptide and polyketide secondary metabolites produced by microorganisms such as bacteria and fungi. New findings regarding the mechanisms underlying NRPS and PKS-I evolution illustrate how microorganisms expand their metabolic potential. During the last decade rapid development of bioinformatics tools as well as improved sequencing and annotation of microbial genomes led to discovery of novel bioactive compounds synthesized by NRPS and PKS-I through genome-mining. Taking advantage of these technological developments metagenomics is a fast growing research field which directly studies microbial genomes or specific gene groups and their products. Discovery of novel bioactive compounds synthesized by NRPS and PKS-I will certainly be accelerated through metagenomics, allowing the exploitation of so far untapped microbial resources in biotechnology and medicine.
Zapata-Peñasco, Icoquih; Poot-Hernandez, Augusto Cesar; Eguiarte, Luis E
2017-01-01
Abstract The increasing number of metagenomic and genomic sequences has dramatically improved our understanding of microbial diversity, yet our ability to infer metabolic capabilities in such datasets remains challenging. We describe the Multigenomic Entropy Based Score pipeline (MEBS), a software platform designed to evaluate, compare, and infer complex metabolic pathways in large “omic” datasets, including entire biogeochemical cycles. MEBS is open source and available through https://github.com/eead-csic-compbio/metagenome_Pfam_score. To demonstrate its use, we modeled the sulfur cycle by exhaustively curating the molecular and ecological elements involved (compounds, genes, metabolic pathways, and microbial taxa). This information was reduced to a collection of 112 characteristic Pfam protein domains and a list of complete-sequenced sulfur genomes. Using the mathematical framework of relative entropy (H΄), we quantitatively measured the enrichment of these domains among sulfur genomes. The entropy of each domain was used both to build up a final score that indicates whether a (meta)genomic sample contains the metabolic machinery of interest and to propose marker domains in metagenomic sequences such as DsrC (PF04358). MEBS was benchmarked with a dataset of 2107 non-redundant microbial genomes from RefSeq and 935 metagenomes from MG-RAST. Its performance, reproducibility, and robustness were evaluated using several approaches, including random sampling, linear regression models, receiver operator characteristic plots, and the area under the curve metric (AUC). Our results support the broad applicability of this algorithm to accurately classify (AUC = 0.985) hard-to-culture genomes (e.g., Candidatus Desulforudis audaxviator), previously characterized ones, and metagenomic environments such as hydrothermal vents, or deep-sea sediment. Our benchmark indicates that an entropy-based score can capture the metabolic machinery of interest and can be used to efficiently classify large genomic and metagenomic datasets, including uncultivated/unexplored taxa. PMID:29069412
De Anda, Valerie; Zapata-Peñasco, Icoquih; Poot-Hernandez, Augusto Cesar; Eguiarte, Luis E; Contreras-Moreira, Bruno; Souza, Valeria
2017-11-01
The increasing number of metagenomic and genomic sequences has dramatically improved our understanding of microbial diversity, yet our ability to infer metabolic capabilities in such datasets remains challenging. We describe the Multigenomic Entropy Based Score pipeline (MEBS), a software platform designed to evaluate, compare, and infer complex metabolic pathways in large "omic" datasets, including entire biogeochemical cycles. MEBS is open source and available through https://github.com/eead-csic-compbio/metagenome_Pfam_score. To demonstrate its use, we modeled the sulfur cycle by exhaustively curating the molecular and ecological elements involved (compounds, genes, metabolic pathways, and microbial taxa). This information was reduced to a collection of 112 characteristic Pfam protein domains and a list of complete-sequenced sulfur genomes. Using the mathematical framework of relative entropy (H΄), we quantitatively measured the enrichment of these domains among sulfur genomes. The entropy of each domain was used both to build up a final score that indicates whether a (meta)genomic sample contains the metabolic machinery of interest and to propose marker domains in metagenomic sequences such as DsrC (PF04358). MEBS was benchmarked with a dataset of 2107 non-redundant microbial genomes from RefSeq and 935 metagenomes from MG-RAST. Its performance, reproducibility, and robustness were evaluated using several approaches, including random sampling, linear regression models, receiver operator characteristic plots, and the area under the curve metric (AUC). Our results support the broad applicability of this algorithm to accurately classify (AUC = 0.985) hard-to-culture genomes (e.g., Candidatus Desulforudis audaxviator), previously characterized ones, and metagenomic environments such as hydrothermal vents, or deep-sea sediment. Our benchmark indicates that an entropy-based score can capture the metabolic machinery of interest and can be used to efficiently classify large genomic and metagenomic datasets, including uncultivated/unexplored taxa. © The Author 2017. Published by Oxford University Press.
Genomes OnLine Database (GOLD) v.6: data updates and feature enhancements
Mukherjee, Supratim; Stamatis, Dimitri; Bertsch, Jon; Ovchinnikova, Galina; Verezemska, Olena; Isbandi, Michelle; Thomas, Alex D.; Ali, Rida; Sharma, Kaushal; Kyrpides, Nikos C.; Reddy, T. B. K.
2017-01-01
The Genomes Online Database (GOLD) (https://gold.jgi.doe.gov) is a manually curated data management system that catalogs sequencing projects with associated metadata from around the world. In the current version of GOLD (v.6), all projects are organized based on a four level classification system in the form of a Study, Organism (for isolates) or Biosample (for environmental samples), Sequencing Project and Analysis Project. Currently, GOLD provides information for 26 117 Studies, 239 100 Organisms, 15 887 Biosamples, 97 212 Sequencing Projects and 78 579 Analysis Projects. These are integrated with over 312 metadata fields from which 58 are controlled vocabularies with 2067 terms. The web interface facilitates submission of a diverse range of Sequencing Projects (such as isolate genome, single-cell genome, metagenome, metatranscriptome) and complex Analysis Projects (such as genome from metagenome, or combined assembly from multiple Sequencing Projects). GOLD provides a seamless interface with the Integrated Microbial Genomes (IMG) system and supports and promotes the Genomic Standards Consortium (GSC) Minimum Information standards. This paper describes the data updates and additional features added during the last two years. PMID:27794040
Overview: The Impact of Microbial Genomics on Food Safety
NASA Astrophysics Data System (ADS)
Milillo, Sara R.; Wiedmann, Martin; Hoelzer, Karin
The first use of the term "genome" is attributed to Hans Winkler in his 1920 publication Verbeitung und Ursache der Parthenogenesis im Pflanzen und Tierreiche (Winkler, 1920). However, it was not until 1986 that the study of genomic concepts coalesced with the creation of a new journal by the same name (McKusick, 1997). The study of genomics was initially defined as the use or the application of "informatic tools" to study features of a sequenced genome (Strauss and Falkow, 1997). Today the field of genomics is typically considered to encompass efforts to determine the nucleic acid DNA sequence of an organism as well as the expression of genetic information using high-throughput, genome-wide methods, including transcriptomic, proteomic, and metabolomic analyses.
Lisle, John T.; Stellick, Sarah H.
2011-01-01
Microbial community genomic DNA was extracted from sediment samples collected along the Gulf of Mexico and Atlantic coasts from Texas to Florida. Sample sites were identified as being ecologically sensitive and (or) as having high potential of being impacted by Macondo-1 (M-1) well oil from the Deepwater Horizon blowout. The diversity within the microbial communities associated with the collected sediments provides a baseline dataset to which microbial community-diversity data from impacted sites could be compared. To determine the microbial community diversity in the samples, genetic fingerprints were generated and compared. Specific sequences within the community genomic DNA were first amplified using the polymerase chain reaction (PCR) with a primer set that provides possible resolution to the species level. A second nested PCR was performed on the primary PCR products using a primer set on which a GC-clamp was attached to one of the primers. The nested PCR products were separated using denaturing-gradient gel electrophoresis (DGGE) that resolves the nested PCR products based on sequence dissimilarities (or similarities), forming a genomic fingerprint of the microbial diversity within the respective samples. Samples with similar fingerprints were grouped and compared to oil-fingerprint data from the same sites (Rosenbauer and others, 2011). The microbial community fingerprints were generally grouped into sites that had been shown to contain background concentrations of non-Deepwater Horizon oil. However, these groupings also included sites where no oil signature was detected. This report represents some of the first information on naturally occurring microbial communities in sediment from shorelines along the Gulf of Mexico and Atlantic coasts from Texas to Florida.
Reproducibility and quantitation of amplicon sequencing-based detection
Zhou, Jizhong; Wu, Liyou; Deng, Ye; Zhi, Xiaoyang; Jiang, Yi-Huei; Tu, Qichao; Xie, Jianping; Van Nostrand, Joy D; He, Zhili; Yang, Yunfeng
2011-01-01
To determine the reproducibility and quantitation of the amplicon sequencing-based detection approach for analyzing microbial community structure, a total of 24 microbial communities from a long-term global change experimental site were examined. Genomic DNA obtained from each community was used to amplify 16S rRNA genes with two or three barcode tags as technical replicates in the presence of a small quantity (0.1% wt/wt) of genomic DNA from Shewanella oneidensis MR-1 as the control. The technical reproducibility of the amplicon sequencing-based detection approach is quite low, with an average operational taxonomic unit (OTU) overlap of 17.2%±2.3% between two technical replicates, and 8.2%±2.3% among three technical replicates, which is most likely due to problems associated with random sampling processes. Such variations in technical replicates could have substantial effects on estimating β-diversity but less on α-diversity. A high variation was also observed in the control across different samples (for example, 66.7-fold for the forward primer), suggesting that the amplicon sequencing-based detection approach could not be quantitative. In addition, various strategies were examined to improve the comparability of amplicon sequencing data, such as increasing biological replicates, and removing singleton sequences and less-representative OTUs across biological replicates. Finally, as expected, various statistical analyses with preprocessed experimental data revealed clear differences in the composition and structure of microbial communities between warming and non-warming, or between clipping and non-clipping. Taken together, these results suggest that amplicon sequencing-based detection is useful in analyzing microbial community structure even though it is not reproducible and quantitative. However, great caution should be taken in experimental design and data interpretation when the amplicon sequencing-based detection approach is used for quantitative analysis of the β-diversity of microbial communities. PMID:21346791
Vallenet, David; Calteau, Alexandra; Cruveiller, Stéphane; Gachet, Mathieu; Lajus, Aurélie; Josso, Adrien; Mercier, Jonathan; Renaux, Alexandre; Rollin, Johan; Rouy, Zoe; Roche, David; Scarpelli, Claude; Médigue, Claudine
2017-01-04
The annotation of genomes from NGS platforms needs to be automated and fully integrated. However, maintaining consistency and accuracy in genome annotation is a challenging problem because millions of protein database entries are not assigned reliable functions. This shortcoming limits the knowledge that can be extracted from genomes and metabolic models. Launched in 2005, the MicroScope platform (http://www.genoscope.cns.fr/agc/microscope) is an integrative resource that supports systematic and efficient revision of microbial genome annotation, data management and comparative analysis. Effective comparative analysis requires a consistent and complete view of biological data, and therefore, support for reviewing the quality of functional annotation is critical. MicroScope allows users to analyze microbial (meta)genomes together with post-genomic experiment results if any (i.e. transcriptomics, re-sequencing of evolved strains, mutant collections, phenotype data). It combines tools and graphical interfaces to analyze genomes and to perform the expert curation of gene functions in a comparative context. Starting with a short overview of the MicroScope system, this paper focuses on some major improvements of the Web interface, mainly for the submission of genomic data and on original tools and pipelines that have been developed and integrated in the platform: computation of pan-genomes and prediction of biosynthetic gene clusters. Today the resource contains data for more than 6000 microbial genomes, and among the 2700 personal accounts (65% of which are now from foreign countries), 14% of the users are performing expert annotations, on at least a weekly basis, contributing to improve the quality of microbial genome annotations. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
Omics approaches in food safety: fulfilling the promise?
Bergholz, Teresa M.; Moreno Switt, Andrea I.; Wiedmann, Martin
2014-01-01
Genomics, transcriptomics, and proteomics are rapidly transforming our approaches to detection, prevention and treatment of foodborne pathogens. Microbial genome sequencing in particular has evolved from a research tool into an approach that can be used to characterize foodborne pathogen isolates as part of routine surveillance systems. Genome sequencing efforts will not only improve outbreak detection and source tracking, but will also create large amounts of foodborne pathogen genome sequence data, which will be available for data mining efforts that could facilitate better source attribution and provide new insights into foodborne pathogen biology and transmission. While practical uses and application of metagenomics, transcriptomics, and proteomics data and associated tools are less prominent, these tools are also starting to yield practical food safety solutions. PMID:24572764
IDENTIFICATION OF CHICKEN-SPECIFIC FECAL MICROBIAL SEQUENCES USING A METAGENOMIC APPROACH
In this study, we applied a genome fragment enrichment (GFE) method to select for genomic regions that differ between different fecal metagenomes. Competitive DNA hybridizations were performed between chicken fecal DNA and pig fecal DNA (C-P) and between chicken fecal DNA and an ...
Nanopore DNA Sequencing and Genome Assembly on the International Space Station.
Castro-Wallace, Sarah L; Chiu, Charles Y; John, Kristen K; Stahl, Sarah E; Rubins, Kathleen H; McIntyre, Alexa B R; Dworkin, Jason P; Lupisella, Mark L; Smith, David J; Botkin, Douglas J; Stephenson, Timothy A; Juul, Sissel; Turner, Daniel J; Izquierdo, Fernando; Federman, Scot; Stryke, Doug; Somasekar, Sneha; Alexander, Noah; Yu, Guixia; Mason, Christopher E; Burton, Aaron S
2017-12-21
We evaluated the performance of the MinION DNA sequencer in-flight on the International Space Station (ISS), and benchmarked its performance off-Earth against the MinION, Illumina MiSeq, and PacBio RS II sequencing platforms in terrestrial laboratories. Samples contained equimolar mixtures of genomic DNA from lambda bacteriophage, Escherichia coli (strain K12, MG1655) and Mus musculus (female BALB/c mouse). Nine sequencing runs were performed aboard the ISS over a 6-month period, yielding a total of 276,882 reads with no apparent decrease in performance over time. From sequence data collected aboard the ISS, we constructed directed assemblies of the ~4.6 Mb E. coli genome, ~48.5 kb lambda genome, and a representative M. musculus sequence (the ~16.3 kb mitochondrial genome), at 100%, 100%, and 96.7% consensus pairwise identity, respectively; de novo assembly of the E. coli genome from raw reads yielded a single contig comprising 99.9% of the genome at 98.6% consensus pairwise identity. Simulated real-time analyses of in-flight sequence data using an automated bioinformatic pipeline and laptop-based genomic assembly demonstrated the feasibility of sequencing analysis and microbial identification aboard the ISS. These findings illustrate the potential for sequencing applications including disease diagnosis, environmental monitoring, and elucidating the molecular basis for how organisms respond to spaceflight.
O'Brien, Heath E; Gong, Yunchen; Fung, Pauline; Wang, Pauline W; Guttman, David S
2011-01-01
Next-generation genomic technology has both greatly accelerated the pace of genome research as well as increased our reliance on draft genome sequences. While groups such as the Genomics Standards Consortium have made strong efforts to promote genome standards there is a still a general lack of uniformity among published draft genomes, leading to challenges for downstream comparative analyses. This lack of uniformity is a particular problem when using standard draft genomes that frequently have large numbers of low-quality sequencing tracts. Here we present a proposal for an "enhanced-quality draft" genome that identifies at least 95% of the coding sequences, thereby effectively providing a full accounting of the genic component of the genome. Enhanced-quality draft genomes are easily attainable through a combination of small- and large-insert next-generation, paired-end sequencing. We illustrate the generation of an enhanced-quality draft genome by re-sequencing the plant pathogenic bacterium Pseudomonas syringae pv. phaseolicola 1448A (Pph 1448A), which has a published, closed genome sequence of 5.93 Mbp. We use a combination of Illumina paired-end and mate-pair sequencing, and surprisingly find that de novo assemblies with 100x paired-end coverage and mate-pair sequencing with as low as low as 2-5x coverage are substantially better than assemblies based on higher coverage. The rapid and low-cost generation of large numbers of enhanced-quality draft genome sequences will be of particular value for microbial diagnostics and biosecurity, which rely on precise discrimination of potentially dangerous clones from closely related benign strains.
Goldberg, Brittany; Sichtig, Heike; Geyer, Chelsie; Ledeboer, Nathan
2015-01-01
ABSTRACT Next-generation DNA sequencing (NGS) has progressed enormously over the past decade, transforming genomic analysis and opening up many new opportunities for applications in clinical microbiology laboratories. The impact of NGS on microbiology has been revolutionary, with new microbial genomic sequences being generated daily, leading to the development of large databases of genomes and gene sequences. The ability to analyze microbial communities without culturing organisms has created the ever-growing field of metagenomics and microbiome analysis and has generated significant new insights into the relation between host and microbe. The medical literature contains many examples of how this new technology can be used for infectious disease diagnostics and pathogen analysis. The implementation of NGS in medical practice has been a slow process due to various challenges such as clinical trials, lack of applicable regulatory guidelines, and the adaptation of the technology to the clinical environment. In April 2015, the American Academy of Microbiology (AAM) convened a colloquium to begin to define these issues, and in this document, we present some of the concepts that were generated from these discussions. PMID:26646014
Grange, Zoë L; Gartrell, Brett D; Biggs, Patrick J; Nelson, Nicola J; Anderson, Marti; French, Nigel P
2016-05-01
Isolation of wildlife into fragmented populations as a consequence of anthropogenic-mediated environmental change may alter host-pathogen relationships. Our understanding of some of the epidemiological features of infectious disease in vulnerable populations can be enhanced by the use of commensal bacteria as a proxy for invasive pathogens in natural ecosystems. The distinctive population structure of a well-described meta-population of a New Zealand endangered flightless bird, the takahe (Porphyrio hochstetteri), provided a unique opportunity to investigate the influence of host isolation on enteric microbial diversity. The genomic epidemiology of a prevalent rail-associated endemic commensal bacterium was explored using core genome and ribosomal multilocus sequence typing (rMLST) of 70 Campylobacter sp. nova 1 isolated from one third of the takahe population resident in multiple locations. While there was evidence of recombination between lineages, bacterial divergence appears to have occurred and multivariate analysis of 52 rMLST genes revealed location-associated differentiation of C. sp. nova 1 sequence types. Our results indicate that fragmentation and anthropogenic manipulation of populations can influence host-microbial relationships, with potential implications for niche adaptation and the evolution of micro-organisms in remote environments. This study provides a novel framework in which to explore the complex genomic epidemiology of micro-organisms in wildlife populations.
Khanna, Namita; Ghosh, Ananta Kumar; Huntemann, Marcel; Deshpande, Shweta; Han, James; Chen, Amy; Kyrpides, Nikos; Mavrommatis, Kostas; Szeto, Ernest; Markowitz, Victor; Ivanova, Natalia; Pagani, Ioanna; Pati, Amrita; Pitluck, Sam; Nolan, Matt; Woyke, Tanja; Teshima, Hazuki; Chertkov, Olga; Daligault, Hajnalka; Davenport, Karen; Gu, Wei; Munk, Christine; Zhang, Xiaojing; Bruce, David; Detter, Chris; Xu, Yan; Quintana, Beverly; Reitenga, Krista; Kunde, Yulia; Green, Lance; Erkkila, Tracy; Han, Cliff; Brambilla, Evelyne-Marie; Lang, Elke; Klenk, Hans-Peter; Goodwin, Lynne; Chain, Patrick; Das, Debabrata
2013-12-20
Enterobacter sp. IIT-BT 08 belongs to Phylum: Proteobacteria, Class: Gammaproteobacteria, Order: Enterobacteriales, Family: Enterobacteriaceae. The organism was isolated from the leaves of a local plant near the Kharagpur railway station, Kharagpur, West Bengal, India. It has been extensively studied for fermentative hydrogen production because of its high hydrogen yield. For further enhancement of hydrogen production by strain development, complete genome sequence analysis was carried out. Sequence analysis revealed that the genome was linear, 4.67 Mbp long and had a GC content of 56.01%. The genome properties encode 4,393 protein-coding and 179 RNA genes. Additionally, a putative pathway of hydrogen production was suggested based on the presence of formate hydrogen lyase complex and other related genes identified in the genome. Thus, in the present study we describe the specific properties of the organism and the generation, annotation and analysis of its genome sequence as well as discuss the putative pathway of hydrogen production by this organism.
Large inserts for big data: artificial chromosomes in the genomic era.
Tocchetti, Arianna; Donadio, Stefano; Sosio, Margherita
2018-05-01
The exponential increase in available microbial genome sequences coupled with predictive bioinformatic tools is underscoring the genetic capacity of bacteria to produce an unexpected large number of specialized bioactive compounds. Since most of the biosynthetic gene clusters (BGCs) present in microbial genomes are cryptic, i.e. not expressed under laboratory conditions, a variety of cloning systems and vectors have been devised to harbor DNA fragments large enough to carry entire BGCs and to allow their transfer in suitable heterologous hosts. This minireview provides an overview of the vectors and approaches that have been developed for cloning large BGCs, and successful examples of heterologous expression.
A Metagenomic Approach to Cyanobacterial Genomics
Alvarenga, Danillo O.; Fiore, Marli F.; Varani, Alessandro M.
2017-01-01
Cyanobacteria, or oxyphotobacteria, are primary producers that establish ecological interactions with a wide variety of organisms. Although their associations with eukaryotes have received most attention, interactions with bacterial and archaeal symbionts have also been occurring for billions of years. Due to these associations, obtaining axenic cultures of cyanobacteria is usually difficult, and most isolation efforts result in unicyanobacterial cultures containing a number of associated microbes, hence composing a microbial consortium. With rising numbers of cyanobacterial blooms due to climate change, demand for genomic evaluations of these microorganisms is increasing. However, standard genomic techniques call for the sequencing of axenic cultures, an approach that not only adds months or even years for culture purification, but also appears to be impossible for some cyanobacteria, which is reflected in the relatively low number of publicly available genomic sequences of this phylum. Under the framework of metagenomics, on the other hand, cumbersome techniques for achieving axenic growth can be circumvented and individual genomes can be successfully obtained from microbial consortia. This review focuses on approaches for the genomic and metagenomic assessment of non-axenic cyanobacterial cultures that bypass requirements for axenity. These methods enable researchers to achieve faster and less costly genomic characterizations of cyanobacterial strains and raise additional information about their associated microorganisms. While non-axenic cultures may have been previously frowned upon in cyanobacteriology, latest advancements in metagenomics have provided new possibilities for in vitro studies of oxyphotobacteria, renewing the value of microbial consortia as a reliable and functional resource for the rapid assessment of bloom-forming cyanobacteria. PMID:28536564
Genomic platform for efficient identification of fungal secondary metabolism genes
USDA-ARS?s Scientific Manuscript database
Fungal secondary metabolites (SMs) are structurally diverse natural compounds, which are thought to have great potential not only for medical industry but also for chemical and environmental industries. Since expansion of sequencing microbial genomes in 1990’s, it has been known that SM genes are ex...
Yokoi, Takahide; Kaku, Yoshiko; Suzuki, Hiroyuki; Ohta, Masayuki; Ikuta, Hajime; Isaka, Kazuichi; Sumino, Tatsuo; Wagatsuma, Masako
2007-08-01
To investigate uncharacterized microbial communities, a custom DNA microarray named 'FloraArray' was developed for screening specific probes that would represent the characteristics of a microbial community. The array was prepared by spotting 2000 plasmid DNAs from a genomic shotgun library of a sludge sample on a DNA microarray. By comparative hybridization of the array with two different samples of genomic DNA, one from the activated sludge and the other from a nonactivated sludge sample of an anaerobic ammonium oxidation (anammox) bacterial community, specific spots were visualized as a definite fluctuating profile in an MA (differential intensity ratio vs. spot intensity) plot. About 300 spots of the array accounted for the candidate probes to represent anammox reaction of the activated sludge. After sequence analysis of the probes and examination of the results of blastn searches against the reported anammox reference sequence, complete matches were found for 161 probes (58.3%) and >90% matches were found for 242 probes (87.1%). These results demonstrate that 'FloraArray' could be a useful tool for screening specific DNA molecules of unknown microbial communities.
Cai, Lu; Zheng, Sheng-Wei; Shen, Yu-Jun; Zheng, Guo-Di; Liu, Hong-Tao; Wu, Zhi-Ying
2018-07-01
To enable the development of microbial agents and identify suitable candidate used for biodrying, the existence and function of Bacillus thermoamylovorans during sewage sludge biodrying merits investigation. This study isolated a strain of B. thermoamylovorans during sludge biodrying, submitted it for complete genome sequencing and analyzed its potential microbial functions. After biodrying, the moisture content of the biodrying material decreased from 66.33% to 50.18%, and B. thermoamylovorans was the ecologically dominant Bacillus, with the primary annotations associated with amino acid transport and metabolism (9.53%) and carbohydrate transport and metabolism (8.14%). It contains 96 carbohydrate-active- enzyme-encoding gene counts, mainly distributed in glycoside hydrolases (33.3%) and glycosyl transferases (27.1%). The virulence factors are mainly associated with biosynthesis of capsule and polysaccharide capsule. This work indicates that among the biodrying microorganisms, B. thermoamylovorans has good potential for degrading recalcitrant and readily degradable components, thus being a potential microbial agent used to improve biodrying. Copyright © 2018 Elsevier Ltd. All rights reserved.
Metatranscriptomics of N2-fixing cyanobacteria in the Amazon River plume
Hilton, Jason A; Satinsky, Brandon M; Doherty, Mary; Zielinski, Brian; Zehr, Jonathan P
2015-01-01
Biological N2 fixation is an important nitrogen source for surface ocean microbial communities. However, nearly all information on the diversity and gene expression of organisms responsible for oceanic N2 fixation in the environment has come from targeted approaches that assay only a small number of genes and organisms. Using genomes of diazotrophic cyanobacteria to extract reads from extensive meta-genomic and -transcriptomic libraries, we examined diazotroph diversity and gene expression from the Amazon River plume, an area characterized by salinity and nutrient gradients. Diazotroph genome and transcript sequences were most abundant in the transitional waters compared with lower salinity or oceanic water masses. We were able to distinguish two genetically divergent phylotypes within the Hemiaulus-associated Richelia sequences, which were the most abundant diazotroph sequences in the data set. Photosystem (PS)-II transcripts in Richelia populations were much less abundant than those in Trichodesmium, and transcripts from several Richelia PS-II genes were absent, indicating a prominent role for cyclic electron transport in Richelia. In addition, there were several abundant regulatory transcripts, including one that targets a gene involved in PS-I cyclic electron transport in Richelia. High sequence coverage of the Richelia transcripts, as well as those from Trichodesmium populations, allowed us to identify expressed regions of the genomes that had been overlooked by genome annotations. High-coverage genomic and transcription analysis enabled the characterization of distinct phylotypes within diazotrophic populations, revealed a distinction in a core process between dominant populations and provided evidence for a prominent role for noncoding RNAs in microbial communities. PMID:25514535
Meier-Kolthoff, Jan P; Hahnke, Richard L; Petersen, Jörn; Scheuner, Carmen; Michael, Victoria; Fiebig, Anne; Rohde, Christine; Rohde, Manfred; Fartmann, Berthold; Goodwin, Lynne A; Chertkov, Olga; Reddy, Tbk; Pati, Amrita; Ivanova, Natalia N; Markowitz, Victor; Kyrpides, Nikos C; Woyke, Tanja; Göker, Markus; Klenk, Hans-Peter
2014-01-01
Although Escherichia coli is the most widely studied bacterial model organism and often considered to be the model bacterium per se, its type strain was until now forgotten from microbial genomics. As a part of the G enomic E ncyclopedia of B acteria and A rchaea project, we here describe the features of E. coli DSM 30083(T) together with its genome sequence and annotation as well as novel aspects of its phenotype. The 5,038,133 bp containing genome sequence includes 4,762 protein-coding genes and 175 RNA genes as well as a single plasmid. Affiliation of a set of 250 genome-sequenced E. coli strains, Shigella and outgroup strains to the type strain of E. coli was investigated using digital DNA:DNA-hybridization (dDDH) similarities and differences in genomic G+C content. As in the majority of previous studies, results show Shigella spp. embedded within E. coli and in most cases forming a single subgroup of it. Phylogenomic trees also recover the proposed E. coli phylotypes as monophyla with minor exceptions and place DSM 30083(T) in phylotype B2 with E. coli S88 as its closest neighbor. The widely used lab strain K-12 is not only genomically but also physiologically strongly different from the type strain. The phylotypes do not express a uniform level of character divergence as measured using dDDH, however, thus an alternative arrangement is proposed and discussed in the context of bacterial subspecies. Analyses of the genome sequences of a large number of E. coli strains and of strains from > 100 other bacterial genera indicate a value of 79-80% dDDH as the most promising threshold for delineating subspecies, which in turn suggests the presence of five subspecies within E. coli.
Hassa, Julia; Maus, Irena; Off, Sandra; Pühler, Alfred; Scherer, Paul; Klocke, Michael; Schlüter, Andreas
2018-06-01
The production of biogas by anaerobic digestion (AD) of agricultural residues, organic wastes, animal excrements, municipal sludge, and energy crops has a firm place in sustainable energy production and bio-economy strategies. Focusing on the microbial community involved in biomass conversion offers the opportunity to control and engineer the biogas process with the objective to optimize its efficiency. Taxonomic profiling of biogas producing communities by means of high-throughput 16S rRNA gene amplicon sequencing provided high-resolution insights into bacterial and archaeal structures of AD assemblages and their linkages to fed substrates and process parameters. Commonly, the bacterial phyla Firmicutes and Bacteroidetes appeared to dominate biogas communities in varying abundances depending on the apparent process conditions. Regarding the community of methanogenic Archaea, their diversity was mainly affected by the nature and composition of the substrates, availability of nutrients and ammonium/ammonia contents, but not by the temperature. It also appeared that a high proportion of 16S rRNA sequences can only be classified on higher taxonomic ranks indicating that many community members and their participation in AD within functional networks are still unknown. Although cultivation-based approaches to isolate microorganisms from biogas fermentation samples yielded hundreds of novel species and strains, this approach intrinsically is limited to the cultivable fraction of the community. To obtain genome sequence information of non-cultivable biogas community members, metagenome sequencing including assembly and binning strategies was highly valuable. Corresponding research has led to the compilation of hundreds of metagenome-assembled genomes (MAGs) frequently representing novel taxa whose metabolism and lifestyle could be reconstructed based on nucleotide sequence information. In contrast to metagenome analyses revealing the genetic potential of microbial communities, metatranscriptome sequencing provided insights into the metabolically active community. Taking advantage of genome sequence information, transcriptional activities were evaluated considering the microorganism's genetic background. Metaproteome studies uncovered enzyme profiles expressed by biogas community members. Enzymes involved in cellulose and hemicellulose decomposition and utilization of other complex biopolymers were identified. Future studies on biogas functional microbial networks will increasingly involve integrated multi-omics analyses evaluating metagenome, transcriptome, proteome, and metabolome datasets.
Construction of a minimal genome as a chassis for synthetic biology.
Sung, Bong Hyun; Choe, Donghui; Kim, Sun Chang; Cho, Byung-Kwan
2016-11-30
Microbial diversity and complexity pose challenges in understanding the voluminous genetic information produced from whole-genome sequences, bioinformatics and high-throughput '-omics' research. These challenges can be overcome by a core blueprint of a genome drawn with a minimal gene set, which is essential for life. Systems biology and large-scale gene inactivation studies have estimated the number of essential genes to be ∼300-500 in many microbial genomes. On the basis of the essential gene set information, minimal-genome strains have been generated using sophisticated genome engineering techniques, such as genome reduction and chemical genome synthesis. Current size-reduced genomes are not perfect minimal genomes, but chemically synthesized genomes have just been constructed. Some minimal genomes provide various desirable functions for bioindustry, such as improved genome stability, increased transformation efficacy and improved production of biomaterials. The minimal genome as a chassis genome for synthetic biology can be used to construct custom-designed genomes for various practical and industrial applications. © 2016 The Author(s). published by Portland Press Limited on behalf of the Biochemical Society.
Assembling the Marine Metagenome, One Cell at a Time
Woyke, Tanja; Xie, Gary; Copeland, Alex; González, José M.; Han, Cliff; Kiss, Hajnalka; Saw, Jimmy H.; Senin, Pavel; Yang, Chi; Chatterji, Sourav; Cheng, Jan-Fang; Eisen, Jonathan A.; Sieracki, Michael E.; Stepanauskas, Ramunas
2009-01-01
The difficulty associated with the cultivation of most microorganisms and the complexity of natural microbial assemblages, such as marine plankton or human microbiome, hinder genome reconstruction of representative taxa using cultivation or metagenomic approaches. Here we used an alternative, single cell sequencing approach to obtain high-quality genome assemblies of two uncultured, numerically significant marine microorganisms. We employed fluorescence-activated cell sorting and multiple displacement amplification to obtain hundreds of micrograms of genomic DNA from individual, uncultured cells of two marine flavobacteria from the Gulf of Maine that were phylogenetically distant from existing cultured strains. Shotgun sequencing and genome finishing yielded 1.9 Mbp in 17 contigs and 1.5 Mbp in 21 contigs for the two flavobacteria, with estimated genome recoveries of about 91% and 78%, respectively. Only 0.24% of the assembling sequences were contaminants and were removed from further analysis using rigorous quality control. In contrast to all cultured strains of marine flavobacteria, the two single cell genomes were excellent Global Ocean Sampling (GOS) metagenome fragment recruiters, demonstrating their numerical significance in the ocean. The geographic distribution of GOS recruits along the Northwest Atlantic coast coincided with ocean surface currents. Metabolic reconstruction indicated diverse potential energy sources, including biopolymer degradation, proteorhodopsin photometabolism, and hydrogen oxidation. Compared to cultured relatives, the two uncultured flavobacteria have small genome sizes, few non-coding nucleotides, and few paralogous genes, suggesting adaptations to narrow ecological niches. These features may have contributed to the abundance of the two taxa in specific regions of the ocean, and may have hindered their cultivation. We demonstrate the power of single cell DNA sequencing to generate reference genomes of uncultured taxa from a complex microbial community of marine bacterioplankton. A combination of single cell genomics and metagenomics enabled us to analyze the genome content, metabolic adaptations, and biogeography of these taxa. PMID:19390573
Interpreting Microbial Biosynthesis in the Genomic Age: Biological and Practical Considerations
Miller, Ian J.; Chevrette, Marc G.; Kwan, Jason C.
2017-01-01
Genome mining has become an increasingly powerful, scalable, and economically accessible tool for the study of natural product biosynthesis and drug discovery. However, there remain important biological and practical problems that can complicate or obscure biosynthetic analysis in genomic and metagenomic sequencing projects. Here, we focus on limitations of available technology as well as computational and experimental strategies to overcome them. We review the unique challenges and approaches in the study of symbiotic and uncultured systems, as well as those associated with biosynthetic gene cluster (BGC) assembly and product prediction. Finally, to explore sequencing parameters that affect the recovery and contiguity of large and repetitive BGCs assembled de novo, we simulate Illumina and PacBio sequencing of the Salinispora tropica genome focusing on assembly of the salinilactam (slm) BGC. PMID:28587290
Iwasaki, Yuki; Abe, Takashi; Wada, Kennosuke; Wada, Yoshiko; Ikemura, Toshimichi
2013-11-20
With the remarkable increase of genomic sequence data of microorganisms, novel tools are needed for comprehensive analyses of the big sequence data available. The self-organizing map (SOM) is an effective tool for clustering and visualizing high-dimensional data, such as oligonucleotide composition on one map. By modifying the conventional SOM, we developed batch-learning SOM (BLSOM), which allowed classification of sequence fragments (e.g., 1 kb) according to phylotypes, solely depending on oligonucleotide composition. Metagenomics studies of uncultivable microorganisms in clinical and environmental samples should allow extensive surveys of genes important in life sciences. BLSOM is most suitable for phylogenetic assignment of metagenomic sequences, because fragmental sequences can be clustered according to phylotypes, solely depending on oligonucleotide composition. We first constructed oligonucleotide BLSOMs for all available sequences from genomes of known species, and by mapping metagenomic sequences on these large-scale BLSOMs, we can predict phylotypes of individual metagenomic sequences, revealing a microbial community structure of uncultured microorganisms, including viruses. BLSOM has shown that influenza viruses isolated from humans and birds clearly differ in oligonucleotide composition. Based on this host-dependent oligonucleotide composition, we have proposed strategies for predicting directional changes of virus sequences and for surveilling potentially hazardous strains when introduced into humans from non-human sources.
Accurate read-based metagenome characterization using a hierarchical suite of unique signatures
Freitas, Tracey Allen K.; Li, Po-E; Scholz, Matthew B.; Chain, Patrick S. G.
2015-01-01
A major challenge in the field of shotgun metagenomics is the accurate identification of organisms present within a microbial community, based on classification of short sequence reads. Though existing microbial community profiling methods have attempted to rapidly classify the millions of reads output from modern sequencers, the combination of incomplete databases, similarity among otherwise divergent genomes, errors and biases in sequencing technologies, and the large volumes of sequencing data required for metagenome sequencing has led to unacceptably high false discovery rates (FDR). Here, we present the application of a novel, gene-independent and signature-based metagenomic taxonomic profiling method with significantly and consistently smaller FDR than any other available method. Our algorithm circumvents false positives using a series of non-redundant signature databases and examines Genomic Origins Through Taxonomic CHAllenge (GOTTCHA). GOTTCHA was tested and validated on 20 synthetic and mock datasets ranging in community composition and complexity, was applied successfully to data generated from spiked environmental and clinical samples, and robustly demonstrates superior performance compared with other available tools. PMID:25765641
DOE Office of Scientific and Technical Information (OSTI.GOV)
McLoughlin, K.
2016-01-11
The overall aim of this project is to develop a software package, called MetaQuant, that can determine the constituents of a complex microbial sample and estimate their relative abundances by analysis of metagenomic sequencing data. The goal for Task 1 is to create a generative model describing the stochastic process underlying the creation of sequence read pairs in the data set. The stages in this generative process include the selection of a source genome sequence for each read pair, with probability dependent on its abundance in the sample. The other stages describe the evolution of the source genome from itsmore » nearest common ancestor with a reference genome, breakage of the source DNA into short fragments, and the errors in sequencing the ends of the fragments to produce read pairs.« less
Riedel, Thomas; Spring, Stefan; Fiebig, Anne; Petersen, Jörn; Kyrpides, Nikos C; Göker, Markus; Klenk, Hans-Peter
2014-06-15
Salipiger mucosus Martínez-Cànovas et al. 2004 is the type species of the genus Salipiger, a moderately halophilic and exopolysaccharide-producing representative of the Roseobacter lineage within the alphaproteobacterial family Rhodobacteraceae. Members of this family were shown to be the most abundant bacteria especially in coastal and polar waters, but were also found in microbial mats and sediments. Here we describe the features of the S. mucosus strain DSM 16094(T) together with its genome sequence and annotation. The 5,689,389-bp genome sequence consists of one chromosome and several extrachromosomal elements. It contains 5,650 protein-coding genes and 95 RNA genes. The genome of S. mucosus DSM 16094(T) was sequenced as part of the activities of the Transregional Collaborative Research Center 51 (TRR51) funded by the German Research Foundation (DFG).
Ogilvie, Lesley A; Nzakizwanayo, Jonathan; Guppy, Fergus M; Dedi, Cinzia; Diston, David; Taylor, Huw; Ebdon, James; Jones, Brian V
2018-04-01
Just as the expansion in genome sequencing has revealed and permitted the exploitation of phylogenetic signals embedded in bacterial genomes, the application of metagenomics has begun to provide similar insights at the ecosystem level for microbial communities. However, little is known regarding this aspect of bacteriophage associated with microbial ecosystems, and if phage encode discernible habitat-associated signals diagnostic of underlying microbiomes. Here we demonstrate that individual phage can encode clear habitat-related 'ecogenomic signatures', based on relative representation of phage-encoded gene homologues in metagenomic data sets. Furthermore, we show the ecogenomic signature encoded by the gut-associated ɸB124-14 can be used to segregate metagenomes according to environmental origin, and distinguish 'contaminated' environmental metagenomes (subject to simulated in silico human faecal pollution) from uncontaminated data sets. This indicates phage-encoded ecological signals likely possess sufficient discriminatory power for use in biotechnological applications, such as development of microbial source tracking tools for monitoring water quality.
Integrating Environmental Genomics and Biogeochemical Models: a Gene-centric Approach
NASA Astrophysics Data System (ADS)
Reed, D. C.; Algar, C. K.; Huber, J. A.; Dick, G.
2013-12-01
Rapid advances in molecular microbial ecology have yielded an unprecedented amount of data about the evolutionary relationships and functional traits of microbial communities that regulate global geochemical cycles. Biogeochemical models, however, are trailing in the wake of the environmental genomics revolution and such models rarely incorporate explicit representations of bacteria and archaea, nor are they compatible with nucleic acid or protein sequence data. Here, we present a functional gene-based framework for describing microbial communities in biogeochemical models that uses genomics data and provides predictions that are readily testable using cutting-edge molecular tools. To demonstrate the approach in practice, nitrogen cycling in the Arabian Sea oxygen minimum zone (OMZ) was modelled to examine key questions about cryptic sulphur cycling and dinitrogen production pathways in OMZs. By directly linking geochemical dynamics to the genetic composition of microbial communities, the method provides mechanistic insights into patterns and biogeochemical consequences of marine microbes. Such an approach is critical for informing our understanding of the key role microbes play in modulating Earth's biogeochemistry.
Recovery of nearly 8,000 metagenome-assembled genomes substantially expands the tree of life.
Parks, Donovan H; Rinke, Christian; Chuvochina, Maria; Chaumeil, Pierre-Alain; Woodcroft, Ben J; Evans, Paul N; Hugenholtz, Philip; Tyson, Gene W
2017-11-01
Challenges in cultivating microorganisms have limited the phylogenetic diversity of currently available microbial genomes. This is being addressed by advances in sequencing throughput and computational techniques that allow for the cultivation-independent recovery of genomes from metagenomes. Here, we report the reconstruction of 7,903 bacterial and archaeal genomes from >1,500 public metagenomes. All genomes are estimated to be ≥50% complete and nearly half are ≥90% complete with ≤5% contamination. These genomes increase the phylogenetic diversity of bacterial and archaeal genome trees by >30% and provide the first representatives of 17 bacterial and three archaeal candidate phyla. We also recovered 245 genomes from the Patescibacteria superphylum (also known as the Candidate Phyla Radiation) and find that the relative diversity of this group varies substantially with different protein marker sets. The scale and quality of this data set demonstrate that recovering genomes from metagenomes provides an expedient path forward to exploring microbial dark matter.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gao, Xiao-Yang; Zhi, Xiao-Yang; Li, Hong-Wei
Species of the genus Halomonas are halophilic and their flexible adaption to changes of salinity and temperature brings considerable potential biotechnology applications, such as degradation of organic pollutants and enzyme production. The type strain Halomonas lutea YIM 91125 T was isolated from a hypersaline lake in China. The genome of strain YIM 91125 T becomes the twelfth species sequenced in Halomonas, and the thirteenth species sequenced in Halomonadaceae. We described the features of H. lutea YIM 91125 T, together with the high quality draft genome sequence and annotation of its type strain. The 4,533,090 bp long genome of strain YIMmore » 91125 T with its 4,284 protein-coding and 84 RNA genes is a part of Genomic Encyclopedia of Type Strains, Phase I: the one thousand microbial genomes (KMG-I) project. From the viewpoint of comparative genomics, H. lutea has a larger genome size and more specific genes, which indicated acquisition of function bringing better adaption to its environment. Finally, DDH analysis demonstrated that H. lutea is a distinctive species, and halophilic features and nitrogen metabolism related genes were discovered in its genome.« less
Gao, Xiao-Yang; Zhi, Xiao-Yang; Li, Hong-Wei; ...
2015-01-20
Species of the genus Halomonas are halophilic and their flexible adaption to changes of salinity and temperature brings considerable potential biotechnology applications, such as degradation of organic pollutants and enzyme production. The type strain Halomonas lutea YIM 91125 T was isolated from a hypersaline lake in China. The genome of strain YIM 91125 T becomes the twelfth species sequenced in Halomonas, and the thirteenth species sequenced in Halomonadaceae. We described the features of H. lutea YIM 91125 T, together with the high quality draft genome sequence and annotation of its type strain. The 4,533,090 bp long genome of strain YIMmore » 91125 T with its 4,284 protein-coding and 84 RNA genes is a part of Genomic Encyclopedia of Type Strains, Phase I: the one thousand microbial genomes (KMG-I) project. From the viewpoint of comparative genomics, H. lutea has a larger genome size and more specific genes, which indicated acquisition of function bringing better adaption to its environment. Finally, DDH analysis demonstrated that H. lutea is a distinctive species, and halophilic features and nitrogen metabolism related genes were discovered in its genome.« less
An artificial intelligence approach fit for tRNA gene studies in the era of big sequence data.
Iwasaki, Yuki; Abe, Takashi; Wada, Kennosuke; Wada, Yoshiko; Ikemura, Toshimichi
2017-09-12
Unsupervised data mining capable of extracting a wide range of knowledge from big data without prior knowledge or particular models is a timely application in the era of big sequence data accumulation in genome research. By handling oligonucleotide compositions as high-dimensional data, we have previously modified the conventional self-organizing map (SOM) for genome informatics and established BLSOM, which can analyze more than ten million sequences simultaneously. Here, we develop BLSOM specialized for tRNA genes (tDNAs) that can cluster (self-organize) more than one million microbial tDNAs according to their cognate amino acid solely depending on tetra- and pentanucleotide compositions. This unsupervised clustering can reveal combinatorial oligonucleotide motifs that are responsible for the amino acid-dependent clustering, as well as other functionally and structurally important consensus motifs, which have been evolutionarily conserved. BLSOM is also useful for identifying tDNAs as phylogenetic markers for special phylotypes. When we constructed BLSOM with 'species-unknown' tDNAs from metagenomic sequences plus 'species-known' microbial tDNAs, a large portion of metagenomic tDNAs self-organized with species-known tDNAs, yielding information on microbial communities in environmental samples. BLSOM can also enhance accuracy in the tDNA database obtained from big sequence data. This unsupervised data mining should become important for studying numerous functionally unclear RNAs obtained from a wide range of organisms.
Finishing Genomes with Limited Resources: Lessons from an Ensemble of Microbial Genomes
2010-01-01
proteobacteria from the Pasteurellaceae family that is strongly implicated as a causative agent of infective endo- carditis [6]. It can also be found as an...Sequence of Aggregatibacter (Haemophilus) aphrophilus NJB700. J Bacrerio/2009, 191(14):4693-4694. 6. Khairat 0: Endocarditis due to a new species of
USDA-ARS?s Scientific Manuscript database
It is estimated that food-borne pathogens cause approximately 76 million cases of gastrointestinal illnesses, 325,000 hospitalizations, and 5,000 deaths in the United States annually. Genomic, proteomic, and metabolomic studies, particularly, genome sequencing projects are providing valuable inform...
SNP-VISTA: An Interactive SNPs Visualization Tool
DOE Office of Scientific and Technical Information (OSTI.GOV)
Shah, Nameeta; Teplitsky, Michael V.; Pennacchio, Len A.
2005-07-05
Recent advances in sequencing technologies promise better diagnostics for many diseases as well as better understanding of evolution of microbial populations. Single Nucleotide Polymorphisms(SNPs) are established genetic markers that aid in the identification of loci affecting quantitative traits and/or disease in a wide variety of eukaryotic species. With today's technological capabilities, it is possible to re-sequence a large set of appropriate candidate genes in individuals with a given disease and then screen for causative mutations.In addition, SNPs have been used extensively in efforts to study the evolution of microbial populations, and the recent application of random shotgun sequencing to environmentalmore » samples makes possible more extensive SNP analysis of co-occurring and co-evolving microbial populations. The program is available at http://genome.lbl.gov/vista/snpvista.« less
Survey of (Meta)genomic Approaches for Understanding Microbial Community Dynamics.
Sharma, Anukriti; Lal, Rup
2017-03-01
Advancement in the next generation sequencing technologies has led to evolution of the field of genomics and metagenomics in a slim duration with nominal cost at precipitous higher rate. While metagenomics and genomics can be separately used to reveal the culture-independent and culture-based microbial evolution, respectively, (meta)genomics together can be used to demonstrate results at population level revealing in-depth complex community interactions for specific ecotypes. The field of metagenomics which started with answering "who is out there?" based on 16S rRNA gene has evolved immensely with the precise organismal reconstruction at species/strain level from the deeply covered metagenome data outweighing the need to isolate bacteria of which 99% are de facto non-cultivable. In this review we have underlined the appeal of metagenomic-derived genomes in providing insights into the evolutionary patterns, growth dynamics, genome/gene-specific sweeps, and durability of environmental pressures. We have demonstrated the use of culture-based genomics and environmental shotgun metagenome data together to elucidate environment specific genome modulations via metagenomic recruitments in terms of gene loss/gain, accessory and core-genome extent. We further illustrated the benefit of (meta)genomics in the understanding of infectious diseases by deducing the relationship between human microbiota and clinical microbiology. This review summarizes the technological advances in the (meta)genomic strategies using the genome and metagenome datasets together to increase the resolution of microbial population studies.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gardner, Shea; Slezak, Tom
With the flood of whole genome finished and draft microbial sequences, we need faster, more scalable bioinformatics tools for sequence comparison. An algorithm is described to find single nucleotide polymorphisms (SNPs) in whole genome data. It scales to hundreds of bacterial or viral genomes, and can be used for finished and/or draft genomes available as unassembled contigs. The method is fast to compute, finding SNPs and building a SNP phylogeny in seconds to hours. We use it to identify thousands of putative SNPs from all publicly available Filoviridae, Poxviridae, foot-and-mouth disease virus, Bacillus, and Escherichia coli genomes and plasmids. Themore » SNP-based trees that result are consistent with known taxonomy and trees determined in other studies. The approach we describe can handle as input hundreds of gigabases of sequence in a single run. The algorithm is based on k-mer analysis using a suffix array, so we call it saSNP.« less
Laviad, Sivan; Lapidus, Alla; Han, James; ...
2015-05-27
Brachymonas chironomi strain AIMA4T (Halpern et al., 2009) is a Gram-negative, non-motile, aerobic, chemoorganotroph bacterium. B. chironomi is a member of the Comamonadaceae, a family within the class Betaproteobacteria. This species was isolated from a chironomid (Diptera; Chironomidae) egg mass, sampled from a waste stabilization pond in northern Israel. Phylogenetic analysis based on the 16S rRNA gene sequences placed strain AIMA4T in the genus Brachymonas. Here we describe the features of this organism, together with the complete genome sequence and annotation. We find the DNA GC content is 63.5%. The chromosome length is 2,509,395 bp. It encodes 2,382 proteins andmore » 68 RNA genes. Brachymonas chironomi genome is part of the Genomic Encyclopedia of Type Strains, Phase I: the one thousand microbial genomes (KMG) project.« less
CMG-Biotools, a Free Workbench for Basic Comparative Microbial Genomics
Vesth, Tammi; Lagesen, Karin; Acar, Öncel; Ussery, David
2013-01-01
Background Today, there are more than a hundred times as many sequenced prokaryotic genomes than were present in the year 2000. The economical sequencing of genomic DNA has facilitated a whole new approach to microbial genomics. The real power of genomics is manifested through comparative genomics that can reveal strain specific characteristics, diversity within species and many other aspects. However, comparative genomics is a field not easily entered into by scientists with few computational skills. The CMG-biotools package is designed for microbiologists with limited knowledge of computational analysis and can be used to perform a number of analyses and comparisons of genomic data. Results The CMG-biotools system presents a stand-alone interface for comparative microbial genomics. The package is a customized operating system, based on Xubuntu 10.10, available through the open source Ubuntu project. The system can be installed on a virtual computer, allowing the user to run the system alongside any other operating system. Source codes for all programs are provided under GNU license, which makes it possible to transfer the programs to other systems if so desired. We here demonstrate the package by comparing and analyzing the diversity within the class Negativicutes, represented by 31 genomes including 10 genera. The analyses include 16S rRNA phylogeny, basic DNA and codon statistics, proteome comparisons using BLAST and graphical analyses of DNA structures. Conclusion This paper shows the strength and diverse use of the CMG-biotools system. The system can be installed on a vide range of host operating systems and utilizes as much of the host computer as desired. It allows the user to compare multiple genomes, from various sources using standardized data formats and intuitive visualizations of results. The examples presented here clearly shows that users with limited computational experience can perform complicated analysis without much training. PMID:23577086
Kwon, Soon-Kyeong; Kim, Byung Kwon; Song, Ju Yeon; Kwak, Min-Jung; Lee, Choong Hoon; Yoon, Jung-Hoon; Oh, Tae Kwang; Kim, Jihyun F
2013-01-01
Rhodopsin-containing marine microbes such as those in the class Flavobacteriia play a pivotal role in the biogeochemical cycle of the euphotic zone (Fuhrman JA, Schwalbach MS, Stingl U. 2008. Proteorhodopsins: an array of physiological roles? Nat Rev Microbiol. 6:488-494). Deciphering the genome information of flavobacteria and accessing the diversity and ecological impact of microbial rhodopsins are important in understanding and preserving the global ecosystems. The genome sequence of the orange-pigmented marine flavobacterium Nonlabens dokdonensis (basonym: Donghaeana dokdonensis) DSW-6 was determined. As a marine photoheterotroph, DSW-6 has written in its genome physiological features that allow survival in the oligotrophic environments. The sequence analysis also uncovered a gene encoding an unexpected type of microbial rhodopsin containing a unique motif in addition to a proteorhodopsin gene and a number of photolyase or cryptochrome genes. Homologs of the novel rhodopsin gene were found in other flavobacteria, alphaproteobacteria, a species of Cytophagia, a deinococcus, and even a eukaryote diatom. They all contain the characteristic NQ motif and form a phylogenetically distinct group. Expression analysis of this rhodopsin gene in DSW-6 indicated that it is induced at high NaCl concentrations, as well as in the presence of light and the absence of nutrients. Genomic and metagenomic surveys demonstrate the diversity of the NQ rhodopsins in nature and the prevalent occurrence of the encoding genes among microbial communities inhabiting hypersaline niches, suggesting its involvement in sodium metabolism and the sodium-adapted lifestyle.
Treu, Laura; Kougias, Panagiotis G; Campanaro, Stefano; Bassani, Ilaria; Angelidaki, Irini
2016-09-01
This research aimed to better characterize the biogas microbiome by means of high throughput metagenomic sequencing and to elucidate the core microbial consortium existing in biogas reactors independently from the operational conditions. Assembly of shotgun reads followed by an established binning strategy resulted in the highest, up to now, extraction of microbial genomes involved in biogas producing systems. From the 236 extracted genome bins, it was remarkably found that the vast majority of them could only be characterized at high taxonomic levels. This result confirms that the biogas microbiome is comprised by a consortium of unknown species. A comparative analysis between the genome bins of the current study and those extracted from a previous metagenomic assembly demonstrated a similar phylogenetic distribution of the main taxa. Finally, this analysis led to the identification of a subset of common microbes that could be considered as the core essential group in biogas production. Copyright © 2016 Elsevier Ltd. All rights reserved.
Li, Xiao-Xiao; Liu, Jin-Feng; Zhou, Lei; Mbadinga, Serge M.; Yang, Shi-Zhong; Gu, Ji-Dong; Mu, Bo-Zhong
2017-01-01
Deep subsurface petroleum reservoir ecosystems harbor a high diversity of microorganisms, and microbial influenced corrosion is a major problem for the petroleum industry. Here, we used high-throughput sequencing to explore the microbial communities based on genomic 16S rDNA and metabolically active 16S rRNA analyses of production water samples with different extents of corrosion from a high-temperature oil reservoir. Results showed that Desulfotignum and Roseovarius were the most abundant genera in both genomic and active bacterial communities of all the samples. Both genomic and active archaeal communities were mainly composed of Archaeoglobus and Methanolobus. Within both bacteria and archaea, the active and genomic communities were compositionally distinct from one another across the different oil wells (bacteria p = 0.002; archaea p = 0.01). In addition, the sulfate-reducing microorganisms (SRMs) were specifically assessed by Sanger sequencing of functional genes aprA and dsrA encoding the enzymes adenosine-5′-phosphosulfate reductase and dissimilatory sulfite reductase, respectively. Functional gene analysis indicated that potentially active Archaeoglobus, Desulfotignum, Desulfovibrio, and Thermodesulforhabdus were frequently detected, with Archaeoglobus as the most abundant and active sulfate-reducing group. Canonical correspondence analysis revealed that the SRM communities in petroleum reservoir system were closely related to pH of the production water and sulfate concentration. This study highlights the importance of distinguishing the metabolically active microorganisms from the genomic community and extends our knowledge on the active SRM communities in corrosive petroleum reservoirs. PMID:28638372
Putman, Tim E; Burgstaller-Muehlbacher, Sebastian; Waagmeester, Andra; Wu, Chunlei; Su, Andrew I; Good, Benjamin M
2016-01-01
The last 20 years of advancement in sequencing technologies have led to sequencing thousands of microbial genomes, creating mountains of genetic data. While efficiency in generating the data improves almost daily, applying meaningful relationships between taxonomic and genetic entities on this scale requires a structured and integrative approach. Currently, knowledge is distributed across a fragmented landscape of resources from government-funded institutions such as National Center for Biotechnology Information (NCBI) and UniProt to topic-focused databases like the ODB3 database of prokaryotic operons, to the supplemental table of a primary publication. A major drawback to large scale, expert-curated databases is the expense of maintaining and extending them over time. No entity apart from a major institution with stable long-term funding can consider this, and their scope is limited considering the magnitude of microbial data being generated daily. Wikidata is an openly editable, semantic web compatible framework for knowledge representation. It is a project of the Wikimedia Foundation and offers knowledge integration capabilities ideally suited to the challenge of representing the exploding body of information about microbial genomics. We are developing a microbial specific data model, based on Wikidata's semantic web compatibility, which represents bacterial species, strains and the gene and gene products that define them. Currently, we have loaded 43,694 gene and 37,966 protein items for 21 species of bacteria, including the human pathogenic bacteriaChlamydia trachomatis.Using this pathogen as an example, we explore complex interactions between the pathogen, its host, associated genes, other microbes, disease and drugs using the Wikidata SPARQL endpoint. In our next phase of development, we will add another 99 bacterial genomes and their gene and gene products, totaling ∼900,000 additional entities. This aggregation of knowledge will be a platform for community-driven collaboration, allowing the networking of microbial genetic data through the sharing of knowledge by both the data and domain expert. © The Author(s) 2016. Published by Oxford University Press.
Microbial Composition and Adaptations in Oligotrophic Inland Seas
NASA Astrophysics Data System (ADS)
Coleman, M.; Paver, S.; Anderson, M. R.; Vargas, G.
2016-02-01
The Laurentian Great Lakes comprise an interconnected freshwater system with certain areas resembling the oligotrophic open ocean in terms of productivity and nutrient availability. This resemblance creates an opportunity for comparing marine and Great Lake microorganisms to identify signatures of adaptation to low nutrient environments and re-evaluate differences between marine and freshwater microorganisms. We present results from the first comprehensive microbial characterization of all five Great Lakes. We compared community structure, genetic functional potential, and genome properties across the Great Lakes and other aquatic systems. Taxonomic and functional comparisons across lakes yielded three consistent groups: trophically distinct Lake Erie, Lakes Michigan and Huron, and Lakes Superior and Ontario. Lake metagenomic signatures were repeatedly differentiated by the presence of phage sequences and phage-related functional genes. We observed sequence similarity and synteny between contigs assembled from Great Lake metagenomes and genomes of marine organisms, including Nitrosopumilus sp. NF5, Synechococcus sp. RCC307 and Synechococcus phage S-SKS1. Assembly of metagenomic sequences additionally yielded large contigs from poorly characterized taxa. These results begin to fill the gap in our understanding of how nutrients, salinity, and other environmental factors shape microbial structure and function.
pico-PLAZA, a genome database of microbial photosynthetic eukaryotes.
Vandepoele, Klaas; Van Bel, Michiel; Richard, Guilhem; Van Landeghem, Sofie; Verhelst, Bram; Moreau, Hervé; Van de Peer, Yves; Grimsley, Nigel; Piganeau, Gwenael
2013-08-01
With the advent of next generation genome sequencing, the number of sequenced algal genomes and transcriptomes is rapidly growing. Although a few genome portals exist to browse individual genome sequences, exploring complete genome information from multiple species for the analysis of user-defined sequences or gene lists remains a major challenge. pico-PLAZA is a web-based resource (http://bioinformatics.psb.ugent.be/pico-plaza/) for algal genomics that combines different data types with intuitive tools to explore genomic diversity, perform integrative evolutionary sequence analysis and study gene functions. Apart from homologous gene families, multiple sequence alignments, phylogenetic trees, Gene Ontology, InterPro and text-mining functional annotations, different interactive viewers are available to study genome organization using gene collinearity and synteny information. Different search functions, documentation pages, export functions and an extensive glossary are available to guide non-expert scientists. To illustrate the versatility of the platform, different case studies are presented demonstrating how pico-PLAZA can be used to functionally characterize large-scale EST/RNA-Seq data sets and to perform environmental genomics. Functional enrichments analysis of 16 Phaeodactylum tricornutum transcriptome libraries offers a molecular view on diatom adaptation to different environments of ecological relevance. Furthermore, we show how complementary genomic data sources can easily be combined to identify marker genes to study the diversity and distribution of algal species, for example in metagenomes, or to quantify intraspecific diversity from environmental strains. © 2013 John Wiley & Sons Ltd and Society for Applied Microbiology.
Microbial Lifestyle and Genome Signatures
Dutta, Chitra; Paul, Sandip
2012-01-01
Microbes are known for their unique ability to adapt to varying lifestyle and environment, even to the extreme or adverse ones. The genomic architecture of a microbe may bear the signatures not only of its phylogenetic position, but also of the kind of lifestyle to which it is adapted. The present review aims to provide an account of the specific genome signatures observed in microbes acclimatized to distinct lifestyles or ecological niches. Niche-specific signatures identified at different levels of microbial genome organization like base composition, GC-skew, purine-pyrimidine ratio, dinucleotide abundance, codon bias, oligonucleotide composition etc. have been discussed. Among the specific cases highlighted in the review are the phenomena of genome shrinkage in obligatory host-restricted microbes, genome expansion in strictly intra-amoebal pathogens, strand-specific codon usage in intracellular species, acquisition of genome islands in pathogenic or symbiotic organisms, discriminatory genomic traits of marine microbes with distinct trophic strategies, and conspicuous sequence features of certain extremophiles like those adapted to high temperature or high salinity. PMID:23024607
IMG/VR: a database of cultured and uncultured DNA Viruses and retroviruses
Paez-Espino, David; Chen, I. -Min A.; Palaniappan, Krishna; ...
2016-10-30
Viruses represent the most abundant life forms on the planet. Recent experimental and computational improvements have led to a dramatic increase in the number of viral genome sequences identified primarily from metagenomic samples. As a result of the expanding catalog of metagenomic viral sequences, there exists a need for a comprehensive computational platform integrating all these sequences with associated metadata and analytical tools. Here we present IMG/VR (https://img.jgi.doe.gov/vr/), the largest publicly available database of 3908 isolate reference DNA viruses with 264 413 computationally identified viral contigs from > 6000 ecologically diverse metagenomic samples. Approximately half of the viral contigs aremore » grouped into genetically distinct quasi-species clusters. Microbial hosts are predicted for 20 000 viral sequences, revealing nine microbial phyla previously unreported to be infected by viruses. Viral sequences can be queried using a variety of associated metadata, including habitat type and geographic location of the samples, or taxonomic classification according to hallmark viral genes. IMG/VR has a user-friendly interface that allows users to interrogate all integrated data and interact by comparingwith external sequences, thus serving as an essential resource in the viral genomics community.« less
IMG/VR: a database of cultured and uncultured DNA Viruses and retroviruses
DOE Office of Scientific and Technical Information (OSTI.GOV)
Paez-Espino, David; Chen, I. -Min A.; Palaniappan, Krishna
Viruses represent the most abundant life forms on the planet. Recent experimental and computational improvements have led to a dramatic increase in the number of viral genome sequences identified primarily from metagenomic samples. As a result of the expanding catalog of metagenomic viral sequences, there exists a need for a comprehensive computational platform integrating all these sequences with associated metadata and analytical tools. Here we present IMG/VR (https://img.jgi.doe.gov/vr/), the largest publicly available database of 3908 isolate reference DNA viruses with 264 413 computationally identified viral contigs from > 6000 ecologically diverse metagenomic samples. Approximately half of the viral contigs aremore » grouped into genetically distinct quasi-species clusters. Microbial hosts are predicted for 20 000 viral sequences, revealing nine microbial phyla previously unreported to be infected by viruses. Viral sequences can be queried using a variety of associated metadata, including habitat type and geographic location of the samples, or taxonomic classification according to hallmark viral genes. IMG/VR has a user-friendly interface that allows users to interrogate all integrated data and interact by comparingwith external sequences, thus serving as an essential resource in the viral genomics community.« less
Médigue, Claudine; Calteau, Alexandra; Cruveiller, Stéphane; Gachet, Mathieu; Gautreau, Guillaume; Josso, Adrien; Lajus, Aurélie; Langlois, Jordan; Pereira, Hugo; Planel, Rémi; Roche, David; Rollin, Johan; Rouy, Zoe; Vallenet, David
2017-09-12
The overwhelming list of new bacterial genomes becoming available on a daily basis makes accurate genome annotation an essential step that ultimately determines the relevance of thousands of genomes stored in public databanks. The MicroScope platform (http://www.genoscope.cns.fr/agc/microscope) is an integrative resource that supports systematic and efficient revision of microbial genome annotation, data management and comparative analysis. Starting from the results of our syntactic, functional and relational annotation pipelines, MicroScope provides an integrated environment for the expert annotation and comparative analysis of prokaryotic genomes. It combines tools and graphical interfaces to analyze genomes and to perform the manual curation of gene function in a comparative genomics and metabolic context. In this article, we describe the free-of-charge MicroScope services for the annotation and analysis of microbial (meta)genomes, transcriptomic and re-sequencing data. Then, the functionalities of the platform are presented in a way providing practical guidance and help to the nonspecialists in bioinformatics. Newly integrated analysis tools (i.e. prediction of virulence and resistance genes in bacterial genomes) and original method recently developed (the pan-genome graph representation) are also described. Integrated environments such as MicroScope clearly contribute, through the user community, to help maintaining accurate resources. © The Author 2017. Published by Oxford University Press.
Getting a better picture of microbial evolution en route to a network of genomes.
Dagan, Tal; Martin, William
2009-08-12
Most current thinking about evolution is couched in the concept of trees. The notion of a tree with recursively bifurcating branches representing recurrent divergence events is a plausible metaphor to describe the evolution of multicellular organisms like vertebrates or land plants. But if we try to force the tree metaphor onto the whole of the evolutionary process, things go badly awry, because the more closely we inspect microbial genomes through the looking glass of gene and genome sequence comparisons, the smaller the amount of the data that fits the concept of a bifurcating tree becomes. That is mainly because among microbes, endosymbiosis and lateral gene transfer are important, two mechanisms of natural variation that differ from the kind of natural variation that Darwin had in mind. For such reasons, when it comes to discussing the relationships among all living things, that is, including the microbes and all of their genes rather than just one or a select few, many biologists are now beginning to talk about networks rather than trees in the context of evolutionary relationships among microbial chromosomes. But talk is not enough. If we were to actually construct networks instead of trees to describe the evolutionary process, what would they look like? Here we consider endosymbiosis and an example of a network of genomes involving 181 sequenced prokaryotes and how that squares off with some ideas about early cell evolution.
Complete genome sequence of the phenanthrene-degrading soil bacterium Delftia acidovorans Cs1-4
DOE Office of Scientific and Technical Information (OSTI.GOV)
Shetty, Ameesha R.; de Gannes, Vidya; Obi, Chioma C.
Polycyclic aromatic hydrocarbons (PAH) are ubiquitous environmental pollutants and microbial biodegradation is an important means of remediation of PAH-contaminated soil. Delftia acidovorans Cs1-4 (formerly Delftia sp. Cs1-4) was isolated by using phenanthrene as the sole carbon source from PAH contaminated soil in Wisconsin. Its full genome sequence was determined to gain insights into a mechanisms underlying biodegradation of PAH. Three genomic libraries were constructed and sequenced: an Illumina GAii shotgun library (916,416,493 reads), a 454 Titanium standard library (770,171 reads) and one paired-end 454 library (average insert size of 8 kb, 508,092 reads). The initial assembly contained 40 contigs inmore » two scaffolds. The 454 Titanium standard data and the 454 paired end data were assembled together and the consensus sequences were computationally shredded into 2 kb overlapping shreds. Illumina sequencing data was assembled, and the consensus sequence was computationally shredded into 1.5 kb overlapping shreds. Gaps between contigs were closed by editing in Consed, by PCR and by Bubble PCR primer walks. A total of 182 additional reactions were needed to close gaps and to raise the quality of the finished sequence. The final assembly is based on 253.3 Mb of 454 draft data (averaging 38.4 X coverage) and 590.2 Mb of Illumina draft data (averaging 89.4 X coverage). The genome of strain Cs1-4 consists of a single circular chromosome of 6,685,842 bp (66.7 %G+C) containing 6,028 predicted genes; 5,931 of these genes were protein-encoding and 4,425 gene products were assigned to a putative function. Genes encoding phenanthrene degradation were localized to a 232 kb genomic island (termed the phn island), which contained near its 3’ end a bacteriophage P4-like integrase, an enzyme often associated with chromosomal integration of mobile genetic elements. Other biodegradation pathways reconstructed from the genome sequence included: benzoate (by the acetyl-CoA pathway), styrene, nicotinic acid (by the maleamate pathway) and the pesticides Dicamba and Fenitrothion. Lastly, determination of the complete genome sequence of D. acidovorans Cs1-4 has provided new insights the microbial mechanisms of PAH biodegradation that may shape the process in the environment.« less
Complete genome sequence of the phenanthrene-degrading soil bacterium Delftia acidovorans Cs1-4
Shetty, Ameesha R.; de Gannes, Vidya; Obi, Chioma C.; ...
2015-08-15
Polycyclic aromatic hydrocarbons (PAH) are ubiquitous environmental pollutants and microbial biodegradation is an important means of remediation of PAH-contaminated soil. Delftia acidovorans Cs1-4 (formerly Delftia sp. Cs1-4) was isolated by using phenanthrene as the sole carbon source from PAH contaminated soil in Wisconsin. Its full genome sequence was determined to gain insights into a mechanisms underlying biodegradation of PAH. Three genomic libraries were constructed and sequenced: an Illumina GAii shotgun library (916,416,493 reads), a 454 Titanium standard library (770,171 reads) and one paired-end 454 library (average insert size of 8 kb, 508,092 reads). The initial assembly contained 40 contigs inmore » two scaffolds. The 454 Titanium standard data and the 454 paired end data were assembled together and the consensus sequences were computationally shredded into 2 kb overlapping shreds. Illumina sequencing data was assembled, and the consensus sequence was computationally shredded into 1.5 kb overlapping shreds. Gaps between contigs were closed by editing in Consed, by PCR and by Bubble PCR primer walks. A total of 182 additional reactions were needed to close gaps and to raise the quality of the finished sequence. The final assembly is based on 253.3 Mb of 454 draft data (averaging 38.4 X coverage) and 590.2 Mb of Illumina draft data (averaging 89.4 X coverage). The genome of strain Cs1-4 consists of a single circular chromosome of 6,685,842 bp (66.7 %G+C) containing 6,028 predicted genes; 5,931 of these genes were protein-encoding and 4,425 gene products were assigned to a putative function. Genes encoding phenanthrene degradation were localized to a 232 kb genomic island (termed the phn island), which contained near its 3’ end a bacteriophage P4-like integrase, an enzyme often associated with chromosomal integration of mobile genetic elements. Other biodegradation pathways reconstructed from the genome sequence included: benzoate (by the acetyl-CoA pathway), styrene, nicotinic acid (by the maleamate pathway) and the pesticides Dicamba and Fenitrothion. Lastly, determination of the complete genome sequence of D. acidovorans Cs1-4 has provided new insights the microbial mechanisms of PAH biodegradation that may shape the process in the environment.« less
Using comparative genome analysis to identify problems in annotated microbial genomes.
Poptsova, Maria S; Gogarten, J Peter
2010-07-01
Genome annotation is a tedious task that is mostly done by automated methods; however, the accuracy of these approaches has been questioned since the beginning of the sequencing era. Genome annotation is a multilevel process, and errors can emerge at different stages: during sequencing, as a result of gene-calling procedures, and in the process of assigning gene functions. Missed or wrongly annotated genes differentially impact different types of analyses. Here we discuss and demonstrate how the methods of comparative genome analysis can refine annotations by locating missing orthologues. We also discuss possible reasons for errors and show that the second-generation annotation systems, which combine multiple gene-calling programs with similarity-based methods, perform much better than the first annotation tools. Since old errors may propagate to the newly sequenced genomes, we emphasize that the problem of continuously updating popular public databases is an urgent and unresolved one. Due to the progress in genome-sequencing technologies, automated annotation techniques will remain the main approach in the future. Researchers need to be aware of the existing errors in the annotation of even well-studied genomes, such as Escherichia coli, and consider additional quality control for their results.
Bowers, Robert M.; Clum, Alicia; Tice, Hope; ...
2015-10-24
Background: The rapid development of sequencing technologies has provided access to environments that were either once thought inhospitable to life altogether or that contain too few cells to be analyzed using genomics approaches. While 16S rRNA gene microbial community sequencing has revolutionized our understanding of community composi tion and diversity over time and space, it only provides a crude estimate of microbial functional and metabolic potential. Alternatively, shotgun metagenomics allows comprehensive sampling of all genetic material in an environment, without any underlying primer biases. Until recently, one of the major bottlenecks of shotgun metagenomics has been the requirement for largemore » initial DNA template quantities during library preparation. Results: Here, we investigate the effects of varying template concentrations across three low biomass library preparation protocols on their ability to accurately reconstruct a mock microbial community of known composition. We analyze the effects of input DNA quantity and library preparation method on library insert size, GC content, community composition, assembly quality and metagenomic binning. We found that library preparation method and the amount of starting material had significant impacts on the mock community metagenomes. In particular, GC content shifted towards more GC rich sequences at the lower input quantities regardless of library prep method, the number of low quality reads that could not be mapped to the reference genomes increased with decreasing input quantities, and the different library preparation methods had an impact on overall metagenomic community composition. Conclusions: This benchmark study provides recommendations for library creation of representative and minimally biased metagenome shotgun sequencing, enabling insights into functional attributes of low biomass ecosystem microbial communities.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bowers, Robert M.; Clum, Alicia; Tice, Hope
Background: The rapid development of sequencing technologies has provided access to environments that were either once thought inhospitable to life altogether or that contain too few cells to be analyzed using genomics approaches. While 16S rRNA gene microbial community sequencing has revolutionized our understanding of community composi tion and diversity over time and space, it only provides a crude estimate of microbial functional and metabolic potential. Alternatively, shotgun metagenomics allows comprehensive sampling of all genetic material in an environment, without any underlying primer biases. Until recently, one of the major bottlenecks of shotgun metagenomics has been the requirement for largemore » initial DNA template quantities during library preparation. Results: Here, we investigate the effects of varying template concentrations across three low biomass library preparation protocols on their ability to accurately reconstruct a mock microbial community of known composition. We analyze the effects of input DNA quantity and library preparation method on library insert size, GC content, community composition, assembly quality and metagenomic binning. We found that library preparation method and the amount of starting material had significant impacts on the mock community metagenomes. In particular, GC content shifted towards more GC rich sequences at the lower input quantities regardless of library prep method, the number of low quality reads that could not be mapped to the reference genomes increased with decreasing input quantities, and the different library preparation methods had an impact on overall metagenomic community composition. Conclusions: This benchmark study provides recommendations for library creation of representative and minimally biased metagenome shotgun sequencing, enabling insights into functional attributes of low biomass ecosystem microbial communities.« less
Genomic Analysis of Complex Microbial Communities in Wounds
2009-07-01
Actinobacteria — were the most commonly misclassified [25]. The 16S sequences used in the current study were all greater than or equal to 200 bases...with most (89.1%) of the sequences falling into Firmicutes, Proteobac- teria, and Actinobacteria phyla. High percentages of the Firmicutes and... Actinobacteria sequences were successfully assigned to the genus level, 88.0% and 82.3%, respectively; however, only 53.0% of the Proteobacteria sequences
Turaev, Dmitrij; Rattei, Thomas
2016-06-01
The systems biology of microbial communities, organismal communities inhabiting all ecological niches on earth, has in recent years been strongly facilitated by the rapid development of experimental, sequencing and data analysis methods. Novel experimental approaches and binning methods in metagenomics render the semi-automatic reconstructions of near-complete genomes of uncultivable bacteria possible, while advances in high-resolution amplicon analysis allow for efficient and less biased taxonomic community characterization. This will also facilitate predictive modeling approaches, hitherto limited by the low resolution of metagenomic data. In this review, we pinpoint the most promising current developments in metagenomics. They facilitate microbial systems biology towards a systemic understanding of mechanisms in microbial communities with scopes of application in many areas of our daily life. Copyright © 2016 Elsevier Ltd. All rights reserved.
Gene discovery by chemical mutagenesis and whole-genome sequencing in Dictyostelium.
Li, Cheng-Lin Frank; Santhanam, Balaji; Webb, Amanda Nicole; Zupan, Blaž; Shaulsky, Gad
2016-09-01
Whole-genome sequencing is a useful approach for identification of chemical-induced lesions, but previous applications involved tedious genetic mapping to pinpoint the causative mutations. We propose that saturation mutagenesis under low mutagenic loads, followed by whole-genome sequencing, should allow direct implication of genes by identifying multiple independent alleles of each relevant gene. We tested the hypothesis by performing three genetic screens with chemical mutagenesis in the social soil amoeba Dictyostelium discoideum Through genome sequencing, we successfully identified mutant genes with multiple alleles in near-saturation screens, including resistance to intense illumination and strong suppressors of defects in an allorecognition pathway. We tested the causality of the mutations by comparison to published data and by direct complementation tests, finding both dominant and recessive causative mutations. Therefore, our strategy provides a cost- and time-efficient approach to gene discovery by integrating chemical mutagenesis and whole-genome sequencing. The method should be applicable to many microbial systems, and it is expected to revolutionize the field of functional genomics in Dictyostelium by greatly expanding the mutation spectrum relative to other common mutagenesis methods. © 2016 Li et al.; Published by Cold Spring Harbor Laboratory Press.
Luo, Chengwei; Tsementzi, Despina; Kyrpides, Nikos; Read, Timothy; Konstantinidis, Konstantinos T
2012-01-01
Next-generation sequencing (NGS) is commonly used in metagenomic studies of complex microbial communities but whether or not different NGS platforms recover the same diversity from a sample and their assembled sequences are of comparable quality remain unclear. We compared the two most frequently used platforms, the Roche 454 FLX Titanium and the Illumina Genome Analyzer (GA) II, on the same DNA sample obtained from a complex freshwater planktonic community. Despite the substantial differences in read length and sequencing protocols, the platforms provided a comparable view of the community sampled. For instance, derived assemblies overlapped in ~90% of their total sequences and in situ abundances of genes and genotypes (estimated based on sequence coverage) correlated highly between the two platforms (R(2)>0.9). Evaluation of base-call error, frameshift frequency, and contig length suggested that Illumina offered equivalent, if not better, assemblies than Roche 454. The results from metagenomic samples were further validated against DNA samples of eighteen isolate genomes, which showed a range of genome sizes and G+C% content. We also provide quantitative estimates of the errors in gene and contig sequences assembled from datasets characterized by different levels of complexity and G+C% content. For instance, we noted that homopolymer-associated, single-base errors affected ~1% of the protein sequences recovered in Illumina contigs of 10× coverage and 50% G+C; this frequency increased to ~3% when non-homopolymer errors were also considered. Collectively, our results should serve as a useful practical guide for choosing proper sampling strategies and data possessing protocols for future metagenomic studies.
Swingley, Wesley D.; Meyer-Dombard, D’Arcy R.; Shock, Everett L.; Alsop, Eric B.; Falenski, Heinz D.; Havig, Jeff R.; Raymond, Jason
2012-01-01
We have constructed a conceptual model of biogeochemical cycles and metabolic and microbial community shifts within a hot spring ecosystem via coordinated analysis of the “Bison Pool” (BP) Environmental Genome and a complementary contextual geochemical dataset of ∼75 geochemical parameters. 2,321 16S rRNA clones and 470 megabases of environmental sequence data were produced from biofilms at five sites along the outflow of BP, an alkaline hot spring in Sentinel Meadow (Lower Geyser Basin) of Yellowstone National Park. This channel acts as a >22 m gradient of decreasing temperature, increasing dissolved oxygen, and changing availability of biologically important chemical species, such as those containing nitrogen and sulfur. Microbial life at BP transitions from a 92°C chemotrophic streamer biofilm community in the BP source pool to a 56°C phototrophic mat community. We improved automated annotation of the BP environmental genomes using BLAST-based Markov clustering. We have also assigned environmental genome sequences to individual microbial community members by complementing traditional homology-based assignment with nucleotide word-usage algorithms, allowing more than 70% of all reads to be assigned to source organisms. This assignment yields high genome coverage in dominant community members, facilitating reconstruction of nearly complete metabolic profiles and in-depth analysis of the relation between geochemical and metabolic changes along the outflow. We show that changes in environmental conditions and energy availability are associated with dramatic shifts in microbial communities and metabolic function. We have also identified an organism constituting a novel phylum in a metabolic “transition” community, located physically between the chemotroph- and phototroph-dominated sites. The complementary analysis of biogeochemical and environmental genomic data from BP has allowed us to build ecosystem-based conceptual models for this hot spring, reconstructing whole metabolic networks in order to illuminate community roles in shaping and responding to geochemical variability. PMID:22675512
Recapitulating phylogenies using k-mers: from trees to networks.
Bernard, Guillaume; Ragan, Mark A; Chan, Cheong Xin
2016-01-01
Ernst Haeckel based his landmark Tree of Life on the supposed ontogenic recapitulation of phylogeny, i.e. that successive embryonic stages during the development of an organism re-trace the morphological forms of its ancestors over the course of evolution. Much of this idea has since been discredited. Today, phylogenies are often based on families of molecular sequences. The standard approach starts with a multiple sequence alignment, in which the sequences are arranged relative to each other in a way that maximises a measure of similarity position-by-position along their entire length. A tree (or sometimes a network) is then inferred. Rigorous multiple sequence alignment is computationally demanding, and evolutionary processes that shape the genomes of many microbes (bacteria, archaea and some morphologically simple eukaryotes) can add further complications. In particular, recombination, genome rearrangement and lateral genetic transfer undermine the assumptions that underlie multiple sequence alignment, and imply that a tree-like structure may be too simplistic. Here, using genome sequences of 143 bacterial and archaeal genomes, we construct a network of phylogenetic relatedness based on the number of shared k -mers (subsequences at fixed length k ). Our findings suggest that the network captures not only key aspects of microbial genome evolution as inferred from a tree, but also features that are not treelike. The method is highly scalable, allowing for investigation of genome evolution across a large number of genomes. Instead of using specific regions or sequences from genome sequences, or indeed Haeckel's idea of ontogeny, we argue that genome phylogenies can be inferred using k -mers from whole-genome sequences. Representing these networks dynamically allows biological questions of interest to be formulated and addressed quickly and in a visually intuitive manner.
Reconstructing rare soil microbial genomes using in situ enrichments and metagenomics
Delmont, Tom O.; Eren, A. Murat; Maccario, Lorrie; Prestat, Emmanuel; Esen, Özcan C.; Pelletier, Eric; Le Paslier, Denis; Simonet, Pascal; Vogel, Timothy M.
2015-01-01
Despite extensive direct sequencing efforts and advanced analytical tools, reconstructing microbial genomes from soil using metagenomics have been challenging due to the tremendous diversity and relatively uniform distribution of genomes found in this system. Here we used enrichment techniques in an attempt to decrease the complexity of a soil microbiome prior to sequencing by submitting it to a range of physical and chemical stresses in 23 separate microcosms for 4 months. The metagenomic analysis of these microcosms at the end of the treatment yielded 540 Mb of assembly using standard de novo assembly techniques (a total of 559,555 genes and 29,176 functions), from which we could recover novel bacterial genomes, plasmids and phages. The recovered genomes belonged to Leifsonia (n = 2), Rhodanobacter (n = 5), Acidobacteria (n = 2), Sporolactobacillus (n = 2, novel nitrogen fixing taxon), Ktedonobacter (n = 1, second representative of the family Ktedonobacteraceae), Streptomyces (n = 3, novel polyketide synthase modules), and Burkholderia (n = 2, includes mega-plasmids conferring mercury resistance). Assembled genomes averaged to 5.9 Mb, with relative abundances ranging from rare (<0.0001%) to relatively abundant (>0.01%) in the original soil microbiome. Furthermore, we detected them in samples collected from geographically distant locations, particularly more in temperate soils compared to samples originating from high-latitude soils and deserts. To the best of our knowledge, this study is the first successful attempt to assemble multiple bacterial genomes directly from a soil sample. Our findings demonstrate that developing pertinent enrichment conditions can stimulate environmental genomic discoveries that would have been impossible to achieve with canonical approaches that focus solely upon post-sequencing data treatment. PMID:25983722
Complete genome sequence of Syntrophobacter fumaroxidans strain (MPOBT)
Plugge, Caroline M.; Henstra, Anne M.; Worm, Petra; Swarts, Daan C.; Paulitsch-Fuchs, Astrid H.; Scholten, Johannes C.M.; Lykidis, Athanasios; Lapidus, Alla L.; Goltsman, Eugene; Kim, Edwin; McDonald, Erin; Rohlin, Lars; Crable, Bryan R.; Gunsalus, Robert P.; Stams, Alfons J.M.; McInerney, Michael J.
2012-01-01
Syntrophobacter fumaroxidans strain MPOBT is the best-studied species of the genus Syntrophobacter. The species is of interest because of its anaerobic syntrophic lifestyle, its involvement in the conversion of propionate to acetate, H2 and CO2 during the overall degradation of organic matter, and its release of products that serve as substrates for other microorganisms. The strain is able to ferment fumarate in pure culture to CO2 and succinate, and is also able to grow as a sulfate reducer with propionate as an electron donor. This is the first complete genome sequence of a member of the genus Syntrophobacter and a member genus in the family Syntrophobacteraceae. Here we describe the features of this organism, together with the complete genome sequence and annotation. The 4,990,251 bp long genome with its 4,098 protein-coding and 81 RNA genes is a part of the Microbial Genome Program (MGP) and the Genomes to Life (GTL) Program project. PMID:23450070
Wooley, John C.; Godzik, Adam; Friedberg, Iddo
2010-01-01
Metagenomics is a discipline that enables the genomic study of uncultured microorganisms. Faster, cheaper sequencing technologies and the ability to sequence uncultured microbes sampled directly from their habitats are expanding and transforming our view of the microbial world. Distilling meaningful information from the millions of new genomic sequences presents a serious challenge to bioinformaticians. In cultured microbes, the genomic data come from a single clone, making sequence assembly and annotation tractable. In metagenomics, the data come from heterogeneous microbial communities, sometimes containing more than 10,000 species, with the sequence data being noisy and partial. From sampling, to assembly, to gene calling and function prediction, bioinformatics faces new demands in interpreting voluminous, noisy, and often partial sequence data. Although metagenomics is a relative newcomer to science, the past few years have seen an explosion in computational methods applied to metagenomic-based research. It is therefore not within the scope of this article to provide an exhaustive review. Rather, we provide here a concise yet comprehensive introduction to the current computational requirements presented by metagenomics, and review the recent progress made. We also note whether there is software that implements any of the methods presented here, and briefly review its utility. Nevertheless, it would be useful if readers of this article would avail themselves of the comment section provided by this journal, and relate their own experiences. Finally, the last section of this article provides a few representative studies illustrating different facets of recent scientific discoveries made using metagenomics. PMID:20195499
Smith, Andy; Southgate, Joel; Poplawski, Radoslaw; Bull, Matthew J.; Richardson, Emily; Ismail, Matthew; Thompson, Simon Elwood-; Kitchen, Christine; Guest, Martyn; Bakke, Marius
2016-01-01
The increasing availability and decreasing cost of high-throughput sequencing has transformed academic medical microbiology, delivering an explosion in available genomes while also driving advances in bioinformatics. However, many microbiologists are unable to exploit the resulting large genomics datasets because they do not have access to relevant computational resources and to an appropriate bioinformatics infrastructure. Here, we present the Cloud Infrastructure for Microbial Bioinformatics (CLIMB) facility, a shared computing infrastructure that has been designed from the ground up to provide an environment where microbiologists can share and reuse methods and data. PMID:28785418
Connor, Thomas R; Loman, Nicholas J; Thompson, Simon; Smith, Andy; Southgate, Joel; Poplawski, Radoslaw; Bull, Matthew J; Richardson, Emily; Ismail, Matthew; Thompson, Simon Elwood-; Kitchen, Christine; Guest, Martyn; Bakke, Marius; Sheppard, Samuel K; Pallen, Mark J
2016-09-01
The increasing availability and decreasing cost of high-throughput sequencing has transformed academic medical microbiology, delivering an explosion in available genomes while also driving advances in bioinformatics. However, many microbiologists are unable to exploit the resulting large genomics datasets because they do not have access to relevant computational resources and to an appropriate bioinformatics infrastructure. Here, we present the Cloud Infrastructure for Microbial Bioinformatics (CLIMB) facility, a shared computing infrastructure that has been designed from the ground up to provide an environment where microbiologists can share and reuse methods and data.
DNA Data Bank of Japan (DDBJ) for genome scale research in life science
Tateno, Y.; Imanishi, T.; Miyazaki, S.; Fukami-Kobayashi, K.; Saitou, N.; Sugawara, H.; Gojobori, T.
2002-01-01
The DNA Data Bank of Japan (DDBJ, http://www.ddbj.nig.ac.jp) has made an effort to collect as much data as possible mainly from Japanese researchers. The increase rates of the data we collected, annotated and released to the public in the past year are 43% for the number of entries and 52% for the number of bases. The increase rates are accelerated even after the human genome was sequenced, because sequencing technology has been remarkably advanced and simplified, and research in life science has been shifted from the gene scale to the genome scale. In addition, we have developed the Genome Information Broker (GIB, http://gib.genes.nig.ac.jp) that now includes more than 50 complete microbial genome and Arabidopsis genome data. We have also developed a database of the human genome, the Human Genomics Studio (HGS, http://studio.nig.ac.jp). HGS provides one with a set of sequences being as continuous as possible in any one of the 24 chromosomes. Both GIB and HGS have been updated incorporating newly available data and retrieval tools. PMID:11752245
Validation of high throughput sequencing and microbial forensics applications
2014-01-01
High throughput sequencing (HTS) generates large amounts of high quality sequence data for microbial genomics. The value of HTS for microbial forensics is the speed at which evidence can be collected and the power to characterize microbial-related evidence to solve biocrimes and bioterrorist events. As HTS technologies continue to improve, they provide increasingly powerful sets of tools to support the entire field of microbial forensics. Accurate, credible results allow analysis and interpretation, significantly influencing the course and/or focus of an investigation, and can impact the response of the government to an attack having individual, political, economic or military consequences. Interpretation of the results of microbial forensic analyses relies on understanding the performance and limitations of HTS methods, including analytical processes, assays and data interpretation. The utility of HTS must be defined carefully within established operating conditions and tolerances. Validation is essential in the development and implementation of microbial forensics methods used for formulating investigative leads attribution. HTS strategies vary, requiring guiding principles for HTS system validation. Three initial aspects of HTS, irrespective of chemistry, instrumentation or software are: 1) sample preparation, 2) sequencing, and 3) data analysis. Criteria that should be considered for HTS validation for microbial forensics are presented here. Validation should be defined in terms of specific application and the criteria described here comprise a foundation for investigators to establish, validate and implement HTS as a tool in microbial forensics, enhancing public safety and national security. PMID:25101166
Validation of high throughput sequencing and microbial forensics applications.
Budowle, Bruce; Connell, Nancy D; Bielecka-Oder, Anna; Colwell, Rita R; Corbett, Cindi R; Fletcher, Jacqueline; Forsman, Mats; Kadavy, Dana R; Markotic, Alemka; Morse, Stephen A; Murch, Randall S; Sajantila, Antti; Schmedes, Sarah E; Ternus, Krista L; Turner, Stephen D; Minot, Samuel
2014-01-01
High throughput sequencing (HTS) generates large amounts of high quality sequence data for microbial genomics. The value of HTS for microbial forensics is the speed at which evidence can be collected and the power to characterize microbial-related evidence to solve biocrimes and bioterrorist events. As HTS technologies continue to improve, they provide increasingly powerful sets of tools to support the entire field of microbial forensics. Accurate, credible results allow analysis and interpretation, significantly influencing the course and/or focus of an investigation, and can impact the response of the government to an attack having individual, political, economic or military consequences. Interpretation of the results of microbial forensic analyses relies on understanding the performance and limitations of HTS methods, including analytical processes, assays and data interpretation. The utility of HTS must be defined carefully within established operating conditions and tolerances. Validation is essential in the development and implementation of microbial forensics methods used for formulating investigative leads attribution. HTS strategies vary, requiring guiding principles for HTS system validation. Three initial aspects of HTS, irrespective of chemistry, instrumentation or software are: 1) sample preparation, 2) sequencing, and 3) data analysis. Criteria that should be considered for HTS validation for microbial forensics are presented here. Validation should be defined in terms of specific application and the criteria described here comprise a foundation for investigators to establish, validate and implement HTS as a tool in microbial forensics, enhancing public safety and national security.
Parks, Donovan H.; Imelfort, Michael; Skennerton, Connor T.; Hugenholtz, Philip; Tyson, Gene W.
2015-01-01
Large-scale recovery of genomes from isolates, single cells, and metagenomic data has been made possible by advances in computational methods and substantial reductions in sequencing costs. Although this increasing breadth of draft genomes is providing key information regarding the evolutionary and functional diversity of microbial life, it has become impractical to finish all available reference genomes. Making robust biological inferences from draft genomes requires accurate estimates of their completeness and contamination. Current methods for assessing genome quality are ad hoc and generally make use of a limited number of “marker” genes conserved across all bacterial or archaeal genomes. Here we introduce CheckM, an automated method for assessing the quality of a genome using a broader set of marker genes specific to the position of a genome within a reference genome tree and information about the collocation of these genes. We demonstrate the effectiveness of CheckM using synthetic data and a wide range of isolate-, single-cell-, and metagenome-derived genomes. CheckM is shown to provide accurate estimates of genome completeness and contamination and to outperform existing approaches. Using CheckM, we identify a diverse range of errors currently impacting publicly available isolate genomes and demonstrate that genomes obtained from single cells and metagenomic data vary substantially in quality. In order to facilitate the use of draft genomes, we propose an objective measure of genome quality that can be used to select genomes suitable for specific gene- and genome-centric analyses of microbial communities. PMID:25977477
Parks, Donovan H; Imelfort, Michael; Skennerton, Connor T; Hugenholtz, Philip; Tyson, Gene W
2015-07-01
Large-scale recovery of genomes from isolates, single cells, and metagenomic data has been made possible by advances in computational methods and substantial reductions in sequencing costs. Although this increasing breadth of draft genomes is providing key information regarding the evolutionary and functional diversity of microbial life, it has become impractical to finish all available reference genomes. Making robust biological inferences from draft genomes requires accurate estimates of their completeness and contamination. Current methods for assessing genome quality are ad hoc and generally make use of a limited number of "marker" genes conserved across all bacterial or archaeal genomes. Here we introduce CheckM, an automated method for assessing the quality of a genome using a broader set of marker genes specific to the position of a genome within a reference genome tree and information about the collocation of these genes. We demonstrate the effectiveness of CheckM using synthetic data and a wide range of isolate-, single-cell-, and metagenome-derived genomes. CheckM is shown to provide accurate estimates of genome completeness and contamination and to outperform existing approaches. Using CheckM, we identify a diverse range of errors currently impacting publicly available isolate genomes and demonstrate that genomes obtained from single cells and metagenomic data vary substantially in quality. In order to facilitate the use of draft genomes, we propose an objective measure of genome quality that can be used to select genomes suitable for specific gene- and genome-centric analyses of microbial communities. © 2015 Parks et al.; Published by Cold Spring Harbor Laboratory Press.
Challenges in Whole-Genome Annotation of Pyrosequenced Eukaryotic Genomes
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kuo, Alan; Grigoriev, Igor
2009-04-17
Pyrosequencing technologies such as 454/Roche and Solexa/Illumina vastly lower the cost of nucleotide sequencing compared to the traditional Sanger method, and thus promise to greatly expand the number of sequenced eukaryotic genomes. However, the new technologies also bring new challenges such as shorter reads and new kinds and higher rates of sequencing errors, which complicate genome assembly and gene prediction. At JGI we are deploying 454 technology for the sequencing and assembly of ever-larger eukaryotic genomes. Here we describe our first whole-genome annotation of a purely 454-sequenced fungal genome that is larger than a yeast (>30 Mbp). The pezizomycotine (filamentousmore » ascomycote) Aspergillus carbonarius belongs to the Aspergillus section Nigri species complex, members of which are significant as platforms for bioenergy and bioindustrial technology, as members of soil microbial communities and players in the global carbon cycle, and as agricultural toxigens. Application of a modified version of the standard JGI Annotation Pipeline has so far predicted ~;;10k genes. ~;;12percent of these preliminary annotations suffer a potential frameshift error, which is somewhat higher than the ~;;9percent rate in the Sanger-sequenced and conventionally assembled and annotated genome of fellow Aspergillus section Nigri member A. niger. Also,>90percent of A. niger genes have potential homologs in the A. carbonarius preliminary annotation. Weconclude, and with further annotation and comparative analysis expect to confirm, that 454 sequencing strategies provide a promising substrate for annotation of modestly sized eukaryotic genomes. We will also present results of annotation of a number of other pyrosequenced fungal genomes of bioenergy interest.« less
Whole-Genome Sequencing in Outbreak Analysis
Turner, Stephen D.; Riley, Margaret F.; Petri, William A.; Hewlett, Erik L.
2015-01-01
SUMMARY In addition to the ever-present concern of medical professionals about epidemics of infectious diseases, the relative ease of access and low cost of obtaining, producing, and disseminating pathogenic organisms or biological toxins mean that bioterrorism activity should also be considered when facing a disease outbreak. Utilization of whole-genome sequencing (WGS) in outbreak analysis facilitates the rapid and accurate identification of virulence factors of the pathogen and can be used to identify the path of disease transmission within a population and provide information on the probable source. Molecular tools such as WGS are being refined and advanced at a rapid pace to provide robust and higher-resolution methods for identifying, comparing, and classifying pathogenic organisms. If these methods of pathogen characterization are properly applied, they will enable an improved public health response whether a disease outbreak was initiated by natural events or by accidental or deliberate human activity. The current application of next-generation sequencing (NGS) technology to microbial WGS and microbial forensics is reviewed. PMID:25876885
Rangannan, Vetriselvi; Bansal, Manju
2009-12-01
The rapid increase in genome sequence information has necessitated the annotation of their functional elements, particularly those occurring in the non-coding regions, in the genomic context. Promoter region is the key regulatory region, which enables the gene to be transcribed or repressed, but it is difficult to determine experimentally. Hence an in silico identification of promoters is crucial in order to guide experimental work and to pin point the key region that controls the transcription initiation of a gene. In this analysis, we demonstrate that while the promoter regions are in general less stable than the flanking regions, their average free energy varies depending on the GC composition of the flanking genomic sequence. We have therefore obtained a set of free energy threshold values, for genomic DNA with varying GC content and used them as generic criteria for predicting promoter regions in several microbial genomes, using an in-house developed tool PromPredict. On applying it to predict promoter regions corresponding to the 1144 and 612 experimentally validated TSSs in E. coli (50.8% GC) and B. subtilis (43.5% GC) sensitivity of 99% and 95% and precision values of 58% and 60%, respectively, were achieved. For the limited data set of 81 TSSs available for M. tuberculosis (65.6% GC) a sensitivity of 100% and precision of 49% was obtained.
Bhattacharyya, Anamitra; Stilwagen, Stephanie; Reznik, Gary; Feil, Helene; Feil, William S; Anderson, Iain; Bernal, Axel; D'Souza, Mark; Ivanova, Natalia; Kapatral, Vinayak; Larsen, Niels; Los, Tamara; Lykidis, Athanasios; Selkov, Eugene; Walunas, Theresa L; Purcell, Alexander; Edwards, Rob A; Hawkins, Trevor; Haselkorn, Robert; Overbeek, Ross; Kyrpides, Nikos C; Predki, Paul F
2002-10-01
Draft sequencing is a rapid and efficient method for determining the near-complete sequence of microbial genomes. Here we report a comparative analysis of one complete and two draft genome sequences of the phytopathogenic bacterium, Xylella fastidiosa, which causes serious disease in plants, including citrus, almond, and oleander. We present highlights of an in silico analysis based on a comparison of reconstructions of core biological subsystems. Cellular pathway reconstructions have been used to identify a small number of genes, which are likely to reside within the draft genomes but are not captured in the draft assembly. These represented only a small fraction of all genes and were predominantly large and small ribosomal subunit protein components. By using this approach, some of the inherent limitations of draft sequence can be significantly reduced. Despite the incomplete nature of the draft genomes, it is possible to identify several phage-related genes, which appear to be absent from the draft genomes and not the result of insufficient sequence sampling. This region may therefore identify potential host-specific functions. Based on this first functional reconstruction of a phytopathogenic microbe, we spotlight an unusual respiration machinery as a potential target for biological control. We also predicted and developed a new defined growth medium for Xylella.
Reverse Vaccinology: Developing Vaccines in the Era of Genomics
Sette, Alessandro; Rappuoli, Rino
2012-01-01
The sequence of microbial genomes made all potential antigens of each pathogen available for vaccine development. This increased by orders of magnitude potential vaccine targets in bacteria, parasites, and large viruses and revealed virtually all their CD4+ and CD8+ T cell epitopes. The genomic information was first used for the development of a vaccine against serogroup B meningococcus, and it is now being used for several other bacterial vaccines. In this review, we will first summarize the impact that genome sequencing has had on vaccine development, and then we will analyze how the genomic information can help further our understanding of immunity to infection or vaccination and lead to the design of better vaccines by diving into the world of T cell immunity. PMID:21029963
Metagenomic discovery of biomass-degrading genes and genomes from cow rumen.
Hess, Matthias; Sczyrba, Alexander; Egan, Rob; Kim, Tae-Wan; Chokhawala, Harshal; Schroth, Gary; Luo, Shujun; Clark, Douglas S; Chen, Feng; Zhang, Tao; Mackie, Roderick I; Pennacchio, Len A; Tringe, Susannah G; Visel, Axel; Woyke, Tanja; Wang, Zhong; Rubin, Edward M
2011-01-28
The paucity of enzymes that efficiently deconstruct plant polysaccharides represents a major bottleneck for industrial-scale conversion of cellulosic biomass into biofuels. Cow rumen microbes specialize in degradation of cellulosic plant material, but most members of this complex community resist cultivation. To characterize biomass-degrading genes and genomes, we sequenced and analyzed 268 gigabases of metagenomic DNA from microbes adherent to plant fiber incubated in cow rumen. From these data, we identified 27,755 putative carbohydrate-active genes and expressed 90 candidate proteins, of which 57% were enzymatically active against cellulosic substrates. We also assembled 15 uncultured microbial genomes, which were validated by complementary methods including single-cell genome sequencing. These data sets provide a substantially expanded catalog of genes and genomes participating in the deconstruction of cellulosic biomass.
Al-Saari, Nurhidayu; Gao, Feng; Rohul, Amin A K M; Sato, Kazumichi; Sato, Keisuke; Mino, Sayaka; Suda, Wataru; Oshima, Kenshiro; Hattori, Masahira; Ohkuma, Moriya; Meirelles, Pedro M; Thompson, Fabiano L; Thompson, Cristiane; Filho, Gilberto M A; Gomez-Gil, Bruno; Sawabe, Toko; Sawabe, Tomoo
2015-01-01
Advances in genomic microbial taxonomy have opened the way to create a more universal and transparent concept of species but is still in a transitional stage towards becoming a defining robust criteria for describing new microbial species with minimum features obtained using both genome and classical polyphasic taxonomies. Here we performed advanced microbial taxonomies combined with both genome-based and classical approaches for new agarolytic vibrio isolates to describe not only a novel Vibrio species but also a member of a new Vibrio clade. Two novel vibrio strains (Vibrio astriarenae sp. nov. C7T and C20) showing agarolytic, halophilic and fermentative metabolic activity were isolated from a seawater sample collected in a coral reef in Okinawa. Intraspecific similarities of the isolates were identical in both sequences on the 16S rRNA and pyrH genes, but the closest relatives on the molecular phylogenetic trees on the basis of 16S rRNA and pyrH gene sequences were V. hangzhouensis JCM 15146T (97.8% similarity) and V. agarivorans CECT 5085T (97.3% similarity), respectively. Further multilocus sequence analysis (MLSA) on the basis of 8 protein coding genes (ftsZ, gapA, gyrB, mreB, pyrH, recA, rpoA, and topA) obtained by the genome sequences clearly showed the V. astriarenae strain C7T and C20 formed a distinct new clade protruded next to V. agarivorans CECT 5085T. The singleton V. agarivorans has never been included in previous MLSA of Vibrionaceae due to the lack of some gene sequences. Now the gene sequences are completed and analysis of 100 taxa in total provided a clear picture describing the association of V. agarivorans into pre-existing concatenated network tree and concluded its relationship to our vibrio strains. Experimental DNA-DNA hybridization (DDH) data showed that the strains C7T and C20 were conspecific but were separated from all of the other Vibrio species related on the basis of both 16S rRNA and pyrH gene phylogenies (e.g., V. agarivorans CECT 5085T, V. hangzhouensis JCM 15146T V. maritimus LMG 25439T, and V. variabilis LMG 25438T). In silico DDH data also supported the genomic relationship. The strains C7T also had less than 95% average amino acid identity (AAI) and average nucleotide identity (ANI) towards V. maritimus C210, V. variabilis C206, and V. mediterranei AK1T, V. brasiliensis LMG 20546T, V. orientalis ATCC 33934T, and V. sinaloensis DSM 21326. The name Vibrio astriarenae sp. nov. is proposed with C7 as the type strains. Both V. agarivorans CECT 5058T and V. astriarenae C7T are members of the newest clade of Vibrionaceae named Agarivorans.
Ishii, Satoshi; Sadowsky, Michael J
2009-04-01
A large number of repetitive DNA sequences are found in multiple sites in the genomes of numerous bacteria, archaea and eukarya. While the functions of many of these repetitive sequence elements are unknown, they have proven to be useful as the basis of several powerful tools for use in molecular diagnostics, medical microbiology, epidemiological analyses and environmental microbiology. The repetitive sequence-based PCR or rep-PCR DNA fingerprint technique uses primers targeting several of these repetitive elements and PCR to generate unique DNA profiles or 'fingerprints' of individual microbial strains. Although this technique has been extensively used to examine diversity among variety of prokaryotic microorganisms, rep-PCR DNA fingerprinting can also be applied to microbial ecology and microbial evolution studies since it has the power to distinguish microbes at the strain or isolate level. Recent advancement in rep-PCR methodology has resulted in increased accuracy, reproducibility and throughput. In this minireview, we summarize recent improvements in rep-PCR DNA fingerprinting methodology, and discuss its applications to address fundamentally important questions in microbial ecology and evolution.
A Hybrid Approach for the Automated Finishing of Bacterial Genomes
Robins, William P.; Chin, Chen-Shan; Webster, Dale; Paxinos, Ellen; Hsu, David; Ashby, Meredith; Wang, Susana; Peluso, Paul; Sebra, Robert; Sorenson, Jon; Bullard, James; Yen, Jackie; Valdovino, Marie; Mollova, Emilia; Luong, Khai; Lin, Steven; LaMay, Brianna; Joshi, Amruta; Rowe, Lori; Frace, Michael; Tarr, Cheryl L.; Turnsek, Maryann; Davis, Brigid M; Kasarskis, Andrew; Mekalanos, John J.; Waldor, Matthew K.; Schadt, Eric E.
2013-01-01
Dramatic improvements in DNA sequencing technology have revolutionized our ability to characterize most genomic diversity. However, accurate resolution of large structural events has remained challenging due to the comparatively shorter read lengths of second-generation technologies. Emerging third-generation sequencing technologies, which yield markedly increased read length on rapid time scales and for low cost, have the potential to address assembly limitations. Here we combine sequencing data from second- and third-generation DNA sequencing technologies to assemble the two-chromosome genome of a recent Haitian cholera outbreak strain into two nearly finished contigs at > 99.9% accuracy. Complex regions with clinically significant structure were completely resolved. In separate control assemblies on experimental and simulated data for the canonical N16961 reference we obtain 14 and 8 scaffolds greater than 1kb, respectively, correcting several errors in the underlying source data. This work provides a blueprint for the next generation of rapid microbial identification and full-genome assembly. PMID:22750883
Ecosystems Biology Approaches To Determine Key Fitness Traits of Soil Microorganisms
NASA Astrophysics Data System (ADS)
Brodie, E.; Zhalnina, K.; Karaoz, U.; Cho, H.; Nuccio, E. E.; Shi, S.; Lipton, M. S.; Zhou, J.; Pett-Ridge, J.; Northen, T.; Firestone, M.
2014-12-01
The application of theoretical approaches such as trait-based modeling represent powerful tools to explain and perhaps predict complex patterns in microbial distribution and function across environmental gradients in space and time. These models are mostly deterministic and where available are built upon a detailed understanding of microbial physiology and response to environmental factors. However as most soil microorganisms have not been cultivated, for the majority our understanding is limited to insights from environmental 'omic information. Information gleaned from 'omic studies of complex systems should be regarded as providing hypotheses, and these hypotheses should be tested under controlled laboratory conditions if they are to be propagated into deterministic models. In a semi-arid Mediterranean grassland system we are attempting to dissect microbial communities into functional guilds with defined physiological traits and are using a range of 'omics approaches to characterize their metabolic potential and niche preference. Initially, two physiologically relevant time points (peak plant activity and prior to wet-up) were sampled and metagenomes sequenced deeply (600-900 Gbp). Following assembly, differential coverage and nucleotide frequency binning were carried out to yield draft genomes. In addition, using a range of cultivation media we have isolated a broad range of bacteria representing abundant bacterial genotypes and with genome sequences of almost 40 isolates are testing genomic predictions regarding growth rate, temperature and substrate utilization in vitro. This presentation will discuss the opportunities and challenges in parameterizing microbial functional guilds from environmental 'omic information for use in trait-based models.
Gole, Jeff; Gore, Athurva; Richards, Andrew; Chiu, Yu-Jui; Fung, Ho-Lim; Bushman, Diane; Chiang, Hsin-I; Chun, Jerold; Lo, Yu-Hwa; Zhang, Kun
2013-01-01
Genome sequencing of single cells has a variety of applications, including characterizing difficult-to-culture microorganisms and identifying somatic mutations in single cells from mammalian tissues. A major hurdle in this process is the bias in amplifying the genetic material from a single cell, a procedure known as polymerase cloning. Here we describe the microwell displacement amplification system (MIDAS), a massively parallel polymerase cloning method in which single cells are randomly distributed into hundreds to thousands of nanoliter wells and simultaneously amplified for shotgun sequencing. MIDAS reduces amplification bias because polymerase cloning occurs in physically separated nanoliter-scale reactors, facilitating the de novo assembly of near-complete microbial genomes from single E. coli cells. In addition, MIDAS allowed us to detect single-copy number changes in primary human adult neurons at 1–2 Mb resolution. MIDAS will further the characterization of genomic diversity in many heterogeneous cell populations. PMID:24213699
Luan, Guodong; Bao, Guanhui; Lin, Zhao; Li, Yang; Chen, Zugen; Li, Yin; Cai, Zhen
2015-12-25
Heat tolerance of microbes is of great importance for efficient biorefinery and bioconversion. However, engineering and understanding of microbial heat tolerance are difficult and insufficient because it is a complex physiological trait which probably correlates with all gene functions, genetic regulations, and cellular metabolisms and activities. In this work, a novel strain engineering approach named Genome Replication Engineering Assisted Continuous Evolution (GREACE) was employed to improve the heat tolerance of Escherichia coli. When the E. coli strain carrying a mutator was cultivated under gradually increasing temperature, genome-wide mutations were continuously generated during genome replication and the mutated strains with improved thermotolerance were autonomously selected. A thermotolerant strain HR50 capable of growing at 50°C on LB agar plate was obtained within two months, demonstrating the efficiency of GREACE in improving such a complex physiological trait. To understand the improved heat tolerance, genomes of HR50 and its wildtype strain DH5α were sequenced. Evenly distributed 361 mutations covering all mutation types were found in HR50. Closed material transportations, loose genome conformation, and possibly altered cell wall structure and transcription pattern were the main differences of HR50 compared with DH5α, which were speculated to be responsible for the improved heat tolerance. This work not only expanding our understanding of microbial heat tolerance, but also emphasizing that the in vivo continuous genome mutagenesis method, GREACE, is efficient in improving microbial complex physiological trait. Copyright © 2015 Elsevier B.V. All rights reserved.
Cerdeira, Louise Teixeira; Carneiro, Adriana Ribeiro; Ramos, Rommel Thiago Jucá; de Almeida, Sintia Silva; D'Afonseca, Vivian; Schneider, Maria Paula Cruz; Baumbach, Jan; Tauch, Andreas; McCulloch, John Anthony; Azevedo, Vasco Ariston Carvalho; Silva, Artur
2011-08-01
Due to the advent of the so-called Next-Generation Sequencing (NGS) technologies the amount of monetary and temporal resources for whole-genome sequencing has been reduced by several orders of magnitude. Sequence reads can be assembled either by anchoring them directly onto an available reference genome (classical reference assembly), or can be concatenated by overlap (de novo assembly). The latter strategy is preferable because it tends to maintain the architecture of the genome sequence the however, depending on the NGS platform used, the shortness of read lengths cause tremendous problems the in the subsequent genome assembly phase, impeding closing of the entire genome sequence. To address the problem, we developed a multi-pronged hybrid de novo strategy combining De Bruijn graph and Overlap-Layout-Consensus methods, which was used to assemble from short reads the entire genome of Corynebacterium pseudotuberculosis strain I19, a bacterium with immense importance in veterinary medicine that causes Caseous Lymphadenitis in ruminants, principally ovines and caprines. Briefly, contigs were assembled de novo from the short reads and were only oriented using a reference genome by anchoring. Remaining gaps were closed using iterative anchoring of short reads by craning to gap flanks. Finally, we compare the genome sequence assembled using our hybrid strategy to a classical reference assembly using the same data as input and show that with the availability of a reference genome, it pays off to use the hybrid de novo strategy, rather than a classical reference assembly, because more genome sequences are preserved using the former. Copyright © 2011 Elsevier B.V. All rights reserved.
PaPrBaG: A machine learning approach for the detection of novel pathogens from NGS data
NASA Astrophysics Data System (ADS)
Deneke, Carlus; Rentzsch, Robert; Renard, Bernhard Y.
2017-01-01
The reliable detection of novel bacterial pathogens from next-generation sequencing data is a key challenge for microbial diagnostics. Current computational tools usually rely on sequence similarity and often fail to detect novel species when closely related genomes are unavailable or missing from the reference database. Here we present the machine learning based approach PaPrBaG (Pathogenicity Prediction for Bacterial Genomes). PaPrBaG overcomes genetic divergence by training on a wide range of species with known pathogenicity phenotype. To that end we compiled a comprehensive list of pathogenic and non-pathogenic bacteria with human host, using various genome metadata in conjunction with a rule-based protocol. A detailed comparative study reveals that PaPrBaG has several advantages over sequence similarity approaches. Most importantly, it always provides a prediction whereas other approaches discard a large number of sequencing reads with low similarity to currently known reference genomes. Furthermore, PaPrBaG remains reliable even at very low genomic coverages. CombiningPaPrBaG with existing approaches further improves prediction results.
Lisle, John T.
2011-01-01
Microbial community genomic DNA was extracted from sediment samples collected from the northern Gulf of Mexico (NGOM) coast. These samples had a high probability of being impacted by Macondo-1 (M-1) well oil from the Deepwater Horizon (DWH) drilling site. The hypothesis for this project was that presence of M-1 oil in coastal sediments would significantly alter the diversity within the microbial communities associated with the impacted sediments. To determine if community-level changes did or did not occur following exposure to M-1 oil, microbial community-diversity fingerprints were generated and compared. Specific sequences within the community's genomic DNA were first amplified using the polymerase chain reaction (PCR) using a primer set that provides possible resolution to the species level. A second nested PCR that was performed on the primary PCR products using a primer set on which a GC-clamp was attached to one of the primers. These nested PCR products were separated using denaturing-gradient gel electrophoresis (DGGE) that resolves the nested PCR products based on sequence dissimilarities (or similarities), forming a genomic fingerprint of the microbial diversity within the respective samples. Sediment samples with similar fingerprints were grouped and compared to oil-fingerprint data from Rosenbauer and others (2010). The microbial community fingerprints grouped closely when identifying those sites that had been impacted by M-1 oil (N=12) and/or some mixture of M-1 and other oil (N=4), based upon the oil fingerprints. This report represents some of the first information on naturally occurring microbial communities in sediment from shorelines along the NGOM coast. These communities contain microbes capable of degrading oil and related hydrocarbons, making this information relevant to response and recovery of the NGOM from the DWH incident.
Petzsch, Patrick; Poehlein, Anja; Johnson, D. Barrie; Daniel, Rolf; Schlömann, Michael
2015-01-01
Microbial dissimilatory sulfate reduction is commonplace in many anaerobic environments, though few acidophilic bacteria are known to mediate this process. We report the 4.64-Mb draft genome of the type strain of the moderate acidophile Desulfosporosinus acididurans, which was isolated from acidic sediment in a river draining the Soufrière volcano, Montserrat. PMID:26251501
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lukjancenko, Oksana
2012-06-01
Julien Tremblay from DOE JGI presents "Evaluation of Multiplexed 16S rRNA Microbial Population Surveys Using Illumina MiSeq Platorm" at the 7th Annual Sequencing, Finishing, Analysis in the Future (SFAF) Meeting held in June, 2012 in Santa Fe, NM.
Lukjancenko, Oksana
2018-01-10
Julien Tremblay from DOE JGI presents "Evaluation of Multiplexed 16S rRNA Microbial Population Surveys Using Illumina MiSeq Platorm" at the 7th Annual Sequencing, Finishing, Analysis in the Future (SFAF) Meeting held in June, 2012 in Santa Fe, NM.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lapidus, Alla L.
From the date its role in heredity was discovered, DNA has been generating interest among scientists from different fields of knowledge: physicists have studied the three dimensional structure of the DNA molecule, biologists tried to decode the secrets of life hidden within these long molecules, and technologists invent and improve methods of DNA analysis. The analysis of the nucleotide sequence of DNA occupies a special place among the methods developed. Thanks to the variety of sequencing technologies available, the process of decoding the sequence of genomic DNA (or whole genome sequencing) has become robust and inexpensive. Meanwhile the assembly ofmore » whole genome sequences remains a challenging task. In addition to the need to assemble millions of DNA fragments of different length (from 35 bp (Solexa) to 800 bp (Sanger)), great interest in analysis of microbial communities (metagenomes) of different complexities raises new problems and pushes some new requirements for sequence assembly tools to the forefront. The genome assembly process can be divided into two steps: draft assembly and assembly improvement (finishing). Despite the fact that automatically performed assembly (or draft assembly) is capable of covering up to 98% of the genome, in most cases, it still contains incorrectly assembled reads. The error rate of the consensus sequence produced at this stage is about 1/2000 bp. A finished genome represents the genome assembly of much higher accuracy (with no gaps or incorrectly assembled areas) and quality ({approx}1 error/10,000 bp), validated through a number of computer and laboratory experiments.« less
UV Decontamination of MDA Reagents for Single Cell Genomics
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lee, Janey; Tighe, Damon; Sczyrba, Alexander
2011-03-18
Single cell genomics, the amplification and sequencing of genomes from single cells, can provide a glimpse into the genetic make-up and thus life style of the vast majority of uncultured microbial cells, making it an immensely powerful and increasingly popular tool. This is accomplished by use of multiple displacement amplification (MDA), which can generate billions of copies of a single bacterial genome producing microgram-range DNA required for shotgun sequencing. Here, we address a key challenge inherent to this approach and propose a solution for the improved recovery of single cell genomes. While DNA-free reagents for the amplification of a singlemore » cell genome are a prerequisite for successful single cell sequencing and analysis, DNA contamination has been detected in various reagents, which poses a considerable challenge. Our study demonstrates the effect of UV irradiation in efficient elimination of exogenous contaminant DNA found in MDA reagents, while maintaining Phi29 activity. Consequently, we also find that increased UV exposure to Phi29 does not adversely affect genome coverage of MDA amplified single cells. While additional challenges in single cell genomics remain to be resolved, the proposed methodology is relatively quick and simple and we believe that its application will be of high value for future single cell sequencing projects.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
McLoughlin, Kevin
2016-01-11
This report describes the design and implementation of an algorithm for estimating relative microbial abundances, together with confidence limits, using data from metagenomic DNA sequencing. For the background behind this project and a detailed discussion of our modeling approach for metagenomic data, we refer the reader to our earlier technical report, dated March 4, 2014. Briefly, we described a fully Bayesian generative model for paired-end sequence read data, incorporating the effects of the relative abundances, the distribution of sequence fragment lengths, fragment position bias, sequencing errors and variations between the sampled genomes and the nearest reference genomes. A distinctive featuremore » of our modeling approach is the use of a Chinese restaurant process (CRP) to describe the selection of genomes to be sampled, and thus the relative abundances. The CRP component is desirable for fitting abundances to reads that may map ambiguously to multiple targets, because it naturally leads to sparse solutions that select the best representative from each set of nearly equivalent genomes.« less
The genome of the Lactobacillus sanfranciscensis temperate phage EV3
2013-01-01
Background Bacteriophages infection modulates microbial consortia and transduction is one of the most important mechanism involved in the bacterial evolution. However, phage contamination brings food fermentations to a halt causing economic setbacks. The number of phage genome sequences of lactic acid bacteria especially of lactobacilli is still limited. We analysed the genome of a temperate phage active on Lactobacillus sanfranciscensis, the predominant strain in type I sourdough fermentations. Results Sequencing of the DNA of EV3 phage revealed a genome of 34,834 bp and a G + C content of 36.45%. Of the 43 open reading frames (ORFs) identified, all but eight shared homology with other phages of lactobacilli. A similar genomic organization and mosaic pattern of identities align EV3 with the closely related Lactobacillus vaginalis ATCC 49540 prophage. Four unknown ORFs that had no homologies in the databases or predicted functions were identified. Notably, EV3 encodes a putative dextranase. Conclusions EV3 is the first L. sanfranciscensis phage that has been completely sequenced so far. PMID:24308641
Liu, Xia; Xu, Yongdong; Li, Zhi; Jiang, Shengwei; Yao, Shuo; Wu, Rina; An, Yingfeng
2018-04-21
A silica sands-based method has been developed to isolate high quality genomic DNAs from cells of animals, plants and microorganisms, such as Hemisalanx prognathus, Spinacia oleracea, Pichia pastoris, Bacillus licheniformis and Escherichia coli. To the best of our knowledge, no DNA isolation method has so wide application until now. In addition, this method and a commercially available kit were compared in analysis of microbial communities using high-throughput 16s rDNA sequencing. As a result, the silica sands-based method was found to be even more efficient in isolating genomic DNA from gram-positive bacteria than the kit, indicating that it would become a very valuable choice to faithfully reflect the composition of microbial communities.
IMG/VR: a database of cultured and uncultured DNA Viruses and retroviruses.
Paez-Espino, David; Chen, I-Min A; Palaniappan, Krishna; Ratner, Anna; Chu, Ken; Szeto, Ernest; Pillay, Manoj; Huang, Jinghua; Markowitz, Victor M; Nielsen, Torben; Huntemann, Marcel; K Reddy, T B; Pavlopoulos, Georgios A; Sullivan, Matthew B; Campbell, Barbara J; Chen, Feng; McMahon, Katherine; Hallam, Steve J; Denef, Vincent; Cavicchioli, Ricardo; Caffrey, Sean M; Streit, Wolfgang R; Webster, John; Handley, Kim M; Salekdeh, Ghasem H; Tsesmetzis, Nicolas; Setubal, Joao C; Pope, Phillip B; Liu, Wen-Tso; Rivers, Adam R; Ivanova, Natalia N; Kyrpides, Nikos C
2017-01-04
Viruses represent the most abundant life forms on the planet. Recent experimental and computational improvements have led to a dramatic increase in the number of viral genome sequences identified primarily from metagenomic samples. As a result of the expanding catalog of metagenomic viral sequences, there exists a need for a comprehensive computational platform integrating all these sequences with associated metadata and analytical tools. Here we present IMG/VR (https://img.jgi.doe.gov/vr/), the largest publicly available database of 3908 isolate reference DNA viruses with 264 413 computationally identified viral contigs from >6000 ecologically diverse metagenomic samples. Approximately half of the viral contigs are grouped into genetically distinct quasi-species clusters. Microbial hosts are predicted for 20 000 viral sequences, revealing nine microbial phyla previously unreported to be infected by viruses. Viral sequences can be queried using a variety of associated metadata, including habitat type and geographic location of the samples, or taxonomic classification according to hallmark viral genes. IMG/VR has a user-friendly interface that allows users to interrogate all integrated data and interact by comparing with external sequences, thus serving as an essential resource in the viral genomics community. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
High depth, whole-genome sequencing of cholera isolates from Haiti and the Dominican Republic.
Sealfon, Rachel; Gire, Stephen; Ellis, Crystal; Calderwood, Stephen; Qadri, Firdausi; Hensley, Lisa; Kellis, Manolis; Ryan, Edward T; LaRocque, Regina C; Harris, Jason B; Sabeti, Pardis C
2012-09-11
Whole-genome sequencing is an important tool for understanding microbial evolution and identifying the emergence of functionally important variants over the course of epidemics. In October 2010, a severe cholera epidemic began in Haiti, with additional cases identified in the neighboring Dominican Republic. We used whole-genome approaches to sequence four Vibrio cholerae isolates from Haiti and the Dominican Republic and three additional V. cholerae isolates to a high depth of coverage (>2000x); four of the seven isolates were previously sequenced. Using these sequence data, we examined the effect of depth of coverage and sequencing platform on genome assembly and identification of sequence variants. We found that 50x coverage is sufficient to construct a whole-genome assembly and to accurately call most variants from 100 base pair paired-end sequencing reads. Phylogenetic analysis between the newly sequenced and thirty-three previously sequenced V. cholerae isolates indicates that the Haitian and Dominican Republic isolates are closest to strains from South Asia. The Haitian and Dominican Republic isolates form a tight cluster, with only four variants unique to individual isolates. These variants are located in the CTX region, the SXT region, and the core genome. Of the 126 mutations identified that separate the Haiti-Dominican Republic cluster from the V. cholerae reference strain (N16961), 73 are non-synonymous changes, and a number of these changes cluster in specific genes and pathways. Sequence variant analyses of V. cholerae isolates, including multiple isolates from the Haitian outbreak, identify coverage-specific and technology-specific effects on variant detection, and provide insight into genomic change and functional evolution during an epidemic.
SMM-system: A mining tool to identify specific markers in Salmonella enterica.
Yu, Shuijing; Liu, Weibing; Shi, Chunlei; Wang, Dapeng; Dan, Xianlong; Li, Xiao; Shi, Xianming
2011-03-01
This report presents SMM-system, a software package that implements various personalized pre- and post-BLASTN tasks for mining specific markers of microbial pathogens. The main functionalities of SMM-system are summarized as follows: (i) converting multi-FASTA file, (ii) cutting interesting genomic sequence, (iii) automatic high-throughput BLASTN searches, and (iv) screening target sequences. The utility of SMM-system was demonstrated by using it to identify 214 Salmonella enterica-specific protein-coding sequences (CDSs). Eighteen primer pairs were designed based on eighteen S. enterica-specific CDSs, respectively. Seven of these primer pairs were validated with PCR assay, which showed 100% inclusivity for the 101 S. enterica genomes and 100% exclusivity of 30 non-S. enterica genomes. Three specific primer pairs were chosen to develop a multiplex PCR assay, which generated specific amplicons with a size of 180bp (SC1286), 238bp (SC1598) and 405bp (SC4361), respectively. This study demonstrates that SMM-system is a high-throughput specific marker generation tool that can be used to identify genus-, species-, serogroup- and even serovar-specific DNA sequences of microbial pathogens, which has a potential to be applied in food industries, diagnostics and taxonomic studies. SMM-system is freely available and can be downloaded from http://foodsafety.sjtu.edu.cn/SMM-system.html. Copyright © 2011 Elsevier B.V. All rights reserved.
Key roles for freshwater Actinobacteria revealed by deep metagenomic sequencing.
Ghai, Rohit; Mizuno, Carolina Megumi; Picazo, Antonio; Camacho, Antonio; Rodriguez-Valera, Francisco
2014-12-01
Freshwater ecosystems are critical but fragile environments directly affecting society and its welfare. However, our understanding of genuinely freshwater microbial communities, constrained by our capacity to manipulate its prokaryotic participants in axenic cultures, remains very rudimentary. Even the most abundant components, freshwater Actinobacteria, remain largely unknown. Here, applying deep metagenomic sequencing to the microbial community of a freshwater reservoir, we were able to circumvent this traditional bottleneck and reconstruct de novo seven distinct streamlined actinobacterial genomes. These genomes represent three new groups of photoheterotrophic, planktonic Actinobacteria. We describe for the first time genomes of two novel clades, acMicro (Micrococcineae, related to Luna2,) and acAMD (Actinomycetales, related to acTH1). Besides, an aggregate of contigs belonged to a new branch of the Acidimicrobiales. All are estimated to have small genomes (approximately 1.2 Mb), and their GC content varied from 40 to 61%. One of the Micrococcineae genomes encodes a proteorhodopsin, a rhodopsin type reported for the first time in Actinobacteria. The remarkable potential capacity of some of these genomes to transform recalcitrant plant detrital material, particularly lignin-derived compounds, suggests close linkages between the terrestrial and aquatic realms. Moreover, abundances of Actinobacteria correlate inversely to those of Cyanobacteria that are responsible for prolonged and frequently irretrievable damage to freshwater ecosystems. This suggests that they might serve as sentinels of impending ecological catastrophes. © 2014 John Wiley & Sons Ltd.
Ben Hania, Wajdi; Joseph, Manon; Schumann, Peter; Bunk, Boyke; Fiebig, Anne; Spröer, Cathrin; Klenk, Hans-Peter; Fardeau, Marie-Laure; Spring, Stefan
2015-01-01
During a study of the anaerobic microbial community of a lithifying hypersaline microbial mat of Lake 21 on the Kiritimati atoll (Kiribati Republic, Central Pacific) strain L21-RPul-D2(T) was isolated. The closest phylogenetic neighbor was Spirochaeta africana Z-7692(T) that shared a 16S rRNA gene sequence identity value of 90% with the novel strain and thus was only distantly related. A comprehensive polyphasic study including determination of the complete genome sequence was initiated to characterize the novel isolate. Cells of strain L21-RPul-D2(T) had a size of 0.2 - 0.25 × 8-9 μm, were helical, motile, stained Gram-negative and produced an orange carotenoid-like pigment. Optimal conditions for growth were 35°C, a salinity of 50 g/l NaCl and a pH around 7.0. Preferred substrates for growth were carbohydrates and a few carboxylic acids. The novel strain had an obligate fermentative metabolism and produced ethanol, acetate, lactate, hydrogen and carbon dioxide during growth on glucose. Strain L21-RPul-D2(T) was aerotolerant, but oxygen did not stimulate growth. Major cellular fatty acids were C14:0, iso-C15:0, C16:0 and C18:0. The major polar lipids were an unidentified aminolipid, phosphatidylglycerol, an unidentified phospholipid and two unidentified glycolipids. Whole-cell hydrolysates contained L-ornithine as diagnostic diamino acid of the cell wall peptidoglycan. The complete genome sequence was determined and annotated. The genome comprised one circular chromosome with a size of 3.78 Mbp that contained 3450 protein-coding genes and 50 RNA genes, including 2 operons of ribosomal RNA genes. The DNA G + C content was determined from the genome sequence as 51.9 mol%. There were no predicted genes encoding cytochromes or enzymes responsible for the biosynthesis of respiratory lipoquinones. Based on significant differences to the uncultured type species of the genus Spirochaeta, S. plicatilis, as well as to any other phylogenetically related cultured species it is suggested to place strain L21-RPul-D2(T) (=DSM 27196(T) = JCM 18663(T)) in a novel species and genus, for which the name Salinispira pacifica gen. nov., sp. nov. is proposed.
Laviad-Shitrit, Sivan; Göker, Markus; Huntemann, Marcel; ...
2017-05-08
Chryseobacterium bovis DSM 19482 T (Hantsis-Zacharov et al., Int J Syst Evol Microbiol 58:1024-1028, 2008) is a Gram-negative, rod shaped, non-motile, facultative anaerobe, chemoorganotroph bacterium. C. bovis is a member of the Flavobacteriaceae, a family within the phylum Bacteroidetes. It was isolated when psychrotolerant bacterial communities in raw milk and their proteolytic and lipolytic traits were studied. Here we describe the features of this organism, together with the draft genome sequence and annotation. The DNA G + C content is 38.19%. The chromosome length is 3,346,045 bp. It encodes 3236 proteins and 105 RNA genes. The C. bovis genome ismore » part of the Genomic Encyclopedia of Type Strains, Phase I: the one thousand microbial genomes study.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Laviad-Shitrit, Sivan; Göker, Markus; Huntemann, Marcel
Chryseobacterium bovis DSM 19482 T (Hantsis-Zacharov et al., Int J Syst Evol Microbiol 58:1024-1028, 2008) is a Gram-negative, rod shaped, non-motile, facultative anaerobe, chemoorganotroph bacterium. C. bovis is a member of the Flavobacteriaceae, a family within the phylum Bacteroidetes. It was isolated when psychrotolerant bacterial communities in raw milk and their proteolytic and lipolytic traits were studied. Here we describe the features of this organism, together with the draft genome sequence and annotation. The DNA G + C content is 38.19%. The chromosome length is 3,346,045 bp. It encodes 3236 proteins and 105 RNA genes. The C. bovis genome ismore » part of the Genomic Encyclopedia of Type Strains, Phase I: the one thousand microbial genomes study.« less
Legume Genome Initiative at the University of Oklahoma
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bruce A. Roe
2004-02-27
Consolidated Appropriations Resolution, 2003 Conference Report for the Department of Energy's Biological and Environmental Research (BER) program provided $481,000 for the Legume Genome Initiative at the University of Oklahoma. These funds were used to support our research that is aimed at determining the entire sequence of the gene rich regions of the genome of the legume, Medicago truncatula, by allowing us to obtain a greater degree of finished BAC sequences from the draft sequences we have already obtained through research funded by the Noble Foundation. During the funding period we increased the number of Medicago truncatula BACs with finished (Bermudamore » standard) sequences from 109 to 359, and the total number of BACs for which we collected sequence data from 584 to 842, 359 of which reached phase 2 (ordered and oriented contigs). We also sequenced a series of pooled BAC clones that cover additional euchromatic (gene rich) genomic regions. This work resulted in 6 refereed publications, see below. Genes whose sequence was determined during this study included multiple members of the plant disease resistance (R-gene) family as well as several genes involved in flavinoid biosynthesis, nitrogen fixation and plant-microbial symbosis. This work also served as a prelude to obtaining NSF funding for the international collaborative effort to complete the entire sequence of the Medicago truncatula genomic euchromatic regions using a BAC based approach.« less
Rodríguez-García, Antonio; Fernández-Alegre, Estela; Morales, Alejandro; Sola-Landa, Alberto; Lorraine, Jess; Macdonald, Sandy; Dovbnya, Dmitry; Smith, Margaret C M; Donova, Marina; Barreiro, Carlos
2016-04-20
Microbial bioconversion of sterols into high value steroid precursors, such as 4-androstene-3,17-dione (AD), is an industrial challenge. Genes and enzymes involved in sterol degradation have been proposed, although the complete pathway is not yet known. The genome sequencing of the AD producer strain 'Mycobacterium neoaurum' NRRL B-3805 (formerly Mycobacterium sp. NRRL B-3805) will serve to elucidate the critical steps for industrial processes and will provide the basis for further genetic engineering. The genome comprises a circular chromosome (5 421 338bp), is devoid of plasmids and contains 4844 protein-coding genes. Copyright © 2016 Elsevier B.V. All rights reserved.
Genome-Based Microbial Taxonomy Coming of Age.
Hugenholtz, Philip; Skarshewski, Adam; Parks, Donovan H
2016-06-01
Reconstructing the complete evolutionary history of extant life on our planet will be one of the most fundamental accomplishments of scientific endeavor, akin to the completion of the periodic table, which revolutionized chemistry. The road to this goal is via comparative genomics because genomes are our most comprehensive and objective evolutionary documents. The genomes of plant and animal species have been systematically targeted over the past decade to provide coverage of the tree of life. However, multicellular organisms only emerged in the last 550 million years of more than three billion years of biological evolution and thus comprise a small fraction of total biological diversity. The bulk of biodiversity, both past and present, is microbial. We have only scratched the surface in our understanding of the microbial world, as most microorganisms cannot be readily grown in the laboratory and remain unknown to science. Ground-breaking, culture-independent molecular techniques developed over the past 30 years have opened the door to this so-called microbial dark matter with an accelerating momentum driven by exponential increases in sequencing capacity. We are on the verge of obtaining representative genomes across all life for the first time. However, historical use of morphology, biochemical properties, behavioral traits, and single-marker genes to infer organismal relationships mean that the existing highly incomplete tree is riddled with taxonomic errors. Concerted efforts are now needed to synthesize and integrate the burgeoning genomic data resources into a coherent universal tree of life and genome-based taxonomy. Copyright © 2016 Cold Spring Harbor Laboratory Press; all rights reserved.
Genomic encyclopedia of bacteria and archaea: sequencing a myriad of type strains.
Kyrpides, Nikos C; Hugenholtz, Philip; Eisen, Jonathan A; Woyke, Tanja; Göker, Markus; Parker, Charles T; Amann, Rudolf; Beck, Brian J; Chain, Patrick S G; Chun, Jongsik; Colwell, Rita R; Danchin, Antoine; Dawyndt, Peter; Dedeurwaerdere, Tom; DeLong, Edward F; Detter, John C; De Vos, Paul; Donohue, Timothy J; Dong, Xiu-Zhu; Ehrlich, Dusko S; Fraser, Claire; Gibbs, Richard; Gilbert, Jack; Gilna, Paul; Glöckner, Frank Oliver; Jansson, Janet K; Keasling, Jay D; Knight, Rob; Labeda, David; Lapidus, Alla; Lee, Jung-Sook; Li, Wen-Jun; Ma, Juncai; Markowitz, Victor; Moore, Edward R B; Morrison, Mark; Meyer, Folker; Nelson, Karen E; Ohkuma, Moriya; Ouzounis, Christos A; Pace, Norman; Parkhill, Julian; Qin, Nan; Rossello-Mora, Ramon; Sikorski, Johannes; Smith, David; Sogin, Mitch; Stevens, Rick; Stingl, Uli; Suzuki, Ken-Ichiro; Taylor, Dorothea; Tiedje, Jim M; Tindall, Brian; Wagner, Michael; Weinstock, George; Weissenbach, Jean; White, Owen; Wang, Jun; Zhang, Lixin; Zhou, Yu-Guang; Field, Dawn; Whitman, William B; Garrity, George M; Klenk, Hans-Peter
2014-08-01
Microbes hold the key to life. They hold the secrets to our past (as the descendants of the earliest forms of life) and the prospects for our future (as we mine their genes for solutions to some of the planet's most pressing problems, from global warming to antibiotic resistance). However, the piecemeal approach that has defined efforts to study microbial genetic diversity for over 20 years and in over 30,000 genome projects risks squandering that promise. These efforts have covered less than 20% of the diversity of the cultured archaeal and bacterial species, which represent just 15% of the overall known prokaryotic diversity. Here we call for the funding of a systematic effort to produce a comprehensive genomic catalog of all cultured Bacteria and Archaea by sequencing, where available, the type strain of each species with a validly published name (currently∼11,000). This effort will provide an unprecedented level of coverage of our planet's genetic diversity, allow for the large-scale discovery of novel genes and functions, and lead to an improved understanding of microbial evolution and function in the environment.
Metagenome sequencing and 98 microbial genomes from Juan de Fuca Ridge flank subsurface fluids
NASA Astrophysics Data System (ADS)
Jungbluth, Sean P.; Amend, Jan P.; Rappé, Michael S.
2017-03-01
The global deep subsurface biosphere is one of the largest reservoirs for microbial life on our planet. This study takes advantage of new sampling technologies and couples them with improvements to DNA sequencing and associated informatics tools to reconstruct the genomes of uncultivated Bacteria and Archaea from fluids collected deep within the Juan de Fuca Ridge subseafloor. Here, we generated two metagenomes from borehole observatories located 311 meters apart and, using binning tools, retrieved 98 genomes from metagenomes (GFMs). Of the GFMs, 31 were estimated to be >90% complete, while an additional 17 were >70% complete. Phylogenomic analysis revealed 53 bacterial and 45 archaeal GFMs, of which nearly all were distantly related to known cultivated isolates. In the GFMs, abundant Bacteria included Chloroflexi, Nitrospirae, Acetothermia (OP1), EM3, Aminicenantes (OP8), Gammaproteobacteria, and Deltaproteobacteria, while abundant Archaea included Archaeoglobi, Bathyarchaeota (MCG), and Marine Benthic Group E (MBG-E). These data are the first GFMs reconstructed from the deep basaltic subseafloor biosphere, and provide a dataset available for further interrogation.
Metagenome sequencing and 98 microbial genomes from Juan de Fuca Ridge flank subsurface fluids.
Jungbluth, Sean P; Amend, Jan P; Rappé, Michael S
2017-03-28
The global deep subsurface biosphere is one of the largest reservoirs for microbial life on our planet. This study takes advantage of new sampling technologies and couples them with improvements to DNA sequencing and associated informatics tools to reconstruct the genomes of uncultivated Bacteria and Archaea from fluids collected deep within the Juan de Fuca Ridge subseafloor. Here, we generated two metagenomes from borehole observatories located 311 meters apart and, using binning tools, retrieved 98 genomes from metagenomes (GFMs). Of the GFMs, 31 were estimated to be >90% complete, while an additional 17 were >70% complete. Phylogenomic analysis revealed 53 bacterial and 45 archaeal GFMs, of which nearly all were distantly related to known cultivated isolates. In the GFMs, abundant Bacteria included Chloroflexi, Nitrospirae, Acetothermia (OP1), EM3, Aminicenantes (OP8), Gammaproteobacteria, and Deltaproteobacteria, while abundant Archaea included Archaeoglobi, Bathyarchaeota (MCG), and Marine Benthic Group E (MBG-E). These data are the first GFMs reconstructed from the deep basaltic subseafloor biosphere, and provide a dataset available for further interrogation.
The Metagenome of Utricularia gibba's Traps: Into the Microbial Input to a Carnivorous Plant
Alcaraz, Luis David; Martínez-Sánchez, Shamayim; Torres, Ignacio; Ibarra-Laclette, Enrique; Herrera-Estrella, Luis
2016-01-01
The genome and transcriptome sequences of the aquatic, rootless, and carnivorous plant Utricularia gibba L. (Lentibulariaceae), were recently determined. Traps are necessary for U. gibba because they help the plant to survive in nutrient-deprived environments. The U. gibba's traps (Ugt) are specialized structures that have been proposed to selectively filter microbial inhabitants. To determine whether the traps indeed have a microbiome that differs, in composition or abundance, from the microbiome in the surrounding environment, we used whole-genome shotgun (WGS) metagenomics to describe both the taxonomic and functional diversity of the Ugt microbiome. We collected U. gibba plants from their natural habitat and directly sequenced the metagenome of the Ugt microbiome and its surrounding water. The total predicted number of species in the Ugt was more than 1,100. Using pan-genome fragment recruitment analysis, we were able to identify to the species level of some key Ugt players, such as Pseudomonas monteilii. Functional analysis of the Ugt metagenome suggests that the trap microbiome plays an important role in nutrient scavenging and assimilation while complementing the hydrolytic functions of the plant. PMID:26859489
Metagenome sequencing and 98 microbial genomes from Juan de Fuca Ridge flank subsurface fluids
Jungbluth, Sean P.; Amend, Jan P.; Rappé, Michael S.
2017-01-01
The global deep subsurface biosphere is one of the largest reservoirs for microbial life on our planet. This study takes advantage of new sampling technologies and couples them with improvements to DNA sequencing and associated informatics tools to reconstruct the genomes of uncultivated Bacteria and Archaea from fluids collected deep within the Juan de Fuca Ridge subseafloor. Here, we generated two metagenomes from borehole observatories located 311 meters apart and, using binning tools, retrieved 98 genomes from metagenomes (GFMs). Of the GFMs, 31 were estimated to be >90% complete, while an additional 17 were >70% complete. Phylogenomic analysis revealed 53 bacterial and 45 archaeal GFMs, of which nearly all were distantly related to known cultivated isolates. In the GFMs, abundant Bacteria included Chloroflexi, Nitrospirae, Acetothermia (OP1), EM3, Aminicenantes (OP8), Gammaproteobacteria, and Deltaproteobacteria, while abundant Archaea included Archaeoglobi, Bathyarchaeota (MCG), and Marine Benthic Group E (MBG-E). These data are the first GFMs reconstructed from the deep basaltic subseafloor biosphere, and provide a dataset available for further interrogation. PMID:28350381
Genomic Encyclopedia of Bacteria and Archaea: Sequencing a Myriad of Type Strains
Kyrpides, Nikos C.; Hugenholtz, Philip; Eisen, Jonathan A.; Woyke, Tanja; Göker, Markus; Parker, Charles T.; Amann, Rudolf; Beck, Brian J.; Chain, Patrick S. G.; Chun, Jongsik; Colwell, Rita R.; Danchin, Antoine; Dawyndt, Peter; Dedeurwaerdere, Tom; DeLong, Edward F.; Detter, John C.; De Vos, Paul; Donohue, Timothy J.; Dong, Xiu-Zhu; Ehrlich, Dusko S.; Fraser, Claire; Gibbs, Richard; Gilbert, Jack; Gilna, Paul; Glöckner, Frank Oliver; Jansson, Janet K.; Keasling, Jay D.; Knight, Rob; Labeda, David; Lapidus, Alla; Lee, Jung-Sook; Li, Wen-Jun; MA, Juncai; Markowitz, Victor; Moore, Edward R. B.; Morrison, Mark; Meyer, Folker; Nelson, Karen E.; Ohkuma, Moriya; Ouzounis, Christos A.; Pace, Norman; Parkhill, Julian; Qin, Nan; Rossello-Mora, Ramon; Sikorski, Johannes; Smith, David; Sogin, Mitch; Stevens, Rick; Stingl, Uli; Suzuki, Ken-ichiro; Taylor, Dorothea; Tiedje, Jim M.; Tindall, Brian; Wagner, Michael; Weinstock, George; Weissenbach, Jean; White, Owen; Wang, Jun; Zhang, Lixin; Zhou, Yu-Guang; Field, Dawn; Whitman, William B.; Garrity, George M.; Klenk, Hans-Peter
2014-01-01
Microbes hold the key to life. They hold the secrets to our past (as the descendants of the earliest forms of life) and the prospects for our future (as we mine their genes for solutions to some of the planet's most pressing problems, from global warming to antibiotic resistance). However, the piecemeal approach that has defined efforts to study microbial genetic diversity for over 20 years and in over 30,000 genome projects risks squandering that promise. These efforts have covered less than 20% of the diversity of the cultured archaeal and bacterial species, which represent just 15% of the overall known prokaryotic diversity. Here we call for the funding of a systematic effort to produce a comprehensive genomic catalog of all cultured Bacteria and Archaea by sequencing, where available, the type strain of each species with a validly published name (currently∼11,000). This effort will provide an unprecedented level of coverage of our planet's genetic diversity, allow for the large-scale discovery of novel genes and functions, and lead to an improved understanding of microbial evolution and function in the environment. PMID:25093819
Gerlt, John A
2017-08-22
The exponentially increasing number of protein and nucleic acid sequences provides opportunities to discover novel enzymes, metabolic pathways, and metabolites/natural products, thereby adding to our knowledge of biochemistry and biology. The challenge has evolved from generating sequence information to mining the databases to integrating and leveraging the available information, i.e., the availability of "genomic enzymology" web tools. Web tools that allow identification of biosynthetic gene clusters are widely used by the natural products/synthetic biology community, thereby facilitating the discovery of novel natural products and the enzymes responsible for their biosynthesis. However, many novel enzymes with interesting mechanisms participate in uncharacterized small-molecule metabolic pathways; their discovery and functional characterization also can be accomplished by leveraging information in protein and nucleic acid databases. This Perspective focuses on two genomic enzymology web tools that assist the discovery novel metabolic pathways: (1) Enzyme Function Initiative-Enzyme Similarity Tool (EFI-EST) for generating sequence similarity networks to visualize and analyze sequence-function space in protein families and (2) Enzyme Function Initiative-Genome Neighborhood Tool (EFI-GNT) for generating genome neighborhood networks to visualize and analyze the genome context in microbial and fungal genomes. Both tools have been adapted to other applications to facilitate target selection for enzyme discovery and functional characterization. As the natural products community has demonstrated, the enzymology community needs to embrace the essential role of web tools that allow the protein and genome sequence databases to be leveraged for novel insights into enzymological problems.
IMG/M-HMP: a metagenome comparative analysis system for the Human Microbiome Project.
Markowitz, Victor M; Chen, I-Min A; Chu, Ken; Szeto, Ernest; Palaniappan, Krishna; Jacob, Biju; Ratner, Anna; Liolios, Konstantinos; Pagani, Ioanna; Huntemann, Marcel; Mavromatis, Konstantinos; Ivanova, Natalia N; Kyrpides, Nikos C
2012-01-01
The Integrated Microbial Genomes and Metagenomes (IMG/M) resource is a data management system that supports the analysis of sequence data from microbial communities in the integrated context of all publicly available draft and complete genomes from the three domains of life as well as a large number of plasmids and viruses. IMG/M currently contains thousands of genomes and metagenome samples with billions of genes. IMG/M-HMP is an IMG/M data mart serving the US National Institutes of Health (NIH) Human Microbiome Project (HMP), focussed on HMP generated metagenome datasets, and is one of the central resources provided from the HMP Data Analysis and Coordination Center (DACC). IMG/M-HMP is available at http://www.hmpdacc-resources.org/imgm_hmp/.
Genomic dissection of host-microbe and microbe-microbe interactions for advanced plant breeding.
Kroll, Samuel; Agler, Matthew T; Kemen, Eric
2017-04-01
Agriculture faces many emerging challenges to sustainability, including limited nutrient resources, losses from diseases caused by current and emerging pathogens and environmental degradation. Microorganisms have great importance for plant growth and performance, including the potential to increase yields, nutrient uptake and pathogen resistance. An urgent need is therefore to understand and engineer plants and their associated microbial communities. Recent massive genomic sequencing of host plants and associated microbes offers resources to identify novel mechanisms of communal assembly mediated by the host. For example, host-microbe and microbe-microbe interactions are involved in niche formation, thereby contributing to colonization. By leveraging genomic resources, genetic traits underlying those mechanisms will become important resources to design plants selecting and hosting beneficial microbial communities. Copyright © 2017 Elsevier Ltd. All rights reserved.
2012 U.S. Department of Energy: Joint Genome Institute: Progress Report
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gilbert, David
2013-01-01
The mission of the U.S. Department of Energy Joint Genome Institute (DOE JGI) is to serve the diverse scientific community as a user facility, enabling the application of large-scale genomics and analysis of plants, microbes, and communities of microbes to address the DOE mission goals in bioenergy and the environment. The DOE JGI's sequencing efforts fall under the Eukaryote Super Program, which includes the Plant and Fungal Genomics Programs; and the Prokaryote Super Program, which includes the Microbial Genomics and Metagenomics Programs. In 2012, several projects made news for their contributions to energy and environment research.
Hua, Zheng-Shuang; Han, Yu-Jiao; Chen, Lin-Xing; Liu, Jun; Hu, Min; Li, Sheng-Jin; Kuang, Jia-Liang; Chain, Patrick S G; Huang, Li-Nan; Shu, Wen-Sheng
2015-06-01
High-throughput sequencing is expanding our knowledge of microbial diversity in the environment. Still, understanding the metabolic potentials and ecological roles of rare and uncultured microbes in natural communities remains a major challenge. To this end, we applied a 'divide and conquer' strategy that partitioned a massive metagenomic data set (>100 Gbp) into subsets based on K-mer frequency in sequence assembly to a low-diversity acid mine drainage (AMD) microbial community and, by integrating with an additional metatranscriptomic assembly, successfully obtained 11 draft genomes most of which represent yet uncultured and/or rare taxa (relative abundance <1%). We report the first genome of a naturally occurring Ferrovum population (relative abundance >90%) and its metabolic potentials and gene expression profile, providing initial molecular insights into the ecological role of these lesser known, but potentially important, microorganisms in the AMD environment. Gene transcriptional analysis of the active taxa revealed major metabolic capabilities executed in situ, including carbon- and nitrogen-related metabolisms associated with syntrophic interactions, iron and sulfur oxidation, which are key in energy conservation and AMD generation, and the mechanisms of adaptation and response to the environmental stresses (heavy metals, low pH and oxidative stress). Remarkably, nitrogen fixation and sulfur oxidation were performed by the rare taxa, indicating their critical roles in the overall functioning and assembly of the AMD community. Our study demonstrates the potential of the 'divide and conquer' strategy in high-throughput sequencing data assembly for genome reconstruction and functional partitioning analysis of both dominant and rare species in natural microbial assemblages.
Pandin, Caroline; Le Coq, Dominique; Deschamps, Julien; Védie, Régis; Rousseau, Thierry; Aymerich, Stéphane; Briandet, Romain
2018-04-24
Bacillus subtilis QST713 is extensively used as a biological control agent in agricultural fields including in the button mushroom culture, Agaricus bisporus. This last use exploits its inhibitory activity against microbial pathogens such as Trichoderma aggressivum f. europaeum, the main button mushroom green mould competitor. Here, we report the complete genome sequence of this bacterium with a genome size of 4 233 757 bp, 4263 predicted genes and an average GC content of 45.9%. Based on phylogenomic analyses, strain QST713 is finally designated as Bacillus velezensis. Genomic analyses revealed two clusters encoding potential new antimicrobials with NRPS and TransATPKS synthetase. B. velezensis QST713 genome also harbours several genes previously described as being involved in surface colonization and biofilm formation. This strain shows a strong ability to form in vitro spatially organized biofilm and to antagonize T. aggressivum. The availability of this genome sequence could bring new elements to understand the interactions with micro or/and macroorganisms in crops. Copyright © 2018 Elsevier B.V. All rights reserved.
Hawkeye and AMOS: visualizing and assessing the quality of genome assemblies
Schatz, Michael C.; Phillippy, Adam M.; Sommer, Daniel D.; Delcher, Arthur L.; Puiu, Daniela; Narzisi, Giuseppe; Salzberg, Steven L.; Pop, Mihai
2013-01-01
Since its launch in 2004, the open-source AMOS project has released several innovative DNA sequence analysis applications including: Hawkeye, a visual analytics tool for inspecting the structure of genome assemblies; the Assembly Forensics and FRCurve pipelines for systematically evaluating the quality of a genome assembly; and AMOScmp, the first comparative genome assembler. These applications have been used to assemble and analyze dozens of genomes ranging in complexity from simple microbial species through mammalian genomes. Recent efforts have been focused on enhancing support for new data characteristics brought on by second- and now third-generation sequencing. This review describes the major components of AMOS in light of these challenges, with an emphasis on methods for assessing assembly quality and the visual analytics capabilities of Hawkeye. These interactive graphical aspects are essential for navigating and understanding the complexities of a genome assembly, from the overall genome structure down to individual bases. Hawkeye and AMOS are available open source at http://amos.sourceforge.net. PMID:22199379
Ciric, Milica; Moon, Christina D; Leahy, Sinead C; Creevey, Christopher J; Altermann, Eric; Attwood, Graeme T; Rakonjac, Jasna; Gagic, Dragana
2014-05-12
In silico, secretome proteins can be predicted from completely sequenced genomes using various available algorithms that identify membrane-targeting sequences. For metasecretome (collection of surface, secreted and transmembrane proteins from environmental microbial communities) this approach is impractical, considering that the metasecretome open reading frames (ORFs) comprise only 10% to 30% of total metagenome, and are poorly represented in the dataset due to overall low coverage of metagenomic gene pool, even in large-scale projects. By combining secretome-selective phage display and next-generation sequencing, we focused the sequence analysis of complex rumen microbial community on the metasecretome component of the metagenome. This approach achieved high enrichment (29 fold) of secreted fibrolytic enzymes from the plant-adherent microbial community of the bovine rumen. In particular, we identified hundreds of heretofore rare modules belonging to cellulosomes, cell-surface complexes specialised for recognition and degradation of the plant fibre. As a method, metasecretome phage display combined with next-generation sequencing has a power to sample the diversity of low-abundance surface and secreted proteins that would otherwise require exceptionally large metagenomic sequencing projects. As a resource, metasecretome display library backed by the dataset obtained by next-generation sequencing is ready for i) affinity selection by standard phage display methodology and ii) easy purification of displayed proteins as part of the virion for individual functional analysis.
Klassen, Jonathan L.
2010-01-01
Background Carotenoids are multifunctional, taxonomically widespread and biotechnologically important pigments. Their biosynthesis serves as a model system for understanding the evolution of secondary metabolism. Microbial carotenoid diversity and evolution has hitherto been analyzed primarily from structural and biosynthetic perspectives, with the few phylogenetic analyses of microbial carotenoid biosynthetic proteins using either used limited datasets or lacking methodological rigor. Given the recent accumulation of microbial genome sequences, a reappraisal of microbial carotenoid biosynthetic diversity and evolution from the perspective of comparative genomics is warranted to validate and complement models of microbial carotenoid diversity and evolution based upon structural and biosynthetic data. Methodology/Principal Findings Comparative genomics were used to identify and analyze in silico microbial carotenoid biosynthetic pathways. Four major phylogenetic lineages of carotenoid biosynthesis are suggested composed of: (i) Proteobacteria; (ii) Firmicutes; (iii) Chlorobi, Cyanobacteria and photosynthetic eukaryotes; and (iv) Archaea, Bacteroidetes and two separate sub-lineages of Actinobacteria. Using this phylogenetic framework, specific evolutionary mechanisms are proposed for carotenoid desaturase CrtI-family enzymes and carotenoid cyclases. Several phylogenetic lineage-specific evolutionary mechanisms are also suggested, including: (i) horizontal gene transfer; (ii) gene acquisition followed by differential gene loss; (iii) co-evolution with other biochemical structures such as proteorhodopsins; and (iv) positive selection. Conclusions/Significance Comparative genomics analyses of microbial carotenoid biosynthetic proteins indicate a much greater taxonomic diversity then that identified based on structural and biosynthetic data, and divides microbial carotenoid biosynthesis into several, well-supported phylogenetic lineages not evident previously. This phylogenetic framework is applicable to understanding the evolution of specific carotenoid biosynthetic proteins or the unique characteristics of carotenoid biosynthetic evolution in a specific phylogenetic lineage. Together, these analyses suggest a “bramble” model for microbial carotenoid biosynthesis whereby later biosynthetic steps exhibit greater evolutionary plasticity and reticulation compared to those closer to the biosynthetic “root”. Structural diversification may be constrained (“trimmed”) where selection is strong, but less so where selection is weaker. These analyses also highlight likely productive avenues for future research and bioprospecting by identifying both gaps in current knowledge and taxa which may particularly facilitate carotenoid diversification. PMID:20582313
Petzsch, Patrick; Poehlein, Anja; Johnson, D Barrie; Daniel, Rolf; Schlömann, Michael; Mühling, Martin
2015-08-06
Microbial dissimilatory sulfate reduction is commonplace in many anaerobic environments, though few acidophilic bacteria are known to mediate this process. We report the 4.64-Mb draft genome of the type strain of the moderate acidophile Desulfosporosinus acididurans, which was isolated from acidic sediment in a river draining the Soufrière volcano, Montserrat. Copyright © 2015 Petzsch et al.
Programmable removal of bacterial strains by use of genome-targeting CRISPR-Cas systems.
Gomaa, Ahmed A; Klumpe, Heidi E; Luo, Michelle L; Selle, Kurt; Barrangou, Rodolphe; Beisel, Chase L
2014-01-28
CRISPR (clustered regularly interspaced short palindromic repeats)-Cas (CRISPR-associated) systems in bacteria and archaea employ CRISPR RNAs to specifically recognize the complementary DNA of foreign invaders, leading to sequence-specific cleavage or degradation of the target DNA. Recent work has shown that the accidental or intentional targeting of the bacterial genome is cytotoxic and can lead to cell death. Here, we have demonstrated that genome targeting with CRISPR-Cas systems can be employed for the sequence-specific and titratable removal of individual bacterial strains and species. Using the type I-E CRISPR-Cas system in Escherichia coli as a model, we found that this effect could be elicited using native or imported systems and was similarly potent regardless of the genomic location, strand, or transcriptional activity of the target sequence. Furthermore, the specificity of targeting with CRISPR RNAs could readily distinguish between even highly similar strains in pure or mixed cultures. Finally, varying the collection of delivered CRISPR RNAs could quantitatively control the relative number of individual strains within a mixed culture. Critically, the observed selectivity and programmability of bacterial removal would be virtually impossible with traditional antibiotics, bacteriophages, selectable markers, or tailored growth conditions. Once delivery challenges are addressed, we envision that this approach could offer a novel means to quantitatively control the composition of environmental and industrial microbial consortia and may open new avenues for the development of "smart" antibiotics that circumvent multidrug resistance and differentiate between pathogenic and beneficial microorganisms. Controlling the composition of microbial populations is a critical aspect in medicine, biotechnology, and environmental cycles. While different antimicrobial strategies, such as antibiotics, antimicrobial peptides, and lytic bacteriophages, offer partial solutions, what remains elusive is a generalized and programmable strategy that can distinguish between even closely related microorganisms and that allows for fine control over the composition of a microbial population. This study demonstrates that RNA-directed immune systems in bacteria and archaea called CRISPR-Cas systems can provide such a strategy. These systems can be employed to selectively and quantitatively remove individual bacterial strains based purely on sequence information, creating opportunities in the treatment of multidrug-resistant infections, the control of industrial fermentations, and the study of microbial consortia.
Microbial succession and the functional potential during the fermentation of Chinese soy sauce brine
Sulaiman, Joanita; Gan, Han Ming; Yin, Wai-Fong; Chan, Kok-Gan
2014-01-01
The quality of traditional Chinese soy sauce is determined by microbial communities and their inter-related metabolic roles in the fermentation tank. In this study, traditional Chinese soy sauce brine samples were obtained periodically to monitor the transitions of the microbial population and functional properties during the 6 months of fermentation process. Whole genome shotgun method revealed that the fermentation brine was dominated by the bacterial genus Weissella and later dominated by the fungal genus Candida. Metabolic reconstruction of the metagenome sequences demonstrated a characteristic profile of heterotrophic fermentation of proteins and carbohydrates. This was supported by the detection of ethanol with stable decrease of pH values. To the best of our knowledge, this is the first study that explores the temporal changes in microbial successions over a period of 6 months, through metagenome shotgun sequencing in traditional Chinese soy sauce fermentation and the biological processes therein. PMID:25400624
Sulaiman, Joanita; Gan, Han Ming; Yin, Wai-Fong; Chan, Kok-Gan
2014-01-01
The quality of traditional Chinese soy sauce is determined by microbial communities and their inter-related metabolic roles in the fermentation tank. In this study, traditional Chinese soy sauce brine samples were obtained periodically to monitor the transitions of the microbial population and functional properties during the 6 months of fermentation process. Whole genome shotgun method revealed that the fermentation brine was dominated by the bacterial genus Weissella and later dominated by the fungal genus Candida. Metabolic reconstruction of the metagenome sequences demonstrated a characteristic profile of heterotrophic fermentation of proteins and carbohydrates. This was supported by the detection of ethanol with stable decrease of pH values. To the best of our knowledge, this is the first study that explores the temporal changes in microbial successions over a period of 6 months, through metagenome shotgun sequencing in traditional Chinese soy sauce fermentation and the biological processes therein.
Strategies to improve reference databases for soil microbiomes
Choi, Jinlyung; Yang, Fan; Stepanauskas, Ramunas; ...
2016-12-09
A database of curated genomes is needed to better assess soil microbial communities and their processes associated with differing land management and environmental impacts. Interpreting soil metagenomic datasets with existing sequence databases is challenging because these datasets are biased towards medical and biotechnology research and can result in misleading annotations. We have curated a database of 928 genomes of soil-associated organisms (888 bacteria, 34 archaea, and 6 fungi). Using this database as a representation of the current state of knowledge of soil microbes that are well-characterized, we evaluated its composition and compared it to broader microbial databases, specifically NCBI’s RefSeq,more » as well as 3,035 publicly available soil amplicon datasets. These comparisons identified phyla and functions that are enriched in soils as well as those that may be underrepresented in RefSoil. For example, RefSoil was observed to have increased representation of Firmicutes despite its low abundance in soil environments and also lacked representation of Acidobacteria and Verrucomicrobia, which are abundant in soils. Our comparison of RefSoil to soil amplicon datasets allowed us to identify targets that if cultured or sequenced would significantly increase the biodiversity represented within RefSoil. To demonstrate the opportunities to access these underrepresented targets, we employed single cell genomics in a pilot experiment to recover 14 genomes from the "most wanted" list, which improved RefSoil's representation of EMP sequences by 7% by abundance. This effort demonstrates the value of RefSoil in the guidance of future research efforts and the capability of single cell genomics as a practical means to fill the existing genomic data gaps.« less
Strategies to improve reference databases for soil microbiomes
DOE Office of Scientific and Technical Information (OSTI.GOV)
Choi, Jinlyung; Yang, Fan; Stepanauskas, Ramunas
A database of curated genomes is needed to better assess soil microbial communities and their processes associated with differing land management and environmental impacts. Interpreting soil metagenomic datasets with existing sequence databases is challenging because these datasets are biased towards medical and biotechnology research and can result in misleading annotations. We have curated a database of 928 genomes of soil-associated organisms (888 bacteria, 34 archaea, and 6 fungi). Using this database as a representation of the current state of knowledge of soil microbes that are well-characterized, we evaluated its composition and compared it to broader microbial databases, specifically NCBI’s RefSeq,more » as well as 3,035 publicly available soil amplicon datasets. These comparisons identified phyla and functions that are enriched in soils as well as those that may be underrepresented in RefSoil. For example, RefSoil was observed to have increased representation of Firmicutes despite its low abundance in soil environments and also lacked representation of Acidobacteria and Verrucomicrobia, which are abundant in soils. Our comparison of RefSoil to soil amplicon datasets allowed us to identify targets that if cultured or sequenced would significantly increase the biodiversity represented within RefSoil. To demonstrate the opportunities to access these underrepresented targets, we employed single cell genomics in a pilot experiment to recover 14 genomes from the "most wanted" list, which improved RefSoil's representation of EMP sequences by 7% by abundance. This effort demonstrates the value of RefSoil in the guidance of future research efforts and the capability of single cell genomics as a practical means to fill the existing genomic data gaps.« less
Ying, Jianchao; Wang, Huifeng; Bao, Bokan; Zhang, Ying; Zhang, Jinfang; Zhang, Cheng; Li, Aifang; Lu, Junwan; Li, Peizhen; Ying, Jun; Liu, Qi; Xu, Teng; Yi, Huiguang; Li, Jinsong; Zhou, Li; Zhou, Tieli; Xu, Zuyuan; Ni, Liyan; Bao, Qiyu
2015-01-01
The homocysteine methyltransferase encoded by mmuM is widely distributed among microbial organisms. It is the key enzyme that catalyzes the last step in methionine biosynthesis and plays an important role in the metabolism process. It also enables the microbial organisms to tolerate high concentrations of selenium in the environment. In this research, 533 mmuM gene sequences covering 70 genera of the bacteria were selected from GenBank database. The distribution frequency of mmuM is different in the investigated genera of bacteria. The mapping results of 160 mmuM reference sequences showed that the mmuM genes were found in 7 species of pathogen genomes sequenced in this work. The polymerase chain reaction products of one mmuM genotype (NC_013951 as the reference) were sequenced and the sequencing results confirmed the mapping results. Furthermore, 144 representative sequences were chosen for phylogenetic analysis and some mmuM genes from totally different genera (such as the genes between Escherichia and Klebsiella and between Enterobacter and Kosakonia) shared closer phylogenetic relationship than those from the same genus. Comparative genomic analysis of the mmuM encoding regions on plasmids and bacterial chromosomes showed that pKF3-140 and pIP1206 plasmids shared a 21 kb homology region and a 4.9 kb fragment in this region was in fact originated from the Escherichia coli chromosome. These results further suggested that mmuM gene did go through the gene horizontal transfer among different species or genera of bacteria. High-throughput sequencing combined with comparative genomics analysis would explore distribution and dissemination of the mmuM gene among bacteria and its evolution at a molecular level.
Genomic analysis of the symbiotic marine crenarchaeon, Cenarchaeumsymbiosum
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hallam, Steven J.; Konstantinidis, Konstantinos T.; Brochier,Celine
2006-06-24
Crenarchaea are ubiquitous and abundant microbial constituents of soils, sediments, lakes and ocean waters, yet relatively little is known about their fundamental evolutionary, ecological, and physiological properties. To better describe the ubiquitous nonthermophilic Crenarchaea, we analyzed the genome sequence of one representative, the uncultivated sponge symbiont, Cenarchaeum symbiosum. C. symbiosum genotypes coinhabiting the same host partitioned into two dominant populations, corresponding to previously described a- and b-type ribosomal RNA variants. Although synthetic, overlapping a- and b-type ribotypes harbored significant genetic variability. A single tiling path comprising the dominant a-type genotype was assembled, and used to explore the biological properties ofmore » C. symbiosum and its planktonic relatives. Out of a total of 2,066 predicted open reading frames, 36% were more highly conserved with other Archaea. The remainder partitioned between bacteria (18%), eukaryotes (1.5%) and viruses (0.1%). A total of 525 open reading frames were more highly conserved with sequences derived from marine environmental genomic surveys, most probably representing orthologous genes found in free-living planktonic Crenarchaea. The remaining genes partitioned between functional RNAs (2.4%), and hypotheticals (42%) with limited homology to known functional genes. The latter category likely contains genes specifically involved in mediated archaeal-sponge symbiosis. Phylogenetic analyses placed C. symbiosum as a basal crenarchaeon, sharing specific genomic features in common with either Crenarchaea, Euryarchaea, or both. The genome sequence of C. symbiosum reflect a unique and unusual evolutionary, physiological, and ecological history, one remarkably distinct from that of any other previously known microbial lineage.« less
Stratification of co-evolving genomic groups using ranked phylogenetic profiles
Freilich, Shiri; Goldovsky, Leon; Gottlieb, Assaf; Blanc, Eric; Tsoka, Sophia; Ouzounis, Christos A
2009-01-01
Background Previous methods of detecting the taxonomic origins of arbitrary sequence collections, with a significant impact to genome analysis and in particular metagenomics, have primarily focused on compositional features of genomes. The evolutionary patterns of phylogenetic distribution of genes or proteins, represented by phylogenetic profiles, provide an alternative approach for the detection of taxonomic origins, but typically suffer from low accuracy. Herein, we present rank-BLAST, a novel approach for the assignment of protein sequences into genomic groups of the same taxonomic origin, based on the ranking order of phylogenetic profiles of target genes or proteins across the reference database. Results The rank-BLAST approach is validated by computing the phylogenetic profiles of all sequences for five distinct microbial species of varying degrees of phylogenetic proximity, against a reference database of 243 fully sequenced genomes. The approach - a combination of sequence searches, statistical estimation and clustering - analyses the degree of sequence divergence between sets of protein sequences and allows the classification of protein sequences according to the species of origin with high accuracy, allowing taxonomic classification of 64% of the proteins studied. In most cases, a main cluster is detected, representing the corresponding species. Secondary, functionally distinct and species-specific clusters exhibit different patterns of phylogenetic distribution, thus flagging gene groups of interest. Detailed analyses of such cases are provided as examples. Conclusion Our results indicate that the rank-BLAST approach can capture the taxonomic origins of sequence collections in an accurate and efficient manner. The approach can be useful both for the analysis of genome evolution and the detection of species groups in metagenomics samples. PMID:19860884
NASA Astrophysics Data System (ADS)
Tully, B. J.; Sylvan, J. B.; Heidelberg, J. F.; Huber, J. A.
2014-12-01
There are many limitations involved with sampling microbial diversity from deep-sea subsurface environments, ranging from physical sample collection, low microbial biomass, culturing at in situ conditions, and inefficient nucleic acid extractions. As such, we are continually modifying our methods to obtain better results and expanding what we know about microbes in these environments. Here we present analysis of metagenomes sequences from samples collected from 120 m within the Louisville Seamount and from the top 5-10cm of the sediment in the center of the south Pacific gyre (SPG). Both systems are low biomass with ~102 and ~104 cells per cm3 for Louisville Seamount samples analyzed and the SPG sediment, respectively. The Louisville Seamount represents the first in situ subseafloor basalt and the SPG sediments represent the first in situ low biomass sediment microbial metagenomes. Both of these environments, subseafloor basalt and sediments underlying oligotrophic ocean gyres, represent large provinces of the seafloor environment that remain understudied. Despite the low biomass and DNA generated from these samples, we have generated 16 near complete genomes (5 from Louisville and 11 from the SPG) from the two metagenomic datasets. These genomes are estimated to be between 51-100% complete and span a range of phylogenetic groups, including the Proteobacteria, Actinobacteria, Firmicutes, Chloroflexi, and unclassified bacterial groups. With these genomes, we have assessed potential functional capabilities of these organisms and performed a comparative analysis between the environmental genomes and previously sequenced relatives to determine possible adaptations that may elucidate survival mechanisms for these low energy environments. These methods illustrate a baseline analysis that can be applied to future metagenomic deep-sea subsurface datasets and will help to further our understanding of microbiology within these environments.
The BioCyc collection of microbial genomes and metabolic pathways.
Karp, Peter D; Billington, Richard; Caspi, Ron; Fulcher, Carol A; Latendresse, Mario; Kothari, Anamika; Keseler, Ingrid M; Krummenacker, Markus; Midford, Peter E; Ong, Quang; Ong, Wai Kit; Paley, Suzanne M; Subhraveti, Pallavi
2017-08-17
BioCyc.org is a microbial genome Web portal that combines thousands of genomes with additional information inferred by computer programs, imported from other databases and curated from the biomedical literature by biologist curators. BioCyc also provides an extensive range of query tools, visualization services and analysis software. Recent advances in BioCyc include an expansion in the content of BioCyc in terms of both the number of genomes and the types of information available for each genome; an expansion in the amount of curated content within BioCyc; and new developments in the BioCyc software tools including redesigned gene/protein pages and metabolite pages; new search tools; a new sequence-alignment tool; a new tool for visualizing groups of related metabolic pathways; and a facility called SmartTables, which enables biologists to perform analyses that previously would have required a programmer's assistance. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Microbes of deep marine sediments as viewed by metagenomics
NASA Astrophysics Data System (ADS)
Biddle, J.
2015-12-01
Ten years after the first deep marine sediment metagenome was produced, questions still exist about the nucleic acid sequences we have retrieved. Current data sets, including the Peru Margin, Costa Rica Margin and Iberian Margin show that consistently, data forms larger assemblies at depth due to the reduced complexity of the microbial community. But are these organisms active or preserved? At SMTZs, a change in the assembly statistics is noted, as well as an increase in cell counts, suggesting that cells are truly active. As depth increases, genome sizes are consistently large, suggesting that much like soil microbes, sedimentary microbes may maintain a larger reportorie of genomic potential. Functional changes are seen with depth, but at many sites are not correlated to specific geochemistries. Individual genomes show changes with depth, which raises interesting questions on how the subsurface is settled and maintained. The subsurface does have a distinct genomic signature, including unusual microbial groups, which we are now able to analyze for total genomic content.
Comparative genomics of xylose-fermenting fungi for enhanced biofuel production
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wohlbach, Dana J.; Kuo, Alan; Sato, Trey K.
Cellulosic biomass is an abundant and underused substrate for biofuel production. The inability of many microbes to metabolize the pentose sugars abundant within hemicellulose creates specific challenges for microbial biofuel production from cellulosic material. Although engineered strains of Saccharomyces cerevisiae can use the pentose xylose, the fermentative capacity pales in comparison with glucose, limiting the economic feasibility of industrial fermentations. To better understand xylose utilization for subsequent microbial engineering, we sequenced the genomes of two xylose-fermenting, beetle-associated fungi, Spathaspora passalidarum and Candida tenuis. To identify genes involved in xylose metabolism, we applied a comparative genomic approach across 14 Ascomycete genomes,more » mapping phenotypes and genotypes onto the fungal phylogeny, and measured genomic expression across five Hemiascomycete species with different xylose-consumption phenotypes. This approach implicated many genes and processes involved in xylose assimilation. Several of these genes significantly improved xylose utilization when engineered into S. cerevisiae, demonstrating the power of comparative methods in rapidly identifying genes for biomass conversion while reflecting on fungal ecology.« less
Dai, Zhimin; Guo, Xue; Yin, Huaqun; Liang, Yili; Cong, Jing; Liu, Xueduan
2014-01-01
Biological nitrogen fixation is an essential function of acid mine drainage (AMD) microbial communities. However, most acidophiles in AMD environments are uncultured microorganisms and little is known about the diversity of nitrogen-fixing genes and structure of nif gene cluster in AMD microbial communities. In this study, we used metagenomic sequencing to isolate nif genes in the AMD microbial community from Dexing Copper Mine, China. Meanwhile, a metagenome microarray containing 7,776 large-insertion fosmids was constructed to screen novel nif gene clusters. Metagenomic analyses revealed that 742 sequences were identified as nif genes including structural subunit genes nifH, nifD, nifK and various additional genes. The AMD community is massively dominated by the genus Acidithiobacillus. However, the phylogenetic diversity of nitrogen-fixing microorganisms is much higher than previously thought in the AMD community. Furthermore, a 32.5-kb genomic sequence harboring nif, fix and associated genes was screened by metagenome microarray. Comparative genome analysis indicated that most nif genes in this cluster are most similar to those of Herbaspirillum seropedicae, but the organization of the nif gene cluster had significant differences from H. seropedicae. Sequence analysis and reverse transcription PCR also suggested that distinct transcription units of nif genes exist in this gene cluster. nifQ gene falls into the same transcription unit with fixABCX genes, which have not been reported in other diazotrophs before. All of these results indicated that more novel diazotrophs survive in the AMD community.
Yin, Huaqun; Liang, Yili; Cong, Jing; Liu, Xueduan
2014-01-01
Biological nitrogen fixation is an essential function of acid mine drainage (AMD) microbial communities. However, most acidophiles in AMD environments are uncultured microorganisms and little is known about the diversity of nitrogen-fixing genes and structure of nif gene cluster in AMD microbial communities. In this study, we used metagenomic sequencing to isolate nif genes in the AMD microbial community from Dexing Copper Mine, China. Meanwhile, a metagenome microarray containing 7,776 large-insertion fosmids was constructed to screen novel nif gene clusters. Metagenomic analyses revealed that 742 sequences were identified as nif genes including structural subunit genes nifH, nifD, nifK and various additional genes. The AMD community is massively dominated by the genus Acidithiobacillus. However, the phylogenetic diversity of nitrogen-fixing microorganisms is much higher than previously thought in the AMD community. Furthermore, a 32.5-kb genomic sequence harboring nif, fix and associated genes was screened by metagenome microarray. Comparative genome analysis indicated that most nif genes in this cluster are most similar to those of Herbaspirillum seropedicae, but the organization of the nif gene cluster had significant differences from H. seropedicae. Sequence analysis and reverse transcription PCR also suggested that distinct transcription units of nif genes exist in this gene cluster. nifQ gene falls into the same transcription unit with fixABCX genes, which have not been reported in other diazotrophs before. All of these results indicated that more novel diazotrophs survive in the AMD community. PMID:24498417
Roach, David J.; Burton, Joshua N.; Lee, Choli; Stackhouse, Bethany; Butler-Wu, Susan M.; Cookson, Brad T.
2015-01-01
Bacterial whole genome sequencing holds promise as a disruptive technology in clinical microbiology, but it has not yet been applied systematically or comprehensively within a clinical context. Here, over the course of one year, we performed prospective collection and whole genome sequencing of nearly all bacterial isolates obtained from a tertiary care hospital’s intensive care units (ICUs). This unbiased collection of 1,229 bacterial genomes from 391 patients enables detailed exploration of several features of clinical pathogens. A sizable fraction of isolates identified as clinically relevant corresponded to previously undescribed species: 12% of isolates assigned a species-level classification by conventional methods actually qualified as distinct, novel genomospecies on the basis of genomic similarity. Pan-genome analysis of the most frequently encountered pathogens in the collection revealed substantial variation in pan-genome size (1,420 to 20,432 genes) and the rate of gene discovery (1 to 152 genes per isolate sequenced). Surprisingly, although potential nosocomial transmission of actively surveilled pathogens was rare, 8.7% of isolates belonged to genomically related clonal lineages that were present among multiple patients, usually with overlapping hospital admissions, and were associated with clinically significant infection in 62% of patients from which they were recovered. Multi-patient clonal lineages were particularly evident in the neonatal care unit, where seven separate Staphylococcus epidermidis clonal lineages were identified, including one lineage associated with bacteremia in 5/9 neonates. Our study highlights key differences in the information made available by conventional microbiological practices versus whole genome sequencing, and motivates the further integration of microbial genome sequencing into routine clinical care. PMID:26230489
NASA Astrophysics Data System (ADS)
Daly, R. A.; Mouser, P. J.; Trexler, R.; Wrighton, K. C.
2014-12-01
Despite a growing appreciation for the ecological role of viruses in marine and gut systems, little is known about their role in the terrestrial deep (> 2000 m) subsurface. We used assembly-based metagenomics to examine the viral component in fluids from hydraulically fractured Marcellus shale gas wells. Here we reconstructed microbial and viral genomes from samples collected 7, 82, and 328 days post fracturing. Viruses accounted for 4.14%, 0.92% and 0.59% of the sample reads that mapped to the assembly. We identified 6 complete, circularized viral genomes and an additional 92 viral contigs > 5 kb with a maximum contig size of 73.6 kb. A BLAST comparison to NCBI viral genomes revealed that 85% of viral contigs had significant hits to the viral order Caudovirales, with 43% of sequences belonging to the family Siphoviridae, 38% to Myoviridae, and 12% to Podoviridae. Enrichment of Caudovirales viruses was supported by a large number of predicted proteins characteristic of tailed viruses including terminases (TerL), tape measure, tail formation, and baseplate related proteins. The viral contigs included evidence of lytic and temperate lifestyles, with the 7 day sample having the greatest number of detected lytic viruses. Notably in this sample, the most abundant virus was lytic and its inferred host, a member of the Vibrionaceae, was not detected at later time points. Analyses of CRISPR sequences (a viral and foreign DNA immune system in bacteria and archaea), linked 18 viral contigs to hosts. CRISPR linkages increased through time and all bacterial and archaeal genomes recovered in the final time point had genes for CRISPR-mediated viral defense. The majority of CRISPR sequences linked phage genomes to several Halanaerobium strains, which are the dominant and persisting members of the community inferred to be responsible for carbon and sulfur cycling in these shales. Network analysis revealed that several viruses were present in the 82 and 328 day samples; this viral persistence is consistent with concomitant temporal stability in geochemistry and microbial community composition. Our findings suggest that after a disturbance (hydraulic fracturing) viral predation and host immunity is an important controller of microbial community structure, metabolism, and thus biogeochemical cycling in the deep subsurface.
Genetic Comparison of B. Anthracis and its Close Relatives Using AFLP and PCR Analysis
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jackson, P.J.; Hill, K.K.; Laker, M.T.
1999-02-01
Amplified Fragment length Polymorphism (AFLP) analysis allows a rapid, relatively simple analysis of a large portion of a microbial genome, providing information about the species and its phylogenetic relationship to other microbes (Vos, et al., 1995). The method simply surveys the genome for length and sequence polymorphisms. The pattern identified can be used for comparison to the genomes of other species. Unlike other methods, it does not rely on analysis of a single genetic locus that may bias the interpretation of results and it does not require any prior knowledge of the targeted organism. Moreover, a standard set of reagentsmore » can be applied to any species without using species-specific information or molecular probes. The authors are using AFLP's to rapidly identify different bacterial species. A comparison of AFLP profiles generated from a large battery of B. anthracis strains shows very little variability among different isolates (Keim, et al., 1997). By contrast, there is a significant difference between AFLP profiles generated for any B. anthracis strain and even the most closely related Bacillus species. Sufficient variability is apparent among all known microbial species to allow phylogenetic analysis based on large numbers of genetically unlinked loci. These striking differences among AFLP profiles allow unambiguous identification of previously identified species and phylogenetic placement of newly characterized isolates relative to known species based on a large number of independent genetic loci. Data generated thus far show that the method provides phylogenetic analyses that are consistent with other widely accepted phylogenetic methods. However, AFLP analysis provides a more detailed analysis of the targets and samples a much larger portion of the genome. Consequently, it provides an inexpensive, rapid means of characterizing microbial isolates to further differentiate among strains and closely related microbial species. Such information cannot be rapidly generated by other means. AFLP sample analysis quickly generates a very large amount of molecular information about microbial genomes. However, this information cannot be analyzed rapidly using manual methods. The authors are developing a large archive of electronic AFLP signatures that is being used to identify isolates collected from medical, veterinary, forensic and environmental samples. They are also developing the computational packages necessary to rapidly and unambiguously analyze the AFLP profiles and conduct a phylogenetic comparison of these data relative to information already in the database. They will use this archive and the associated algorithms to determine the species identity of previously uncharacterized isolates and place them phylogenetically relative to other microbes based on their AFLP signatures. This study provides significant new information about microbes with environmental, veterinary and medical significance. This information can be used in further studies to understand the relationships among these species and the factors that distinguish them from one another. It should also allow identification of unique factors that contribute to important microbial traits including pathogenicity and virulence. They are also using AFLP data to identify, isolate and sequence DNA fragments that are unique to particular microbial species and strains. The fragment patterns and sequence information provide insights into the complexity and organization of bacterial genomes relative to one another. They also provide the information necessary for development of species-specific PCR primers that can be used to interrogate complex samples for the presence of B. anthracis, other microbial pathogens or their remnants.« less
Bernard, Guillaume; Pathmanathan, Jananan S; Lannes, Romain; Lopez, Philippe; Bapteste, Eric
2018-01-01
Abstract Microbes are the oldest and most widespread, phylogenetically and metabolically diverse life forms on Earth. However, they have been discovered only 334 years ago, and their diversity started to become seriously investigated even later. For these reasons, microbial studies that unveil novel microbial lineages and processes affecting or involving microbes deeply (and repeatedly) transform knowledge in biology. Considering the quantitative prevalence of taxonomically and functionally unassigned sequences in environmental genomics data sets, and that of uncultured microbes on the planet, we propose that unraveling the microbial dark matter should be identified as a central priority for biologists. Based on former empirical findings of microbial studies, we sketch a logic of discovery with the potential to further highlight the microbial unknowns. PMID:29420719
Christen, Matthias; Deutsch, Samuel; Christen, Beat
2015-08-21
Recent advances in synthetic biology have resulted in an increasing demand for the de novo synthesis of large-scale DNA constructs. Any process improvement that enables fast and cost-effective streamlining of digitized genetic information into fabricable DNA sequences holds great promise to study, mine, and engineer genomes. Here, we present Genome Calligrapher, a computer-aided design web tool intended for whole genome refactoring of bacterial chromosomes for de novo DNA synthesis. By applying a neutral recoding algorithm, Genome Calligrapher optimizes GC content and removes obstructive DNA features known to interfere with the synthesis of double-stranded DNA and the higher order assembly into large DNA constructs. Subsequent bioinformatics analysis revealed that synthesis constraints are prevalent among bacterial genomes. However, a low level of codon replacement is sufficient for refactoring bacterial genomes into easy-to-synthesize DNA sequences. To test the algorithm, 168 kb of synthetic DNA comprising approximately 20 percent of the synthetic essential genome of the cell-cycle bacterium Caulobacter crescentus was streamlined and then ordered from a commercial supplier of low-cost de novo DNA synthesis. The successful assembly into eight 20 kb segments indicates that Genome Calligrapher algorithm can be efficiently used to refactor difficult-to-synthesize DNA. Genome Calligrapher is broadly applicable to recode biosynthetic pathways, DNA sequences, and whole bacterial genomes, thus offering new opportunities to use synthetic biology tools to explore the functionality of microbial diversity. The Genome Calligrapher web tool can be accessed at https://christenlab.ethz.ch/GenomeCalligrapher .
Predicting ecological roles in the rhizosphere using metabolome and transportome modeling
Larsen, Peter E.; Collart, Frank R.; Dai, Yang; ...
2015-09-02
The ability to obtain complete genome sequences from bacteria in environmental samples, such as soil samples from the rhizosphere, has highlighted the microbial diversity and complexity of environmental communities. New algorithms to analyze genome sequence information in the context of community structure are needed to enhance our understanding of the specific ecological roles of these organisms in soil environments. We present a machine learning approach using sequenced Pseudomonad genomes coupled with outputs of metabolic and transportomic computational models for identifying the most predictive molecular mechanisms indicative of a Pseudomonad’s ecological role in the rhizosphere: a biofilm, biocontrol agent, promoter ofmore » plant growth, or plant pathogen. Computational predictions of ecological niche were highly accurate overall with models trained on transportomic model output being the most accurate (Leave One Out Validation F-scores between 0.82 and 0.89). The strongest predictive molecular mechanism features for rhizosphere ecological niche overlap with many previously reported analyses of Pseudomonad interactions in the rhizosphere, suggesting that this approach successfully informs a system-scale level understanding of how Pseudomonads sense and interact with their environments. The observation that an organism’s transportome is highly predictive of its ecological niche is a novel discovery and may have implications in our understanding microbial ecology. The framework developed here can be generalized to the analysis of any bacteria across a wide range of environments and ecological niches making this approach a powerful tool for providing insights into functional predictions from bacterial genomic data.« less
Predicting Ecological Roles in the Rhizosphere Using Metabolome and Transportome Modeling
DOE Office of Scientific and Technical Information (OSTI.GOV)
Larsen, Peter E.; Collart, Frank R.; Dai, Yang
2015-09-02
The ability to obtain complete genome sequences from bacteria in environmental samples, such as soil samples from the rhizosphere, has highlighted the microbial diversity and complexity of environmental communities. However, new algorithms to analyze genome sequence information in the context of community structure are needed to enhance our understanding of the specific ecological roles of these organisms in soil environments. We present a machine learning approach using sequenced Pseudomonad genomes coupled with outputs of metabolic and transportomic computational models for identifying the most predictive molecular mechanisms indicative of a Pseudomonad's ecological role in the rhizosphere: a biofilm, biocontrol agent, promotermore » of plant growth, or plant pathogen. Computational predictions of ecological niche were highly accurate overall with models trained on transportomic model output being the most accurate (Leave One Out Validation F-scores between 0.82 and 0.89). The strongest predictive molecular mechanism features for rhizosphere ecological niche overlap with many previously reported analyses of Pseudomonad interactions in the rhizosphere, suggesting that this approach successfully informs a system-scale level understanding of how Pseudomonads sense and interact with their environments. The observation that an organism's transportome is highly predictive of its ecological niche is a novel discovery and may have implications in our understanding microbial ecology. The framework developed here can be generalized to the analysis of any bacteria across a wide range of environments and ecological niches making this approach a powerful tool for providing insights into functional predictions from bacterial genomic data.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Aizenberg-Gershtein, Yana; Izhaki, Ido; Lapidus, Alla
We report that the Phaseolibacter flectens strain ATCC 12775 T (Halpern et al., Int J Syst Evol Microbiol 63:268–273, 2013) is a Gram-negative, rod shaped, motile, aerobic, chemoorganotroph bacterium. Ph. flectens is as a plant-pathogenic bacterium on pods of French bean and was first identified by Johnson (1956) as Pseudomonas flectens. After its phylogenetic position was reexamined, Pseudomonas flectens was transferred to the family Enterobacteriaceae as Phaseolibacter flectens gen. nov., comb. nov. Here we describe the features of this organism, together with the draft genome sequence and annotation. The DNA GC content is 44.34 mol%. The chromosome length is 2,748,442more » bp. It encodes 2,437 proteins and 89 RNA genes. Ph. flectens genome is part of the Genomic Encyclopedia of Type Strains, Phase I: the one thousand microbial genomes study.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Mukherjee, Supratim; Lapidus, Alla; Shapiro, Nicole
2015-01-01
Pontibacter roseus Suresh et al 2006 is a member of genus Pontibacter family Cytophagaceae, class Cytophagia. While the type species of the genus Pontibacter actiniarum was isolated in 2005 from a marine environment, subsequent species of the same genus have been found in different types of habitats ranging from seawater, sediment, desert soil, rhizosphere, contaminated sites, solar saltern and muddy water. Here we describe the features of Pontibacter roseus strain SRC-1T along with its complete genome sequence and annotation from a culture of DSM 17521T. The 4,581,480 bp long draft genome consists of 12 scaffolds with 4,003 protein-coding and 50more » RNA genes and is a part of Genomic encyclopedia of Type Strains, Phase I: the one thousand microbial genomes (KMG-I) project.« less
Aizenberg-Gershtein, Yana; Izhaki, Ido; Lapidus, Alla; ...
2016-01-13
We report that the Phaseolibacter flectens strain ATCC 12775 T (Halpern et al., Int J Syst Evol Microbiol 63:268–273, 2013) is a Gram-negative, rod shaped, motile, aerobic, chemoorganotroph bacterium. Ph. flectens is as a plant-pathogenic bacterium on pods of French bean and was first identified by Johnson (1956) as Pseudomonas flectens. After its phylogenetic position was reexamined, Pseudomonas flectens was transferred to the family Enterobacteriaceae as Phaseolibacter flectens gen. nov., comb. nov. Here we describe the features of this organism, together with the draft genome sequence and annotation. The DNA GC content is 44.34 mol%. The chromosome length is 2,748,442more » bp. It encodes 2,437 proteins and 89 RNA genes. Ph. flectens genome is part of the Genomic Encyclopedia of Type Strains, Phase I: the one thousand microbial genomes study.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Yassin, Atteyet F.; Lapidus, Alla; Han, James
We report that the Corynebacterium ulceribovis strain IMMIB L-1395T (= DSM 45146T) is an aerobic to facultative anaerobic, Gram-positive, non-spore-forming, non-motile rod-shaped bacterium that was isolated from the skin of the udder of a cow, in Schleswig Holstein, Germany. The cell wall of C. ulceribovis contains corynemycolic acids. The cellular fatty acids are those described for the genus Corynebacterium, but tuberculostearic acid is not present. Here we describe the features of C. ulceribovis strain IMMIB L-1395T, together with genome sequence information and its annotation. The 2,300,451 bp long genome containing 2,104 protein-coding genes and 54 RNA-encoding genes and is partmore » of the Genomic Encyclopedia of Type Strains, Phase I: the one thousand microbial genomes (KMG) project.« less
Yassin, Atteyet F.; Lapidus, Alla; Han, James; ...
2015-08-05
We report that the Corynebacterium ulceribovis strain IMMIB L-1395T (= DSM 45146T) is an aerobic to facultative anaerobic, Gram-positive, non-spore-forming, non-motile rod-shaped bacterium that was isolated from the skin of the udder of a cow, in Schleswig Holstein, Germany. The cell wall of C. ulceribovis contains corynemycolic acids. The cellular fatty acids are those described for the genus Corynebacterium, but tuberculostearic acid is not present. Here we describe the features of C. ulceribovis strain IMMIB L-1395T, together with genome sequence information and its annotation. The 2,300,451 bp long genome containing 2,104 protein-coding genes and 54 RNA-encoding genes and is partmore » of the Genomic Encyclopedia of Type Strains, Phase I: the one thousand microbial genomes (KMG) project.« less
Recovery of a Medieval Brucella melitensis Genome Using Shotgun Metagenomics
Kay, Gemma L.; Sergeant, Martin J.; Giuffra, Valentina; Bandiera, Pasquale; Milanese, Marco; Bramanti, Barbara
2014-01-01
ABSTRACT Shotgun metagenomics provides a powerful assumption-free approach to the recovery of pathogen genomes from contemporary and historical material. We sequenced the metagenome of a calcified nodule from the skeleton of a 14th-century middle-aged male excavated from the medieval Sardinian settlement of Geridu. We obtained 6.5-fold coverage of a Brucella melitensis genome. Sequence reads from this genome showed signatures typical of ancient or aged DNA. Despite the relatively low coverage, we were able to use information from single-nucleotide polymorphisms to place the medieval pathogen genome within a clade of B. melitensis strains that included the well-studied Ether strain and two other recent Italian isolates. We confirmed this placement using information from deletions and IS711 insertions. We conclude that metagenomics stands ready to document past and present infections, shedding light on the emergence, evolution, and spread of microbial pathogens. PMID:25028426
DOE Office of Scientific and Technical Information (OSTI.GOV)
Weng, Li; Rubin, Edward M.; Bristow, James
Ecologists studying microbial life in the environment have recognized the enormous complexity of microbial diversity for more than a decade (Whitman et al. 1998). The development of a variety of culture-independent methods, many of them coupled with high-throughput DNA sequencing, has allowed this diversity to be explored in ever greater detail (Handelsman 2004; Harris et al. 2004; Hugenholtz et al. 1998; Moreira and Lopez-Garcia 2002; Rappe and Giovannoni 2003). Despite the widespread application of these new techniques to the characterization of uncultivated microbes and microbial communities in the environment, their application to human health and disease has lagged behind. Becausemore » these techniques now allow not only cataloging of microbial diversity, but also insight into microbial functions, it is time for clinical microbiologists to apply these tools to the microbial communities that abound on and within us, in what has been aptly called ''the second Human Genome Project'' (Relman and Falkow 2001). In this review we will discuss the sequence-based methods for microbial analysis that are currently available and their application to identify novel human pathogens, improve diagnosis and treatment of known infectious diseases, and finally to advance understanding of our relationship with microbial communities that normally reside in and on the human body.« less
Conceptualizing a Genomics Software Institute (GSI)
Gilbert, Jack A.; Catlett, Charlie; Desai, Narayan; Knight, Rob; White, Owen; Robbins, Robert; Sankaran, Rajesh; Sansone, Susanna-Assunta; Field, Dawn; Meyer, Folker
2012-01-01
Microbial ecology has been enhanced greatly by the ongoing ‘omics revolution, bringing half the world's biomass and most of its biodiversity into analytical view for the first time; indeed, it feels almost like the invention of the microscope and the discovery of the new world at the same time. With major microbial ecology research efforts accumulating prodigious quantities of sequence, protein, and metabolite data, we are now poised to address environmental microbial research at macro scales, and to begin to characterize and understand the dimensions of microbial biodiversity on the planet. What is currently impeding progress is the need for a framework within which the research community can develop, exchange and discuss predictive ecosystem models that describe the biodiversity and functional interactions. Such a framework must encompass data and metadata transparency and interoperation; data and results validation, curation, and search; application programming interfaces for modeling and analysis tools; and human and technical processes and services necessary to ensure broad adoption. Here we discuss the need for focused community interaction to augment and deepen established community efforts, beginning with the Genomic Standards Consortium (GSC), to create a science-driven strategic plan for a Genomic Software Institute (GSI). PMID:22675605
Termite hindguts and the ecology of microbial communities in the sequencing age.
Tai, Vera; Keeling, Patrick J
2013-01-01
Advances in high-throughput nucleic acid sequencing have improved our understanding of microbial communities in a number of ways. Deeper sequence coverage provides the means to assess diversity at the resolution necessary to recover ecological and biogeographic patterns, and at the same time single-cell genomics provides detailed information about the interactions between members of a microbial community. Given the vastness and complexity of microbial ecosystems, such analyses remain challenging for most environments, so greater insight can also be drawn from analysing less dynamic ecosystems. Here, we outline the advantages of one such environment, the wood-digesting hindgut communities of termites and cockroaches, and how it is a model to examine and compare both protist and bacterial communities. Beyond the analysis of diversity, our understanding of protist community ecology will depend on using statistically sound sampling regimes at biologically relevant scales, transitioning from discovery-based to experimental ecology, incorporating single-cell microbiology and other data sources, and continued development of analytical tools. © 2013 The Author(s) Journal of Eukaryotic Microbiology © 2013 International Society of Protistologists.
Architecture of a Species: Phylogenomics of Staphylococcus aureus.
Planet, Paul J; Narechania, Apurva; Chen, Liang; Mathema, Barun; Boundy, Sam; Archer, Gordon; Kreiswirth, Barry
2017-02-01
A deluge of whole-genome sequencing has begun to give insights into the patterns and processes of microbial evolution, but genome sequences have accrued in a haphazard manner, with biased sampling of natural variation that is driven largely by medical and epidemiological priorities. For instance, there is a strong bias for sequencing epidemic lineages of methicillin-resistant Staphylococcus aureus (MRSA) over sensitive isolates (methicillin-sensitive S. aureus: MSSA). As more diverse genomes are sequenced the emerging picture is of a highly subdivided species with a handful of relatively clonal groups (complexes) that, at any given moment, dominate in particular geographical regions. The establishment of hegemony of particular clones appears to be a dynamic process of successive waves of replacement of the previously dominant clone. Here we review the phylogenomic structure of a diverse range of S. aureus, including both MRSA and MSSA. We consider the utility of the concept of the 'core' genome and the impact of recombination and horizontal transfer. We argue that whole-genome surveillance of S. aureus populations could lead to better forecasting of antibiotic resistance and virulence of emerging clones, and a better understanding of the elusive biological factors that determine repeated strain replacement. Copyright © 2016. Published by Elsevier Ltd.
Legault, Boris A; Lopez-Lopez, Arantxa; Alba-Casado, Jose Carlos; Doolittle, W Ford; Bolhuis, Henk; Rodriguez-Valera, Francisco; Papke, R Thane
2006-01-01
Background Mature saturated brine (crystallizers) communities are largely dominated (>80% of cells) by the square halophilic archaeon "Haloquadratum walsbyi". The recent cultivation of the strain HBSQ001 and thesequencing of its genome allows comparison with the metagenome of this taxonomically simplified environment. Similar studies carried out in other extreme environments have revealed very little diversity in gene content among the cell lineages present. Results The metagenome of the microbial community of a crystallizer pond has been analyzed by end sequencing a 2000 clone fosmid library and comparing the sequences obtained with the genome sequence of "Haloquadratum walsbyi". The genome of the sequenced strain was retrieved nearly complete within this environmental DNA library. However, many ORF's that could be ascribed to the "Haloquadratum" metapopulation by common genome characteristics or scaffolding to the strain genome were not present in the specific sequenced isolate. Particularly, three regions of the sequenced genome were associated with multiple rearrangements and the presence of different genes from the metapopulation. Many transposition and phage related genes were found within this pool which, together with the associated atypical GC content in these areas, supports lateral gene transfer mediated by these elements as the most probable genetic cause of this variability. Additionally, these sequences were highly enriched in putative regulatory and signal transduction functions. Conclusion These results point to a large pan-genome (total gene repertoire of the genus/species) even in this highly specialized extremophile and at a single geographic location. The extensive gene repertoire is what might be expected of a population that exploits a diverse nutrient pool, resulting from the degradation of biomass produced at lower salinities. PMID:16820057
Microbial stratification and microbially catalyzed processes along a hypersaline chemocline
NASA Astrophysics Data System (ADS)
Hyde, A.; Joye, S. B.; Teske, A.
2017-12-01
Orca Basin is the largest deep hypersaline anoxic basin in the world, covering over 400 km2. Located at the bottom of the Gulf of Mexico, this body of water reaches depths of 200 meters and is 8 times denser (and more saline) than the overlying seawater. The sharp pycnocline prevents any significant vertical mixing and serves as a particle trap for sinking organic matter. These rapid changes in salinity, oxygen, organic matter, and other geochemical parameters present unique conditions for the microbial communities present. We collected samples in 10m intervals throughout the chemocline. After filtering the water, we used high-throughput bacterial and archaeal 16S rRNA gene sequencing to characterize the changing microbial community along the Orca Basin chemocline. The results reveal a dominance of microbial taxa whose biogeochemical function is entirely unknown. We then used metagenomic sequencing and reconstructed genomes for select samples, revealing the potential dominant metabolic processes in the Orca Basin chemocline. Understanding how these unique geochemical conditions shape microbial communities and metabolic capabilities will have implications for the ocean's biogeochemical cycles and the consequences of expanding oxygen minimum zones.
Simmons, Sheri L; Dibartolo, Genevieve; Denef, Vincent J; Goltsman, Daniela S Aliaga; Thelen, Michael P; Banfield, Jillian F
2008-07-22
Deeply sampled community genomic (metagenomic) datasets enable comprehensive analysis of heterogeneity in natural microbial populations. In this study, we used sequence data obtained from the dominant member of a low-diversity natural chemoautotrophic microbial community to determine how coexisting closely related individuals differ from each other in terms of gene sequence and gene content, and to uncover evidence of evolutionary processes that occur over short timescales. DNA sequence obtained from an acid mine drainage biofilm was reconstructed, taking into account the effects of strain variation, to generate a nearly complete genome tiling path for a Leptospirillum group II species closely related to L. ferriphilum (sampling depth approximately 20x). The population is dominated by one sequence type, yet we detected evidence for relatively abundant variants (>99.5% sequence identity to the dominant type) at multiple loci, and a few rare variants. Blocks of other Leptospirillum group II types ( approximately 94% sequence identity) have recombined into one or more variants. Variant blocks of both types are more numerous near the origin of replication. Heterogeneity in genetic potential within the population arises from localized variation in gene content, typically focused in integrated plasmid/phage-like regions. Some laterally transferred gene blocks encode physiologically important genes, including quorum-sensing genes of the LuxIR system. Overall, results suggest inter- and intrapopulation genetic exchange involving distinct parental genome types and implicate gain and loss of phage and plasmid genes in recent evolution of this Leptospirillum group II population. Population genetic analyses of single nucleotide polymorphisms indicate variation between closely related strains is not maintained by positive selection, suggesting that these regions do not represent adaptive differences between strains. Thus, the most likely explanation for the observed patterns of polymorphism is divergence of ancestral strains due to geographic isolation, followed by mixing and subsequent recombination.
Denef, Vincent J; Goltsman, Daniela S. Aliaga; Thelen, Michael P; Banfield, Jillian F
2008-01-01
Deeply sampled community genomic (metagenomic) datasets enable comprehensive analysis of heterogeneity in natural microbial populations. In this study, we used sequence data obtained from the dominant member of a low-diversity natural chemoautotrophic microbial community to determine how coexisting closely related individuals differ from each other in terms of gene sequence and gene content, and to uncover evidence of evolutionary processes that occur over short timescales. DNA sequence obtained from an acid mine drainage biofilm was reconstructed, taking into account the effects of strain variation, to generate a nearly complete genome tiling path for a Leptospirillum group II species closely related to L. ferriphilum (sampling depth ∼20×). The population is dominated by one sequence type, yet we detected evidence for relatively abundant variants (>99.5% sequence identity to the dominant type) at multiple loci, and a few rare variants. Blocks of other Leptospirillum group II types (∼94% sequence identity) have recombined into one or more variants. Variant blocks of both types are more numerous near the origin of replication. Heterogeneity in genetic potential within the population arises from localized variation in gene content, typically focused in integrated plasmid/phage-like regions. Some laterally transferred gene blocks encode physiologically important genes, including quorum-sensing genes of the LuxIR system. Overall, results suggest inter- and intrapopulation genetic exchange involving distinct parental genome types and implicate gain and loss of phage and plasmid genes in recent evolution of this Leptospirillum group II population. Population genetic analyses of single nucleotide polymorphisms indicate variation between closely related strains is not maintained by positive selection, suggesting that these regions do not represent adaptive differences between strains. Thus, the most likely explanation for the observed patterns of polymorphism is divergence of ancestral strains due to geographic isolation, followed by mixing and subsequent recombination. PMID:18651792
Exploring Arabidopsis thaliana Root Endophytes via Single-Cell Genomics
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lundberg, Derek; Woyke, Tanja; Tringe, Susannah
2014-03-19
Land plants grow in association with microbial communities both on their surfaces and inside the plant (endophytes). The relationships between microbes and their host can vary from pathogenic to mutualistic. Colonization of the endophyte compartment occurs in the presence of a sophisticated plant immune system, implying finely tuned discrimination of pathogens from mutualists and commensals. Despite the importance of the microbiome to the plant, relatively little is known about the specific interactions between plants and microbes, especially in the case of endophytes. The vast majority of microbes have not been grown in the lab, and thus one of the fewmore » ways of studying them is by examining their DNA. Although metagenomics is a powerful tool for examining microbial communities, its application to endophyte samples is technically difficult due to the presence of large amounts of host plant DNA in the sample. One method to address these difficulties is single-cell genomics where a single microbial cell is isolated from a sample, lysed, and its genome amplified by multiple displacement amplification (MDA) to produce enough DNA for genome sequencing. This produces a single-cell amplified genome (SAG). We have applied this technology to study the endophytic microbes in Arabidopsis thaliana roots. Extensive 16S gene profiling of the microbial communities in the roots of multiple inbred A. thaliana strains has identified 164 OTUs as being significantly enriched in all the root endophyte samples compared to their presence in bulk soil.« less
Population genomics of intrapatient HIV-1 evolution
Zanini, Fabio; Brodin, Johanna; Thebo, Lina; Lanz, Christa; Bratt, Göran; Albert, Jan; Neher, Richard A
2015-01-01
Many microbial populations rapidly adapt to changing environments with multiple variants competing for survival. To quantify such complex evolutionary dynamics in vivo, time resolved and genome wide data including rare variants are essential. We performed whole-genome deep sequencing of HIV-1 populations in 9 untreated patients, with 6-12 longitudinal samples per patient spanning 5-8 years of infection. The data can be accessed and explored via an interactive web application. We show that patterns of minor diversity are reproducible between patients and mirror global HIV-1 diversity, suggesting a universal landscape of fitness costs that control diversity. Reversions towards the ancestral HIV-1 sequence are observed throughout infection and account for almost one third of all sequence changes. Reversion rates depend strongly on conservation. Frequent recombination limits linkage disequilibrium to about 100bp in most of the genome, but strong hitch-hiking due to short range linkage limits diversity. DOI: http://dx.doi.org/10.7554/eLife.11282.001 PMID:26652000
Quail, Michael A; Smith, Miriam; Coupland, Paul; Otto, Thomas D; Harris, Simon R; Connor, Thomas R; Bertoni, Anna; Swerdlow, Harold P; Gu, Yong
2012-07-24
Next generation sequencing (NGS) technology has revolutionized genomic and genetic research. The pace of change in this area is rapid with three major new sequencing platforms having been released in 2011: Ion Torrent's PGM, Pacific Biosciences' RS and the Illumina MiSeq. Here we compare the results obtained with those platforms to the performance of the Illumina HiSeq, the current market leader. In order to compare these platforms, and get sufficient coverage depth to allow meaningful analysis, we have sequenced a set of 4 microbial genomes with mean GC content ranging from 19.3 to 67.7%. Together, these represent a comprehensive range of genome content. Here we report our analysis of that sequence data in terms of coverage distribution, bias, GC distribution, variant detection and accuracy. Sequence generated by Ion Torrent, MiSeq and Pacific Biosciences technologies displays near perfect coverage behaviour on GC-rich, neutral and moderately AT-rich genomes, but a profound bias was observed upon sequencing the extremely AT-rich genome of Plasmodium falciparum on the PGM, resulting in no coverage for approximately 30% of the genome. We analysed the ability to call variants from each platform and found that we could call slightly more variants from Ion Torrent data compared to MiSeq data, but at the expense of a higher false positive rate. Variant calling from Pacific Biosciences data was possible but higher coverage depth was required. Context specific errors were observed in both PGM and MiSeq data, but not in that from the Pacific Biosciences platform. All three fast turnaround sequencers evaluated here were able to generate usable sequence. However there are key differences between the quality of that data and the applications it will support.
Remus-Emsermann, Mitja N P; Schmid, Michael; Gekenidis, Maria-Theresia; Pelludat, Cosima; Frey, Jürg E; Ahrens, Christian H; Drissner, David
2016-01-01
Pseudomonas citronellolis is a Gram negative, motile gammaproteobacterium belonging to the order Pseudomonadales and the family Pseudomonadaceae . We isolated strain P3B5 from the phyllosphere of basil plants ( Ocimum basilicum L.). Here we describe the physiology of this microorganism, its full genome sequence, and detailed annotation. The 6.95 Mbp genome contains 6071 predicted protein coding sequences and 96 RNA coding sequences. P. citronellolis has been the subject of many studies including the investigation of long-chain aliphatic compounds and terpene degradation. Plant leaves are covered by long-chain aliphates making up a waxy layer that is associated with the leaf cuticle. In addition, basil leaves are known to contain high amounts of terpenoid substances, hinting to a potential nutrient niche that might be exploited by P. citronellolis . Furthermore, the isolated strain exhibited resistance to several antibiotics. To evaluate the potential of this strain as source of transferable antibiotic resistance genes on raw consumed herbs we therefore investigated if those resistances are encoded on mobile genetic elements. The availability of the genome will be helpful for comparative genomics of the phylogenetically broad pseudomonads, in particular with the sequence of the P. citronellolis type strain PRJDB205 not yet publicly available. The genome is discussed with respect to a phyllosphere related lifestyle, aliphate and terpenoid degradation, and antibiotic resistance.
Zhou, Jizhong; He, Zhili; Yang, Yunfeng; Deng, Ye; Tringe, Susannah G; Alvarez-Cohen, Lisa
2015-01-27
Understanding the structure, functions, activities and dynamics of microbial communities in natural environments is one of the grand challenges of 21st century science. To address this challenge, over the past decade, numerous technologies have been developed for interrogating microbial communities, of which some are amenable to exploratory work (e.g., high-throughput sequencing and phenotypic screening) and others depend on reference genes or genomes (e.g., phylogenetic and functional gene arrays). Here, we provide a critical review and synthesis of the most commonly applied "open-format" and "closed-format" detection technologies. We discuss their characteristics, advantages, and disadvantages within the context of environmental applications and focus on analysis of complex microbial systems, such as those in soils, in which diversity is high and reference genomes are few. In addition, we discuss crucial issues and considerations associated with applying complementary high-throughput molecular technologies to address important ecological questions. Copyright © 2015 Zhou et al.
He, Zhili; Yang, Yunfeng; Deng, Ye; Tringe, Susannah G.; Alvarez-Cohen, Lisa
2015-01-01
ABSTRACT Understanding the structure, functions, activities and dynamics of microbial communities in natural environments is one of the grand challenges of 21st century science. To address this challenge, over the past decade, numerous technologies have been developed for interrogating microbial communities, of which some are amenable to exploratory work (e.g., high-throughput sequencing and phenotypic screening) and others depend on reference genes or genomes (e.g., phylogenetic and functional gene arrays). Here, we provide a critical review and synthesis of the most commonly applied “open-format” and “closed-format” detection technologies. We discuss their characteristics, advantages, and disadvantages within the context of environmental applications and focus on analysis of complex microbial systems, such as those in soils, in which diversity is high and reference genomes are few. In addition, we discuss crucial issues and considerations associated with applying complementary high-throughput molecular technologies to address important ecological questions. PMID:25626903
Zhou, Jizhong; He, Zhili; Yang, Yunfeng; ...
2015-01-27
Understanding the structure, functions, activities and dynamics of microbial communities in natural environments is one of the grand challenges of 21st century science. To address this challenge, over the past decade, numerous technologies have been developed for interrogating microbial communities, of which some are amenable to exploratory work (e.g., high-throughput sequencing and phenotypic screening) and others depend on reference genes or genomes (e.g., phylogenetic and functional gene arrays). Here, we provide a critical review and synthesis of the most commonly applied “open-format” and “closed-format” detection technologies. We discuss their characteristics, advantages, and disadvantages within the context of environmental applications andmore » focus on analysis of complex microbial systems, such as those in soils, in which diversity is high and reference genomes are few. In addition, we discuss crucial issues and considerations associated with applying complementary high-throughput molecular technologies to address important ecological questions.« less
Metagenome assembly through clustering of next-generation sequencing data using protein sequences.
Sim, Mikang; Kim, Jaebum
2015-02-01
The study of environmental microbial communities, called metagenomics, has gained a lot of attention because of the recent advances in next-generation sequencing (NGS) technologies. Microbes play a critical role in changing their environments, and the mode of their effect can be solved by investigating metagenomes. However, the difficulty of metagenomes, such as the combination of multiple microbes and different species abundance, makes metagenome assembly tasks more challenging. In this paper, we developed a new metagenome assembly method by utilizing protein sequences, in addition to the NGS read sequences. Our method (i) builds read clusters by using mapping information against available protein sequences, and (ii) creates contig sequences by finding consensus sequences through probabilistic choices from the read clusters. By using simulated NGS read sequences from real microbial genome sequences, we evaluated our method in comparison with four existing assembly programs. We found that our method could generate relatively long and accurate metagenome assemblies, indicating that the idea of using protein sequences, as a guide for the assembly, is promising. Copyright © 2015 Elsevier B.V. All rights reserved.
Michael Dunne, W; Pouseele, Hannes; Monecke, Stefan; Ehricht, Ralf; van Belkum, Alex
2017-09-21
The magnitude of interest in the epidemiology of transmissible human diseases is reflected in the vast number of tools and methods developed recently with the expressed purpose to characterize and track evolutionary changes that occur in agents of these diseases over time. Within the past decade a new suite of such tools has become available with the emergence of the so-called "omics" technologies. Among these, two are exponents of the ongoing genomic revolution. Firstly, high-density nucleic acid probe arrays have been proposed and developed using various chemical and physical approaches. Via hybridization-mediated detection of entire genes or genetic polymorphisms in such genes and intergenic regions these so called "DNA chips" have been successfully applied for distinguishing very closely related microbial species and strains. Second and even more phenomenal, next generation sequencing (NGS) has facilitated the assessment of the complete nucleotide sequence of entire microbial genomes. This technology currently provides the most detailed level of bacterial genotyping and hence allows for the resolution of microbial spread and short-term evolution in minute detail. We will here review the very recent history of these two technologies, sketch their usefulness in the elucidation of the spread and epidemiology of mostly hospital-acquired infections and discuss future developments. Copyright © 2017 Elsevier B.V. All rights reserved.
Intelligibility in microbial complex systems: Wittgenstein and the score of life.
Baquero, Fernando; Moya, Andrés
2012-01-01
Knowledge in microbiology is reaching an extreme level of diversification and complexity, which paradoxically results in a strong reduction in the intelligibility of microbial life. In our days, the "score of life" metaphor is more accurate to express the complexity of living systems than the classic "book of life." Music and life can be represented at lower hierarchical levels by music scores and genomic sequences, and such representations have a generational influence in the reproduction of music and life. If music can be considered as a representation of life, such representation remains as unthinkable as life itself. The analysis of scores and genomic sequences might provide mechanistic, phylogenetic, and evolutionary insights into music and life, but not about their real dynamics and nature, which is still maintained unthinkable, as was proposed by Wittgenstein. As complex systems, life or music is composed by thinkable and only showable parts, and a strategy of half-thinking, half-seeing is needed to expand knowledge. Complex models for complex systems, based on experiences on trans-hierarchical integrations, should be developed in order to provide a mixture of legibility and imageability of biological processes, which should lead to higher levels of intelligibility of microbial life.
Intelligibility in microbial complex systems: Wittgenstein and the score of life
Baquero, Fernando; Moya, Andrés
2012-01-01
Knowledge in microbiology is reaching an extreme level of diversification and complexity, which paradoxically results in a strong reduction in the intelligibility of microbial life. In our days, the “score of life” metaphor is more accurate to express the complexity of living systems than the classic “book of life.” Music and life can be represented at lower hierarchical levels by music scores and genomic sequences, and such representations have a generational influence in the reproduction of music and life. If music can be considered as a representation of life, such representation remains as unthinkable as life itself. The analysis of scores and genomic sequences might provide mechanistic, phylogenetic, and evolutionary insights into music and life, but not about their real dynamics and nature, which is still maintained unthinkable, as was proposed by Wittgenstein. As complex systems, life or music is composed by thinkable and only showable parts, and a strategy of half-thinking, half-seeing is needed to expand knowledge. Complex models for complex systems, based on experiences on trans-hierarchical integrations, should be developed in order to provide a mixture of legibility and imageability of biological processes, which should lead to higher levels of intelligibility of microbial life. PMID:22919679
Complete genome sequence of Rhodospirillum rubrum type strain (S1).
Munk, A Christine; Copeland, Alex; Lucas, Susan; Lapidus, Alla; Del Rio, Tijana Glavina; Barry, Kerrie; Detter, John C; Hammon, Nancy; Israni, Sanjay; Pitluck, Sam; Brettin, Thomas; Bruce, David; Han, Cliff; Tapia, Roxanne; Gilna, Paul; Schmutz, Jeremy; Larimer, Frank; Land, Miriam; Kyrpides, Nikos C; Mavromatis, Konstantinos; Richardson, Paul; Rohde, Manfred; Göker, Markus; Klenk, Hans-Peter; Zhang, Yaoping; Roberts, Gary P; Reslewic, Susan; Schwartz, David C
2011-07-01
Rhodospirillum rubrum (Esmarch 1887) Molisch 1907 is the type species of the genus Rhodospirillum, which is the type genus of the family Rhodospirillaceae in the class Alphaproteobacteria. The species is of special interest because it is an anoxygenic phototroph that produces extracellular elemental sulfur (instead of oxygen) while harvesting light. It contains one of the most simple photosynthetic systems currently known, lacking light harvesting complex 2. Strain S1(T) can grow on carbon monoxide as sole energy source. With currently over 1,750 PubMed entries, R. rubrum is one of the most intensively studied microbial species, in particular for physiological and genetic studies. Next to R. centenum strain SW, the genome sequence of strain S1(T) is only the second genome of a member of the genus Rhodospirillum to be published, but the first type strain genome from the genus. The 4,352,825 bp long chromosome and 53,732 bp plasmid with a total of 3,850 protein-coding and 83 RNA genes were sequenced as part of the DOE Joint Genome Institute Program DOEM 2002.
Laviad, Sivan; Lapidus, Alla; Copeland, Alex; ...
2015-05-08
Leucobacter chironomi strain MM2LBT (Halpern et al., Int J Syst Evol Microbiol 59:665-70 2009) is a Gram-positive, rod shaped, non-motile, aerobic, chemoorganotroph bacterium. L. chironomi belongs to the family Microbacteriaceae, a family within the class Actinobacteria. Strain MM2LBT was isolated from a chironomid (Diptera; Chironomidae) egg mass that was sampled from a waste stabilization pond in northern Israel. In a phylogenetic tree based on 16S rRNA gene sequences, strain MM2LBT formed a distinct branch within the radiation encompassing the genus Leucobacter. Here we describe the features of this organism, together with the complete genome sequence and annotation. We find thatmore » the DNA GC content is 69.90%. The chromosome length is 2,964,712 bp. It encodes 2,690 proteins and 61 RNA genes. L. chironomi genome is part of the Genomic Encyclopedia of Type Strains, Phase I: the one thousand microbial genomes (KMG) project.« less
Complex Microbial Communities: Weâre not in Kansas Anymore
Fraser-Liggett, Claire M.
2018-05-08
Claire Fraser-Liggett, Director of the Institute for Genome Sciences and professor at the University of Maryland School of Medicine, gives the June 2, 2010 keynote at the "Sequencing, Finishing, Analysis in the Future" meeting in Santa Fe, NM.
López-Pérez, Mario; Kimes, Nikole E; Haro-Moreno, Jose M; Rodriguez-Valera, Francisco
2016-01-01
We have used two metagenomic approaches, direct sequencing of natural samples and sequencing after enrichment, to characterize communities of prokaryotes associated to particles. In the first approximation, different size filters (0.22 and 5 μm) were used to identify prokaryotic microbes of free-living and particle-attached bacterial communities in the Mediterranean water column. A subtractive metagenomic approach was used to characterize the dominant microbial groups in the large size fraction that were not present in the free-living one. They belonged mainly to Actinobacteria, Planctomycetes, Flavobacteria and Proteobacteria. In addition, marine microbial communities enriched by incubation with different kinds of particulate material have been studied by metagenomic assembly. Different particle kinds (diatomaceous earth, sand, chitin and cellulose) were colonized by very different communities of bacteria belonging to Roseobacter, Vibrio, Bacteriovorax, and Lacinutrix that were distant relatives of genomes already described from marine habitats. Besides, using assembly from deep metagenomic sequencing from the particle-specific enrichments we were able to determine a total of 20 groups of contigs (eight of them with >50% completeness) and reconstruct de novo five new genomes of novel species within marine clades (>79% completeness and <1.8% contamination). We also describe for the first time the genome of a marine Rhizobiales phage that seems to infect a broad range of Alphaproteobacteria and live in habitats as diverse as soil, marine sediment and water column. The metagenomic recruitment of the communities found by direct sequencing of the large size filter and by enrichment had nearly no overlap. These results indicate that these reconstructed genomes are part of the rare biosphere which exists at nominal levels under natural conditions.
Nielsen, Tue Kjærgaard; Rasmussen, Morten; Demanèche, Sandrine; Cecillon, Sébastien; Vogel, Timothy M.
2017-01-01
Abstract Bacterial degraders of chlorophenoxy herbicides have been isolated from various ecosystems, including pristine environments. Among these degraders, the sphingomonads constitute a prominent group that displays versatile xenobiotic-degradation capabilities. Four separate sequencing strategies were required to provide the complete sequence of the complex and plastic genome of the canonical chlorophenoxy herbicide-degrading Sphingobium herbicidovorans MH. The genome has an intricate organization of the chlorophenoxy-herbicide catabolic genes sdpA, rdpA, and cadABCD that encode the (R)- and (S)-enantiomer-specific 2,4-dichlorophenoxypropionate dioxygenases and four subunits of a Rieske non-heme iron oxygenase involved in 2-methyl-chlorophenoxyacetic acid degradation, respectively. Several major genomic rearrangements are proposed to help understand the evolution and mobility of these important genes and their genetic context. Single-strain mobilomic sequence analysis uncovered plasmids and insertion sequence-associated circular intermediates in this environmentally important bacterium and enabled the description of evolutionary models for pesticide degradation in strain MH and related organisms. The mobilome presented a complex mosaic of mobile genetic elements including four plasmids and several circular intermediate DNA molecules of insertion-sequence elements and transposons that are central to the evolution of xenobiotics degradation. Furthermore, two individual chromosomally integrated prophages were shown to excise and form free circular DNA molecules. This approach holds great potential for improving the understanding of genome plasticity, evolution, and microbial ecology. PMID:28961970
Draft genome of bagasse-degrading bacteria Bacillus aryabhattai GZ03 from deep sea water.
Wen, Jian; Ren, Chong; Huang, Nan; Liu, Yang; Zeng, Runying
2015-02-01
Bacillus aryabhattai GZ03 was isolated from deep sea water of the South China Sea, which can produce glucose and fructose by degrading bagasse at 25 °C. Here we report the draft genome sequence of Bacillus aryabhattai GZ03. The data obtained revealed 37 contigs with genome size of 5,105,129 bp and G+C content of 38.09%. The draft genome of B. aryabhattai GZ03 may provide insights into the mechanism of microbial carbohydrate and lignocellulosic material degradation. Copyright © 2014 Elsevier B.V. All rights reserved.
Use of simulated data sets to evaluate the fidelity of metagenomic processing methods
DOE Office of Scientific and Technical Information (OSTI.GOV)
Mavromatis, K; Ivanova, N; Barry, Kerrie
2007-01-01
Metagenomics is a rapidly emerging field of research for studying microbial communities. To evaluate methods presently used to process metagenomic sequences, we constructed three simulated data sets of varying complexity by combining sequencing reads randomly selected from 113 isolate genomes. These data sets were designed to model real metagenomes in terms of complexity and phylogenetic composition. We assembled sampled reads using three commonly used genome assemblers (Phrap, Arachne and JAZZ), and predicted genes using two popular gene-finding pipelines (fgenesb and CRITICA/GLIMMER). The phylogenetic origins of the assembled contigs were predicted using one sequence similarity-based ( blast hit distribution) and twomore » sequence composition-based (PhyloPythia, oligonucleotide frequencies) binning methods. We explored the effects of the simulated community structure and method combinations on the fidelity of each processing step by comparison to the corresponding isolate genomes. The simulated data sets are available online to facilitate standardized benchmarking of tools for metagenomic analysis.« less
Use of simulated data sets to evaluate the fidelity of Metagenomicprocessing methods
DOE Office of Scientific and Technical Information (OSTI.GOV)
Mavromatis, Konstantinos; Ivanova, Natalia; Barry, Kerri
2006-12-01
Metagenomics is a rapidly emerging field of research for studying microbial communities. To evaluate methods presently used to process metagenomic sequences, we constructed three simulated data sets of varying complexity by combining sequencing reads randomly selected from 113 isolate genomes. These data sets were designed to model real metagenomes in terms of complexity and phylogenetic composition. We assembled sampled reads using three commonly used genome assemblers (Phrap, Arachne and JAZZ), and predicted genes using two popular gene finding pipelines (fgenesb and CRITICA/GLIMMER). The phylogenetic origins of the assembled contigs were predicted using one sequence similarity--based (blast hit distribution) and twomore » sequence composition--based (PhyloPythia, oligonucleotide frequencies) binning methods. We explored the effects of the simulated community structure and method combinations on the fidelity of each processing step by comparison to the corresponding isolate genomes. The simulated data sets are available online to facilitate standardized benchmarking of tools for metagenomic analysis.« less
Zhu, Chengsheng; Miller, Maximilian
2018-01-01
Abstract Microbial functional diversification is driven by environmental factors, i.e. microorganisms inhabiting the same environmental niche tend to be more functionally similar than those from different environments. In some cases, even closely phylogenetically related microbes differ more across environments than across taxa. While microbial similarities are often reported in terms of taxonomic relationships, no existing databases directly link microbial functions to the environment. We previously developed a method for comparing microbial functional similarities on the basis of proteins translated from their sequenced genomes. Here, we describe fusionDB, a novel database that uses our functional data to represent 1374 taxonomically distinct bacteria annotated with available metadata: habitat/niche, preferred temperature, and oxygen use. Each microbe is encoded as a set of functions represented by its proteome and individual microbes are connected via common functions. Users can search fusionDB via combinations of organism names and metadata. Moreover, the web interface allows mapping new microbial genomes to the functional spectrum of reference bacteria, rendering interactive similarity networks that highlight shared functionality. fusionDB provides a fast means of comparing microbes, identifying potential horizontal gene transfer events, and highlighting key environment-specific functionality. PMID:29112720
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hua, Zheng-Shuang; Han, Yu-Jiao; Chen, Lin-Xing
Here we report that high-throughput sequencing is expanding our knowledge of microbial diversity in the environment. Still, understanding the metabolic potentials and ecological roles of rare and uncultured microbes in natural communities remains a major challenge. To this end, we applied a ‘divide and conquer’ strategy that partitioned a massive metagenomic data set (>100 Gbp) into subsets based on K-mer frequency in sequence assembly to a low-diversity acid mine drainage (AMD) microbial community and, by integrating with an additional metatranscriptomic assembly, successfully obtained 11 draft genomes most of which represent yet uncultured and/or rare taxa (relative abundance <1%). We reportmore » the first genome of a naturally occurring Ferrovum population (relative abundance >90%) and its metabolic potentials and gene expression profile, providing initial molecular insights into the ecological role of these lesser known, but potentially important, microorganisms in the AMD environment. Gene transcriptional analysis of the active taxa revealed major metabolic capabilities executed in situ, including carbon- and nitrogen-related metabolisms associated with syntrophic interactions, iron and sulfur oxidation, which are key in energy conservation and AMD generation, and the mechanisms of adaptation and response to the environmental stresses (heavy metals, low pH and oxidative stress). Remarkably, nitrogen fixation and sulfur oxidation were performed by the rare taxa, indicating their critical roles in the overall functioning and assembly of the AMD community. Finally, our study demonstrates the potential of the ‘divide and conquer’ strategy in high-throughput sequencing data assembly for genome reconstruction and functional partitioning analysis of both dominant and rare species in natural microbial assemblages.« less
Hua, Zheng-Shuang; Han, Yu-Jiao; Chen, Lin-Xing; ...
2014-11-07
Here we report that high-throughput sequencing is expanding our knowledge of microbial diversity in the environment. Still, understanding the metabolic potentials and ecological roles of rare and uncultured microbes in natural communities remains a major challenge. To this end, we applied a ‘divide and conquer’ strategy that partitioned a massive metagenomic data set (>100 Gbp) into subsets based on K-mer frequency in sequence assembly to a low-diversity acid mine drainage (AMD) microbial community and, by integrating with an additional metatranscriptomic assembly, successfully obtained 11 draft genomes most of which represent yet uncultured and/or rare taxa (relative abundance <1%). We reportmore » the first genome of a naturally occurring Ferrovum population (relative abundance >90%) and its metabolic potentials and gene expression profile, providing initial molecular insights into the ecological role of these lesser known, but potentially important, microorganisms in the AMD environment. Gene transcriptional analysis of the active taxa revealed major metabolic capabilities executed in situ, including carbon- and nitrogen-related metabolisms associated with syntrophic interactions, iron and sulfur oxidation, which are key in energy conservation and AMD generation, and the mechanisms of adaptation and response to the environmental stresses (heavy metals, low pH and oxidative stress). Remarkably, nitrogen fixation and sulfur oxidation were performed by the rare taxa, indicating their critical roles in the overall functioning and assembly of the AMD community. Finally, our study demonstrates the potential of the ‘divide and conquer’ strategy in high-throughput sequencing data assembly for genome reconstruction and functional partitioning analysis of both dominant and rare species in natural microbial assemblages.« less
One Bacterial Cell, One Complete Genome
DOE Office of Scientific and Technical Information (OSTI.GOV)
Woyke, Tanja; Tighe, Damon; Mavrommatis, Konstantinos
2010-04-26
While the bulk of the finished microbial genomes sequenced to date are derived from cultured bacterial and archaeal representatives, the vast majority of microorganisms elude current culturing attempts, severely limiting the ability to recover complete or even partial genomes from these environmental species. Single cell genomics is a novel culture-independent approach, which enables access to the genetic material of an individual cell. No single cell genome has to our knowledge been closed and finished to date. Here we report the completed genome from an uncultured single cell of Candidatus Sulcia muelleri DMIN. Digital PCR on single symbiont cells isolated frommore » the bacteriome of the green sharpshooter Draeculacephala minerva bacteriome allowed us to assess that this bacteria is polyploid with genome copies ranging from approximately 200?900 per cell, making it a most suitable target for single cell finishing efforts. For single cell shotgun sequencing, an individual Sulcia cell was isolated and whole genome amplified by multiple displacement amplification (MDA). Sanger-based finishing methods allowed us to close the genome. To verify the correctness of our single cell genome and exclude MDA-derived artifacts, we independently shotgun sequenced and assembled the Sulcia genome from pooled bacteriomes using a metagenomic approach, yielding a nearly identical genome. Four variations we detected appear to be genuine biological differences between the two samples. Comparison of the single cell genome with bacteriome metagenomic sequence data detected two single nucleotide polymorphisms (SNPs), indicating extremely low genetic diversity within a Sulcia population. This study demonstrates the power of single cell genomics to generate a complete, high quality, non-composite reference genome within an environmental sample, which can be used for population genetic analyzes.« less
McAlpine, James B
2009-03-27
Over the past decade major changes have occurred in the access to genome sequences that encode the enzymes responsible for the biosynthesis of secondary metabolites, knowledge of how those sequences translate into the final structure of the metabolite, and the ability to alter the sequence to obtain predicted products via both homologous and heterologous expression. Novel genera have been discovered leading to new chemotypes, but more surprisingly several instances have been uncovered where the apparently general rules of modular translation have not applied. Several new biosynthetic pathways have been unearthed, and our general knowledge grows rapidly. This review aims to highlight some of the more striking discoveries and advances of the decade.
NASA Technical Reports Server (NTRS)
Everroad, R. Craig; Stuart, Rhona K.; Bebout, Brad M.; Detweiler, Angela M.; Lee, Jackson Zan; Woebken, Dagmar; Bebout, Leslie E.; Pett-Ridge, Jennifer
2016-01-01
The nonheterocystous filamentous cyanobacterium, strain ESFC-1, is a recently described member of the order Oscillatoriales within the Cyanobacteria. ESFC-1 has been shown to be a major diazotroph in the intertidal microbial mat system at Elkhorn Slough, CA, USA. Based on phylogenetic analyses of the 16S RNA gene, ESFC-1 appears to belong to a unique, genus-level divergence; the draft genome sequence of this strain has now been determined. Here we report features of this genome as they relate to the ecological functions and capabilities of strain ESFC-1. The 5,632,035 bp genome sequence encodes 4914 protein-coding genes and 92 RNA genes. One striking feature of this cyanobacterium is the apparent lack of either uptake or bi-directional hydrogenases typically expected within a diazotroph. Additionally, a large genomic island is found that contains numerous low GC-content genes and genes related to extracellular polysaccharide production and cell wall synthesis and maintenance.
Everroad, R. Craig; Stuart, Rhona K.; Bebout, Brad M.; ...
2016-08-24
The nonheterocystous filamentous cyanobacterium, strain ESFC-1, is a recently described member of the order Oscillatoriales within the Cyanobacteria. ESFC-1 has been shown to be a major diazotroph in the intertidal microbial mat system at Elkhorn Slough, CA, USA. Based on phylogenetic analyses of the 16S RNA gene, ESFC-1 appears to belong to a unique, genus-level divergence; the draft genome sequence of this strain has now been determined. Here we report features of this genome as they relate to the ecological functions and capabilities of strain ESFC-1. The 5,632,035 bp genome sequence encodes 4914 protein-coding genes and 92 RNA genes. Onemore » striking feature of this cyanobacterium is the apparent lack of either uptake or bi-directional hydrogenases typically expected within a diazotroph. In addition, a large genomic island is found that contains numerous low GC-content genes and genes related to extracellular polysaccharide production and cell wall synthesis and maintenance.« less
Ntougias, Spyridon; Lapidus, Alla; Copeland, Alex; ...
2015-08-13
Members of the genus Halotalea (family Halomonadaceae) are of high significance since they can tolerate the greatest glucose and maltose concentrations ever reported for known bacteria and are involved in the degradation of industrial effluents. Here, the characteristics and the permanent-draft genome sequence and annotation of Halotalea alkalilenta AW-7T are described. The microorganism was sequenced as a part of the Genomic Encyclopedia of Type Strains, Phase I: the one thousand microbial genomes (KMG) project at the DOE Joint Genome Institute, and it is the only strain within the genus Halotalea having its genome sequenced. The genome is 4,467,826 bp longmore » and consists of 40 scaffolds with 64.62 % average GC content. A total of 4,104 genes were predicted, comprising of 4,028 protein-coding and 76 RNA genes. Most protein-coding genes (87.79 %) were assigned to a putative function. Halotalea alkalilenta AW-7T encodes the catechol and protocatechuate degradation to β-ketoadipate via the β-ketoadipate and protocatechuate ortho-cleavage degradation pathway, and it possesses the genetic ability to detoxify fluoroacetate, cyanate and acrylonitrile. Lastly, an emended description of the genus Halotalea Ntougias et al. 2007 is also provided in order to describe the delayed fermentation ability of the type strain.« less
Hovde, Blake T.; Deodato, Chloe R.; Hunsperger, Heather M.; Ryken, Scott A.; Yost, Will; Jha, Ramesh K.; Patterson, Johnathan; Monnat, Raymond J.; Barlow, Steven B.; Starkenburg, Shawn R.; Cattolico, Rose Ann
2015-01-01
Haptophytes are recognized as seminal players in aquatic ecosystem function. These algae are important in global carbon sequestration, form destructive harmful blooms, and given their rich fatty acid content, serve as a highly nutritive food source to a broad range of eco-cohorts. Haptophyte dominance in both fresh and marine waters is supported by the mixotrophic nature of many taxa. Despite their importance the nuclear genome sequence of only one haptophyte, Emiliania huxleyi (Isochrysidales), is available. Here we report the draft genome sequence of Chrysochromulina tobin (Prymnesiales), and transcriptome data collected at seven time points over a 24-hour light/dark cycle. The nuclear genome of C. tobin is small (59 Mb), compact (∼40% of the genome is protein coding) and encodes approximately 16,777 genes. Genes important to fatty acid synthesis, modification, and catabolism show distinct patterns of expression when monitored over the circadian photoperiod. The C. tobin genome harbors the first hybrid polyketide synthase/non-ribosomal peptide synthase gene complex reported for an algal species, and encodes potential anti-microbial peptides and proteins involved in multidrug and toxic compound extrusion. A new haptophyte xanthorhodopsin was also identified, together with two “red” RuBisCO activases that are shared across many algal lineages. The Chrysochromulina tobin genome sequence provides new information on the evolutionary history, ecology and economic importance of haptophytes. PMID:26397803
Enabling large-scale next-generation sequence assembly with Blacklight
Couger, M. Brian; Pipes, Lenore; Squina, Fabio; Prade, Rolf; Siepel, Adam; Palermo, Robert; Katze, Michael G.; Mason, Christopher E.; Blood, Philip D.
2014-01-01
Summary A variety of extremely challenging biological sequence analyses were conducted on the XSEDE large shared memory resource Blacklight, using current bioinformatics tools and encompassing a wide range of scientific applications. These include genomic sequence assembly, very large metagenomic sequence assembly, transcriptome assembly, and sequencing error correction. The data sets used in these analyses included uncategorized fungal species, reference microbial data, very large soil and human gut microbiome sequence data, and primate transcriptomes, composed of both short-read and long-read sequence data. A new parallel command execution program was developed on the Blacklight resource to handle some of these analyses. These results, initially reported previously at XSEDE13 and expanded here, represent significant advances for their respective scientific communities. The breadth and depth of the results achieved demonstrate the ease of use, versatility, and unique capabilities of the Blacklight XSEDE resource for scientific analysis of genomic and transcriptomic sequence data, and the power of these resources, together with XSEDE support, in meeting the most challenging scientific problems. PMID:25294974
Venkateswaran, Kasthuri; Singh, Nitin K; Checinska Sielaff, Aleksandra; Pope, Robert K; Bergman, Nicholas H; van Tongeren, Sandra P; Patel, Nisha B; Lawson, Paul A; Satomi, Masataka; Williamson, Charles H D; Sahl, Jason W; Keim, Paul; Pierson, Duane; Perry, Jay
2017-01-01
In an ongoing Microbial Observatory investigation of the International Space Station (ISS), 11 Bacillus strains (2 from the Kibo Japanese experimental module, 4 from the U.S. segment, and 5 from the Russian module) were isolated and their whole genomes were sequenced. A comparative analysis of the 16S rRNA gene sequences of these isolates showed the highest similarity (>99%) to the Bacillus anthracis - B. cereus - B. thuringiensis group. The fatty acid composition, polar lipid profile, peptidoglycan type, and matrix-assisted laser desorption ionization-time of flight profiles were consistent with the B. cereus sensu lato group. The phenotypic traits such as motile rods, enterotoxin production, lack of capsule, and resistance to gamma phage/penicillin observed in ISS isolates were not characteristics of B. anthracis . Whole-genome sequence characterizations showed that ISS strains had the plcR non- B. anthracis ancestral "C" allele and lacked anthrax toxin-encoding plasmids pXO1 and pXO2, excluding their identification as B. anthracis . The genetic identities of all 11 ISS isolates characterized via gyrB analyses arbitrarily identified them as members of the B. cereus group, but traditional DNA-DNA hybridization (DDH) showed that the ISS isolates are similar to B. anthracis (88% to 90%) but distant from the B. cereus (42%) and B. thuringiensis (48%) type strains. The DDH results were supported by average nucleotide identity (>98.5%) and digital DDH (>86%) analyses. However, the collective phenotypic traits and genomic evidence were the reasons to exclude the ISS isolates from B. anthracis . Nevertheless, multilocus sequence typing and whole-genome single nucleotide polymorphism analyses placed these isolates in a clade that is distinct from previously described members of the B. cereus sensu lato group but closely related to B. anthracis . IMPORTANCE The International Space Station Microbial Observatory (Microbial Tracking-1) study is generating a microbial census of the space station's surfaces and atmosphere by using advanced molecular microbial community analysis techniques supported by traditional culture-based methods and modern bioinformatic computational modeling. This approach will lead to long-term, multigenerational studies of microbial population dynamics in a closed environment and address key questions, including whether microgravity influences the evolution and genetic modification of microorganisms. The spore-forming Bacillus cereus sensu lato group consists of pathogenic ( B. anthracis ), food poisoning ( B. cereus ), and biotechnologically useful ( B. thuringiensis ) microorganisms; their presence in a closed system such as the ISS might be a concern for the health of crew members. A detailed characterization of these potential pathogens would lead to the development of suitable countermeasures that are needed for long-term future missions and a better understanding of microorganisms associated with space missions.
Singh, Nitin K.; Checinska Sielaff, Aleksandra; Pope, Robert K.; Bergman, Nicholas H.; van Tongeren, Sandra P.; Patel, Nisha B.; Lawson, Paul A.; Satomi, Masataka; Williamson, Charles H. D.; Sahl, Jason W.; Pierson, Duane; Perry, Jay
2017-01-01
ABSTRACT In an ongoing Microbial Observatory investigation of the International Space Station (ISS), 11 Bacillus strains (2 from the Kibo Japanese experimental module, 4 from the U.S. segment, and 5 from the Russian module) were isolated and their whole genomes were sequenced. A comparative analysis of the 16S rRNA gene sequences of these isolates showed the highest similarity (>99%) to the Bacillus anthracis-B. cereus-B. thuringiensis group. The fatty acid composition, polar lipid profile, peptidoglycan type, and matrix-assisted laser desorption ionization–time of flight profiles were consistent with the B. cereus sensu lato group. The phenotypic traits such as motile rods, enterotoxin production, lack of capsule, and resistance to gamma phage/penicillin observed in ISS isolates were not characteristics of B. anthracis. Whole-genome sequence characterizations showed that ISS strains had the plcR non-B. anthracis ancestral “C” allele and lacked anthrax toxin-encoding plasmids pXO1 and pXO2, excluding their identification as B. anthracis. The genetic identities of all 11 ISS isolates characterized via gyrB analyses arbitrarily identified them as members of the B. cereus group, but traditional DNA-DNA hybridization (DDH) showed that the ISS isolates are similar to B. anthracis (88% to 90%) but distant from the B. cereus (42%) and B. thuringiensis (48%) type strains. The DDH results were supported by average nucleotide identity (>98.5%) and digital DDH (>86%) analyses. However, the collective phenotypic traits and genomic evidence were the reasons to exclude the ISS isolates from B. anthracis. Nevertheless, multilocus sequence typing and whole-genome single nucleotide polymorphism analyses placed these isolates in a clade that is distinct from previously described members of the B. cereus sensu lato group but closely related to B. anthracis. IMPORTANCE The International Space Station Microbial Observatory (Microbial Tracking-1) study is generating a microbial census of the space station’s surfaces and atmosphere by using advanced molecular microbial community analysis techniques supported by traditional culture-based methods and modern bioinformatic computational modeling. This approach will lead to long-term, multigenerational studies of microbial population dynamics in a closed environment and address key questions, including whether microgravity influences the evolution and genetic modification of microorganisms. The spore-forming Bacillus cereus sensu lato group consists of pathogenic (B. anthracis), food poisoning (B. cereus), and biotechnologically useful (B. thuringiensis) microorganisms; their presence in a closed system such as the ISS might be a concern for the health of crew members. A detailed characterization of these potential pathogens would lead to the development of suitable countermeasures that are needed for long-term future missions and a better understanding of microorganisms associated with space missions. PMID:28680972
Lee, Patrick K H; Men, Yujie; Wang, Shanquan; He, Jianzhong; Alvarez-Cohen, Lisa
2015-02-03
Dehalococcoides mccartyi are functionally important bacteria that catalyze the reductive dechlorination of chlorinated ethenes. However, these anaerobic bacteria are fastidious to isolate, making downstream genomic characterization challenging. In order to facilitate genomic analysis, a fluorescence-activated cell sorting (FACS) method was developed in this study to separate D. mccartyi cells from a microbial community, and the DNA of the isolated cells was processed by whole genome amplification (WGA) and hybridized onto a D. mccartyi microarray for comparative genomics against four sequenced strains. First, FACS was successfully applied to a D. mccartyi isolate as positive control, and then microarray results verified that WGA from 10(6) cells or ∼1 ng of genomic DNA yielded high-quality coverage detecting nearly all genes across the genome. As expected, some inter- and intrasample variability in WGA was observed, but these biases were minimized by performing multiple parallel amplifications. Subsequent application of the FACS and WGA protocols to two enrichment cultures containing ∼10% and ∼1% D. mccartyi cells successfully enabled genomic analysis. As proof of concept, this study demonstrates that coupling FACS with WGA and microarrays is a promising tool to expedite genomic characterization of target strains in environmental communities where the relative concentrations are low.
Zarins-Tutt, Joseph Scott; Barberi, Tania Triscari; Gao, Hong; Mearns-Spragg, Andrew; Zhang, Lixin; Newman, David J; Goss, Rebecca Jane Miriam
2016-01-01
Covering: up to 2015. Over the centuries, microbial secondary metabolites have played a central role in the treatment of human diseases and have revolutionised the pharmaceutical industry. With the increasing number of sequenced microbial genomes revealing a plethora of novel biosynthetic genes, natural product drug discovery is entering an exciting second golden age. Here, we provide a concise overview as an introductory guide to the main methods employed to unlock or up-regulate these so called 'cryptic', 'silent' and 'orphan' gene clusters, and increase the production of the encoded natural product. With a predominant focus on bacterial natural products we will discuss the importance of the bioinformatics approach for genome mining, the use of first different and simple culturing techniques and then the application of genetic engineering to unlock the microbial treasure trove.
CLAST: CUDA implemented large-scale alignment search tool.
Yano, Masahiro; Mori, Hiroshi; Akiyama, Yutaka; Yamada, Takuji; Kurokawa, Ken
2014-12-11
Metagenomics is a powerful methodology to study microbial communities, but it is highly dependent on nucleotide sequence similarity searching against sequence databases. Metagenomic analyses with next-generation sequencing technologies produce enormous numbers of reads from microbial communities, and many reads are derived from microbes whose genomes have not yet been sequenced, limiting the usefulness of existing sequence similarity search tools. Therefore, there is a clear need for a sequence similarity search tool that can rapidly detect weak similarity in large datasets. We developed a tool, which we named CLAST (CUDA implemented large-scale alignment search tool), that enables analyses of millions of reads and thousands of reference genome sequences, and runs on NVIDIA Fermi architecture graphics processing units. CLAST has four main advantages over existing alignment tools. First, CLAST was capable of identifying sequence similarities ~80.8 times faster than BLAST and 9.6 times faster than BLAT. Second, CLAST executes global alignment as the default (local alignment is also an option), enabling CLAST to assign reads to taxonomic and functional groups based on evolutionarily distant nucleotide sequences with high accuracy. Third, CLAST does not need a preprocessed sequence database like Burrows-Wheeler Transform-based tools, and this enables CLAST to incorporate large, frequently updated sequence databases. Fourth, CLAST requires <2 GB of main memory, making it possible to run CLAST on a standard desktop computer or server node. CLAST achieved very high speed (similar to the Burrows-Wheeler Transform-based Bowtie 2 for long reads) and sensitivity (equal to BLAST, BLAT, and FR-HIT) without the need for extensive database preprocessing or a specialized computing platform. Our results demonstrate that CLAST has the potential to be one of the most powerful and realistic approaches to analyze the massive amount of sequence data from next-generation sequencing technologies.
Hiraoka, Satoshi; Machiyama, Asako; Ijichi, Minoru; Inoue, Kentaro; Oshima, Kenshiro; Hattori, Masahira; Yoshizawa, Susumu; Kogure, Kazuhiro; Iwasaki, Wataru
2016-01-14
The Great East Japan Earthquake of 2011 triggered large tsunami waves, which flooded broad areas of land along the Pacific coast of eastern Japan and changed the soil environment drastically. However, the microbial characteristics of tsunami-affected soil at the genomic level remain largely unknown. In this study, we isolated microbes from a soil sample using general low-nutrient and seawater-based media to investigate microbial characteristics in tsunami-affected soil. As expected, a greater proportion of strains isolated from the tsunami-affected soil than the unaffected soil grew in the seawater-based medium. Cultivable strains in both the general low-nutrient and seawater-based media were distributed in the genus Arthrobacter. Most importantly, whole-genome sequencing of four of the isolated Arthrobacter strains revealed independent losses of siderophore-synthesis genes from their genomes. Siderophores are low-molecular-weight, iron-chelating compounds that are secreted for iron uptake; thus, the loss of siderophore-synthesis genes indicates that these strains have adapted to environments with high-iron concentrations. Indeed, chemical analysis confirmed the investigated soil samples to be rich in iron, and culture experiments confirmed weak cultivability of some of these strains in iron-limited media. Furthermore, metagenomic analyses demonstrated over-representation of denitrification-related genes in the tsunami-affected soil sample, as well as the presence of pathogenic and marine-living genera and genes related to salt-tolerance. Collectively, the present results would provide an example of microbial characteristics of soil disturbed by the tsunami, which may give an insight into microbial adaptation to drastic environmental changes. Further analyses on microbial ecology after a tsunami are envisioned to develop a deeper understanding of the recovery processes of terrestrial microbial ecosystems.
Xie, Wei; Wang, Fengping; Guo, Lei; Chen, Zeling; Sievert, Stefan M; Meng, Jun; Huang, Guangrui; Li, Yuxin; Yan, Qingyu; Wu, Shan; Wang, Xin; Chen, Shangwu; He, Guangyuan; Xiao, Xiang; Xu, Anlong
2011-01-01
Deep-sea hydrothermal vent chimneys harbor a high diversity of largely unknown microorganisms. Although the phylogenetic diversity of these microorganisms has been described previously, the adaptation and metabolic potential of the microbial communities is only beginning to be revealed. A pyrosequencing approach was used to directly obtain sequences from a fosmid library constructed from a black smoker chimney 4143-1 in the Mothra hydrothermal vent field at the Juan de Fuca Ridge. A total of 308 034 reads with an average sequence length of 227 bp were generated. Comparative genomic analyses of metagenomes from a variety of environments by two-way clustering of samples and functional gene categories demonstrated that the 4143-1 metagenome clustered most closely with that from a carbonate chimney from Lost City. Both are highly enriched in genes for mismatch repair and homologous recombination, suggesting that the microbial communities have evolved extensive DNA repair systems to cope with the extreme conditions that have potential deleterious effects on the genomes. As previously reported for the Lost City microbiome, the metagenome of chimney 4143-1 exhibited a high proportion of transposases, implying that horizontal gene transfer may be a common occurrence in the deep-sea vent chimney biosphere. In addition, genes for chemotaxis and flagellar assembly were highly enriched in the chimney metagenomes, reflecting the adaptation of the organisms to the highly dynamic conditions present within the chimney walls. Reconstruction of the metabolic pathways revealed that the microbial community in the wall of chimney 4143-1 was mainly fueled by sulfur oxidation, putatively coupled to nitrate reduction to perform inorganic carbon fixation through the Calvin–Benson–Bassham cycle. On the basis of the genomic organization of the key genes of the carbon fixation and sulfur oxidation pathways contained in the large genomic fragments, both obligate and facultative autotrophs appear to be present and contribute to biomass production. PMID:20927138
A world of opportunities with nanopore sequencing.
Leggett, Richard M; Clark, Matthew D
2017-11-28
Oxford Nanopore Technologies' MinION sequencer was launched in pre-release form in 2014 and represents an exciting new sequencing paradigm. The device offers multi-kilobase reads and a streamed mode of operation that allows processing of reads as they are generated. Crucially, it is an extremely compact device that is powered from the USB port of a laptop computer, enabling it to be taken out of the lab and facilitating previously impossible in-field sequencing experiments to be undertaken. Many of the initial publications concerning the platform focused on provision of tools to access and analyse the new sequence formats and then demonstrating the assembly of microbial genomes. More recently, as throughput and accuracy have increased, it has been possible to begin work involving more complex genomes and metagenomes. With the release of the high-throughput GridION X5 and PromethION platforms, the sequencing of large genomes will become more cost efficient, and enable the leveraging of extremely long (>100 kb) reads for resolution of complex genomic structures. This review provides a brief overview of nanopore sequencing technology, describes the growing range of nanopore bioinformatics tools, and highlights some of the most influential publications that have emerged over the last 2 years. Finally, we look to the future and the potential the platform has to disrupt work in human, microbiome, and plant genomics. © The Author 2017. Published by Oxford University Press on behalf of the Society for Experimental Biology. All rights reserved. For permissions, please email: journals.permissions@oup.com.
Potential Mechanisms for Microbial Energy Acquisition in Oxic Deep-Sea Sediments
Heidelberg, John F.
2016-01-01
ABSTRACT The South Pacific Gyre (SPG) possesses the lowest rates of sedimentation, surface chlorophyll concentration, and primary productivity in the global oceans. As a direct result, deep-sea sediments are thin and contain small amounts of labile organic carbon. It was recently shown that the entire SPG sediment column is oxygenated and may be representative of up to a third of the global marine environment. To understand the microbial processes that contribute to the removal of the labile organic matter at the water-sediment interface, a sediment sample was collected and subjected to metagenomic sequencing and analyses. Analysis of nine partially reconstructed environmental genomes, which represent approximately one-third of the microbial community, revealed that the members of the SPG surface sediment microbial community are phylogenetically distinct from surface/upper-ocean organisms. These genomes represent a wide distribution of novel organisms, including deep-branching Alphaproteobacteria, two novel organisms within the Proteobacteria, and new members of the Nitrospirae, Nitrospinae, and candidate phylum NC10. These genomes contain evidence for microbially mediated metal (iron/manganese) oxidation and carbon fixation linked to nitrification. Additionally, despite hypothesized energy limitation, members of the SPG microbial community had motility and chemotaxis genes and possessed mechanisms for the degradation of high-molecular-weight organic matter. This study contributes to our understanding of the metabolic potential of microorganisms in deep-sea oligotrophic sediments and their impact on local carbon geochemistry. IMPORTANCE This research provides insight into the microbial metabolic potential of organisms inhabiting oxygenated deep-sea marine sediments. Current estimates suggest that these environments account for up to a third of the global marine sediment habitat. Nine novel deep-sea microbial genomes were reconstructed from a metagenomic data set and expand the limited number of environmental genomes from deep-sea sediment environments. This research provides phylogeny-linked insight into critical metabolisms, including carbon fixation associated with nitrification, which is assignable to members of the marine group 1 Thaumarchaeota, Nitrospinae, and Nitrospirae and neutrophilic metal (iron/manganese) oxidation assignable to a novel proteobacterium. PMID:27208118
Potential Mechanisms for Microbial Energy Acquisition in Oxic Deep-Sea Sediments.
Tully, Benjamin J; Heidelberg, John F
2016-07-15
The South Pacific Gyre (SPG) possesses the lowest rates of sedimentation, surface chlorophyll concentration, and primary productivity in the global oceans. As a direct result, deep-sea sediments are thin and contain small amounts of labile organic carbon. It was recently shown that the entire SPG sediment column is oxygenated and may be representative of up to a third of the global marine environment. To understand the microbial processes that contribute to the removal of the labile organic matter at the water-sediment interface, a sediment sample was collected and subjected to metagenomic sequencing and analyses. Analysis of nine partially reconstructed environmental genomes, which represent approximately one-third of the microbial community, revealed that the members of the SPG surface sediment microbial community are phylogenetically distinct from surface/upper-ocean organisms. These genomes represent a wide distribution of novel organisms, including deep-branching Alphaproteobacteria, two novel organisms within the Proteobacteria, and new members of the Nitrospirae, Nitrospinae, and candidate phylum NC10. These genomes contain evidence for microbially mediated metal (iron/manganese) oxidation and carbon fixation linked to nitrification. Additionally, despite hypothesized energy limitation, members of the SPG microbial community had motility and chemotaxis genes and possessed mechanisms for the degradation of high-molecular-weight organic matter. This study contributes to our understanding of the metabolic potential of microorganisms in deep-sea oligotrophic sediments and their impact on local carbon geochemistry. This research provides insight into the microbial metabolic potential of organisms inhabiting oxygenated deep-sea marine sediments. Current estimates suggest that these environments account for up to a third of the global marine sediment habitat. Nine novel deep-sea microbial genomes were reconstructed from a metagenomic data set and expand the limited number of environmental genomes from deep-sea sediment environments. This research provides phylogeny-linked insight into critical metabolisms, including carbon fixation associated with nitrification, which is assignable to members of the marine group 1 Thaumarchaeota, Nitrospinae, and Nitrospirae and neutrophilic metal (iron/manganese) oxidation assignable to a novel proteobacterium. Copyright © 2016 Tully and Heidelberg.
Genomic Encyclopedia of Bacteria and Archaea: Sequencing a Myriad of Type Strains
Kyrpides, Nikos C.; Hugenholtz, Philip; Eisen, Jonathan A.; ...
2014-08-05
Microbes hold the key to life. They hold the secrets to our past (as the descendants of the earliest forms of life) and the prospects for our future (as we mine their genes for solutions to some of the planet's most pressing problems, from global warming to antibiotic resistance). However, the piecemeal approach that has defined efforts to study microbial genetic diversity for over 20 years and in over 30,000 genome projects risks squandering that promise. These efforts have covered less than 20% of the diversity of the cultured archaeal and bacterial species, which represent just 15% of the overallmore » known prokaryotic diversity. Here we call for the funding of a systematic effort to produce a comprehensive genomic catalog of all cultured Bacteria and Archaea by sequencing, where available, the type strain of each species with a validly published name (currently~11,000). This effort will provide an unprecedented level of coverage of our planet's genetic diversity, allow for the large-scale discovery of novel genes and functions, and lead to an improved understanding of microbial evolution and function in the environment.« less
Extremophiles in Household Water Heaters
NASA Astrophysics Data System (ADS)
Wilpiszeski, R.; House, C. H.
2016-12-01
A significant fraction of Earth's microbial diversity comes from species living in extreme environments, but natural extreme environments can be difficult to access. Manmade systems like household water heaters serve as an effective proxy for thermophilic environments that are otherwise difficult to sample directly. As such, we are investigating the biogeography, taxonomic distribution, and evolution of thermophiles growing in domestic water heaters. Citizen scientists collected hot tap water culture- and filter- samples from 101 homes across the United States. We recovered a single species of thermophilic heterotroph from culture samples inoculated from water heaters across the United States, Thermus scotoductus. Whole-genome sequencing was conducted to better understand the distribution and evolution of this single species. We have also sequenced hyper-variable regions of the 16S rRNA gene from whole-community filter samples to identify the broad diversity and distribution of microbial cells captured from each water heater. These results shed light on the processes that shape thermophilic populations and genomes at a spatial resolution that is difficult to access in naturally occurring extreme ecosystems.
Sequence Data for Clostridium autoethanogenum using Three Generations of Sequencing Technologies
Utturkar, Sagar M.; Klingeman, Dawn Marie; Bruno-Barcena, José M.; ...
2015-04-14
During the past decade, DNA sequencing output has been mostly dominated by the second generation sequencing platforms which are characterized by low cost, high throughput and shorter read lengths for example, Illumina. The emergence and development of so called third generation sequencing platforms such as PacBio has permitted exceptionally long reads (over 20 kb) to be generated. Due to read length increases, algorithm improvements and hybrid assembly approaches, the concept of one chromosome, one contig and automated finishing of microbial genomes is now a realistic and achievable task for many microbial laboratories. In this paper, we describe high quality sequencemore » datasets which span three generations of sequencing technologies, containing six types of data from four NGS platforms and originating from a single microorganism, Clostridium autoethanogenum. The dataset reported here will be useful for the scientific community to evaluate upcoming NGS platforms, enabling comparison of existing and novel bioinformatics approaches and will encourage interest in the development of innovative experimental and computational methods for NGS data.« less
Murat, Claude; Zampieri, Elisa; Vallino, Marta; Daghino, Stefania; Perotto, Silvia; Bonfante, Paola
2011-05-01
Characterization of genomic variation among different microbial species, or different strains of the same species, is a field of significant interest with a wide range of potential applications. We have investigated the genomic variation in mycorrhizal fungal genomes through genomic suppressive subtractive hybridization. The comparison was between phylogenetically distant and close truffle species (Tuber spp.), and between isolates of the ericoid mycorrhizal fungus Oidiodendron maius featuring different degrees of metal tolerance. In the interspecies experiment, almost all the sequences that were identified in the Tuber melanosporum genome and absent in Tuber borchii and Tuber indicum corresponded to transposable elements. In the intraspecies comparison, some specific sequences corresponded to regions coding for enzymes, among them a glutathione synthetase known to be involved in metal tolerance. This approach is a quick and rather inexpensive tool to develop molecular markers for mycorrhizal fungi tracking and barcoding, to identify functional genes and to investigate the genome plasticity, adaptation and evolution. © 2011 Federation of European Microbiological Societies. Published by Blackwell Publishing Ltd. All rights reserved.
Proteomics in medical microbiology.
Cash, P
2000-04-01
The techniques of proteomics (high resolution two-dimensional electrophoresis and protein characterisation) are widely used for microbiological research to analyse global protein synthesis as an indicator of gene expression. The rapid progress in microbial proteomics has been achieved through the wide availability of whole genome sequences for a number of bacterial groups. Beyond providing a basic understanding of microbial gene expression, proteomics has also played a role in medical areas of microbiology. Progress has been made in the use of the techniques for investigating the epidemiology and taxonomy of human microbial pathogens, the identification of novel pathogenic mechanisms and the analysis of drug resistance. In each of these areas, proteomics has provided new insights that complement genomic-based investigations. This review describes the current progress in these research fields and highlights some of the technical challenges existing for the application of proteomics in medical microbiology. The latter concern the analysis of genetically heterogeneous bacterial populations and the integration of the proteomic and genomic data for these bacteria. The characterisation of the proteomes of bacterial pathogens growing in their natural hosts remains a future challenge.
Resolving the Complexity of Human Skin Metagenomes Using Single-Molecule Sequencing
Tsai, Yu-Chih; Deming, Clayton; Segre, Julia A.; Kong, Heidi H.; Korlach, Jonas
2016-01-01
ABSTRACT Deep metagenomic shotgun sequencing has emerged as a powerful tool to interrogate composition and function of complex microbial communities. Computational approaches to assemble genome fragments have been demonstrated to be an effective tool for de novo reconstruction of genomes from these communities. However, the resultant “genomes” are typically fragmented and incomplete due to the limited ability of short-read sequence data to assemble complex or low-coverage regions. Here, we use single-molecule, real-time (SMRT) sequencing to reconstruct a high-quality, closed genome of a previously uncharacterized Corynebacterium simulans and its companion bacteriophage from a skin metagenomic sample. Considerable improvement in assembly quality occurs in hybrid approaches incorporating short-read data, with even relatively small amounts of long-read data being sufficient to improve metagenome reconstruction. Using short-read data to evaluate strain variation of this C. simulans in its skin community at single-nucleotide resolution, we observed a dominant C. simulans strain with moderate allelic heterozygosity throughout the population. We demonstrate the utility of SMRT sequencing and hybrid approaches in metagenome quantitation, reconstruction, and annotation. PMID:26861018
Christen, Matthias; Del Medico, Luca; Christen, Heinz; Christen, Beat
2017-01-01
Recent advances in lower-cost DNA synthesis techniques have enabled new innovations in the field of synthetic biology. Still, efficient design and higher-order assembly of genome-scale DNA constructs remains a labor-intensive process. Given the complexity, computer assisted design tools that fragment large DNA sequences into fabricable DNA blocks are needed to pave the way towards streamlined assembly of biological systems. Here, we present the Genome Partitioner software implemented as a web-based interface that permits multi-level partitioning of genome-scale DNA designs. Without the need for specialized computing skills, biologists can submit their DNA designs to a fully automated pipeline that generates the optimal retrosynthetic route for higher-order DNA assembly. To test the algorithm, we partitioned a 783 kb Caulobacter crescentus genome design. We validated the partitioning strategy by assembling a 20 kb test segment encompassing a difficult to synthesize DNA sequence. Successful assembly from 1 kb subblocks into the 20 kb segment highlights the effectiveness of the Genome Partitioner for reducing synthesis costs and timelines for higher-order DNA assembly. The Genome Partitioner is broadly applicable to translate DNA designs into ready to order sequences that can be assembled with standardized protocols, thus offering new opportunities to harness the diversity of microbial genomes for synthetic biology applications. The Genome Partitioner web tool can be accessed at https://christenlab.ethz.ch/GenomePartitioner.
Ai, Yuncan; Ai, Hannan; Meng, Fanmei; Zhao, Lei
2013-01-01
No attention has been paid on comparing a set of genome sequences crossing genetic components and biological categories with far divergence over large size range. We define it as the systematic comparative genomics and aim to develop the methodology. First, we create a method, GenomeFingerprinter, to unambiguously produce a set of three-dimensional coordinates from a sequence, followed by one three-dimensional plot and six two-dimensional trajectory projections, to illustrate the genome fingerprint of a given genome sequence. Second, we develop a set of concepts and tools, and thereby establish a method called the universal genome fingerprint analysis (UGFA). Particularly, we define the total genetic component configuration (TGCC) (including chromosome, plasmid, and phage) for describing a strain as a systematic unit, the universal genome fingerprint map (UGFM) of TGCC for differentiating strains as a universal system, and the systematic comparative genomics (SCG) for comparing a set of genomes crossing genetic components and biological categories. Third, we construct a method of quantitative analysis to compare two genomes by using the outcome dataset of genome fingerprint analysis. Specifically, we define the geometric center and its geometric mean for a given genome fingerprint map, followed by the Euclidean distance, the differentiate rate, and the weighted differentiate rate to quantitatively describe the difference between two genomes of comparison. Moreover, we demonstrate the applications through case studies on various genome sequences, giving tremendous insights into the critical issues in microbial genomics and taxonomy. We have created a method, GenomeFingerprinter, for rapidly computing, geometrically visualizing, intuitively comparing a set of genomes at genome fingerprint level, and hence established a method called the universal genome fingerprint analysis, as well as developed a method of quantitative analysis of the outcome dataset. These have set up the methodology of systematic comparative genomics based on the genome fingerprint analysis.
Metagenomic Analysis of Ammonia-Oxidizing Archaea Affiliated with the Soil Group
Bartossek, Rita; Spang, Anja; Weidler, Gerhard; Lanzen, Anders; Schleper, Christa
2012-01-01
Ammonia-oxidizing archaea (AOA) have recently been recognized as a significant component of many microbial communities and represent one of the most abundant prokaryotic groups in the biosphere. However, only few AOA have been successfully cultivated so far and information on the physiology and genomic content remains scarce. We have performed a metagenomic analysis to extend the knowledge of the AOA affiliated with group I.1b that is widespread in terrestrial habitats and of which no genome sequences has been described yet. A fosmid library was generated from samples of a radioactive thermal cave (46°C) in the Austrian Central Alps in which AOA had been found as a major part of the microbial community. Out of 16 fosmids that possessed either an amoA or 16S rRNA gene affiliating with AOA, 5 were fully sequenced, 4 of which grouped with the soil/I.1b (Nitrososphaera-) lineage, and 1 with marine/I.1a (Nitrosopumilus-) lineage. Phylogenetic analyses of amoBC and an associated conserved gene were congruent with earlier analyses based on amoA and 16S rRNA genes and supported the separation of the soil and marine group. Several putative genes that did not have homologs in currently available marine Thaumarchaeota genomes indicated that AOA of the soil group contain specific genes that are distinct from their marine relatives. Potential cis-regulatory elements around conserved promoter motifs found upstream of the amo genes in sequenced (meta-) genomes differed in marine and soil group AOA. On one fosmid, a group of genes including amoA and amoB were flanked by identical transposable insertion sequences, indicating that amoAB could potentially be co-mobilized in the form of a composite transposon. This might be one of the mechanisms that caused the greater variation in gene order compared to genomes in the marine counterparts. Our findings highlight the genetic diversity within the two major and widespread lineages of Thaumarchaeota. PMID:22723795
Mason, Olivia U; Hazen, Terry C; Borglin, Sharon; Chain, Patrick S G; Dubinsky, Eric A; Fortney, Julian L; Han, James; Holman, Hoi-Ying N; Hultman, Jenni; Lamendella, Regina; Mackelprang, Rachel; Malfatti, Stephanie; Tom, Lauren M; Tringe, Susannah G; Woyke, Tanja; Zhou, Jizhong; Rubin, Edward M; Jansson, Janet K
2012-09-01
The Deepwater Horizon oil spill in the Gulf of Mexico resulted in a deep-sea hydrocarbon plume that caused a shift in the indigenous microbial community composition with unknown ecological consequences. Early in the spill history, a bloom of uncultured, thus uncharacterized, members of the Oceanospirillales was previously detected, but their role in oil disposition was unknown. Here our aim was to determine the functional role of the Oceanospirillales and other active members of the indigenous microbial community using deep sequencing of community DNA and RNA, as well as single-cell genomics. Shotgun metagenomic and metatranscriptomic sequencing revealed that genes for motility, chemotaxis and aliphatic hydrocarbon degradation were significantly enriched and expressed in the hydrocarbon plume samples compared with uncontaminated seawater collected from plume depth. In contrast, although genes coding for degradation of more recalcitrant compounds, such as benzene, toluene, ethylbenzene, total xylenes and polycyclic aromatic hydrocarbons, were identified in the metagenomes, they were expressed at low levels, or not at all based on analysis of the metatranscriptomes. Isolation and sequencing of two Oceanospirillales single cells revealed that both cells possessed genes coding for n-alkane and cycloalkane degradation. Specifically, the near-complete pathway for cyclohexane oxidation in the Oceanospirillales single cells was elucidated and supported by both metagenome and metatranscriptome data. The draft genome also included genes for chemotaxis, motility and nutrient acquisition strategies that were also identified in the metagenomes and metatranscriptomes. These data point towards a rapid response of members of the Oceanospirillales to aliphatic hydrocarbons in the deep sea.
Genomic analysis of three Bifidobacterium species isolated from the calf gastrointestinal tract
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kelly, William J.; Cookson, Adrian L.; Altermann, Eric
Ruminant animals contribute significantly to the global value of agriculture and rely on a complex microbial community for efficient digestion. However, little is known of how this microbial-host relationship develops and is maintained. To begin to address this, we have determined the ability of three Bifidobacterium species isolated from the faeces of newborn calves to grow on carbohydrates typical of a newborn ruminant diet. Genome sequences have been determined for these bacteria with analysis of the genomes providing insights into the host association and identification of several genes that may mediate interactions with the ruminant gastrointestinal tract. The present studymore » provides a starting point from which we can define the role of potential beneficial microbes in the nutrition of young ruminants and begin to influence the interactions between the microbiota and the host. The differences observed in genomic content hint at niche partitioning among the bifidobacterial species analysed and the different strategies they employ to successfully adapt to this habitat.« less
Genomic analysis of three Bifidobacterium species isolated from the calf gastrointestinal tract
Kelly, William J.; Cookson, Adrian L.; Altermann, Eric; ...
2016-07-29
Ruminant animals contribute significantly to the global value of agriculture and rely on a complex microbial community for efficient digestion. However, little is known of how this microbial-host relationship develops and is maintained. To begin to address this, we have determined the ability of three Bifidobacterium species isolated from the faeces of newborn calves to grow on carbohydrates typical of a newborn ruminant diet. Genome sequences have been determined for these bacteria with analysis of the genomes providing insights into the host association and identification of several genes that may mediate interactions with the ruminant gastrointestinal tract. The present studymore » provides a starting point from which we can define the role of potential beneficial microbes in the nutrition of young ruminants and begin to influence the interactions between the microbiota and the host. The differences observed in genomic content hint at niche partitioning among the bifidobacterial species analysed and the different strategies they employ to successfully adapt to this habitat.« less
NASA Astrophysics Data System (ADS)
Bomberg, Malin; Lamminmäki, Tiina; Itävaara, Merja
2016-11-01
The microbial diversity in oligotrophic isolated crystalline Fennoscandian Shield bedrock fracture groundwaters is high, but the core community has not been identified. Here we characterized the bacterial and archaeal communities in 12 water conductive fractures situated at depths between 296 and 798 m by high throughput amplicon sequencing using the Illumina HiSeq platform. Between 1.7 × 104 and 1.2 × 106 bacterial or archaeal sequence reads per sample were obtained. These sequences revealed that up to 95 and 99 % of the bacterial and archaeal sequences obtained from the 12 samples, respectively, belonged to only a few common species, i.e. the core microbiome. However, the remaining rare microbiome contained over 3- and 6-fold more bacterial and archaeal taxa. The metabolic properties of the microbial communities were predicted using PICRUSt. The approximate estimation showed that the metabolic pathways commonly included fermentation, fatty acid oxidation, glycolysis/gluconeogenesis, oxidative phosphorylation, and methanogenesis/anaerobic methane oxidation, but carbon fixation through the Calvin cycle, reductive TCA cycle, and the Wood-Ljungdahl pathway was also predicted. The rare microbiome is an unlimited source of genomic functionality in all ecosystems. It may consist of remnants of microbial communities prevailing in earlier environmental conditions, but could also be induced again if changes in their living conditions occur.
Prey Range and Genome Evolution of Halobacteriovorax marinus Predatory Bacteria from an Estuary
Enos, Brett G.; Anthony, Molly K.; DeGiorgis, Joseph A.
2018-01-01
ABSTRACT Halobacteriovorax strains are saltwater-adapted predatory bacteria that attack Gram-negative bacteria and may play an important role in shaping microbial communities. To understand how Halobacteriovorax strains impact ecosystems and develop them as biocontrol agents, it is important to characterize variation in predation phenotypes and investigate Halobacteriovorax genome evolution. We isolated Halobacteriovorax marinus BE01 from an estuary in Rhode Island using Vibrio from the same site as prey. Small, fast-moving, attack-phase BE01 cells attach to and invade prey cells, consistent with the intraperiplasmic predation strategy of the H. marinus type strain, SJ. BE01 is a prey generalist, forming plaques on Vibrio strains from the estuary, Pseudomonas from soil, and Escherichia coli. Genome analysis revealed extremely high conservation of gene order and amino acid sequences between BE01 and SJ, suggesting strong selective pressure to maintain the genome in this H. marinus lineage. Despite this, we identified two regions of gene content difference that likely resulted from horizontal gene transfer. Analysis of modal codon usage frequencies supports the hypothesis that these regions were acquired from bacteria with different codon usage biases than H. marinus. In one of these regions, BE01 and SJ carry different genes associated with mobile genetic elements. Acquired functions in BE01 include the dnd operon, which encodes a pathway for DNA modification, and a suite of genes involved in membrane synthesis and regulation of gene expression that was likely acquired from another Halobacteriovorax lineage. This analysis provides further evidence that horizontal gene transfer plays an important role in genome evolution in predatory bacteria. IMPORTANCE Predatory bacteria attack and digest other bacteria and therefore may play a role in shaping microbial communities. To investigate phenotypic and genotypic variation in saltwater-adapted predatory bacteria, we isolated Halobacteriovorax marinus BE01 from an estuary in Rhode Island, assayed whether it could attack different prey bacteria, and sequenced and analyzed its genome. We found that BE01 is a prey generalist, attacking bacteria from different phylogenetic groups and environments. Gene order and amino acid sequences are highly conserved between BE01 and the H. marinus type strain, SJ. By comparative genomics, we detected two regions of gene content difference that likely occurred via horizontal gene transfer events. Acquired genes encode functions such as modification of DNA, membrane synthesis and regulation of gene expression. Understanding genome evolution and variation in predation phenotypes among predatory bacteria will inform their development as biocontrol agents and clarify how they impact microbial communities. PMID:29359184
Nanthini, Jayaram; Ong, Su Yean; Sudesh, Kumar
2017-09-10
Rubber materials have greatly contributed to human civilization. However, being a polymeric material does not decompose easily, it has caused huge environmental problems. On the other hand, only few bacteria are known to degrade rubber, with studies pertaining them being intensively focusing on the mechanism involved in microbial rubber degradation. The Streptomyces sp. strain CFMR 7, which was previously confirmed to possess rubber-degrading ability, was subjected to whole genome sequencing using the single molecule sequencing technology of the PacBio® RS II system. The genome was further analyzed and compared with previously reported rubber-degrading bacteria in order to identify the potential genes involved in rubber degradation. This led to the interesting discovery of three homologues of latex-clearing protein (Lcp) on the chromosome of this strain, which are probably responsible for rubber degrading activities. Genes encoding oxidoreductase α-subunit (oxiA) and oxidoreductase β-subunit (oxiB) were also found downstream of two lcp genes which are located adjacent to each other. In silico analysis reveals genes that have been identified to be involved in the microbial degradation of rubber in the Streptomyces sp. strain CFMR 7. This is the first whole genome sequence of a clear-zone-forming natural rubber- degrading Streptomyces sp., which harbours three Lcp homologous genes with the presence of oxiA and oxiB genes compared to the previously reported Gordonia polyisoprenivorans strain VH2 (with two Lcp homologous genes) and Nocardia nova SH22a (with only one Lcp gene). Copyright © 2017 Elsevier B.V. All rights reserved.
Sela, D. A.; Chapman, J.; Adeuya, A.; Kim, J. H.; Chen, F.; Whitehead, T. R.; Lapidus, A.; Rokhsar, D. S.; Lebrilla, C. B.; German, J. B.; Price, N. P.; Richardson, P. M.; Mills, D. A.
2008-01-01
Following birth, the breast-fed infant gastrointestinal tract is rapidly colonized by a microbial consortium often dominated by bifidobacteria. Accordingly, the complete genome sequence of Bifidobacterium longum subsp. infantis ATCC15697 reflects a competitive nutrient-utilization strategy targeting milk-borne molecules which lack a nutritive value to the neonate. Several chromosomal loci reflect potential adaptation to the infant host including a 43 kbp cluster encoding catabolic genes, extracellular solute binding proteins and permeases predicted to be active on milk oligosaccharides. An examination of in vivo metabolism has detected the hallmarks of milk oligosaccharide utilization via the central fermentative pathway using metabolomic and proteomic approaches. Finally, conservation of gene clusters in multiple isolates corroborates the genomic mechanism underlying milk utilization for this infant-associated phylotype. PMID:19033196
Phylogenetically conserved resource partitioning in the coastal microbial loop
Bryson, Samuel; Li, Zhou; Chavez, Francisco; ...
2017-08-11
Resource availability influences marine microbial community structure, suggesting that population-specific resource partitioning defines discrete niches. Identifying how resources are partitioned among populations, thereby characterizing functional guilds within the communities, remains a challenge for microbial ecologists. We used proteomic stable isotope probing (SIP) and NanoSIMS analysis of phylogenetic microarrays (Chip-SIP) along with 16S rRNA gene amplicon and metagenomic sequencing to characterize the assimilation of six 13C-labeled common metabolic substrates and changes in the microbial community structure within surface water collected from Monterey Bay, CA. Both sequencing approaches indicated distinct substrate-specific community shifts. However, observed changes in relative abundance for individual populationsmore » did not correlate well with directly measured substrate assimilation. The complementary SIP techniques identified assimilation of all six substrates by diverse taxa, but also revealed differential assimilation of substrates into protein and ribonucleotide biomass between taxa. Substrate assimilation trends indicated significantly conserved resource partitioning among populations within the Flavobacteriia, Alphaproteobacteria and Gammaproteobacteria classes, suggesting that functional guilds within marine microbial communities are phylogenetically cohesive. However, populations within these classes exhibited heterogeneity in biosynthetic activity, which distinguished high-activity copiotrophs from low-activity oligotrophs. These results indicate distinct growth responses between populations that is not apparent by genome sequencing alone.« less
Phylogenetically conserved resource partitioning in the coastal microbial loop
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bryson, Samuel; Li, Zhou; Chavez, Francisco
Resource availability influences marine microbial community structure, suggesting that population-specific resource partitioning defines discrete niches. Identifying how resources are partitioned among populations, thereby characterizing functional guilds within the communities, remains a challenge for microbial ecologists. We used proteomic stable isotope probing (SIP) and NanoSIMS analysis of phylogenetic microarrays (Chip-SIP) along with 16S rRNA gene amplicon and metagenomic sequencing to characterize the assimilation of six 13C-labeled common metabolic substrates and changes in the microbial community structure within surface water collected from Monterey Bay, CA. Both sequencing approaches indicated distinct substrate-specific community shifts. However, observed changes in relative abundance for individual populationsmore » did not correlate well with directly measured substrate assimilation. The complementary SIP techniques identified assimilation of all six substrates by diverse taxa, but also revealed differential assimilation of substrates into protein and ribonucleotide biomass between taxa. Substrate assimilation trends indicated significantly conserved resource partitioning among populations within the Flavobacteriia, Alphaproteobacteria and Gammaproteobacteria classes, suggesting that functional guilds within marine microbial communities are phylogenetically cohesive. However, populations within these classes exhibited heterogeneity in biosynthetic activity, which distinguished high-activity copiotrophs from low-activity oligotrophs. These results indicate distinct growth responses between populations that is not apparent by genome sequencing alone.« less
Phylogenetically conserved resource partitioning in the coastal microbial loop
Bryson, Samuel; Li, Zhou; Chavez, Francisco; Weber, Peter K; Pett-Ridge, Jennifer; Hettich, Robert L; Pan, Chongle; Mayali, Xavier; Mueller, Ryan S
2017-01-01
Resource availability influences marine microbial community structure, suggesting that population-specific resource partitioning defines discrete niches. Identifying how resources are partitioned among populations, thereby characterizing functional guilds within the communities, remains a challenge for microbial ecologists. We used proteomic stable isotope probing (SIP) and NanoSIMS analysis of phylogenetic microarrays (Chip-SIP) along with 16S rRNA gene amplicon and metagenomic sequencing to characterize the assimilation of six 13C-labeled common metabolic substrates and changes in the microbial community structure within surface water collected from Monterey Bay, CA. Both sequencing approaches indicated distinct substrate-specific community shifts. However, observed changes in relative abundance for individual populations did not correlate well with directly measured substrate assimilation. The complementary SIP techniques identified assimilation of all six substrates by diverse taxa, but also revealed differential assimilation of substrates into protein and ribonucleotide biomass between taxa. Substrate assimilation trends indicated significantly conserved resource partitioning among populations within the Flavobacteriia, Alphaproteobacteria and Gammaproteobacteria classes, suggesting that functional guilds within marine microbial communities are phylogenetically cohesive. However, populations within these classes exhibited heterogeneity in biosynthetic activity, which distinguished high-activity copiotrophs from low-activity oligotrophs. These results indicate distinct growth responses between populations that is not apparent by genome sequencing alone. PMID:28800138
Proteins of Unknown Biochemical Function: A Persistent Problem and a Roadmap to Help Overcome It.
Niehaus, Thomas D; Thamm, Antje M K; de Crécy-Lagard, Valérie; Hanson, Andrew D
2015-11-01
The number of sequenced genomes is rapidly increasing, but functional annotation of the genes in these genomes lags far behind. Even in Arabidopsis (Arabidopsis thaliana), only approximately 40% of enzyme- and transporter-encoding genes have credible functional annotations, and this number is even lower in nonmodel plants. Functional characterization of unknown genes is a challenge, but various databases (e.g. for protein localization and coexpression) can be mined to provide clues. If homologous microbial genes exist-and about one-half the genes encoding unknown enzymes and transporters in Arabidopsis have microbial homologs-cross-kingdom comparative genomics can powerfully complement plant-based data. Multiple lines of evidence can strengthen predictions and warrant experimental characterization. In some cases, relatively quick tests in genetically tractable microbes can determine whether a prediction merits biochemical validation, which is costly and demands specialized skills. © 2015 American Society of Plant Biologists. All Rights Reserved.
Evaluation of nine popular de novo assemblers in microbial genome assembly.
Forouzan, Esmaeil; Maleki, Masoumeh Sadat Mousavi; Karkhane, Ali Asghar; Yakhchali, Bagher
2017-12-01
Next generation sequencing (NGS) technologies are revolutionizing biology, with Illumina being the most popular NGS platform. Short read assembly is a critical part of most genome studies using NGS. Hence, in this study, the performance of nine well-known assemblers was evaluated in the assembly of seven different microbial genomes. Effect of different read coverage and k-mer parameters on the quality of the assembly were also evaluated on both simulated and actual read datasets. Our results show that the performance of assemblers on real and simulated datasets could be significantly different, mainly because of coverage bias. According to outputs on actual read datasets, for all studied read coverages (of 7×, 25× and 100×), SPAdes and IDBA-UD clearly outperformed other assemblers based on NGA50 and accuracy metrics. Velvet is the most conservative assembler with the lowest NGA50 and error rate. Copyright © 2017. Published by Elsevier B.V.
Detection of Bacillus anthracis DNA in Complex Soil and Air Samples Using Next-Generation Sequencing
Be, Nicholas A.; Thissen, James B.; Gardner, Shea N.; McLoughlin, Kevin S.; Fofanov, Viacheslav Y.; Koshinsky, Heather; Ellingson, Sally R.; Brettin, Thomas S.; Jackson, Paul J.; Jaing, Crystal J.
2013-01-01
Bacillus anthracis is the potentially lethal etiologic agent of anthrax disease, and is a significant concern in the realm of biodefense. One of the cornerstones of an effective biodefense strategy is the ability to detect infectious agents with a high degree of sensitivity and specificity in the context of a complex sample background. The nature of the B. anthracis genome, however, renders specific detection difficult, due to close homology with B. cereus and B. thuringiensis. We therefore elected to determine the efficacy of next-generation sequencing analysis and microarrays for detection of B. anthracis in an environmental background. We applied next-generation sequencing to titrated genome copy numbers of B. anthracis in the presence of background nucleic acid extracted from aerosol and soil samples. We found next-generation sequencing to be capable of detecting as few as 10 genomic equivalents of B. anthracis DNA per nanogram of background nucleic acid. Detection was accomplished by mapping reads to either a defined subset of reference genomes or to the full GenBank database. Moreover, sequence data obtained from B. anthracis could be reliably distinguished from sequence data mapping to either B. cereus or B. thuringiensis. We also demonstrated the efficacy of a microbial census microarray in detecting B. anthracis in the same samples, representing a cost-effective and high-throughput approach, complementary to next-generation sequencing. Our results, in combination with the capacity of sequencing for providing insights into the genomic characteristics of complex and novel organisms, suggest that these platforms should be considered important components of a biosurveillance strategy. PMID:24039948
IMG/M: integrated genome and metagenome comparative data analysis system
Chen, I-Min A.; Markowitz, Victor M.; Chu, Ken; ...
2016-10-13
The Integrated Microbial Genomes with Microbiome Samples (IMG/M: https://img.jgi.doe.gov/m/) system contains annotated DNA and RNA sequence data of (i) archaeal, bacterial, eukaryotic and viral genomes from cultured organisms, (ii) single cell genomes (SCG) and genomes from metagenomes (GFM) from uncultured archaea, bacteria and viruses and (iii) metagenomes from environmental, host associated and engineered microbiome samples. Sequence data are generated by DOE's Joint Genome Institute (JGI), submitted by individual scientists, or collected from public sequence data archives. Structural and functional annotation is carried out by JGI's genome and metagenome annotation pipelines. A variety of analytical and visualization tools provide support formore » examining and comparing IMG/M's datasets. IMG/M allows open access interactive analysis of publicly available datasets, while manual curation, submission and access to private datasets and computationally intensive workspace-based analysis require login/password access to its expert review(ER) companion system (IMG/M ER: https://img.jgi.doe.gov/ mer/). Since the last report published in the 2014 NAR Database Issue, IMG/M's dataset content has tripled in terms of number of datasets and overall protein coding genes, while its analysis tools have been extended to cope with the rapid growth in the number and size of datasets handled by the system.« less
IMG/M: integrated genome and metagenome comparative data analysis system
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chen, I-Min A.; Markowitz, Victor M.; Chu, Ken
The Integrated Microbial Genomes with Microbiome Samples (IMG/M: https://img.jgi.doe.gov/m/) system contains annotated DNA and RNA sequence data of (i) archaeal, bacterial, eukaryotic and viral genomes from cultured organisms, (ii) single cell genomes (SCG) and genomes from metagenomes (GFM) from uncultured archaea, bacteria and viruses and (iii) metagenomes from environmental, host associated and engineered microbiome samples. Sequence data are generated by DOE's Joint Genome Institute (JGI), submitted by individual scientists, or collected from public sequence data archives. Structural and functional annotation is carried out by JGI's genome and metagenome annotation pipelines. A variety of analytical and visualization tools provide support formore » examining and comparing IMG/M's datasets. IMG/M allows open access interactive analysis of publicly available datasets, while manual curation, submission and access to private datasets and computationally intensive workspace-based analysis require login/password access to its expert review(ER) companion system (IMG/M ER: https://img.jgi.doe.gov/ mer/). Since the last report published in the 2014 NAR Database Issue, IMG/M's dataset content has tripled in terms of number of datasets and overall protein coding genes, while its analysis tools have been extended to cope with the rapid growth in the number and size of datasets handled by the system.« less
IMG/M: integrated genome and metagenome comparative data analysis system
Chen, I-Min A.; Markowitz, Victor M.; Chu, Ken; Palaniappan, Krishna; Szeto, Ernest; Pillay, Manoj; Ratner, Anna; Huang, Jinghua; Andersen, Evan; Huntemann, Marcel; Varghese, Neha; Hadjithomas, Michalis; Tennessen, Kristin; Nielsen, Torben; Ivanova, Natalia N.; Kyrpides, Nikos C.
2017-01-01
The Integrated Microbial Genomes with Microbiome Samples (IMG/M: https://img.jgi.doe.gov/m/) system contains annotated DNA and RNA sequence data of (i) archaeal, bacterial, eukaryotic and viral genomes from cultured organisms, (ii) single cell genomes (SCG) and genomes from metagenomes (GFM) from uncultured archaea, bacteria and viruses and (iii) metagenomes from environmental, host associated and engineered microbiome samples. Sequence data are generated by DOE's Joint Genome Institute (JGI), submitted by individual scientists, or collected from public sequence data archives. Structural and functional annotation is carried out by JGI's genome and metagenome annotation pipelines. A variety of analytical and visualization tools provide support for examining and comparing IMG/M's datasets. IMG/M allows open access interactive analysis of publicly available datasets, while manual curation, submission and access to private datasets and computationally intensive workspace-based analysis require login/password access to its expert review (ER) companion system (IMG/M ER: https://img.jgi.doe.gov/mer/). Since the last report published in the 2014 NAR Database Issue, IMG/M's dataset content has tripled in terms of number of datasets and overall protein coding genes, while its analysis tools have been extended to cope with the rapid growth in the number and size of datasets handled by the system. PMID:27738135
Lipus, Daniel; Vikram, Amit; Ross, Daniel; ...
2017-02-03
Here, microbial activity in the produced water from hydraulically fractured oil and gas wells may potentially interfere with hydrocarbon production and cause damage to the well and surface infrastructure via corrosion, sulfide release, and fouling. In this study, we surveyed the microbial abundance and community structure of produced water sampled from 42 Marcellus Shale wells in southwestern Pennsylvania (well age ranged from 150 to 1,846 days) to better understand the microbial diversity of produced water. We sequenced the V4 region of the 16S rRNA gene to assess taxonomy and utilized quantitative PCR (qPCR) to evaluate the microbial abundance across allmore » 42 produced water samples. Bacteria of the order Halanaerobiales were found to be the most abundant organisms in the majority of the produced water samples, emphasizing their previously suggested role in hydraulic fracturing-related microbial activity. Statistical analyses identified correlations between well age and biocide formulation and the microbial community, in particular, the relative abundance of Halanaerobiales. We further investigated the role of members of the order Halanaerobiales in produced water by reconstructing and annotating a Halanaerobium draft genome (named MDAL1), using shotgun metagenomic sequencing and metagenomic binning. The recovered draft genome was found to be closely related to the species H. congolense, an oil field isolate, and Halanaerobium sp. strain T82-1, also recovered from hydraulic fracturing produced water. Reconstruction of metabolic pathways revealed Halanaerobium sp. strain MDAL1 to have the potential for acid production, thiosulfate reduction, and biofilm formation, suggesting it to have the ability to contribute to corrosion, souring, and biofouling events in the hydraulic fracturing infrastructure.« less
NASA Astrophysics Data System (ADS)
Gong, J.; Edwardson, C.; Mackey, T. J.; Dzaugis, M.; Ibarra, Y.; Course 2012, G.; Frantz, C. M.; Osburn, M. R.; Hirst, M.; Williamson, C.; Hanselmann, K.; Caporaso, J.; Sessions, A. L.; Spear, J. R.
2012-12-01
The microbial diversity of Stinking Springs, a sulfidic, saline, warm spring northeast of the Great Salt Lake was investigated. The measured pH, temperature, salinity, and sulfide concentration along the flow path ranged from 6.64-7.77, 40-28° C, 2.9-2.2%, and 250 μM to negligible, respectively. Five sites were selected along the flow path and within each site microbial mats were dissected into depth profiles based on the color and texture of the mat layers. Genomic DNA was extracted from each layer, and the 16S rRNA gene was amplified and sequenced on the Roche 454 Titanium platform. Fatty acids were also extracted from the mat layers and analyzed by liquid chromatography and mass spectrometry. The mats at Stinking Springs were classified into roughly two morphologies with respect to their spatial distribution: loose, sometimes floating mats proximal to the spring source; and thicker, well-laminated mats distal to the spring source. Loosely-laminated mats were found in turbulent stream flow environments, whereas well-laminated mats were common in less turbulent sheet flows. Phototrophs, sulfur oxidizers, sulfate reducers, methanogens, other bacteria and archaea were identified by 16S rRNA gene sequences. Diatoms, identified by microscopy and lipid analysis were found to increase in abundance with distance from the source. Methanogens were generally more abundant in deeper mat laminae. Photoheterotrophs were found in all mat layers. Microbial diversity increased significantly with depth at most sites. In addition, two distinct microbial streamers were identified and characterized at the two fast flowing sites. These two streamer varieties were dominated by either cyanobacteria or flavobacteria. Overall, our genomic and lipid analysis suggest that the physical and chemical environment is more predictive of the community composition than mat morphology. Site Map
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lipus, Daniel; Vikram, Amit; Ross, Daniel
Here, microbial activity in the produced water from hydraulically fractured oil and gas wells may potentially interfere with hydrocarbon production and cause damage to the well and surface infrastructure via corrosion, sulfide release, and fouling. In this study, we surveyed the microbial abundance and community structure of produced water sampled from 42 Marcellus Shale wells in southwestern Pennsylvania (well age ranged from 150 to 1,846 days) to better understand the microbial diversity of produced water. We sequenced the V4 region of the 16S rRNA gene to assess taxonomy and utilized quantitative PCR (qPCR) to evaluate the microbial abundance across allmore » 42 produced water samples. Bacteria of the order Halanaerobiales were found to be the most abundant organisms in the majority of the produced water samples, emphasizing their previously suggested role in hydraulic fracturing-related microbial activity. Statistical analyses identified correlations between well age and biocide formulation and the microbial community, in particular, the relative abundance of Halanaerobiales. We further investigated the role of members of the order Halanaerobiales in produced water by reconstructing and annotating a Halanaerobium draft genome (named MDAL1), using shotgun metagenomic sequencing and metagenomic binning. The recovered draft genome was found to be closely related to the species H. congolense, an oil field isolate, and Halanaerobium sp. strain T82-1, also recovered from hydraulic fracturing produced water. Reconstruction of metabolic pathways revealed Halanaerobium sp. strain MDAL1 to have the potential for acid production, thiosulfate reduction, and biofilm formation, suggesting it to have the ability to contribute to corrosion, souring, and biofouling events in the hydraulic fracturing infrastructure.« less
NASA Astrophysics Data System (ADS)
Emerson, J. B.; Brum, J. R.; Roux, S.; Bolduc, B.; Woodcroft, B. J.; Singleton, C. M.; Boyd, J. A.; Hodgkins, S. B.; Wilson, R.; Trubl, G. G.; Jang, H. B.; Crill, P. M.; Chanton, J.; Saleska, S. R.; Rich, V. I.; Tyson, G. W.; Sullivan, M. B.
2016-12-01
Methane and carbon dioxide emissions, which are under significant microbial control, provide positive feedbacks to climate change in thawing permafrost peatlands. Although viruses in marine systems have been shown to impact microbial ecology and biogeochemical cycling through host cell lysis, horizontal gene transfer, and auxiliary metabolic gene expression, viral ecology in permafrost and other soils remains virtually unstudied due to methodological challenges. Here, we identified viral sequences in 208 assembled bulk soil metagenomes derived from a permafrost thaw gradient in Stordalen Mire, northern Sweden, from 2010-2012. 2,048 viral populations were recovered, which genome- and network-based classification revealed to be largely novel, increasing known viral genera globally by 40%. Ecologically, viral communities differed significantly across the thaw gradient and by soil depth. Co-occurring microbial community composition, soil moisture, and pH were predictors of viral community composition, indicative of biological and biogeochemical feedbacks as permafrost thaws. Host prediction—achieved through clustered regularly interspaced short palindromic repeats (CRISPRs), tetranucleotide frequency patterns, and other sequence similarities to binned microbial population genomes—was able to link 38% of the viral populations to a microbial host. 5% of the implicated hosts were archaea, predominantly methanogens and ammonia-oxidizing Nitrososphaera, 45% were Acidobacteria or Verrucomicrobia (mostly predicted heterotrophic complex carbon degraders), and 21% were Proteobacteria, including methane oxidizers. Recovered viral genome fragments also contained auxiliary metabolic genes involved in carbon and nitrogen cycling. Together, these data reveal multiple levels of previously unknown viral contributions to biogeochemical cycling, including to carbon gas emissions, in peatland soils undergoing and contributing to climate change. This work represents a significant step towards understanding viral roles in microbially-mediated biogeochemical cycling in soil.
Optofluidic Cell Selection from Complex Microbial Communities for Single-Genome Analysis
Landry, Zachary C.; Giovanonni, Stephen J.; Quake, Stephen R.; Blainey, Paul C.
2013-01-01
Genetic analysis of single cells is emerging as a powerful approach for studies of heterogeneous cell populations. Indeed, the notion of homogeneous cell populations is receding as approaches to resolve genetic and phenotypic variation between single cells are applied throughout the life sciences. A key step in single-cell genomic analysis today is the physical isolation of individual cells from heterogeneous populations, particularly microbial populations, which often exhibit high diversity. Here, we detail the construction and use of instrumentation for optical trapping inside microfluidic devices to select individual cells for analysis by methods including nucleic acid sequencing. This approach has unique advantages for analyses of rare community members, cells with irregular morphologies, small quantity samples, and studies that employ advanced optical microscopy. PMID:24060116
McCann, Joshua C.; Wickersham, Tryon A.; Loor, Juan J.
2014-01-01
Diversity in the forestomach microbiome is one of the key features of ruminant animals. The diverse microbial community adapts to a wide array of dietary feedstuffs and management strategies. Understanding rumen microbiome composition, adaptation, and function has global implications ranging from climatology to applied animal production. Classical knowledge of rumen microbiology was based on anaerobic, culture-dependent methods. Next-generation sequencing and other molecular techniques have uncovered novel features of the rumen microbiome. For instance, pyrosequencing of the 16S ribosomal RNA gene has revealed the taxonomic identity of bacteria and archaea to the genus level, and when complemented with barcoding adds multiple samples to a single run. Whole genome shotgun sequencing generates true metagenomic sequences to predict the functional capability of a microbiome, and can also be used to construct genomes of isolated organisms. Integration of high-throughput data describing the rumen microbiome with classic fermentation and animal performance parameters has produced meaningful advances and opened additional areas for study. In this review, we highlight recent studies of the rumen microbiome in the context of cattle production focusing on nutrition, rumen development, animal efficiency, and microbial function. PMID:24940050
Bragalini, Claudia; Ribière, Céline; Parisot, Nicolas; Vallon, Laurent; Prudent, Elsa; Peyretaillade, Eric; Girlanda, Mariangela; Peyret, Pierre; Marmeisse, Roland; Luis, Patricia
2014-01-01
Eukaryotic microbial communities play key functional roles in soil biology and potentially represent a rich source of natural products including biocatalysts. Culture-independent molecular methods are powerful tools to isolate functional genes from uncultured microorganisms. However, none of the methods used in environmental genomics allow for a rapid isolation of numerous functional genes from eukaryotic microbial communities. We developed an original adaptation of the solution hybrid selection (SHS) for an efficient recovery of functional complementary DNAs (cDNAs) synthesized from soil-extracted polyadenylated mRNAs. This protocol was tested on the Glycoside Hydrolase 11 gene family encoding endo-xylanases for which we designed 35 explorative 31-mers capture probes. SHS was implemented on four soil eukaryotic cDNA pools. After two successive rounds of capture, >90% of the resulting cDNAs were GH11 sequences, of which 70% (38 among 53 sequenced genes) were full length. Between 1.5 and 25% of the cloned captured sequences were expressed in Saccharomyces cerevisiae. Sequencing of polymerase chain reaction-amplified GH11 gene fragments from the captured sequences highlighted hundreds of phylogenetically diverse sequences that were not yet described, in public databases. This protocol offers the possibility of performing exhaustive exploration of eukaryotic gene families within microbial communities thriving in any type of environment. PMID:25281543
Protein family clustering for structural genomics.
Yan, Yongpan; Moult, John
2005-10-28
A major goal of structural genomics is the provision of a structural template for a large fraction of protein domains. The magnitude of this task depends on the number and nature of protein sequence families. With a large number of bacterial genomes now fully sequenced, it is possible to obtain improved estimates of the number and diversity of families in that kingdom. We have used an automated clustering procedure to group all sequences in a set of genomes into protein families. Bench-marking shows the clustering method is sensitive at detecting remote family members, and has a low level of false positives. This comprehensive protein family set has been used to address the following questions. (1) What is the structure coverage for currently known families? (2) How will the number of known apparent families grow as more genomes are sequenced? (3) What is a practical strategy for maximizing structure coverage in future? Our study indicates that approximately 20% of known families with three or more members currently have a representative structure. The study indicates also that the number of apparent protein families will be considerably larger than previously thought: We estimate that, by the criteria of this work, there will be about 250,000 protein families when 1000 microbial genomes have been sequenced. However, the vast majority of these families will be small, and it will be possible to obtain structural templates for 70-80% of protein domains with an achievable number of representative structures, by systematically sampling the larger families.
Characterization of Microbial Communities in Gas Industry Pipelines
Zhu, Xiang Y.; Lubeck, John; Kilbane, John J.
2003-01-01
Culture-independent techniques, denaturing gradient gel electrophoresis (DGGE) analysis, and random cloning of 16S rRNA gene sequences amplified from community DNA were used to determine the diversity of microbial communities in gas industry pipelines. Samples obtained from natural gas pipelines were used directly for DNA extraction, inoculated into sulfate-reducing bacterium medium, or used to inoculate a reactor that simulated a natural gas pipeline environment. The variable V2-V3 (average size, 384 bp) and V3-V6 (average size, 648 bp) regions of bacterial and archaeal 16S rRNA genes, respectively, were amplified from genomic DNA isolated from nine natural gas pipeline samples and analyzed. A total of 106 bacterial 16S rDNA sequences were derived from DGGE bands, and these formed three major clusters: beta and gamma subdivisions of Proteobacteria and gram-positive bacteria. The most frequently encountered bacterial species was Comamonas denitrificans, which was not previously reported to be associated with microbial communities found in gas pipelines or with microbially influenced corrosion. The 31 archaeal 16S rDNA sequences obtained in this study were all related to those of methanogens and phylogenetically fall into three clusters: order I, Methanobacteriales; order III, Methanomicrobiales; and order IV, Methanosarcinales. Further microbial ecology studies are needed to better understand the relationship among bacterial and archaeal groups and the involvement of these groups in the process of microbially influenced corrosion in order to develop improved ways of monitoring and controlling microbially influenced corrosion. PMID:12957923
Personal genomes in progress: from the human genome project to the personal genome project.
Lunshof, Jeantine E; Bobe, Jason; Aach, John; Angrist, Misha; Thakuria, Joseph V; Vorhaus, Daniel B; Hoehe, Margret R; Church, George M
2010-01-01
The cost of a diploid human genome sequence has dropped from about $70M to $2000 since 2007--even as the standards for redundancy have increased from 7x to 40x in order to improve call rates. Coupled with the low return on investment for common single-nucleotide polylmorphisms, this has caused a significant rise in interest in correlating genome sequences with comprehensive environmental and trait data (GET). The cost of electronic health records, imaging, and microbial, immunological, and behavioral data are also dropping quickly. Sharing such integrated GET datasets and their interpretations with a diversity of researchers and research subjects highlights the need for informed-consent models capable of addressing novel privacy and other issues, as well as for flexible data-sharing resources that make materials and data available with minimum restrictions on use. This article examines the Personal Genome Project's effort to develop a GET database as a public genomics resource broadly accessible to both researchers and research participants, while pursuing the highest standards in research ethics.
Choi, Dong Han; Ahn, Chisang; Jang, Gwang Il; ...
2015-11-11
Gracilimonas tropica Choi et al. 2009 is a member of order Sphingobacteriales, class Sphingobacteriia. Three species of the genus Gracilimonas have been isolated from marine seawater or a salt mine and showed extremely halotolerant and mesophilic features, although close relatives are extremely halophilic or thermophilic. The type strain of the type species of Gracilimonas, G. tropica DSM19535 T, was isolated from a Synechococcus culture which was established from the tropical sea-surface water of the Pacific Ocean. The genome of the strain DSM19535 T was sequenced through the Genomic Encyclopedia of Type Strains, Phase I: the one thousand microbial genomes project.more » Here, we describe the genomic features of the strain. The 3,831,242 bp long draft genome consists of 48 contigs with 3373 protein-coding and 53 RNA genes. Finally, the strain seems to adapt to phosphate limitation and requires amino acids from external environment. In addition, genomic analyses and pasteurization experiment suggested that G. tropica DSM19535 T did not form spore.« less