Science.gov

Sample records for metagenome fragment classification

  1. Metagenome fragment classification based on multiple motif-occurrence profiles.

    PubMed

    Matsushita, Naoki; Seno, Shigeto; Takenaka, Yoichi; Matsuda, Hideo

    2014-01-01

    A vast amount of metagenomic data has been obtained by extracting multiple genomes simultaneously from microbial communities, including genomes from uncultivable microbes. By analyzing these metagenomic data, novel microbes are discovered and new microbial functions are elucidated. The first step in analyzing these data is sequenced-read classification into reference genomes from which each read can be derived. The Naïve Bayes Classifier is a method for this classification. To identify the derivation of the reads, this method calculates a score based on the occurrence of a DNA sequence motif in each reference genome. However, large differences in the sizes of the reference genomes can bias the scoring of the reads. This bias might cause erroneous classification and decrease the classification accuracy. To address this issue, we have updated the Naïve Bayes Classifier method using multiple sets of occurrence profiles for each reference genome by normalizing the genome sizes, dividing each genome sequence into a set of subsequences of similar length and generating profiles for each subsequence. This multiple profile strategy improves the accuracy of the results generated by the Naïve Bayes Classifier method for simulated and Sargasso Sea datasets. PMID:25210663

  2. Gene prediction in metagenomic fragments: A large scale machine learning approach

    PubMed Central

    Hoff, Katharina J; Tech, Maike; Lingner, Thomas; Daniel, Rolf; Morgenstern, Burkhard; Meinicke, Peter

    2008-01-01

    Background Metagenomics is an approach to the characterization of microbial genomes via the direct isolation of genomic sequences from the environment without prior cultivation. The amount of metagenomic sequence data is growing fast while computational methods for metagenome analysis are still in their infancy. In contrast to genomic sequences of single species, which can usually be assembled and analyzed by many available methods, a large proportion of metagenome data remains as unassembled anonymous sequencing reads. One of the aims of all metagenomic sequencing projects is the identification of novel genes. Short length, for example, Sanger sequencing yields on average 700 bp fragments, and unknown phylogenetic origin of most fragments require approaches to gene prediction that are different from the currently available methods for genomes of single species. In particular, the large size of metagenomic samples requires fast and accurate methods with small numbers of false positive predictions. Results We introduce a novel gene prediction algorithm for metagenomic fragments based on a two-stage machine learning approach. In the first stage, we use linear discriminants for monocodon usage, dicodon usage and translation initiation sites to extract features from DNA sequences. In the second stage, an artificial neural network combines these features with open reading frame length and fragment GC-content to compute the probability that this open reading frame encodes a protein. This probability is used for the classification and scoring of gene candidates. With large scale training, our method provides fast single fragment predictions with good sensitivity and specificity on artificially fragmented genomic DNA. Additionally, this method is able to predict translation initiation sites accurately and distinguishes complete from incomplete genes with high reliability. Conclusion Large scale machine learning methods are well-suited for gene prediction in metagenomic DNA

  3. Microbial Community Analysis with Ribosomal Gene Fragments from Shotgun Metagenomes

    PubMed Central

    Guo, Jiarong; Cole, James R.; Zhang, Qingpeng; Brown, C. Titus

    2015-01-01

    Shotgun metagenomic sequencing does not depend on gene-targeted primers or PCR amplification; thus, it is not affected by primer bias or chimeras. However, searching rRNA genes from large shotgun Illumina data sets is computationally expensive, and no approach exists for unsupervised community analysis of small-subunit (SSU) rRNA gene fragments retrieved from shotgun data. We present a pipeline, SSUsearch, to achieve the faster identification of short-subunit rRNA gene fragments and enabled unsupervised community analysis with shotgun data. It also includes classification and copy number correction, and the output can be used by traditional amplicon analysis platforms. Shotgun metagenome data using this pipeline yielded higher diversity estimates than amplicon data but retained the grouping of samples in ordination analyses. We applied this pipeline to soil samples with paired shotgun and amplicon data and confirmed bias against Verrucomicrobia in a commonly used V6-V8 primer set, as well as discovering likely bias against Actinobacteria and for Verrucomicrobia in a commonly used V4 primer set. This pipeline can utilize all variable regions in SSU rRNA and also can be applied to large-subunit (LSU) rRNA genes for confirmation of community structure. The pipeline can scale to handle large amounts of soil metagenomic data (5 Gb memory and 5 central processing unit hours to process 38 Gb [1 lane] of trimmed Illumina HiSeq2500 data) and is freely available at https://github.com/dib-lab/SSUsearch under a BSD license. PMID:26475107

  4. CoMeta: Classification of Metagenomes Using k-mers

    PubMed Central

    Kawulok, Jolanta; Deorowicz, Sebastian

    2015-01-01

    Nowadays, the study of environmental samples has been developing rapidly. Characterization of the environment composition broadens the knowledge about the relationship between species composition and environmental conditions. An important element of extracting the knowledge of the sample composition is to compare the extracted fragments of DNA with sequences derived from known organisms. In the presented paper, we introduce an algorithm called CoMeta (Classification of metagenomes), which assigns a query read (a DNA fragment) into one of the groups previously prepared by the user. Typically, this is one of the taxonomic rank (e.g., phylum, genus), however prepared groups may contain sequences having various functions. In CoMeta, we used the exact method for read classification using short subsequences (k-mers) and fast program for indexing large set of k-mers. In contrast to the most popular methods based on BLAST, where the query is compared with each reference sequence, we begin the classification from the top of the taxonomy tree to reduce the number of comparisons. The presented experimental study confirms that CoMeta outperforms other programs used in this context. CoMeta is available at https://github.com/jkawulok/cometa under a free GNU GPL 2 license. PMID:25884504

  5. CoMeta: classification of metagenomes using k-mers.

    PubMed

    Kawulok, Jolanta; Deorowicz, Sebastian

    2015-01-01

    Nowadays, the study of environmental samples has been developing rapidly. Characterization of the environment composition broadens the knowledge about the relationship between species composition and environmental conditions. An important element of extracting the knowledge of the sample composition is to compare the extracted fragments of DNA with sequences derived from known organisms. In the presented paper, we introduce an algorithm called CoMeta (Classification of metagenomes), which assigns a query read (a DNA fragment) into one of the groups previously prepared by the user. Typically, this is one of the taxonomic rank (e.g., phylum, genus), however prepared groups may contain sequences having various functions. In CoMeta, we used the exact method for read classification using short subsequences (k-mers) and fast program for indexing large set of k-mers. In contrast to the most popular methods based on BLAST, where the query is compared with each reference sequence, we begin the classification from the top of the taxonomy tree to reduce the number of comparisons. The presented experimental study confirms that CoMeta outperforms other programs used in this context. CoMeta is available at https://github.com/jkawulok/cometa under a free GNU GPL 2 license. PMID:25884504

  6. Gene prediction in metagenomic fragments based on the SVM algorithm

    PubMed Central

    2013-01-01

    Background Metagenomic sequencing is becoming a powerful technology for exploring micro-ogranisms from various environments, such as human body, without isolation and cultivation. Accurately identifying genes from metagenomic fragments is one of the most fundamental issues. Results In this article, we present a novel gene prediction method named MetaGUN for metagenomic fragments based on a machine learning approach of SVM. It implements in a three-stage strategy to predict genes. Firstly, it classifies input fragments into phylogenetic groups by a k-mer based sequence binning method. Then, protein-coding sequences are identified for each group independently with SVM classifiers that integrate entropy density profiles (EDP) of codon usage, translation initiation site (TIS) scores and open reading frame (ORF) length as input patterns. Finally, the TISs are adjusted by employing a modified version of MetaTISA. To identify protein-coding sequences, MetaGun builds the universal module and the novel module. The former is based on a set of representative species, while the latter is designed to find potential functionary DNA sequences with conserved domains. Conclusions Comparisons on artificial shotgun fragments with multiple current metagenomic gene finders show that MetaGUN predicts better results on both 3' and 5' ends of genes with fragments of various lengths. Especially, it makes the most reliable predictions among these methods. As an application, MetaGUN was used to predict genes for two samples of human gut microbiome. It identifies thousands of additional genes with significant evidences. Further analysis indicates that MetaGUN tends to predict more potential novel genes than other current metagenomic gene finders. PMID:23735199

  7. Scalable metagenomic taxonomy classification using a reference genome database

    PubMed Central

    Ames, Sasha K.; Hysom, David A.; Gardner, Shea N.; Lloyd, G. Scott; Gokhale, Maya B.; Allen, Jonathan E.

    2013-01-01

    Motivation: Deep metagenomic sequencing of biological samples has the potential to recover otherwise difficult-to-detect microorganisms and accurately characterize biological samples with limited prior knowledge of sample contents. Existing metagenomic taxonomic classification algorithms, however, do not scale well to analyze large metagenomic datasets, and balancing classification accuracy with computational efficiency presents a fundamental challenge. Results: A method is presented to shift computational costs to an off-line computation by creating a taxonomy/genome index that supports scalable metagenomic classification. Scalable performance is demonstrated on real and simulated data to show accurate classification in the presence of novel organisms on samples that include viruses, prokaryotes, fungi and protists. Taxonomic classification of the previously published 150 giga-base Tyrolean Iceman dataset was found to take <20 h on a single node 40 core large memory machine and provide new insights on the metagenomic contents of the sample. Availability: Software was implemented in C++ and is freely available at http://sourceforge.net/projects/lmat Contact: allen99@llnl.gov Supplementary information: Supplementary data are available at Bioinformatics online. PMID:23828782

  8. Multi-Layer and Recursive Neural Networks for Metagenomic Classification.

    PubMed

    Ditzler, Gregory; Polikar, Robi; Rosen, Gail

    2015-09-01

    Recent advances in machine learning, specifically in deep learning with neural networks, has made a profound impact on fields such as natural language processing, image classification, and language modeling; however, feasibility and potential benefits of the approaches to metagenomic data analysis has been largely under-explored. Deep learning exploits many layers of learning nonlinear feature representations, typically in an unsupervised fashion, and recent results have shown outstanding generalization performance on previously unseen data. Furthermore, some deep learning methods can also represent the structure in a data set. Consequently, deep learning and neural networks may prove to be an appropriate approach for metagenomic data. To determine whether such approaches are indeed appropriate for metagenomics, we experiment with two deep learning methods: i) a deep belief network, and ii) a recursive neural network, the latter of which provides a tree representing the structure of the data. We compare these approaches to the standard multi-layer perceptron, which has been well-established in the machine learning community as a powerful prediction algorithm, though its presence is largely missing in metagenomics literature. We find that traditional neural networks can be quite powerful classifiers on metagenomic data compared to baseline methods, such as random forests. On the other hand, while the deep learning approaches did not result in improvements to the classification accuracy, they do provide the ability to learn hierarchical representations of a data set that standard classification methods do not allow. Our goal in this effort is not to determine the best algorithm in terms accuracy-as that depends on the specific application-but rather to highlight the benefits and drawbacks of each of the approach we discuss and provide insight on how they can be improved for predictive metagenomic analysis. PMID:26316190

  9. Fast and sensitive taxonomic classification for metagenomics with Kaiju

    PubMed Central

    Menzel, Peter; Ng, Kim Lee; Krogh, Anders

    2016-01-01

    Metagenomics emerged as an important field of research not only in microbial ecology but also for human health and disease, and metagenomic studies are performed on increasingly larger scales. While recent taxonomic classification programs achieve high speed by comparing genomic k-mers, they often lack sensitivity for overcoming evolutionary divergence, so that large fractions of the metagenomic reads remain unclassified. Here we present the novel metagenome classifier Kaiju, which finds maximum (in-)exact matches on the protein-level using the Burrows–Wheeler transform. We show in a genome exclusion benchmark that Kaiju classifies reads with higher sensitivity and similar precision compared with current k-mer-based classifiers, especially in genera that are underrepresented in reference databases. We also demonstrate that Kaiju classifies up to 10 times more reads in real metagenomes. Kaiju can process millions of reads per minute and can run on a standard PC. Source code and web server are available at http://kaiju.binf.ku.dk. PMID:27071849

  10. Fast and sensitive taxonomic classification for metagenomics with Kaiju.

    PubMed

    Menzel, Peter; Ng, Kim Lee; Krogh, Anders

    2016-01-01

    Metagenomics emerged as an important field of research not only in microbial ecology but also for human health and disease, and metagenomic studies are performed on increasingly larger scales. While recent taxonomic classification programs achieve high speed by comparing genomic k-mers, they often lack sensitivity for overcoming evolutionary divergence, so that large fractions of the metagenomic reads remain unclassified. Here we present the novel metagenome classifier Kaiju, which finds maximum (in-)exact matches on the protein-level using the Burrows-Wheeler transform. We show in a genome exclusion benchmark that Kaiju classifies reads with higher sensitivity and similar precision compared with current k-mer-based classifiers, especially in genera that are underrepresented in reference databases. We also demonstrate that Kaiju classifies up to 10 times more reads in real metagenomes. Kaiju can process millions of reads per minute and can run on a standard PC. Source code and web server are available at http://kaiju.binf.ku.dk. PMID:27071849

  11. Fragmentation and Coverage Variation in Viral Metagenome Assemblies, and Their Effect in Diversity Calculations

    PubMed Central

    García-López, Rodrigo; Vázquez-Castellanos, Jorge Francisco; Moya, Andrés

    2015-01-01

    Metagenomic libraries consist of DNA fragments from diverse species, with varying genome size and abundance. High-throughput sequencing platforms produce large volumes of reads from these libraries, which may be assembled into contigs, ideally resembling the original larger genomic sequences. The uneven species distribution, along with the stochasticity in sample processing and sequencing bias, impacts the success of accurate sequence assembly. Several assemblers enable the processing of viral metagenomic data de novo, generally using overlap layout consensus or de Bruijn graph approaches for contig assembly. The success of viral genomic reconstruction in these datasets is limited by the degree of fragmentation of each genome in the sample, which is dependent on the sequencing effort and the genome length. Depending on ecological, biological, or procedural biases, some fragments have a higher prevalence, or coverage, in the assembly. However, assemblers must face challenges, such as the formation of chimerical structures and intra-species variability. Diversity calculation relies on the classification of the sequences that comprise a metagenomic dataset. Whenever the corresponding genomic and taxonomic information is available, contigs matching the same species can be classified accordingly and the coverage of its genome can be calculated for that species. This may be used to compare populations by estimating abundance and assessing species distribution from this data. Nevertheless, the coverage does not take into account the degree of fragmentation, or else genome completeness, and is not necessarily representative of actual species distribution in the samples. Furthermore, undetermined sequences are abundant in viral metagenomic datasets, resulting in several independent contigs that cannot be assigned by homology or genomic information. These may only be classified as different operational taxonomic units (OTUs), sometimes remaining inadvisably unrelated. Thus

  12. Identification and characterization of metagenomic fragments from tidal flat sediment.

    PubMed

    Kim, Byung Kwon; Park, Yoon-Dong; Oh, Hyun-Myung; Chun, Jongsik

    2009-08-01

    Phylogenetic surveys based on cultivation-independent methods have revealed that tidal flat sediments are environments with extensive microbial diversity. Since most of prokaryotes in nature cannot be easily cultivated under general laboratory conditions, our knowledge on prokaryotic dwellers in tidal flat sediment is mainly based on the analysis of metagenomes. Microbial community analysis based on the 16S rRNA gene and other phylogenetic markers has been widely used to provide important information on the role of microorganisms, but it is basically an indirect means, compared with direct sequencing of metagenomic DNAs. In this study, we applied a sequence-based metagenomic approach to characterize uncultivated prokaryotes from tidal flat sediment. Two large-insert genomic libraries based on fosmid were constructed from tidal flat metagenomic DNA. A survey based on end-sequencing of selected fosmid clones resulted in the identification of clones containing 274 bacterial and 16 archaeal homologs in which majority were of proteobacterial origins. Two fosmid clones containing large metagenomic DNAs were completely sequenced using the shotgun method. Both DNA inserts contained more than 20 genes encoding putative proteins which implied their ecological roles in tidal flat sediment. Phylogenetic analyses of evolutionary conserved proteins indicate that these clones are not closely related to known prokaryotes whose genome sequence is known, and genes in tidal flat may be subjected to extensive lateral gene transfer, notably between domains Bacteria and Archaea. This is the first report demonstrating that direct sequencing of metagenomic gene library is useful in underpinning the genetic makeup and functional roles of prokaryotes in tidal flat sediments. PMID:19763413

  13. Gene prediction with Glimmer for metagenomic sequences augmented by classification and clustering

    PubMed Central

    Kelley, David R.; Liu, Bo; Delcher, Arthur L.; Pop, Mihai; Salzberg, Steven L.

    2012-01-01

    Environmental shotgun sequencing (or metagenomics) is widely used to survey the communities of microbial organisms that live in many diverse ecosystems, such as the human body. Finding the protein-coding genes within the sequences is an important step for assessing the functional capacity of a metagenome. In this work, we developed a metagenomics gene prediction system Glimmer-MG that achieves significantly greater accuracy than previous systems via novel approaches to a number of important prediction subtasks. First, we introduce the use of phylogenetic classifications of the sequences to model parameterization. We also cluster the sequences, grouping together those that likely originated from the same organism. Analogous to iterative schemes that are useful for whole genomes, we retrain our models within each cluster on the initial gene predictions before making final predictions. Finally, we model both insertion/deletion and substitution sequencing errors using a different approach than previous software, allowing Glimmer-MG to change coding frame or pass through stop codons by predicting an error. In a comparison among multiple gene finding methods, Glimmer-MG makes the most sensitive and precise predictions on simulated and real metagenomes for all read lengths and error rates tested. PMID:22102569

  14. Accurate phylogenetic classification of DNA fragments based onsequence composition

    SciTech Connect

    McHardy, Alice C.; Garcia Martin, Hector; Tsirigos, Aristotelis; Hugenholtz, Philip; Rigoutsos, Isidore

    2006-05-01

    Metagenome studies have retrieved vast amounts of sequenceout of a variety of environments, leading to novel discoveries and greatinsights into the uncultured microbial world. Except for very simplecommunities, diversity makes sequence assembly and analysis a verychallenging problem. To understand the structure a 5 nd function ofmicrobial communities, a taxonomic characterization of the obtainedsequence fragments is highly desirable, yet currently limited mostly tothose sequences that contain phylogenetic marker genes. We show that forclades at the rank of domain down to genus, sequence composition allowsthe very accurate phylogenetic 10 characterization of genomic sequence.We developed a composition-based classifier, PhyloPythia, for de novophylogenetic sequence characterization and have trained it on adata setof 340 genomes. By extensive evaluation experiments we show that themethodis accurate across all taxonomic ranks considered, even forsequences that originate fromnovel organisms and are as short as 1kb.Application to two metagenome datasets 15 obtained from samples ofphosphorus-removing sludge showed that the method allows the accurateclassification at genus level of most sequence fragments from thedominant populations, while at the same time correctly characterizingeven larger parts of the samples at higher taxonomic levels.

  15. Methods for virus classification and the challenge of incorporating metagenomic sequence data.

    PubMed

    Simmonds, Peter

    2015-06-01

    The division of viruses into orders, families, genera and species provides a classification framework that seeks to organize and make sense of the diversity of viruses infecting animals, plants and bacteria. Classifications are based on similarities in genome structure and organization, the presence of homologous genes and sequence motifs and at lower levels such as species, host range, nucleotide and antigenic relatedness and epidemiology. Classification below the level of family must also be consistent with phylogeny and virus evolutionary histories. Recently developed methods such as PASC, DEMaRC and NVR offer alternative strategies for genus and species assignments that are based purely on degrees of divergence between genome sequences. They offer the possibility of automating classification of the vast number of novel virus sequences being generated by next-generation metagenomic sequencing. However, distance-based methods struggle to deal with the complex evolutionary history of virus genomes that are shuffled by recombination and reassortment, and where taxonomic lineages evolve at different rates. In biological terms, classifications based on sequence distances alone are also arbitrary whereas the current system of virus taxonomy is of utility precisely because it is primarily based upon phenotypic characteristics. However, a separate system is clearly needed by which virus variants that lack biological information might be incorporated into the ICTV classification even if based solely on sequence relationships to existing taxa. For these, simplified taxonomic proposals and naming conventions represent a practical way to expand the existing virus classification and catalogue our rapidly increasing knowledge of virus diversity. PMID:26068186

  16. IDENTIFICATION OF AVIAN-SPECIFIC FECAL METAGENOMIC SEQUENCES USING GENOME FRAGMENT ENRICHMENTS

    EPA Science Inventory

    Sequence analysis of microbial genomes has provided biologists the opportunity to compare genetic differences between closely related microorganisms. While random sequencing has also been used to study natural microbial communities, metagenomic comparisons via sequencing analysis...

  17. Metagenomic Classification and Characterization Marine Actinobacteria from the Gulf of Maine without Representative Genomes

    NASA Astrophysics Data System (ADS)

    Sachdeva, R.; Heidelberg, J.

    2012-12-01

    Actinobacteria represent one of the largest and most diverse bacterial phyla and unlike most marine prokaryotes are gram-positive. This phylum encompasses a broad range of physiologies, morphologies, and metabolic properties with a broad array of lifestyles. The marine actinobacterial assemblage is dominated by the orders Actinomycetales and Acidimicrobiales (also known as the marine Actinobacteria clade). The Acidimicrobiales bacteria typically outnumber the Actinomycetales bacteria and are mostly represented by the OCS155 group. Although bacteria of the order Acidimicrobiales make up ~7.6% of the 16S matches from the Global Ocean Survey shotgun metagenomic libraries; very little is known about their potential function and role in biogeochemical cycling. Samples were collected from surface seawater samples in the Gulf of Maine (GOM) from the summer and winter of 2006. Sanger sequences were generated from the 0.1-0.8 μm fractions using paired-end medium insert shotgun libraries. The resulting 2.2 Gb were assembled using the Celera Assembler package into 280 Mb of non-redundant scaffolds. Putative actinobacterial assemblies were identified using (1) ribosomal RNA genes (16S and 23S), (2) phylogenetically informative non-ribosomal core genes thought to be resistant to horizontal gene transfer (e.g. RecA and RpoB) and (3) compositional binning using oligonucleotide frequency pattern based hierarchical clustering. Binning resulted in 3.6 Mb (4.2X coverage) of actinobacterial scaffolds that were comprised of 15.1 Mb of unassembled reads. Putative actinobacterial assemblies included both summer and winter reads demonstrating that the Actinobacteria are abundant year round. Classification reveals that all of the sampled Actinobacteria are from the orders Acidimicrobiales and Actinomycetales and are similar to those found in the global ocean. The GOM Actinobacteria show a broad range of G+C % content (32-66%) indicating a high level of genomic diversity. Those assemblies

  18. Improved ethanol production from biomass by a rumen metagenomic DNA fragment expressed in Escherichia coli MS04 during fermentation.

    PubMed

    Loaces, Inés; Amarelle, Vanesa; Muñoz-Gutierrez, Iván; Fabiano, Elena; Martinez, Alfredo; Noya, Francisco

    2015-11-01

    With the aim of improving current ethanologenic Escherichia coli strains, we screened a metagenomic library from bovine ruminal fluid for cellulolytic enzymes. We isolated one fosmid, termed Csd4, which was able to confer to E. coli the ability to grow on complex cellulosic material as the sole carbon source such as avicel, carboxymethyl cellulose, filter paper, pretreated sugarcane bagasse, and xylan. Glucanolytic activity obtained from E. coli transformed with Csd4 was maximal at 24 h of incubation and was inhibited when glucose or xylose were present in the media. The 34,406-bp DNA fragment of Csd4 was completely sequenced, and a putative endoglucanase, a xylosidase/arabinosidase, and a laccase gene were identified. Comparison analysis revealed that Csd4 derived from an organism closely related to Prevotella ruminicola, but no homologies were found with any of the genomes already sequenced. Csd4 was introduced into the ethanologenic E. coli MS04 strain and ethanol production from CMC, avicel, sugarcane bagasse, or filter paper was observed. Exogenously expressed β-glucosidase had a positie effect on cell growth in agreement with the fact that no putative β-glucosidase was found in Csd4. Ethanol production from sugarcane bagasse was improved threefold by Csd4 after saccharification by commercial Trichoderma reesei cellulases underlining the ability of Csd4 to act as a saccharification enhancer to reduce the enzymatic load and time required for cellulose deconstruction. PMID:26175105

  19. Studying long 16S rDNA sequences with ultrafast-metagenomic sequence classification using exact alignments (Kraken).

    PubMed

    Valenzuela-González, Fabiola; Martínez-Porchas, Marcel; Villalpando-Canchola, Enrique; Vargas-Albores, Francisco

    2016-03-01

    Ultrafast-metagenomic sequence classification using exact alignments (Kraken) is a novel approach to classify 16S rDNA sequences. The classifier is based on mapping short sequences to the lowest ancestor and performing alignments to form subtrees with specific weights in each taxon node. This study aimed to evaluate the classification performance of Kraken with long 16S rDNA random environmental sequences produced by cloning and then Sanger sequenced. A total of 480 clones were isolated and expanded, and 264 of these clones formed contigs (1352 ± 153 bp). The same sequences were analyzed using the Ribosomal Database Project (RDP) classifier. Deeper classification performance was achieved by Kraken than by the RDP: 73% of the contigs were classified up to the species or variety levels, whereas 67% of these contigs were classified no further than the genus level by the RDP. The results also demonstrated that unassembled sequences analyzed by Kraken provide similar or inclusively deeper information. Moreover, sequences that did not form contigs, which are usually discarded by other programs, provided meaningful information when analyzed by Kraken. Finally, it appears that the assembly step for Sanger sequences can be eliminated when using Kraken. Kraken cumulates the information of both sequence senses, providing additional elements for the classification. In conclusion, the results demonstrate that Kraken is an excellent choice for use in the taxonomic assignment of sequences obtained by Sanger sequencing or based on third generation sequencing, of which the main goal is to generate larger sequences. PMID:26812576

  20. Signal Processing for Metagenomics: Extracting Information from the Soup

    PubMed Central

    Rosen, Gail L.; Sokhansanj, Bahrad A.; Polikar, Robi; Bruns, Mary Ann; Russell, Jacob; Garbarine, Elaine; Essinger, Steve; Yok, Non

    2009-01-01

    Traditionally, studies in microbial genomics have focused on single-genomes from cultured species, thereby limiting their focus to the small percentage of species that can be cultured outside their natural environment. Fortunately, recent advances in high-throughput sequencing and computational analyses have ushered in the new field of metagenomics, which aims to decode the genomes of microbes from natural communities without the need for cultivation. Although metagenomic studies have shed a great deal of insight into bacterial diversity and coding capacity, several computational challenges remain due to the massive size and complexity of metagenomic sequence data. Current tools and techniques are reviewed in this paper which address challenges in 1) genomic fragment annotation, 2) phylogenetic reconstruction, 3) functional classification of samples, and 4) interpreting complementary metaproteomics and metametabolomics data. Also surveyed are important applications of metagenomic studies, including microbial forensics and the roles of microbial communities in shaping human health and soil ecology. PMID:20436876

  1. 16S classifier: a tool for fast and accurate taxonomic classification of 16S rRNA hypervariable regions in metagenomic datasets.

    PubMed

    Chaudhary, Nikhil; Sharma, Ashok K; Agarwal, Piyush; Gupta, Ankit; Sharma, Vineet K

    2015-01-01

    The diversity of microbial species in a metagenomic study is commonly assessed using 16S rRNA gene sequencing. With the rapid developments in genome sequencing technologies, the focus has shifted towards the sequencing of hypervariable regions of 16S rRNA gene instead of full length gene sequencing. Therefore, 16S Classifier is developed using a machine learning method, Random Forest, for faster and accurate taxonomic classification of short hypervariable regions of 16S rRNA sequence. It displayed precision values of up to 0.91 on training datasets and the precision values of up to 0.98 on the test dataset. On real metagenomic datasets, it showed up to 99.7% accuracy at the phylum level and up to 99.0% accuracy at the genus level. 16S Classifier is available freely at http://metagenomics.iiserb.ac.in/16Sclassifier and http://metabiosys.iiserb.ac.in/16Sclassifier. PMID:25646627

  2. Motif-Based Text Mining of Microbial Metagenome Redundancy Profiling Data for Disease Classification

    PubMed Central

    Wang, Yin; Zhou, Yuhua; Ling, Zongxin; Guo, Xiaokui; Xie, Lu; Liu, Lei

    2016-01-01

    Background. Text data of 16S rRNA are informative for classifications of microbiota-associated diseases. However, the raw text data need to be systematically processed so that features for classification can be defined/extracted; moreover, the high-dimension feature spaces generated by the text data also pose an additional difficulty. Results. Here we present a Phylogenetic Tree-Based Motif Finding algorithm (PMF) to analyze 16S rRNA text data. By integrating phylogenetic rules and other statistical indexes for classification, we can effectively reduce the dimension of the large feature spaces generated by the text datasets. Using the retrieved motifs in combination with common classification methods, we can discriminate different samples of both pneumonia and dental caries better than other existing methods. Conclusions. We extend the phylogenetic approaches to perform supervised learning on microbiota text data to discriminate the pathological states for pneumonia and dental caries. The results have shown that PMF may enhance the efficiency and reliability in analyzing high-dimension text data. PMID:27057545

  3. Motif-Based Text Mining of Microbial Metagenome Redundancy Profiling Data for Disease Classification.

    PubMed

    Wang, Yin; Li, Rudong; Zhou, Yuhua; Ling, Zongxin; Guo, Xiaokui; Xie, Lu; Liu, Lei

    2016-01-01

    Background. Text data of 16S rRNA are informative for classifications of microbiota-associated diseases. However, the raw text data need to be systematically processed so that features for classification can be defined/extracted; moreover, the high-dimension feature spaces generated by the text data also pose an additional difficulty. Results. Here we present a Phylogenetic Tree-Based Motif Finding algorithm (PMF) to analyze 16S rRNA text data. By integrating phylogenetic rules and other statistical indexes for classification, we can effectively reduce the dimension of the large feature spaces generated by the text datasets. Using the retrieved motifs in combination with common classification methods, we can discriminate different samples of both pneumonia and dental caries better than other existing methods. Conclusions. We extend the phylogenetic approaches to perform supervised learning on microbiota text data to discriminate the pathological states for pneumonia and dental caries. The results have shown that PMF may enhance the efficiency and reliability in analyzing high-dimension text data. PMID:27057545

  4. MetaSAMS--a novel software platform for taxonomic classification, functional annotation and comparative analysis of metagenome datasets.

    PubMed

    Zakrzewski, Martha; Bekel, Thomas; Ander, Christina; Pühler, Alfred; Rupp, Oliver; Stoye, Jens; Schlüter, Andreas; Goesmann, Alexander

    2013-08-20

    Metagenomics aims at exploring microbial communities concerning their composition and functioning. Application of high-throughput sequencing technologies for the analysis of environmental DNA-preparations can generate large sets of metagenome sequence data which have to be analyzed by means of bioinformatics tools to unveil the taxonomic composition of the analyzed community as well as the repertoire of genes and gene functions. A bioinformatics software platform is required that allows the automated taxonomic and functional analysis and interpretation of metagenome datasets without manual effort. To address current demands in metagenome data analyses, the novel platform MetaSAMS was developed. MetaSAMS automatically accomplishes the tasks necessary for analyzing the composition and functional repertoire of a given microbial community from metagenome sequence data by implementing two software pipelines: (i) the first pipeline consists of three different classifiers performing the taxonomic profiling of metagenome sequences and (ii) the second functional pipeline accomplishes region predictions on assembled contigs and assigns functional information to predicted coding sequences. Moreover, MetaSAMS provides tools for statistical and comparative analyses based on the taxonomic and functional annotations. The capabilities of MetaSAMS are demonstrated for two metagenome datasets obtained from a biogas-producing microbial community of a production-scale biogas plant. The MetaSAMS web interface is available at https://metasams.cebitec.uni-bielefeld.de. PMID:23026555

  5. Classification of fragments of objects by the Fourier masks pattern recognition system

    NASA Astrophysics Data System (ADS)

    Barajas-García, Carolina; Solorza-Calderón, Selene; Álvarez-Borrego, Josué

    2016-05-01

    The automation process of the pattern recognition for fragments of objects is a challenge to humanity. For humans it is relatively easy to classify the fragment of some object even if it is isolated and perhaps this identification could be more complicated if it is partially overlapped by other object. However, the emulation of the functions of the human eye and brain by a computer is not a trivial issue. This paper presents a pattern recognition digital system based on Fourier binary rings masks in order to classify fragments of objects. The system is invariant to position, scale and rotation, and it is robust in the classification of images that have noise. Moreover, it classifies images that present an occlusion or elimination of approximately 50% of the area of the object.

  6. Large-scale metagenomic sequence clustering on map-reduce clusters.

    PubMed

    Yang, Xiao; Zola, Jaroslaw; Aluru, Srinivas

    2013-02-01

    Taxonomic clustering of species from millions of DNA fragments sequenced from their genomes is an important and frequently arising problem in metagenomics. In this paper, we present a parallel algorithm for taxonomic clustering of large metagenomic samples with support for overlapping clusters. We develop sketching techniques, akin to those created for web document clustering, to deduce significant similarities between pairs of sequences without resorting to expensive all vs. all comparison. We formulate the metagenomic classification problem as that of maximal quasi-clique enumeration in the resulting similarity graph, at multiple levels of the hierarchy as prescribed by different similarity thresholds. We cast execution of the underlying algorithmic steps as applications of the map-reduce framework to achieve a cloud ready implementation. We show that the resulting framework can produce high quality clustering of metagenomic samples consisting of millions of reads, in reasonable time limits, when executed on a modest size cluster. PMID:23427983

  7. Genomic characterization of Defluviitoga tunisiensis L3, a key hydrolytic bacterium in a thermophilic biogas plant and its abundance as determined by metagenome fragment recruitment.

    PubMed

    Maus, Irena; Cibis, Katharina Gabriela; Bremges, Andreas; Stolze, Yvonne; Wibberg, Daniel; Tomazetto, Geizecler; Blom, Jochen; Sczyrba, Alexander; König, Helmut; Pühler, Alfred; Schlüter, Andreas

    2016-08-20

    The genome sequence of Defluviitoga tunisiensis L3 originating from a thermophilic biogas-production plant was established and recently published as Genome Announcement by our group. The circular chromosome of D. tunisiensis L3 has a size of 2,053,097bp and a mean GC content of 31.38%. To analyze the D. tunisiensis L3 genome sequence in more detail, a phylogenetic analysis of completely sequenced Thermotogae strains based on shared core genes was performed. It appeared that Petrotoga mobilis DSM 10674(T), originally isolated from a North Sea oil-production well, is the closest relative of D. tunisiensis L3. Comparative genome analyses of P. mobilis DSM 10674(T) and D. tunisiensis L3 showed moderate similarities regarding occurrence of orthologous genes. Both genomes share a common set of 1351 core genes. Reconstruction of metabolic pathways important for the biogas production process revealed that the D. tunisiensis L3 genome encodes a large set of genes predicted to facilitate utilization of a variety of complex polysaccharides including cellulose, chitin and xylan. Ethanol, acetate, hydrogen (H2) and carbon dioxide (CO2) were found as possible end-products of the fermentation process. The latter three metabolites are considered to represent substrates for methanogenic Archaea, the key organisms in the final step of the anaerobic digestion process. To determine the degree of relatedness between D. tunisiensis L3 and dominant biogas community members within the thermophilic biogas-production plant, metagenome sequences obtained from the corresponding microbial community were mapped onto the L3 genome sequence. This fragment recruitment revealed that the D. tunisiensis L3 genome is almost completely covered with metagenome sequences featuring high matching accuracy. This result indicates that strains highly related or even identical to the reference strain D. tunisiensis L3 play a dominant role within the community of the thermophilic biogas-production plant. PMID

  8. Enhanced Acylcarnitine Annotation in High-Resolution Mass Spectrometry Data: Fragmentation Analysis for the Classification and Annotation of Acylcarnitines

    PubMed Central

    van der Hooft, Justin J. J.; Ridder, Lars; Barrett, Michael P.; Burgess, Karl E. V.

    2015-01-01

    Metabolite annotation and identification are primary challenges in untargeted metabolomics experiments. Rigorous workflows for reliable annotation of mass features with chemical structures or compound classes are needed to enhance the power of untargeted mass spectrometry. High-resolution mass spectrometry considerably improves the confidence in assigning elemental formulas to mass features in comparison to nominal mass spectrometry, and embedding of fragmentation methods enables more reliable metabolite annotations and facilitates metabolite classification. However, the analysis of mass fragmentation spectra can be a time-consuming step and requires expert knowledge. This study demonstrates how characteristic fragmentations, specific to compound classes, can be used to systematically analyze their presence in complex biological extracts like urine that have undergone untargeted mass spectrometry combined with data dependent or targeted fragmentation. Human urine extracts were analyzed using normal phase liquid chromatography (hydrophilic interaction chromatography) coupled to an Ion Trap-Orbitrap hybrid instrument. Subsequently, mass chromatograms and collision-induced dissociation and higher-energy collisional dissociation (HCD) fragments were annotated using the freely available MAGMa software1. Acylcarnitines play a central role in energy metabolism by transporting fatty acids into the mitochondrial matrix. By filtering on a combination of a mass fragment and neutral loss designed based on the MAGMa fragment annotations, we were able to classify and annotate 50 acylcarnitines in human urine extracts, based on high-resolution mass spectrometry HCD fragmentation spectra at different energies for all of them. Of these annotated acylcarnitines, 31 are not described in HMDB yet and for only 4 annotated acylcarnitines the fragmentation spectra could be matched to reference spectra. Therefore, we conclude that the use of mass fragmentation filters within the context

  9. The Phylogenetic Diversity of Metagenomes

    PubMed Central

    Kembel, Steven W.; Eisen, Jonathan A.; Pollard, Katherine S.; Green, Jessica L.

    2011-01-01

    Phylogenetic diversity—patterns of phylogenetic relatedness among organisms in ecological communities—provides important insights into the mechanisms underlying community assembly. Studies that measure phylogenetic diversity in microbial communities have primarily been limited to a single marker gene approach, using the small subunit of the rRNA gene (SSU-rRNA) to quantify phylogenetic relationships among microbial taxa. In this study, we present an approach for inferring phylogenetic relationships among microorganisms based on the random metagenomic sequencing of DNA fragments. To overcome challenges caused by the fragmentary nature of metagenomic data, we leveraged fully sequenced bacterial genomes as a scaffold to enable inference of phylogenetic relationships among metagenomic sequences from multiple phylogenetic marker gene families. The resulting metagenomic phylogeny can be used to quantify the phylogenetic diversity of microbial communities based on metagenomic data sets. We applied this method to understand patterns of microbial phylogenetic diversity and community assembly along an oceanic depth gradient, and compared our findings to previous studies of this gradient using SSU-rRNA gene and metagenomic analyses. Bacterial phylogenetic diversity was highest at intermediate depths beneath the ocean surface, whereas taxonomic diversity (diversity measured by binning sequences into taxonomically similar groups) showed no relationship with depth. Phylogenetic diversity estimates based on the SSU-rRNA gene and the multi-gene metagenomic phylogeny were broadly concordant, suggesting that our approach will be applicable to other metagenomic data sets for which corresponding SSU-rRNA gene sequences are unavailable. Our approach opens up the possibility of using metagenomic data to study microbial diversity in a phylogenetic context. PMID:21912589

  10. The Genomes OnLine Database (GOLD) v.5: a metadata management system based on a four level (meta)genome project classification

    SciTech Connect

    Reddy, Tatiparthi B. K.; Thomas, Alex D.; Stamatis, Dimitri; Bertsch, Jon; Isbandi, Michelle; Jansson, Jakob; Mallajosyula, Jyothi; Pagani, Ioanna; Lobos, Elizabeth A.; Kyrpides, Nikos C.

    2014-10-27

    The Genomes OnLine Database (GOLD; http://www.genomesonline.org) is a comprehensive online resource to catalog and monitor genetic studies worldwide. GOLD provides up-to-date status on complete and ongoing sequencing projects along with a broad array of curated metadata. Within this paper, we report version 5 (v.5) of the database. The newly designed database schema and web user interface supports several new features including the implementation of a four level (meta)genome project classification system and a simplified intuitive web interface to access reports and launch search tools. The database currently hosts information for about 19 200 studies, 56 000 Biosamples, 56 000 sequencing projects and 39 400 analysis projects. More than just a catalog of worldwide genome projects, GOLD is a manually curated, quality-controlled metadata warehouse. The problems encountered in integrating disparate and varying quality data into GOLD are briefly highlighted. Lastly, GOLD fully supports and follows the Genomic Standards Consortium (GSC) Minimum Information standards.

  11. The Genomes OnLine Database (GOLD) v.5: a metadata management system based on a four level (meta)genome project classification

    PubMed Central

    Reddy, T.B.K.; Thomas, Alex D.; Stamatis, Dimitri; Bertsch, Jon; Isbandi, Michelle; Jansson, Jakob; Mallajosyula, Jyothi; Pagani, Ioanna; Lobos, Elizabeth A.; Kyrpides, Nikos C.

    2015-01-01

    The Genomes OnLine Database (GOLD; http://www.genomesonline.org) is a comprehensive online resource to catalog and monitor genetic studies worldwide. GOLD provides up-to-date status on complete and ongoing sequencing projects along with a broad array of curated metadata. Here we report version 5 (v.5) of the database. The newly designed database schema and web user interface supports several new features including the implementation of a four level (meta)genome project classification system and a simplified intuitive web interface to access reports and launch search tools. The database currently hosts information for about 19 200 studies, 56 000 Biosamples, 56 000 sequencing projects and 39 400 analysis projects. More than just a catalog of worldwide genome projects, GOLD is a manually curated, quality-controlled metadata warehouse. The problems encountered in integrating disparate and varying quality data into GOLD are briefly highlighted. GOLD fully supports and follows the Genomic Standards Consortium (GSC) Minimum Information standards. PMID:25348402

  12. The Genomes OnLine Database (GOLD) v.5: a metadata management system based on a four level (meta)genome project classification.

    PubMed

    Reddy, T B K; Thomas, Alex D; Stamatis, Dimitri; Bertsch, Jon; Isbandi, Michelle; Jansson, Jakob; Mallajosyula, Jyothi; Pagani, Ioanna; Lobos, Elizabeth A; Kyrpides, Nikos C

    2015-01-01

    The Genomes OnLine Database (GOLD; http://www.genomesonline.org) is a comprehensive online resource to catalog and monitor genetic studies worldwide. GOLD provides up-to-date status on complete and ongoing sequencing projects along with a broad array of curated metadata. Here we report version 5 (v.5) of the database. The newly designed database schema and web user interface supports several new features including the implementation of a four level (meta)genome project classification system and a simplified intuitive web interface to access reports and launch search tools. The database currently hosts information for about 19,200 studies, 56,000 Biosamples, 56,000 sequencing projects and 39,400 analysis projects. More than just a catalog of worldwide genome projects, GOLD is a manually curated, quality-controlled metadata warehouse. The problems encountered in integrating disparate and varying quality data into GOLD are briefly highlighted. GOLD fully supports and follows the Genomic Standards Consortium (GSC) Minimum Information standards. PMID:25348402

  13. The metagenomic telescope.

    PubMed

    Szalkai, Balázs; Scheer, Ildikó; Nagy, Kinga; Vértessy, Beáta G; Grolmusz, Vince

    2014-01-01

    Next generation sequencing technologies led to the discovery of numerous new microbe species in diverse environmental samples. Some of the new species contain genes never encountered before. Some of these genes encode proteins with novel functions, and some of these genes encode proteins that perform some well-known function in a novel way. A tool, named the Metagenomic Telescope, is described here that applies artificial intelligence methods, and seems to be capable of identifying new protein functions even in the well-studied model organisms. As a proof-of-principle demonstration of the Metagenomic Telescope, we considered DNA repair enzymes in the present work. First we identified proteins in DNA repair in well-known organisms (i.e., proteins in base excision repair, nucleotide excision repair, mismatch repair and DNA break repair); next we applied multiple alignments and then built hidden Markov profiles for each protein separately, across well-researched organisms; next, using public depositories of metagenomes, originating from extreme environments, we identified DNA repair genes in the samples. While the phylogenetic classification of the metagenomic samples are not typically available, we hypothesized that some very special DNA repair strategies need to be applied in bacteria and Archaea living in those extreme circumstances. It is a difficult task to evaluate the results obtained from mostly unknown species; therefore we applied again the hidden Markov profiling: for the identified DNA repair genes in the extreme metagenomes, we prepared new hidden Markov profiles (for each genes separately, subsequent to a cluster analysis); and we searched for similarities to those profiles in model organisms. We have found well known DNA repair proteins, numerous proteins with unknown functions, and also proteins with known, but different functions in the model organisms. PMID:25054802

  14. The Metagenomic Telescope

    PubMed Central

    Szalkai, Balázs; Scheer, Ildikó; Nagy, Kinga; Vértessy, Beáta G.; Grolmusz, Vince

    2014-01-01

    Next generation sequencing technologies led to the discovery of numerous new microbe species in diverse environmental samples. Some of the new species contain genes never encountered before. Some of these genes encode proteins with novel functions, and some of these genes encode proteins that perform some well-known function in a novel way. A tool, named the Metagenomic Telescope, is described here that applies artificial intelligence methods, and seems to be capable of identifying new protein functions even in the well-studied model organisms. As a proof-of-principle demonstration of the Metagenomic Telescope, we considered DNA repair enzymes in the present work. First we identified proteins in DNA repair in well–known organisms (i.e., proteins in base excision repair, nucleotide excision repair, mismatch repair and DNA break repair); next we applied multiple alignments and then built hidden Markov profiles for each protein separately, across well–researched organisms; next, using public depositories of metagenomes, originating from extreme environments, we identified DNA repair genes in the samples. While the phylogenetic classification of the metagenomic samples are not typically available, we hypothesized that some very special DNA repair strategies need to be applied in bacteria and Archaea living in those extreme circumstances. It is a difficult task to evaluate the results obtained from mostly unknown species; therefore we applied again the hidden Markov profiling: for the identified DNA repair genes in the extreme metagenomes, we prepared new hidden Markov profiles (for each genes separately, subsequent to a cluster analysis); and we searched for similarities to those profiles in model organisms. We have found well known DNA repair proteins, numerous proteins with unknown functions, and also proteins with known, but different functions in the model organisms. PMID:25054802

  15. Exploration of noncoding sequences in metagenomes.

    PubMed

    Tobar-Tosse, Fabián; Rodríguez, Adrián C; Vélez, Patricia E; Zambrano, María M; Moreno, Pedro A

    2013-01-01

    Environment-dependent genomic features have been defined for different metagenomes, whose genes and their associated processes are related to specific environments. Identification of ORFs and their functional categories are the most common methods for association between functional and environmental features. However, this analysis based on finding ORFs misses noncoding sequences and, therefore, some metagenome regulatory or structural information could be discarded. In this work we analyzed 23 whole metagenomes, including coding and noncoding sequences using the following sequence patterns: (G+C) content, Codon Usage (Cd), Trinucleotide Usage (Tn), and functional assignments for ORF prediction. Herein, we present evidence of a high proportion of noncoding sequences discarded in common similarity-based methods in metagenomics, and the kind of relevant information present in those. We found a high density of trinucleotide repeat sequences (TRS) in noncoding sequences, with a regulatory and adaptive function for metagenome communities. We present associations between trinucleotide values and gene function, where metagenome clustering correlate with microorganism adaptations and kinds of metagenomes. We propose here that noncoding sequences have relevant information to describe metagenomes that could be considered in a whole metagenome analysis in order to improve their organization, classification protocols, and their relation with the environment. PMID:23536879

  16. Exploration of Noncoding Sequences in Metagenomes

    PubMed Central

    Tobar-Tosse, Fabián; Rodríguez, Adrián C.; Vélez, Patricia E.; Zambrano, María M.; Moreno, Pedro A.

    2013-01-01

    Environment-dependent genomic features have been defined for different metagenomes, whose genes and their associated processes are related to specific environments. Identification of ORFs and their functional categories are the most common methods for association between functional and environmental features. However, this analysis based on finding ORFs misses noncoding sequences and, therefore, some metagenome regulatory or structural information could be discarded. In this work we analyzed 23 whole metagenomes, including coding and noncoding sequences using the following sequence patterns: (G+C) content, Codon Usage (Cd), Trinucleotide Usage (Tn), and functional assignments for ORF prediction. Herein, we present evidence of a high proportion of noncoding sequences discarded in common similarity-based methods in metagenomics, and the kind of relevant information present in those. We found a high density of trinucleotide repeat sequences (TRS) in noncoding sequences, with a regulatory and adaptive function for metagenome communities. We present associations between trinucleotide values and gene function, where metagenome clustering correlate with microorganism adaptations and kinds of metagenomes. We propose here that noncoding sequences have relevant information to describe metagenomes that could be considered in a whole metagenome analysis in order to improve their organization, classification protocols, and their relation with the environment. PMID:23536879

  17. Using the metagenomics RAST server (MG-RAST) for analyzing shotgun metagenomes.

    SciTech Connect

    Glass, E. M.; Wilkening, J.; Wilke, A.; Antonopoulos, D.; Meyer, F.

    2010-01-01

    Shotgun metagenomics creates millions of fragments of short DNA reads, which are meaningless unless analyzed appropriately. The Metagenomics RAST server (MG-RAST) is a web-based, open source system that offers a unique suite of tools for analyzing these data sets. After de-replication and quality control, fragments are mapped against a comprehensive nonredundant database (NR). Phylogenetic and metabolic reconstructions are computed from the set of hits against the NR. The resulting data are made available for browsing, download, and most importantly, comparison against a comprehensive collection of public metagenomes. A submitted metagenome is visible only to the user, unless the user makes it public or shares with other registered users. Public metagenomes are available to all.

  18. METAXA2: improved identification and taxonomic classification of small and large subunit rRNA in metagenomic data.

    PubMed

    Bengtsson-Palme, Johan; Hartmann, Martin; Eriksson, Karl Martin; Pal, Chandan; Thorell, Kaisa; Larsson, Dan Göran Joakim; Nilsson, Rolf Henrik

    2015-11-01

    The ribosomal rRNA genes are widely used as genetic markers for taxonomic identification of microbes. Particularly the small subunit (SSU; 16S/18S) rRNA gene is frequently used for species- or genus-level identification, but also the large subunit (LSU; 23S/28S) rRNA gene is employed in taxonomic assignment. The METAXA software tool is a popular utility for extracting partial rRNA sequences from large sequencing data sets and assigning them to an archaeal, bacterial, nuclear eukaryote, mitochondrial or chloroplast origin. This study describes a comprehensive update to METAXA - METAXA2 - that extends the capabilities of the tool, introducing support for the LSU rRNA gene, a greatly improved classifier allowing classification down to genus or species level, as well as enhanced support for short-read (100 bp) and paired-end sequences, among other changes. The performance of METAXA2 was compared to other commonly used taxonomic classifiers, showing that METAXA2 often outperforms previous methods in terms of making correct predictions while maintaining a low misclassification rate. METAXA2 is freely available from http://microbiology.se/software/metaxa2/. PMID:25732605

  19. Swine Fecal Metagenomics

    EPA Science Inventory

    Metagenomic approaches are providing rapid and more robust means to investigate the composition and functional genetic potential of complex microbial communities. In this study, we utilized a metagenomic approach to further understand the functional diversity of the swine gut. To...

  20. A comparative evaluation of sequence classification programs

    PubMed Central

    2012-01-01

    Background A fundamental problem in modern genomics is to taxonomically or functionally classify DNA sequence fragments derived from environmental sampling (i.e., metagenomics). Several different methods have been proposed for doing this effectively and efficiently, and many have been implemented in software. In addition to varying their basic algorithmic approach to classification, some methods screen sequence reads for ’barcoding genes’ like 16S rRNA, or various types of protein-coding genes. Due to the sheer number and complexity of methods, it can be difficult for a researcher to choose one that is well-suited for a particular analysis. Results We divided the very large number of programs that have been released in recent years for solving the sequence classification problem into three main categories based on the general algorithm they use to compare a query sequence against a database of sequences. We also evaluated the performance of the leading programs in each category on data sets whose taxonomic and functional composition is known. Conclusions We found significant variability in classification accuracy, precision, and resource consumption of sequence classification programs when used to analyze various metagenomics data sets. However, we observe some general trends and patterns that will be useful to researchers who use sequence classification programs. PMID:22574964

  1. Classification

    ERIC Educational Resources Information Center

    Clary, Renee; Wandersee, James

    2013-01-01

    In this article, Renee Clary and James Wandersee describe the beginnings of "Classification," which lies at the very heart of science and depends upon pattern recognition. Clary and Wandersee approach patterns by first telling the story of the "Linnaean classification system," introduced by Carl Linnacus (1707-1778), who is…

  2. Interactive metagenomic visualization in a Web browser

    PubMed Central

    2011-01-01

    Background A critical output of metagenomic studies is the estimation of abundances of taxonomical or functional groups. The inherent uncertainty in assignments to these groups makes it important to consider both their hierarchical contexts and their prediction confidence. The current tools for visualizing metagenomic data, however, omit or distort quantitative hierarchical relationships and lack the facility for displaying secondary variables. Results Here we present Krona, a new visualization tool that allows intuitive exploration of relative abundances and confidences within the complex hierarchies of metagenomic classifications. Krona combines a variant of radial, space-filling displays with parametric coloring and interactive polar-coordinate zooming. The HTML5 and JavaScript implementation enables fully interactive charts that can be explored with any modern Web browser, without the need for installed software or plug-ins. This Web-based architecture also allows each chart to be an independent document, making them easy to share via e-mail or post to a standard Web server. To illustrate Krona's utility, we describe its application to various metagenomic data sets and its compatibility with popular metagenomic analysis tools. Conclusions Krona is both a powerful metagenomic visualization tool and a demonstration of the potential of HTML5 for highly accessible bioinformatic visualizations. Its rich and interactive displays facilitate more informed interpretations of metagenomic analyses, while its implementation as a browser-based application makes it extremely portable and easily adopted into existing analysis packages. Both the Krona rendering code and conversion tools are freely available under a BSD open-source license, and available from: http://krona.sourceforge.net. PMID:21961884

  3. Livermore Metagenomics Analysis Toolkit

    Energy Science and Technology Software Center (ESTSC)

    2012-10-01

    LMAT is designed to take as input a collection of raw metagenomic sequencer reads, and search each read against a reference genome database and assign a taxonomic label and confidence value to each read and report a summary of the predicted taxonomic contents of the metagenomic sample.

  4. Megraft: A software package to graft ribosomal small subunit (16S/18S) fragments onto full-length sequences for accurate species richness and sequencing depth analysis in pyrosequencing-length metagenomes

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Metagenomic libraries represent subsamples of the total DNA found at a study site and offer unprecedented opportunities to study ecological and functional aspects of microbial communities. To examine the depth of the sequencing effort, rarefaction analysis of the ribosomal small sub-unit (SSU/16S/18...

  5. Metagenomic islands of hyperhalophiles: the case of Salinibacter ruber

    PubMed Central

    2009-01-01

    Background Saturated brines are extreme environments of low diversity. Salinibacter ruber is the only bacterium that inhabits this environment in significant numbers. In order to establish the extent of genetic diversity in natural populations of this microbe, the genomic sequence of reference strain DSM 13855 was compared to metagenomic fragments recovered from climax saltern crystallizers and obtained with 454 sequencing technology. This kind of analysis reveals the presence of metagenomic islands, i.e. highly variable regions among the different lineages in the population. Results Three regions of the sequenced isolate were scarcely represented in the metagenome thus appearing to vary among co-occurring S. ruber cells. These metagenomic islands showed evidence of extensive genomic corruption with atypically low GC content, low coding density, high numbers of pseudogenes and short hypothetical proteins. A detailed analysis of island gene content showed that the genes in metagenomic island 1 code for cell surface polysaccharides. The strain-specific genes of metagenomic island 2 were found to be involved in biosynthesis of cell wall polysaccharide components. Finally, metagenomic island 3 was rich in DNA related enzymes. Conclusion The genomic organisation of S. ruber variable genomic regions showed a number of convergences with genomic islands of marine microbes studied, being largely involved in variable cell surface traits. This variation at the level of cell envelopes in an environment devoid of grazing pressure probably reflects a global strategy of bacteria to escape phage predation. PMID:19951421

  6. Utilizing de Bruijn graph of metagenome assembly for metatranscriptome analysis

    PubMed Central

    Ye, Yuzhen; Tang, Haixu

    2016-01-01

    Motivation: Metagenomics research has accelerated the studies of microbial organisms, providing insights into the composition and potential functionality of various microbial communities. Metatranscriptomics (studies of the transcripts from a mixture of microbial species) and other meta-omics approaches hold even greater promise for providing additional insights into functional and regulatory characteristics of the microbial communities. Current metatranscriptomics projects are often carried out without matched metagenomic datasets (of the same microbial communities). For the projects that produce both metatranscriptomic and metagenomic datasets, their analyses are often not integrated. Metagenome assemblies are far from perfect, partially explaining why metagenome assemblies are not used for the analysis of metatranscriptomic datasets. Results: Here, we report a reads mapping algorithm for mapping of short reads onto a de Bruijn graph of assemblies. A hash table of junction k-mers (k-mers spanning branching structures in the de Bruijn graph) is used to facilitate fast mapping of reads to the graph. We developed an application of this mapping algorithm: a reference-based approach to metatranscriptome assembly using graphs of metagenome assembly as the reference. Our results show that this new approach (called TAG) helps to assemble substantially more transcripts that otherwise would have been missed or truncated because of the fragmented nature of the reference metagenome. Availability and implementation: TAG was implemented in C++ and has been tested extensively on the Linux platform. It is available for download as open source at http://omics.informatics.indiana.edu/TAG. Contact: yye@indiana.edu PMID:26319390

  7. Classification

    NASA Astrophysics Data System (ADS)

    Oza, Nikunj

    2012-03-01

    A supervised learning task involves constructing a mapping from input data (normally described by several features) to the appropriate outputs. A set of training examples— examples with known output values—is used by a learning algorithm to generate a model. This model is intended to approximate the mapping between the inputs and outputs. This model can be used to generate predicted outputs for inputs that have not been seen before. Within supervised learning, one type of task is a classification learning task, in which each output is one or more classes to which the input belongs. For example, we may have data consisting of observations of sunspots. In a classification learning task, our goal may be to learn to classify sunspots into one of several types. Each example may correspond to one candidate sunspot with various measurements or just an image. A learning algorithm would use the supplied examples to generate a model that approximates the mapping between each supplied set of measurements and the type of sunspot. This model can then be used to classify previously unseen sunspots based on the candidate’s measurements. The generalization performance of a learned model (how closely the target outputs and the model’s predicted outputs agree for patterns that have not been presented to the learning algorithm) would provide an indication of how well the model has learned the desired mapping. More formally, a classification learning algorithm L takes a training set T as its input. The training set consists of |T| examples or instances. It is assumed that there is a probability distribution D from which all training examples are drawn independently—that is, all the training examples are independently and identically distributed (i.i.d.). The ith training example is of the form (x_i, y_i), where x_i is a vector of values of several features and y_i represents the class to be predicted.* In the sunspot classification example given above, each training example

  8. Structural and functional insights from the metagenome of an acidic hot spring microbial planktonic community in the Colombian Andes.

    PubMed

    Jiménez, Diego Javier; Andreote, Fernando Dini; Chaves, Diego; Montaña, José Salvador; Osorio-Forero, Cesar; Junca, Howard; Zambrano, María Mercedes; Baena, Sandra

    2012-01-01

    A taxonomic and annotated functional description of microbial life was deduced from 53 Mb of metagenomic sequence retrieved from a planktonic fraction of the Neotropical high Andean (3,973 meters above sea level) acidic hot spring El Coquito (EC). A classification of unassembled metagenomic reads using different databases showed a high proportion of Gammaproteobacteria and Alphaproteobacteria (in total read affiliation), and through taxonomic affiliation of 16S rRNA gene fragments we observed the presence of Proteobacteria, micro-algae chloroplast and Firmicutes. Reads mapped against the genomes Acidiphilium cryptum JF-5, Legionella pneumophila str. Corby and Acidithiobacillus caldus revealed the presence of transposase-like sequences, potentially involved in horizontal gene transfer. Functional annotation and hierarchical comparison with different datasets obtained by pyrosequencing in different ecosystems showed that the microbial community also contained extensive DNA repair systems, possibly to cope with ultraviolet radiation at such high altitudes. Analysis of genes involved in the nitrogen cycle indicated the presence of dissimilatory nitrate reduction to N2 (narGHI, nirS, norBCDQ and nosZ), associated with Proteobacteria-like sequences. Genes involved in the sulfur cycle (cysDN, cysNC and aprA) indicated adenylsulfate and sulfite production that were affiliated to several bacterial species. In summary, metagenomic sequence data provided insight regarding the structure and possible functions of this hot spring microbial community, describing some groups potentially involved in the nitrogen and sulfur cycling in this environment. PMID:23251687

  9. Structural and Functional Insights from the Metagenome of an Acidic Hot Spring Microbial Planktonic Community in the Colombian Andes

    PubMed Central

    Jiménez, Diego Javier; Andreote, Fernando Dini; Chaves, Diego; Montaña, José Salvador; Osorio-Forero, Cesar; Junca, Howard; Zambrano, María Mercedes; Baena, Sandra

    2012-01-01

    A taxonomic and annotated functional description of microbial life was deduced from 53 Mb of metagenomic sequence retrieved from a planktonic fraction of the Neotropical high Andean (3,973 meters above sea level) acidic hot spring El Coquito (EC). A classification of unassembled metagenomic reads using different databases showed a high proportion of Gammaproteobacteria and Alphaproteobacteria (in total read affiliation), and through taxonomic affiliation of 16S rRNA gene fragments we observed the presence of Proteobacteria, micro-algae chloroplast and Firmicutes. Reads mapped against the genomes Acidiphilium cryptum JF-5, Legionella pneumophila str. Corby and Acidithiobacillus caldus revealed the presence of transposase-like sequences, potentially involved in horizontal gene transfer. Functional annotation and hierarchical comparison with different datasets obtained by pyrosequencing in different ecosystems showed that the microbial community also contained extensive DNA repair systems, possibly to cope with ultraviolet radiation at such high altitudes. Analysis of genes involved in the nitrogen cycle indicated the presence of dissimilatory nitrate reduction to N2 (narGHI, nirS, norBCDQ and nosZ), associated with Proteobacteria-like sequences. Genes involved in the sulfur cycle (cysDN, cysNC and aprA) indicated adenylsulfate and sulfite production that were affiliated to several bacterial species. In summary, metagenomic sequence data provided insight regarding the structure and possible functions of this hot spring microbial community, describing some groups potentially involved in the nitrogen and sulfur cycling in this environment. PMID:23251687

  10. Classification

    NASA Technical Reports Server (NTRS)

    Oza, Nikunj C.

    2011-01-01

    A supervised learning task involves constructing a mapping from input data (normally described by several features) to the appropriate outputs. Within supervised learning, one type of task is a classification learning task, in which each output is one or more classes to which the input belongs. In supervised learning, a set of training examples---examples with known output values---is used by a learning algorithm to generate a model. This model is intended to approximate the mapping between the inputs and outputs. This model can be used to generate predicted outputs for inputs that have not been seen before. For example, we may have data consisting of observations of sunspots. In a classification learning task, our goal may be to learn to classify sunspots into one of several types. Each example may correspond to one candidate sunspot with various measurements or just an image. A learning algorithm would use the supplied examples to generate a model that approximates the mapping between each supplied set of measurements and the type of sunspot. This model can then be used to classify previously unseen sunspots based on the candidate's measurements. This chapter discusses methods to perform machine learning, with examples involving astronomy.

  11. Ocean microbial metagenomics

    NASA Astrophysics Data System (ADS)

    Kerkhof, Lee J.; Goodman, Robert M.

    2009-09-01

    Technology for accessing the genomic DNA of microorganisms, directly from environmental samples without prior cultivation, has opened new vistas to understanding microbial diversity and functions. Especially as applied to soils and the oceans, environments on Earth where microbial diversity is vast, metagenomics and its emergent approaches have the power to transform rapidly our understanding of environmental microbiology. Here we explore select recent applications of the metagenomic suite to ocean microbiology.

  12. Random Whole Metagenomic Sequencing for Forensic Discrimination of Soils

    PubMed Central

    Khodakova, Anastasia S.; Smith, Renee J.; Burgoyne, Leigh; Abarno, Damien; Linacre, Adrian

    2014-01-01

    Here we assess the ability of random whole metagenomic sequencing approaches to discriminate between similar soils from two geographically distinct urban sites for application in forensic science. Repeat samples from two parklands in residential areas separated by approximately 3 km were collected and the DNA was extracted. Shotgun, whole genome amplification (WGA) and single arbitrarily primed DNA amplification (AP-PCR) based sequencing techniques were then used to generate soil metagenomic profiles. Full and subsampled metagenomic datasets were then annotated against M5NR/M5RNA (taxonomic classification) and SEED Subsystems (metabolic classification) databases. Further comparative analyses were performed using a number of statistical tools including: hierarchical agglomerative clustering (CLUSTER); similarity profile analysis (SIMPROF); non-metric multidimensional scaling (NMDS); and canonical analysis of principal coordinates (CAP) at all major levels of taxonomic and metabolic classification. Our data showed that shotgun and WGA-based approaches generated highly similar metagenomic profiles for the soil samples such that the soil samples could not be distinguished accurately. An AP-PCR based approach was shown to be successful at obtaining reproducible site-specific metagenomic DNA profiles, which in turn were employed for successful discrimination of visually similar soil samples collected from two different locations. PMID:25111003

  13. Random whole metagenomic sequencing for forensic discrimination of soils.

    PubMed

    Khodakova, Anastasia S; Smith, Renee J; Burgoyne, Leigh; Abarno, Damien; Linacre, Adrian

    2014-01-01

    Here we assess the ability of random whole metagenomic sequencing approaches to discriminate between similar soils from two geographically distinct urban sites for application in forensic science. Repeat samples from two parklands in residential areas separated by approximately 3 km were collected and the DNA was extracted. Shotgun, whole genome amplification (WGA) and single arbitrarily primed DNA amplification (AP-PCR) based sequencing techniques were then used to generate soil metagenomic profiles. Full and subsampled metagenomic datasets were then annotated against M5NR/M5RNA (taxonomic classification) and SEED Subsystems (metabolic classification) databases. Further comparative analyses were performed using a number of statistical tools including: hierarchical agglomerative clustering (CLUSTER); similarity profile analysis (SIMPROF); non-metric multidimensional scaling (NMDS); and canonical analysis of principal coordinates (CAP) at all major levels of taxonomic and metabolic classification. Our data showed that shotgun and WGA-based approaches generated highly similar metagenomic profiles for the soil samples such that the soil samples could not be distinguished accurately. An AP-PCR based approach was shown to be successful at obtaining reproducible site-specific metagenomic DNA profiles, which in turn were employed for successful discrimination of visually similar soil samples collected from two different locations. PMID:25111003

  14. Metagenomic mining for microbiologists.

    PubMed

    Delmont, Tom O; Malandain, Cedric; Prestat, Emmanuel; Larose, Catherine; Monier, Jean-Michel; Simonet, Pascal; Vogel, Timothy M

    2011-12-01

    Microbial ecologists can now start digging into the accumulating mountains of metagenomic data to uncover the occurrence of functional genes and their correlations to microbial community members. Limitations and biases in DNA extraction and sequencing technologies impact sequence distributions, and therefore, have to be considered. However, when comparing metagenomes from widely differing environments, these fluctuations have a relatively minor role in microbial community discrimination. As a consequence, any functional gene or species distribution pattern can be compared among metagenomes originating from various environments and projects. In particular, global comparisons would help to define ecosystem specificities, such as involvement and response to climate change (for example, carbon and nitrogen cycle), human health risks (eg, presence of pathogen species, toxin genes and viruses) and biodegradation capacities. Although not all scientists have easy access to high-throughput sequencing technologies, they do have access to the sequences that have been deposited in databases, and therefore, can begin to intensively mine these metagenomic data to generate hypotheses that can be validated experimentally. Information about metabolic functions and microbial species compositions can already be compared among metagenomes from different ecosystems. These comparisons add to our understanding about microbial adaptation and the role of specific microbes in different ecosystems. Concurrent with the rapid growth of sequencing technologies, we have entered a new age of microbial ecology, which will enable researchers to experimentally confirm putative relationships between microbial functions and community structures. PMID:21593798

  15. Metagenomic mining for microbiologists

    PubMed Central

    Delmont, Tom O; Malandain, Cedric; Prestat, Emmanuel; Larose, Catherine; Monier, Jean-Michel; Simonet, Pascal; Vogel, Timothy M

    2011-01-01

    Microbial ecologists can now start digging into the accumulating mountains of metagenomic data to uncover the occurrence of functional genes and their correlations to microbial community members. Limitations and biases in DNA extraction and sequencing technologies impact sequence distributions, and therefore, have to be considered. However, when comparing metagenomes from widely differing environments, these fluctuations have a relatively minor role in microbial community discrimination. As a consequence, any functional gene or species distribution pattern can be compared among metagenomes originating from various environments and projects. In particular, global comparisons would help to define ecosystem specificities, such as involvement and response to climate change (for example, carbon and nitrogen cycle), human health risks (eg, presence of pathogen species, toxin genes and viruses) and biodegradation capacities. Although not all scientists have easy access to high-throughput sequencing technologies, they do have access to the sequences that have been deposited in databases, and therefore, can begin to intensively mine these metagenomic data to generate hypotheses that can be validated experimentally. Information about metabolic functions and microbial species compositions can already be compared among metagenomes from different ecosystems. These comparisons add to our understanding about microbial adaptation and the role of specific microbes in different ecosystems. Concurrent with the rapid growth of sequencing technologies, we have entered a new age of microbial ecology, which will enable researchers to experimentally confirm putative relationships between microbial functions and community structures. PMID:21593798

  16. Metagenomics of extreme environments.

    PubMed

    Cowan, D A; Ramond, J-B; Makhalanyane, T P; De Maayer, P

    2015-06-01

    Whether they are exposed to extremes of heat or cold, or buried deep beneath the Earth's surface, microorganisms have an uncanny ability to survive under these conditions. This ability to survive has fascinated scientists for nearly a century, but the recent development of metagenomics and 'omics' tools has allowed us to make huge leaps in understanding the remarkable complexity and versatility of extremophile communities. Here, in the context of the recently developed metagenomic tools, we discuss recent research on the community composition, adaptive strategies and biological functions of extremophiles. PMID:26048196

  17. Recent progresses in metagenomics

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Metagenomics addresses the collective genetic structure and functional composition of a microbial community at its native habitat. This approach has emerged as a powerful tool to study the structure and function of the microbiota for the past few years and is revolutionizing studies of microbial ec...

  18. Beyond Biodiversity: Fish Metagenomes

    PubMed Central

    Ardura, Alba; Planes, Serge; Garcia-Vazquez, Eva

    2011-01-01

    Biodiversity and intra-specific genetic diversity are interrelated and determine the potential of a community to survive and evolve. Both are considered together in Prokaryote communities treated as metagenomes or ensembles of functional variants beyond species limits. Many factors alter biodiversity in higher Eukaryote communities, and human exploitation can be one of the most important for some groups of plants and animals. For example, fisheries can modify both biodiversity and genetic diversity (intra specific). Intra-specific diversity can be drastically altered by overfishing. Intense fishing pressure on one stock may imply extinction of some genetic variants and subsequent loss of intra-specific diversity. The objective of this study was to apply a metagenome approach to fish communities and explore its value for rapid evaluation of biodiversity and genetic diversity at community level. Here we have applied the metagenome approach employing the Barcoding target gene COI as a model sequence in catch from four very different fish assemblages exploited by fisheries: freshwater communities from the Amazon River and northern Spanish rivers, and marine communities from the Cantabric and Mediterranean seas. Treating all sequences obtained from each regional catch as a biological unit (exploited community) we found that metagenomic diversity indices of the Amazonian catch sample here examined were lower than expected. Reduced diversity could be explained, at least partially, by overexploitation of the fish community that had been independently estimated by other methods. We propose using a metagenome approach for estimating diversity in Eukaryote communities and early evaluating genetic variation losses at multi-species level. PMID:21829636

  19. Beyond biodiversity: fish metagenomes.

    PubMed

    Ardura, Alba; Planes, Serge; Garcia-Vazquez, Eva

    2011-01-01

    Biodiversity and intra-specific genetic diversity are interrelated and determine the potential of a community to survive and evolve. Both are considered together in Prokaryote communities treated as metagenomes or ensembles of functional variants beyond species limits.Many factors alter biodiversity in higher Eukaryote communities, and human exploitation can be one of the most important for some groups of plants and animals. For example, fisheries can modify both biodiversity and genetic diversity (intra specific). Intra-specific diversity can be drastically altered by overfishing. Intense fishing pressure on one stock may imply extinction of some genetic variants and subsequent loss of intra-specific diversity. The objective of this study was to apply a metagenome approach to fish communities and explore its value for rapid evaluation of biodiversity and genetic diversity at community level. Here we have applied the metagenome approach employing the barcoding target gene coi as a model sequence in catch from four very different fish assemblages exploited by fisheries: freshwater communities from the Amazon River and northern Spanish rivers, and marine communities from the Cantabric and Mediterranean seas.Treating all sequences obtained from each regional catch as a biological unit (exploited community) we found that metagenomic diversity indices of the Amazonian catch sample here examined were lower than expected. Reduced diversity could be explained, at least partially, by overexploitation of the fish community that had been independently estimated by other methods.We propose using a metagenome approach for estimating diversity in Eukaryote communities and early evaluating genetic variation losses at multi-species level. PMID:21829636

  20. Recovering full-length viral genomes from metagenomes

    PubMed Central

    Smits, Saskia L.; Bodewes, Rogier; Ruiz-González, Aritz; Baumgärtner, Wolfgang; Koopmans, Marion P.; Osterhaus, Albert D. M. E.; Schürch, Anita C.

    2015-01-01

    Infectious disease metagenomics is driven by the question: “what is causing the disease?” in contrast to classical metagenome studies which are guided by “what is out there?” In case of a novel virus, a first step to eventually establishing etiology can be to recover a full-length viral genome from a metagenomic sample. However, retrieval of a full-length genome of a divergent virus is technically challenging and can be time-consuming and costly. Here we discuss different assembly and fragment linkage strategies such as iterative assembly, motif searches, k-mer frequency profiling, coverage profile binning, and other strategies used to recover genomes of potential viral pathogens in a timely and cost-effective manner. PMID:26483782

  1. Accessing the Soil Metagenome for Studies of Microbial Diversity▿ †

    PubMed Central

    Delmont, Tom O.; Robe, Patrick; Cecillon, Sébastien; Clark, Ian M.; Constancias, Florentin; Simonet, Pascal; Hirsch, Penny R.; Vogel, Timothy M.

    2011-01-01

    Soil microbial communities contain the highest level of prokaryotic diversity of any environment, and metagenomic approaches involving the extraction of DNA from soil can improve our access to these communities. Most analyses of soil biodiversity and function assume that the DNA extracted represents the microbial community in the soil, but subsequent interpretations are limited by the DNA recovered from the soil. Unfortunately, extraction methods do not provide a uniform and unbiased subsample of metagenomic DNA, and as a consequence, accurate species distributions cannot be determined. Moreover, any bias will propagate errors in estimations of overall microbial diversity and may exclude some microbial classes from study and exploitation. To improve metagenomic approaches, investigate DNA extraction biases, and provide tools for assessing the relative abundances of different groups, we explored the biodiversity of the accessible community DNA by fractioning the metagenomic DNA as a function of (i) vertical soil sampling, (ii) density gradients (cell separation), (iii) cell lysis stringency, and (iv) DNA fragment size distribution. Each fraction had a unique genetic diversity, with different predominant and rare species (based on ribosomal intergenic spacer analysis [RISA] fingerprinting and phylochips). All fractions contributed to the number of bacterial groups uncovered in the metagenome, thus increasing the DNA pool for further applications. Indeed, we were able to access a more genetically diverse proportion of the metagenome (a gain of more than 80% compared to the best single extraction method), limit the predominance of a few genomes, and increase the species richness per sequencing effort. This work stresses the difference between extracted DNA pools and the currently inaccessible complete soil metagenome. PMID:21183646

  2. Metagenomic Analysis of Bacterial Communities of Antarctic Surface Snow.

    PubMed

    Lopatina, Anna; Medvedeva, Sofia; Shmakov, Sergey; Logacheva, Maria D; Krylenkov, Vjacheslav; Severinov, Konstantin

    2016-01-01

    The diversity of bacteria present in surface snow around four Russian stations in Eastern Antarctica was studied by high throughput sequencing of amplified 16S rRNA gene fragments and shotgun metagenomic sequencing. Considerable class- and genus-level variation between the samples was revealed indicating a presence of inter-site diversity of bacteria in Antarctic snow. Flavobacterium was a major genus in one sampling site and was also detected in other sites. The diversity of flavobacterial type II-C CRISPR spacers in the samples was investigated by metagenome sequencing. Thousands of unique spacers were revealed with less than 35% overlap between the sampling sites, indicating an enormous natural variety of flavobacterial CRISPR spacers and, by extension, high level of adaptive activity of the corresponding CRISPR-Cas system. None of the spacers matched known spacers of flavobacterial isolates from the Northern hemisphere. Moreover, the percentage of spacers with matches with Antarctic metagenomic sequences obtained in this work was significantly higher than with sequences from much larger publically available environmental metagenomic database. The results indicate that despite the overall very high level of diversity, Antarctic Flavobacteria comprise a separate pool that experiences pressures from mobile genetic elements different from those present in other parts of the world. The results also establish analysis of metagenomic CRISPR spacer content as a powerful tool to study bacterial populations diversity. PMID:27064693

  3. Metagenomic Analysis of Bacterial Communities of Antarctic Surface Snow

    PubMed Central

    Lopatina, Anna; Medvedeva, Sofia; Shmakov, Sergey; Logacheva, Maria D.; Krylenkov, Vjacheslav; Severinov, Konstantin

    2016-01-01

    The diversity of bacteria present in surface snow around four Russian stations in Eastern Antarctica was studied by high throughput sequencing of amplified 16S rRNA gene fragments and shotgun metagenomic sequencing. Considerable class- and genus-level variation between the samples was revealed indicating a presence of inter-site diversity of bacteria in Antarctic snow. Flavobacterium was a major genus in one sampling site and was also detected in other sites. The diversity of flavobacterial type II-C CRISPR spacers in the samples was investigated by metagenome sequencing. Thousands of unique spacers were revealed with less than 35% overlap between the sampling sites, indicating an enormous natural variety of flavobacterial CRISPR spacers and, by extension, high level of adaptive activity of the corresponding CRISPR-Cas system. None of the spacers matched known spacers of flavobacterial isolates from the Northern hemisphere. Moreover, the percentage of spacers with matches with Antarctic metagenomic sequences obtained in this work was significantly higher than with sequences from much larger publically available environmental metagenomic database. The results indicate that despite the overall very high level of diversity, Antarctic Flavobacteria comprise a separate pool that experiences pressures from mobile genetic elements different from those present in other parts of the world. The results also establish analysis of metagenomic CRISPR spacer content as a powerful tool to study bacterial populations diversity. PMID:27064693

  4. Hot Spring Metagenomics

    PubMed Central

    López-López, Olalla; Cerdán, María Esperanza; González-Siso, María Isabel

    2013-01-01

    Hot springs have been investigated since the XIX century, but isolation and examination of their thermophilic microbial inhabitants did not start until the 1950s. Many thermophilic microorganisms and their viruses have since been discovered, although the real complexity of thermal communities was envisaged when research based on PCR amplification of the 16S rRNA genes arose. Thereafter, the possibility of cloning and sequencing the total environmental DNA, defined as metagenome, and the study of the genes rescued in the metagenomic libraries and assemblies made it possible to gain a more comprehensive understanding of microbial communities—their diversity, structure, the interactions existing between their components, and the factors shaping the nature of these communities. In the last decade, hot springs have been a source of thermophilic enzymes of industrial interest, encouraging further study of the poorly understood diversity of microbial life in these habitats. PMID:25369743

  5. Microbial Metagenomics: Beyond the Genome

    NASA Astrophysics Data System (ADS)

    Gilbert, Jack A.; Dupont, Christopher L.

    2011-01-01

    Metagenomics literally means “beyond the genome.” Marine microbial metagenomic databases presently comprise ˜400 billion base pairs of DNA, only ˜3% of that found in 1 ml of seawater. Very soon a trillion-base-pair sequence run will be feasible, so it is time to reflect on what we have learned from metagenomics. We review the impact of metagenomics on our understanding of marine microbial communities. We consider the studies facilitated by data generated through the Global Ocean Sampling expedition, as well as the revolution wrought at the individual laboratory level through next generation sequencing technologies. We review recent studies and discoveries since 2008, provide a discussion of bioinformatic analyses, including conceptual pipelines and sequence annotation and predict the future of metagenomics, with suggestions of collaborative community studies tailored toward answering some of the fundamental questions in marine microbial ecology.

  6. A study of the effectiveness of machine learning methods for classification of clinical interview fragments into a large number of categories.

    PubMed

    Hasan, Mehedi; Kotov, Alexander; Idalski Carcone, April; Dong, Ming; Naar, Sylvie; Brogan Hartlieb, Kathryn

    2016-08-01

    This study examines the effectiveness of state-of-the-art supervised machine learning methods in conjunction with different feature types for the task of automatic annotation of fragments of clinical text based on codebooks with a large number of categories. We used a collection of motivational interview transcripts consisting of 11,353 utterances, which were manually annotated by two human coders as the gold standard, and experimented with state-of-art classifiers, including Naïve Bayes, J48 Decision Tree, Support Vector Machine (SVM), Random Forest (RF), AdaBoost, DiscLDA, Conditional Random Fields (CRF) and Convolutional Neural Network (CNN) in conjunction with lexical, contextual (label of the previous utterance) and semantic (distribution of words in the utterance across the Linguistic Inquiry and Word Count dictionaries) features. We found out that, when the number of classes is large, the performance of CNN and CRF is inferior to SVM. When only lexical features were used, interview transcripts were automatically annotated by SVM with the highest classification accuracy among all classifiers of 70.8%, 61% and 53.7% based on the codebooks consisting of 17, 20 and 41 codes, respectively. Using contextual and semantic features, as well as their combination, in addition to lexical ones, improved the accuracy of SVM for annotation of utterances in motivational interview transcripts with a codebook consisting of 17 classes to 71.5%, 74.2%, and 75.1%, respectively. Our results demonstrate the potential of using machine learning methods in conjunction with lexical, semantic and contextual features for automatic annotation of clinical interview transcripts with near-human accuracy. PMID:27185608

  7. Use of Substrate-Induced Gene Expression in Metagenomic Analysis of an Aromatic Hydrocarbon-Contaminated Soil

    PubMed Central

    Meier, Matthew J.; Paterson, E. Suzanne

    2015-01-01

    Metagenomics allows the study of genes related to xenobiotic degradation in a culture-independent manner, but many of these studies are limited by the lack of genomic context for metagenomic sequences. This study combined a phenotypic screen known as substrate-induced gene expression (SIGEX) with whole-metagenome shotgun sequencing. SIGEX is a high-throughput promoter-trap method that relies on transcriptional activation of a green fluorescent protein (GFP) reporter gene in response to an inducing compound and subsequent fluorescence-activated cell sorting to isolate individual inducible clones from a metagenomic DNA library. We describe a SIGEX procedure with improved library construction from fragmented metagenomic DNA and improved flow cytometry sorting procedures. We used SIGEX to interrogate an aromatic hydrocarbon (AH)-contaminated soil metagenome. The recovered clones contained sequences with various degrees of similarity to genes (or partial genes) involved in aromatic metabolism, for example, nahG (salicylate oxygenase) family genes and their respective upstream nahR regulators. To obtain a broader context for the recovered fragments, clones were mapped to contigs derived from de novo assembly of shotgun-sequenced metagenomic DNA which, in most cases, contained complete operons involved in aromatic metabolism, providing greater insight into the origin of the metagenomic fragments. A comparable set of contigs was generated using a significantly less computationally intensive procedure in which assembly of shotgun-sequenced metagenomic DNA was directed by the SIGEX-recovered sequences. This methodology may have broad applicability in identifying biologically relevant subsets of metagenomes (including both novel and known sequences) that can be targeted computationally by in silico assembly and prediction tools. PMID:26590287

  8. Use of Substrate-Induced Gene Expression in Metagenomic Analysis of an Aromatic Hydrocarbon-Contaminated Soil.

    PubMed

    Meier, Matthew J; Paterson, E Suzanne; Lambert, Iain B

    2016-02-01

    Metagenomics allows the study of genes related to xenobiotic degradation in a culture-independent manner, but many of these studies are limited by the lack of genomic context for metagenomic sequences. This study combined a phenotypic screen known as substrate-induced gene expression (SIGEX) with whole-metagenome shotgun sequencing. SIGEX is a high-throughput promoter-trap method that relies on transcriptional activation of a green fluorescent protein (GFP) reporter gene in response to an inducing compound and subsequent fluorescence-activated cell sorting to isolate individual inducible clones from a metagenomic DNA library. We describe a SIGEX procedure with improved library construction from fragmented metagenomic DNA and improved flow cytometry sorting procedures. We used SIGEX to interrogate an aromatic hydrocarbon (AH)-contaminated soil metagenome. The recovered clones contained sequences with various degrees of similarity to genes (or partial genes) involved in aromatic metabolism, for example, nahG (salicylate oxygenase) family genes and their respective upstream nahR regulators. To obtain a broader context for the recovered fragments, clones were mapped to contigs derived from de novo assembly of shotgun-sequenced metagenomic DNA which, in most cases, contained complete operons involved in aromatic metabolism, providing greater insight into the origin of the metagenomic fragments. A comparable set of contigs was generated using a significantly less computationally intensive procedure in which assembly of shotgun-sequenced metagenomic DNA was directed by the SIGEX-recovered sequences. This methodology may have broad applicability in identifying biologically relevant subsets of metagenomes (including both novel and known sequences) that can be targeted computationally by in silico assembly and prediction tools. PMID:26590287

  9. Captured metagenomics: large-scale targeting of genes based on ‘sequence capture’ reveals functional diversity in soils

    PubMed Central

    Manoharan, Lokeshwaran; Kushwaha, Sandeep K.; Hedlund, Katarina; Ahrén, Dag

    2015-01-01

    Microbial enzyme diversity is a key to understand many ecosystem processes. Whole metagenome sequencing (WMG) obtains information on functional genes, but it is costly and inefficient due to large amount of sequencing that is required. In this study, we have applied a captured metagenomics technique for functional genes in soil microorganisms, as an alternative to WMG. Large-scale targeting of functional genes, coding for enzymes related to organic matter degradation, was applied to two agricultural soil communities through captured metagenomics. Captured metagenomics uses custom-designed, hybridization-based oligonucleotide probes that enrich functional genes of interest in metagenomic libraries where only probe-bound DNA fragments are sequenced. The captured metagenomes were highly enriched with targeted genes while maintaining their target diversity and their taxonomic distribution correlated well with the traditional ribosomal sequencing. The captured metagenomes were highly enriched with genes related to organic matter degradation; at least five times more than similar, publicly available soil WMG projects. This target enrichment technique also preserves the functional representation of the soils, thereby facilitating comparative metagenomics projects. Here, we present the first study that applies the captured metagenomics approach in large scale, and this novel method allows deep investigations of central ecosystem processes by studying functional gene abundances. PMID:26490729

  10. Reconstructing the Genomic Content of Microbiome Taxa through Shotgun Metagenomic Deconvolution

    PubMed Central

    Carr, Rogan; Shen-Orr, Shai S.; Borenstein, Elhanan

    2013-01-01

    Metagenomics has transformed our understanding of the microbial world, allowing researchers to bypass the need to isolate and culture individual taxa and to directly characterize both the taxonomic and gene compositions of environmental samples. However, associating the genes found in a metagenomic sample with the specific taxa of origin remains a critical challenge. Existing binning methods, based on nucleotide composition or alignment to reference genomes allow only a coarse-grained classification and rely heavily on the availability of sequenced genomes from closely related taxa. Here, we introduce a novel computational framework, integrating variation in gene abundances across multiple samples with taxonomic abundance data to deconvolve metagenomic samples into taxa-specific gene profiles and to reconstruct the genomic content of community members. This assembly-free method is not bounded by various factors limiting previously described methods of metagenomic binning or metagenomic assembly and represents a fundamentally different approach to metagenomic-based genome reconstruction. An implementation of this framework is available at http://elbo.gs.washington.edu/software.html. We first describe the mathematical foundations of our framework and discuss considerations for implementing its various components. We demonstrate the ability of this framework to accurately deconvolve a set of metagenomic samples and to recover the gene content of individual taxa using synthetic metagenomic samples. We specifically characterize determinants of prediction accuracy and examine the impact of annotation errors on the reconstructed genomes. We finally apply metagenomic deconvolution to samples from the Human Microbiome Project, successfully reconstructing genus-level genomic content of various microbial genera, based solely on variation in gene count. These reconstructed genera are shown to correctly capture genus-specific properties. With the accumulation of metagenomic

  11. Binning of shallowly sampled metagenomic sequence fragments reveals that low abundance bacteria play important roles in sulfur cycling and degradation of complex organic polymers in an acid mine drainage community

    NASA Astrophysics Data System (ADS)

    Dick, G. J.; Andersson, A.; Banfield, J. F.

    2007-12-01

    Our understanding of environmental microbiology has been greatly enhanced by community genome sequencing of DNA recovered directly the environment. Community genomics provides insights into the diversity, community structure, metabolic function, and evolution of natural populations of uncultivated microbes, thereby revealing dynamics of how microorganisms interact with each other and their environment. Recent studies have demonstrated the potential for reconstructing near-complete genomes from natural environments while highlighting the challenges of analyzing community genomic sequence, especially from diverse environments. A major challenge of shotgun community genome sequencing is identification of DNA fragments from minor community members for which only low coverage of genomic sequence is present. We analyzed community genome sequence retrieved from biofilms in an acid mine drainage (AMD) system in the Richmond Mine at Iron Mountain, CA, with an emphasis on identification and assembly of DNA fragments from low-abundance community members. The Richmond mine hosts an extensive, relatively low diversity subterranean chemolithoautotrophic community that is sustained entirely by oxidative dissolution of pyrite. The activity of these microorganisms greatly accelerates the generation of AMD. Previous and ongoing work in our laboratory has focused on reconstrucing genomes of dominant community members, including several bacteria and archaea. We binned contigs from several samples (including one new sample and two that had been previously analyzed) by tetranucleotide frequency with clustering by Self-Organizing Maps (SOM). The binning, evaluated by comparison with information from the manually curated assembly of the dominant organisms, was found to be very effective: fragments were correctly assigned with 95% accuracy. Improperly assigned fragments often contained sequences that are either evolutionarily constrained (e.g. 16S rRNA genes) or mobile elements that are

  12. New hydrocarbon degradation pathways in the microbial metagenome from Brazilian petroleum reservoirs.

    PubMed

    Sierra-García, Isabel Natalia; Correa Alvarez, Javier; de Vasconcellos, Suzan Pantaroto; Pereira de Souza, Anete; dos Santos Neto, Eugenio Vaz; de Oliveira, Valéria Maia

    2014-01-01

    Current knowledge of the microbial diversity and metabolic pathways involved in hydrocarbon degradation in petroleum reservoirs is still limited, mostly due to the difficulty in recovering the complex community from such an extreme environment. Metagenomics is a valuable tool to investigate the genetic and functional diversity of previously uncultured microorganisms in natural environments. Using a function-driven metagenomic approach, we investigated the metabolic abilities of microbial communities in oil reservoirs. Here, we describe novel functional metabolic pathways involved in the biodegradation of aromatic compounds in a metagenomic library obtained from an oil reservoir. Although many of the deduced proteins shared homology with known enzymes of different well-described aerobic and anaerobic catabolic pathways, the metagenomic fragments did not contain the complete clusters known to be involved in hydrocarbon degradation. Instead, the metagenomic fragments comprised genes belonging to different pathways, showing novel gene arrangements. These results reinforce the potential of the metagenomic approach for the identification and elucidation of new genes and pathways in poorly studied environments and contribute to a broader perspective on the hydrocarbon degradation processes in petroleum reservoirs. PMID:24587220

  13. New Hydrocarbon Degradation Pathways in the Microbial Metagenome from Brazilian Petroleum Reservoirs

    PubMed Central

    Sierra-García, Isabel Natalia; Correa Alvarez, Javier; Pantaroto de Vasconcellos, Suzan; Pereira de Souza, Anete; dos Santos Neto, Eugenio Vaz; de Oliveira, Valéria Maia

    2014-01-01

    Current knowledge of the microbial diversity and metabolic pathways involved in hydrocarbon degradation in petroleum reservoirs is still limited, mostly due to the difficulty in recovering the complex community from such an extreme environment. Metagenomics is a valuable tool to investigate the genetic and functional diversity of previously uncultured microorganisms in natural environments. Using a function-driven metagenomic approach, we investigated the metabolic abilities of microbial communities in oil reservoirs. Here, we describe novel functional metabolic pathways involved in the biodegradation of aromatic compounds in a metagenomic library obtained from an oil reservoir. Although many of the deduced proteins shared homology with known enzymes of different well-described aerobic and anaerobic catabolic pathways, the metagenomic fragments did not contain the complete clusters known to be involved in hydrocarbon degradation. Instead, the metagenomic fragments comprised genes belonging to different pathways, showing novel gene arrangements. These results reinforce the potential of the metagenomic approach for the identification and elucidation of new genes and pathways in poorly studied environments and contribute to a broader perspective on the hydrocarbon degradation processes in petroleum reservoirs. PMID:24587220

  14. IMG/M 4 version of the integrated metagenome comparative analysis system.

    PubMed

    Markowitz, Victor M; Chen, I-Min A; Chu, Ken; Szeto, Ernest; Palaniappan, Krishna; Pillay, Manoj; Ratner, Anna; Huang, Jinghua; Pagani, Ioanna; Tringe, Susannah; Huntemann, Marcel; Billis, Konstantinos; Varghese, Neha; Tennessen, Kristin; Mavromatis, Konstantinos; Pati, Amrita; Ivanova, Natalia N; Kyrpides, Nikos C

    2014-01-01

    IMG/M (http://img.jgi.doe.gov/m) provides support for comparative analysis of microbial community aggregate genomes (metagenomes) in the context of a comprehensive set of reference genomes from all three domains of life, as well as plasmids, viruses and genome fragments. IMG/M's data content and analytical tools have expanded continuously since its first version was released in 2007. Since the last report published in the 2012 NAR Database Issue, IMG/M's database architecture, annotation and data integration pipelines and analysis tools have been extended to copewith the rapid growth in the number and size of metagenome data sets handled by the system. IMG/M data marts provide support for the analysis of publicly available genomes, expert review of metagenome annotations (IMG/M ER: http://img.jgi.doe.gov/mer) and Human Microbiome Project (HMP)-specific metagenome samples (IMG/M HMP: http://img.jgi.doe.gov/imgm_hmp). PMID:24136997

  15. Crass: identification and reconstruction of CRISPR from unassembled metagenomic data.

    PubMed

    Skennerton, Connor T; Imelfort, Michael; Tyson, Gene W

    2013-05-01

    Clustered regularly interspaced short palindromic repeats (CRISPR) constitute a bacterial and archaeal adaptive immune system that protect against bacteriophage (phage). Analysis of CRISPR loci reveals the history of phage infections and provides a direct link between phage and their hosts. All current tools for CRISPR identification have been developed to analyse completed genomes and are not well suited to the analysis of metagenomic data sets, where CRISPR loci are difficult to assemble owing to their repetitive structure and population heterogeneity. Here, we introduce a new algorithm, Crass, which is designed to identify and reconstruct CRISPR loci from raw metagenomic data without the need for assembly or prior knowledge of CRISPR in the data set. CRISPR in assembled data are often fragmented across many contigs/scaffolds and do not fully represent the population heterogeneity of CRISPR loci. Crass identified substantially more CRISPR in metagenomes previously analysed using assembly-based approaches. Using Crass, we were able to detect CRISPR that contained spacers with sequence homology to phage in the system, which would not have been identified using other approaches. The increased sensitivity, specificity and speed of Crass will facilitate comprehensive analysis of CRISPRs in metagenomic data sets, increasing our understanding of phage-host interactions and co-evolution within microbial communities. PMID:23511966

  16. Metagenomic Pyrosequencing and Microbial Identification

    PubMed Central

    Petrosino, Joseph F.; Highlander, Sarah; Luna, Ruth Ann; Gibbs, Richard A.; Versalovic, James

    2010-01-01

    Background The Human Microbiome Project has ushered in a new era for human metagenomics and high-throughput next generation sequencing strategies. Content This review will describe evolving strategies in metagenomics with a special emphasis on the core technology of DNA pyrosequencing. The challenges of microbial identification in the context of microbial populations are described. Summary Both 16S rDNA amplicon and whole genome sequencing approaches may be useful for human metagenomics, and numerous bio-informatics tools are being deployed to tackle such vast amounts of microbiological sequence diversity. Metagenomics or studies of microbial communities may ultimately contribute to a more comprehensive understanding of human health, disease susceptibilities, and the pathophysiology of infectious and immune-mediated diseases. PMID:19264858

  17. A Bioinformatician's Guide to Metagenomics

    SciTech Connect

    Kunin, Victor; Copeland, Alex; Lapidus, Alla; Mavromatis, Konstantinos; Hugenholtz, Philip

    2008-08-01

    As random shotgun metagenomic projects proliferate and become the dominant source of publicly available sequence data, procedures for best practices in their execution and analysis become increasingly important. Based on our experience at the Joint Genome Institute, we describe step-by-step the chain of decisions accompanying a metagenomic project from the viewpoint of a bioinformatician. We guide the reader through a standard workflow for a metagenomic project beginning with pre-sequencing considerations such as community composition and sequence data type that will greatly influence downstream analyses. We proceed with recommendations for sampling and data generation including sample and metadata collection, community profiling, construction of shotgun libraries and sequencing strategies. We then discuss the application of generic sequence processing steps (read preprocessing, assembly, and gene prediction and annotation) to metagenomic datasets by contrast to genome projects. Different types of data analyses particular to metagenomes are then presented including binning, dominant population analysis and gene-centric analysis. Finally data management systems and issues are presented and discussed. We hope that this review will assist bioinformaticians and biologists in making better-informed decisions on their journey during a metagenomic project.

  18. CLaMS: Classifier for Metagenomic Sequences

    Energy Science and Technology Software Center (ESTSC)

    2010-12-01

    CLaMS-"Classifer for Metagenonic Sequences" is a Java application for binning assembled metagenomes wings user-specified training sequence sets and other user-specified initial parameters. Since ClAmS analyzes and matches sequence composition-based genomic signatures, it is much faster than binning tools that rely on alignments to homologs; CLaMS can bin ~20,000 sequences in 3 minutes on a laptop with a 2.4 Ghz. Intel Core 2 Duo processor and 2 GB Ram. CLaMS is meant to be desktop applicationmore » for biologist and can be run on any machine under any operating system on which the Java Runtime Environment is enabled. CLaMS is freely available in both GVI-based and command-line based forms.« less

  19. MetaBAT: Metagenome Binning based on Abundance and Tetranucleotide frequence

    SciTech Connect

    Kang, Dongwan; Froula, Jeff; Egan, Rob; Wang, Zhong

    2014-03-21

    Grouping large fragments assembled from shotgun metagenomic sequences to deconvolute complex microbial communities, or metagenome binning, enables the study of individual organisms and their interactions. Here we developed automated metagenome binning software, called MetaBAT, which integrates empirical probabilistic distances of genome abundance and tetranucleotide frequency. On synthetic datasets MetaBAT on average achieves 98percent precision and 90percent recall at the strain level with 281 near complete unique genomes. Applying MetaBAT to a human gut microbiome data set we recovered 176 genome bins with 92percent precision and 80percent recall. Further analyses suggest MetaBAT is able to recover genome fragments missed in reference genomes up to 19percent, while 53 genome bins are novel. In summary, we believe MetaBAT is a powerful tool to facilitate comprehensive understanding of complex microbial communities.

  20. Open resource metagenomics: a model for sharing metagenomic libraries

    PubMed Central

    Neufeld, J.D.; Engel, K.; Cheng, J.; Moreno-Hagelsieb, G.; Rose, D.R.; Charles, T.C.

    2011-01-01

    Both sequence-based and activity-based exploitation of environmental DNA have provided unprecedented access to the genomic content of cultivated and uncultivated microorganisms. Although researchers deposit microbial strains in culture collections and DNA sequences in databases, activity-based metagenomic studies typically only publish sequences from the hits retrieved from specific screens. Physical metagenomic libraries, conceptually similar to entire sequence datasets, are usually not straightforward to obtain by interested parties subsequent to publication. In order to facilitate unrestricted distribution of metagenomic libraries, we propose the adoption of open resource metagenomics, in line with the trend towards open access publishing, and similar to culture- and mutant-strain collections that have been the backbone of traditional microbiology and microbial genetics. The concept of open resource metagenomics includes preparation of physical DNA libraries, preferably in versatile vectors that facilitate screening in a diversity of host organisms, and pooling of clones so that single aliquots containing complete libraries can be easily distributed upon request. Database deposition of associated metadata and sequence data for each library provides researchers with information to select the most appropriate libraries for further research projects. As a starting point, we have established the Canadian MetaMicroBiome Library (CM2BL [1]). The CM2BL is a publicly accessible collection of cosmid libraries containing environmental DNA from soils collected from across Canada, spanning multiple biomes. The libraries were constructed such that the cloned DNA can be easily transferred to Gateway® compliant vectors, facilitating functional screening in virtually any surrogate microbial host for which there are available plasmid vectors. The libraries, which we are placing in the public domain, will be distributed upon request without restriction to members of both the

  1. Challenges of the Unknown: Clinical Application of Microbial Metagenomics.

    PubMed

    Rose, Graham; Wooldridge, David J; Anscombe, Catherine; Mee, Edward T; Misra, Raju V; Gharbia, Saheer

    2015-01-01

    Availability of fast, high throughput and low cost whole genome sequencing holds great promise within public health microbiology, with applications ranging from outbreak detection and tracking transmission events to understanding the role played by microbial communities in health and disease. Within clinical metagenomics, identifying microorganisms from a complex and host enriched background remains a central computational challenge. As proof of principle, we sequenced two metagenomic samples, a known viral mixture of 25 human pathogens and an unknown complex biological model using benchtop technology. The datasets were then analysed using a bioinformatic pipeline developed around recent fast classification methods. A targeted approach was able to detect 20 of the viruses against a background of host contamination from multiple sources and bacterial contamination. An alternative untargeted identification method was highly correlated with these classifications, and over 1,600 species were identified when applied to the complex biological model, including several species captured at over 50% genome coverage. In summary, this study demonstrates the great potential of applying metagenomics within the clinical laboratory setting and that this can be achieved using infrastructure available to nondedicated sequencing centres. PMID:26451363

  2. Challenges of the Unknown: Clinical Application of Microbial Metagenomics

    PubMed Central

    Rose, Graham; Wooldridge, David J.; Anscombe, Catherine; Mee, Edward T.; Misra, Raju V.; Gharbia, Saheer

    2015-01-01

    Availability of fast, high throughput and low cost whole genome sequencing holds great promise within public health microbiology, with applications ranging from outbreak detection and tracking transmission events to understanding the role played by microbial communities in health and disease. Within clinical metagenomics, identifying microorganisms from a complex and host enriched background remains a central computational challenge. As proof of principle, we sequenced two metagenomic samples, a known viral mixture of 25 human pathogens and an unknown complex biological model using benchtop technology. The datasets were then analysed using a bioinformatic pipeline developed around recent fast classification methods. A targeted approach was able to detect 20 of the viruses against a background of host contamination from multiple sources and bacterial contamination. An alternative untargeted identification method was highly correlated with these classifications, and over 1,600 species were identified when applied to the complex biological model, including several species captured at over 50% genome coverage. In summary, this study demonstrates the great potential of applying metagenomics within the clinical laboratory setting and that this can be achieved using infrastructure available to nondedicated sequencing centres. PMID:26451363

  3. Functional metagenomics to decipher food-microbe-host crosstalk.

    PubMed

    Larraufie, Pierre; de Wouters, Tomas; Potocki-Veronese, Gabrielle; Blottière, Hervé M; Doré, Joël

    2015-02-01

    The recent developments of metagenomics permit an extremely high-resolution molecular scan of the intestinal microbiota giving new insights and opening perspectives for clinical applications. Beyond the unprecedented vision of the intestinal microbiota given by large-scale quantitative metagenomics studies, such as the EU MetaHIT project, functional metagenomics tools allow the exploration of fine interactions between food constituents, microbiota and host, leading to the identification of signals and intimate mechanisms of crosstalk, especially between bacteria and human cells. Cloning of large genome fragments, either from complex intestinal communities or from selected bacteria, allows the screening of these biological resources for bioactivity towards complex plant polymers or functional food such as prebiotics. This permitted identification of novel carbohydrate-active enzyme families involved in dietary fibre and host glycan breakdown, and highlighted unsuspected bacterial players at the top of the intestinal microbial food chain. Similarly, exposure of fractions from genomic and metagenomic clones onto human cells engineered with reporter systems to track modulation of immune response, cell proliferation or cell metabolism has allowed the identification of bioactive clones modulating key cell signalling pathways or the induction of specific genes. This opens the possibility to decipher mechanisms by which commensal bacteria or candidate probiotics can modulate the activity of cells in the intestinal epithelium or even in distal organs such as the liver, adipose tissue or the brain. Hence, in spite of our inability to culture many of the dominant microbes of the human intestine, functional metagenomics open a new window for the exploration of food-microbe-host crosstalk. PMID:25417646

  4. IDENTIFICATION OF CHICKEN-SPECIFIC FECAL MICROBIAL SEQUENCES USING A METAGENOMIC APPROACH

    EPA Science Inventory

    In this study, we applied a genome fragment enrichment (GFE) method to select for genomic regions that differ between different fecal metagenomes. Competitive DNA hybridizations were performed between chicken fecal DNA and pig fecal DNA (C-P) and between chicken fecal DNA and an ...

  5. A Novel Abundance-Based Algorithm for Binning Metagenomic Sequences Using l-Tuples

    NASA Astrophysics Data System (ADS)

    Wu, Yu-Wei; Ye, Yuzhen

    Metagenomics is the study of microbial communities sampled directly from their natural environment, without prior culturing. Among the computational tools recently developed for metagenomic sequence analysis, binning tools attempt to classify all (or most) of the sequences in a metagenomic dataset into different bins (i.e., species), based on various DNA composition patterns (e.g., the tetramer frequencies) of various genomes. Composition-based binning methods, however, cannot be used to classify very short fragments, because of the substantial variation of DNA composition patterns within a single genome. We developed a novel approach (AbundanceBin) for metagenomics binning by utilizing the different abundances of species living in the same environment. AbundanceBin is an application of the Lander-Waterman model to metagenomics, which is based on the l-tuple content of the reads. AbundanceBin achieved accurate, unsupervised, clustering of metagenomic sequences into different bins, such that the reads classified in a bin belong to species of identical or very similar abundances in the sample. In addition, AbundanceBin gave accurate estimations of species abundances, as well as their genome sizes - two important parameters for characterizing a microbial community. We also show that AbundanceBin performed well when the sequence lengths are very short (e.g. 75 bp) or have sequencing errors.

  6. Web Resources for Metagenomics Studies.

    PubMed

    Dudhagara, Pravin; Bhavsar, Sunil; Bhagat, Chintan; Ghelani, Anjana; Bhatt, Shreyas; Patel, Rajesh

    2015-10-01

    The development of next-generation sequencing (NGS) platforms spawned an enormous volume of data. This explosion in data has unearthed new scalability challenges for existing bioinformatics tools. The analysis of metagenomic sequences using bioinformatics pipelines is complicated by the substantial complexity of these data. In this article, we review several commonly-used online tools for metagenomics data analysis with respect to their quality and detail of analysis using simulated metagenomics data. There are at least a dozen such software tools presently available in the public domain. Among them, MGRAST, IMG/M, and METAVIR are the most well-known tools according to the number of citations by peer-reviewed scientific media up to mid-2015. Here, we describe 12 online tools with respect to their web link, annotation pipelines, clustering methods, online user support, and availability of data storage. We have also done the rating for each tool to screen more potential and preferential tools and evaluated five best tools using synthetic metagenome. The article comprehensively deals with the contemporary problems and the prospects of metagenomics from a bioinformatics viewpoint. PMID:26602607

  7. Web Resources for Metagenomics Studies

    PubMed Central

    Dudhagara, Pravin; Bhavsar, Sunil; Bhagat, Chintan; Ghelani, Anjana; Bhatt, Shreyas; Patel, Rajesh

    2015-01-01

    The development of next-generation sequencing (NGS) platforms spawned an enormous volume of data. This explosion in data has unearthed new scalability challenges for existing bioinformatics tools. The analysis of metagenomic sequences using bioinformatics pipelines is complicated by the substantial complexity of these data. In this article, we review several commonly-used online tools for metagenomics data analysis with respect to their quality and detail of analysis using simulated metagenomics data. There are at least a dozen such software tools presently available in the public domain. Among them, MGRAST, IMG/M, and METAVIR are the most well-known tools according to the number of citations by peer-reviewed scientific media up to mid-2015. Here, we describe 12 online tools with respect to their web link, annotation pipelines, clustering methods, online user support, and availability of data storage. We have also done the rating for each tool to screen more potential and preferential tools and evaluated five best tools using synthetic metagenome. The article comprehensively deals with the contemporary problems and the prospects of metagenomics from a bioinformatics viewpoint. PMID:26602607

  8. riboFrame: An Improved Method for Microbial Taxonomy Profiling from Non-Targeted Metagenomics

    PubMed Central

    Ramazzotti, Matteo; Donati, Claudio; Cavalieri, Duccio

    2015-01-01

    Non-targeted metagenomics offers the unprecedented possibility of simultaneously investigate the microbial profile and the genetic capabilities of a sample by a direct analysis of its entire DNA content. The assessment of the microbial taxonomic composition is frequently obtained by mapping reads to genomic databases that, although growing, are still limited and biased. Here we present riboFrame, a novel procedure for microbial profiling based on the identification and classification of 16S rDNA sequences in non-targeted metagenomics datasets. Reads overlapping the 16S rDNA genes are identified using Hidden Markov Models and a taxonomic assignment is obtained by naïve Bayesian classification. All reads identified as ribosomal are coherently positioned in the 16S rDNA gene, allowing the use of the topology of the gene (i.e., the secondary structure and the location of variable regions) to guide the abundance analysis. We tested and verified the effectiveness of our method on simulated ribosomal data, on simulated metagenomes and on a real dataset. riboFrame exploits the taxonomic potentialities of the 16S rDNA gene in the context of non-targeted metagenomics, giving an accurate perspective on the microbial profile in metagenomic samples. PMID:26635865

  9. VIROME: a standard operating procedure for analysis of viral metagenome sequences.

    PubMed

    Wommack, K Eric; Bhavsar, Jaysheel; Polson, Shawn W; Chen, Jing; Dumas, Michael; Srinivasiah, Sharath; Furman, Megan; Jamindar, Sanchita; Nasko, Daniel J

    2012-07-30

    One consistent finding among studies using shotgun metagenomics to analyze whole viral communities is that most viral sequences show no significant homology to known sequences. Thus, bioinformatic analyses based on sequence collections such as GenBank nr, which are largely comprised of sequences from known organisms, tend to ignore a majority of sequences within most shotgun viral metagenome libraries. Here we describe a bioinformatic pipeline, the Viral Informatics Resource for Metagenome Exploration (VIROME), that emphasizes the classification of viral metagenome sequences (predicted open-reading frames) based on homology search results against both known and environmental sequences. Functional and taxonomic information is derived from five annotated sequence databases which are linked to the UniRef 100 database. Environmental classifications are obtained from hits against a custom database, MetaGenomes On-Line, which contains 49 million predicted environmental peptides. Each predicted viral metagenomic ORF run through the VIROME pipeline is placed into one of seven ORF classes, thus, every sequence receives a meaningful annotation. Additionally, the pipeline includes quality control measures to remove contaminating and poor quality sequence and assesses the potential amount of cellular DNA contamination in a viral metagenome library by screening for rRNA genes. Access to the VIROME pipeline and analysis results are provided through a web-application interface that is dynamically linked to a relational back-end database. The VIROME web-application interface is designed to allow users flexibility in retrieving sequences (reads, ORFs, predicted peptides) and search results for focused secondary analyses. PMID:23407591

  10. Metagenomics and novel gene discovery

    PubMed Central

    Culligan, Eamonn P; Sleator, Roy D; Marchesi, Julian R; Hill, Colin

    2014-01-01

    Metagenomics provides a means of assessing the total genetic pool of all the microbes in a particular environment, in a culture-independent manner. It has revealed unprecedented diversity in microbial community composition, which is further reflected in the encoded functional diversity of the genomes, a large proportion of which consists of novel genes. Herein, we review both sequence-based and functional metagenomic methods to uncover novel genes and outline some of the associated problems of each type of approach, as well as potential solutions. Furthermore, we discuss the potential for metagenomic biotherapeutic discovery, with a particular focus on the human gut microbiome and finally, we outline how the discovery of novel genes may be used to create bioengineered probiotics. PMID:24317337

  11. Metagenomic biomarker discovery and explanation

    PubMed Central

    2011-01-01

    This study describes and validates a new method for metagenomic biomarker discovery by way of class comparison, tests of biological consistency and effect size estimation. This addresses the challenge of finding organisms, genes, or pathways that consistently explain the differences between two or more microbial communities, which is a central problem to the study of metagenomics. We extensively validate our method on several microbiomes and a convenient online interface for the method is provided at http://huttenhower.sph.harvard.edu/lefse/. PMID:21702898

  12. Evaluation of a fosmid-clone-based microarray for comparative analysis of swine fecal metagenomes.

    PubMed

    Park, Soo-Je; Kim, Dong-Hwan; Jung, Man-Young; Kim, So-Jeong; Kim, Hongik; Kim, Yang-Hoon; Chae, Jong-Chan; Rhee, Sung-Keun

    2012-08-01

    Glass slide arrayed with fosmid clone DNAs generated from swine feces as probes were fabricated and used as a metagenome microarray (MGA). MGA appeared to be specific to their corresponding target genomic fragments. The detection limit was 10 ng of genomic DNA (ca. 10(6) bacterial cells) in the presence of 1000 ng of background DNA. Linear relationships between the signal intensity and the target DNA (20-100 ng) were observed (r ( 2 )=0.98). Application of MGA to the comparison of swine fecal metagenomes suggested that the microbial community composition of swine intestine could be dependent on the health state of swine. PMID:22923120

  13. Gene Capture Coupled to High-Throughput Sequencing as a Strategy for Targeted Metagenome Exploration

    PubMed Central

    Denonfoux, Jérémie; Parisot, Nicolas; Dugat-Bony, Eric; Biderre-Petit, Corinne; Boucher, Delphine; Morgavi, Diego P.; Le Paslier, Denis; Peyretaillade, Eric; Peyret, Pierre

    2013-01-01

    Next-generation sequencing (NGS) allows faster acquisition of metagenomic data, but complete exploration of complex ecosystems is hindered by the extraordinary diversity of microorganisms. To reduce the environmental complexity, we created an innovative solution hybrid selection (SHS) method that is combined with NGS to characterize large DNA fragments harbouring biomarkers of interest. The quality of enrichment was evaluated after fragments containing the methyl coenzyme M reductase subunit A gene (mcrA), the biomarker of methanogenesis, were captured from a Methanosarcina strain and a metagenomic sample from a meromictic lake. The methanogen diversity was compared with direct metagenome and mcrA-based amplicon pyrosequencing strategies. The SHS approach resulted in the capture of DNA fragments up to 2.5 kb with an enrichment efficiency between 41 and 100%, depending on the sample complexity. Compared with direct metagenome and amplicons sequencing, SHS detected broader mcrA diversity, and it allowed efficient sampling of the rare biosphere and unknown sequences. In contrast to amplicon-based strategies, SHS is less biased and GC independent, and it recovered complete biomarker sequences in addition to conserved regions. Because this method can also isolate the regions flanking the target sequences, it could facilitate operon reconstructions. PMID:23364577

  14. Metagenomic Analysis of the Pygmy Loris Fecal Microbiome Reveals Unique Functional Capacity Related to Metabolism of Aromatic Compounds

    PubMed Central

    Xu, Bo; Xu, Weijiang; Yang, Fuya; Li, Junjun; Yang, Yunjuan; Tang, Xianghua; Mu, Yuelin; Zhou, Junpei; Huang, Zunxi

    2013-01-01

    The animal gastrointestinal tract contains a complex community of microbes, whose composition ultimately reflects the co-evolution of microorganisms with their animal host. An analysis of 78,619 pyrosequencing reads generated from pygmy loris fecal DNA extracts was performed to help better understand the microbial diversity and functional capacity of the pygmy loris gut microbiome. The taxonomic analysis of the metagenomic reads indicated that pygmy loris fecal microbiomes were dominated by Bacteroidetes and Proteobacteria phyla. The hierarchical clustering of several gastrointestinal metagenomes demonstrated the similarities of the microbial community structures of pygmy loris and mouse gut systems despite their differences in functional capacity. The comparative analysis of function classification revealed that the metagenome of the pygmy loris was characterized by an overrepresentation of those sequences involved in aromatic compound metabolism compared with humans and other animals. The key enzymes related to the benzoate degradation pathway were identified based on the Kyoto Encyclopedia of Genes and Genomes pathway assignment. These results would contribute to the limited body of primate metagenome studies and provide a framework for comparative metagenomic analysis between human and non-human primates, as well as a comparative understanding of the evolution of humans and their microbiome. However, future studies on the metagenome sequencing of pygmy loris and other prosimians regarding the effects of age, genetics, and environment on the composition and activity of the metagenomes are required. PMID:23457582

  15. Multivariate Analysis of Functional Metagenomes

    PubMed Central

    Dinsdale, Elizabeth A.; Edwards, Robert A.; Bailey, Barbara A.; Tuba, Imre; Akhter, Sajia; McNair, Katelyn; Schmieder, Robert; Apkarian, Naneh; Creek, Michelle; Guan, Eric; Hernandez, Mayra; Isaacs, Katherine; Peterson, Chris; Regh, Todd; Ponomarenko, Vadim

    2013-01-01

    Metagenomics is a primary tool for the description of microbial and viral communities. The sheer magnitude of the data generated in each metagenome makes identifying key differences in the function and taxonomy between communities difficult to elucidate. Here we discuss the application of seven different data mining and statistical analyses by comparing and contrasting the metabolic functions of 212 microbial metagenomes within and between 10 environments. Not all approaches are appropriate for all questions, and researchers should decide which approach addresses their questions. This work demonstrated the use of each approach: for example, random forests provided a robust and enlightening description of both the clustering of metagenomes and the metabolic processes that were important in separating microbial communities from different environments. All analyses identified that the presence of phage genes within the microbial community was a predictor of whether the microbial community was host-associated or free-living. Several analyses identified the subtle differences that occur with environments, such as those seen in different regions of the marine environment. PMID:23579547

  16. Estimating richness from phage metagenomes

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Bacteriophages are important drivers of ecosystem functions, yet little is known about the vast majority of phages. Phage metagenomics, or the study of the collective genome of an assemblage of phages, enables the investigation of broad ecological questions in phage communities. One ecological cha...

  17. HPMCD: the database of human microbial communities from metagenomic datasets and microbial reference genomes.

    PubMed

    Forster, Samuel C; Browne, Hilary P; Kumar, Nitin; Hunt, Martin; Denise, Hubert; Mitchell, Alex; Finn, Robert D; Lawley, Trevor D

    2016-01-01

    The Human Pan-Microbe Communities (HPMC) database (http://www.hpmcd.org/) provides a manually curated, searchable, metagenomic resource to facilitate investigation of human gastrointestinal microbiota. Over the past decade, the application of metagenome sequencing to elucidate the microbial composition and functional capacity present in the human microbiome has revolutionized many concepts in our basic biology. When sufficient high quality reference genomes are available, whole genome metagenomic sequencing can provide direct biological insights and high-resolution classification. The HPMC database provides species level, standardized phylogenetic classification of over 1800 human gastrointestinal metagenomic samples. This is achieved by combining a manually curated list of bacterial genomes from human faecal samples with over 21000 additional reference genomes representing bacteria, viruses, archaea and fungi with manually curated species classification and enhanced sample metadata annotation. A user-friendly, web-based interface provides the ability to search for (i) microbial groups associated with health or disease state, (ii) health or disease states and community structure associated with a microbial group, (iii) the enrichment of a microbial gene or sequence and (iv) enrichment of a functional annotation. The HPMC database enables detailed analysis of human microbial communities and supports research from basic microbiology and immunology to therapeutic development in human health and disease. PMID:26578596

  18. Benchmarking of gene prediction programs for metagenomic data.

    PubMed

    Yok, Non; Rosen, Gail

    2010-01-01

    This manuscript presents the most rigorous benchmarking of gene annotation algorithms for metagenomic datasets to date. We compare three different programs: GeneMark, MetaGeneAnnotator (MGA) and Orphelia. The comparisons are based on their performances over simulated fragments from one hundred species of diverse lineages. We defined four different types of fragments; two types come from the inter- and intra-coding regions and the other types are from the gene edges. Hoff et al. used only 12 species in their comparison; therefore, their sample is too small to represent an environmental sample. Also, no predecessors has separately examined fragments that contain gene edges as opposed to intra-coding regions. General observations in our results are that performances of all these programs improve as we increase the length of the fragment. On the other hand, intra-coding fragments of our data show low annotation error in all of the programs if compared to the gene edge fragments. Overall, we found an upper-bound performance by combining all the methods. PMID:21097156

  19. The metagenomics RAST server - a public resource for the automatic phylogenetic and functional analysis of metagenomes.

    SciTech Connect

    Meyer, F.; Paarmann, D.; D'Souza, M.; Olson, R.; Glass, E. M.; Kubal, M.; Paczian, T.; Stevens, R.; Wilke, A.; Wilkening, J.; Edwards, R. A.; Rodriguez, A.; Mathematics and Computer Science; Univ. of Chicago; San Diego State Univ.

    2008-09-19

    Random community genomes (metagenomes) are now commonly used to study microbes in different environments. Over the past few years, the major challenge associated with metagenomics shifted from generating to analyzing sequences. High-throughput, low-cost next-generation sequencing has provided access to metagenomics to a wide range of researchers. A high-throughput pipeline has been constructed to provide high-performance computing to all researchers interested in using metagenomics. The pipeline produces automated functional assignments of sequences in the metagenome by comparing both protein and nucleotide databases. phylogenetic and functional summaries of the metagenomes are generated, and tools for comparative metagenomics are incorporated into the standard views. user access is controlled to ensure data privacy, but the collaborative environment underpinning the service provides a framework for sharing databasets between multiple users. In the metagenomics RAST, all users retain full control of their data, and everything is available for download in a variety of formats. The open-source metagenomics RAST service provides a new paradigm for the annotation and analysis of metagenomes. With built-in support for multiple data sources and a back end that houses abstract data types, the metagenomics RAST is stable, extensible, and freely available to all researchers. This service has removed one of the primary bottlenecks in metagenome sequence analysis--the available of high-performance computing for annotating the data.

  20. Metagenome Assembly at the DOE JGI (Metagenomics Informatics Challenges Workshop: 10K Genomes at a Time)

    ScienceCinema

    Chain, Patrick [DOE JGI at LANL

    2013-01-22

    Patrick Chain of DOE JGI at LANL, Co-Chair of the Metagenome-specific Assembly session, on "Metagenome Assembly at the DOE JGI" at the Metagenomics Informatics Challenges Workshop held at the DOE JGI on October 12-13, 2011.

  1. Metagenome Assembly at the DOE JGI (Metagenomics Informatics Challenges Workshop: 10K Genomes at a Time)

    SciTech Connect

    Chain, Patrick

    2011-10-13

    Patrick Chain of DOE JGI at LANL, Co-Chair of the Metagenome-specific Assembly session, on "Metagenome Assembly at the DOE JGI" at the Metagenomics Informatics Challenges Workshop held at the DOE JGI on October 12-13, 2011.

  2. Metagenomes from Argonne's MG-RAST Metagenomics Analysis Server

    DOE Data Explorer

    MG-RAST has a large number of datasets that researchers have deposited for public use. As of July, 2014, the number of metagenomes represented by MG-RAST numbered more than 18,500, and the number of available sequences was more than 75 million! The public can browse the collection several different ways, and researchers can login to deposit new data. Researchers have the choice of keeping a dataset private so that it is viewable only by them when logged in, or they can choose to make a dataset public at any time with a simple click of a link. MG-RAST was launched in 2007 by the Mathematics and Computer Science Division at Argonne National Laboratory (ANL). It is part of the toolkit available to the Terragenomics project, which seeks to do a comprehensive metagenomics study of U.S. soil. The Terragenomics project page is located at http://www.mcs.anl.gov/research/projects/terragenomics/.

  3. Taxonomic and functional assignment of cloned sequences from high Andean forest soil metagenome.

    PubMed

    Montaña, José Salvador; Jiménez, Diego Javier; Hernández, Mónica; Angel, Tatiana; Baena, Sandra

    2012-02-01

    Total metagenomic DNA was isolated from high Andean forest soil and subjected to taxonomical and functional composition analyses by means of clone library generation and sequencing. The obtained yield of 1.7 μg of DNA/g of soil was used to construct a metagenomic library of approximately 20,000 clones (in the plasmid p-Bluescript II SK+) with an average insert size of 4 Kb, covering 80 Mb of the total metagenomic DNA. Metagenomic sequences near the plasmid cloning site were sequenced and them trimmed and assembled, obtaining 299 reads and 31 contigs (0.3 Mb). Taxonomic assignment of total sequences was performed by BLASTX, resulting in 68.8, 44.8 and 24.5% classification into taxonomic groups using the metagenomic RAST server v2.0, WebCARMA v1.0 online system and MetaGenome Analyzer v3.8 software, respectively. Most clone sequences were classified as Bacteria belonging to phlya Actinobacteria, Proteobacteria and Acidobacteria. Among the most represented orders were Actinomycetales (34% average), Rhizobiales, Burkholderiales and Myxococcales and with a greater number of sequences in the genus Mycobacterium (7% average), Frankia, Streptomyces and Bradyrhizobium. The vast majority of sequences were associated with the metabolism of carbohydrates, proteins, lipids and catalytic functions, such as phosphatases, glycosyltransferases, dehydrogenases, methyltransferases, dehydratases and epoxide hydrolases. In this study we compared different methods of taxonomic and functional assignment of metagenomic clone sequences to evaluate microbial diversity in an unexplored soil ecosystem, searching for putative enzymes of biotechnological interest and generating important information for further functional screening of clone libraries. PMID:21792685

  4. Metagenomics of the Svalbard reindeer rumen microbiome reveals abundance of polysaccharide utilization loci.

    PubMed

    Pope, Phillip B; Mackenzie, Alasdair K; Gregor, Ivan; Smith, Wendy; Sundset, Monica A; McHardy, Alice C; Morrison, Mark; Eijsink, Vincent G H

    2012-01-01

    Lignocellulosic biomass remains a largely untapped source of renewable energy predominantly due to its recalcitrance and an incomplete understanding of how this is overcome in nature. We present here a compositional and comparative analysis of metagenomic data pertaining to a natural biomass-converting ecosystem adapted to austere arctic nutritional conditions, namely the rumen microbiome of Svalbard reindeer (Rangifer tarandus platyrhynchus). Community analysis showed that deeply-branched cellulolytic lineages affiliated to the Bacteroidetes and Firmicutes are dominant, whilst sequence binning methods facilitated the assemblage of metagenomic sequence for a dominant and novel Bacteroidales clade (SRM-1). Analysis of unassembled metagenomic sequence as well as metabolic reconstruction of SRM-1 revealed the presence of multiple polysaccharide utilization loci-like systems (PULs) as well as members of more than 20 glycoside hydrolase and other carbohydrate-active enzyme families targeting various polysaccharides including cellulose, xylan and pectin. Functional screening of cloned metagenome fragments revealed high cellulolytic activity and an abundance of PULs that are rich in endoglucanases (GH5) but devoid of other common enzymes thought to be involved in cellulose degradation. Combining these results with known and partly re-evaluated metagenomic data strongly indicates that much like the human distal gut, the digestive system of herbivores harbours high numbers of deeply branched and as-yet uncultured members of the Bacteroidetes that depend on PUL-like systems for plant biomass degradation. PMID:22701672

  5. Metagenomics of the Svalbard Reindeer Rumen Microbiome Reveals Abundance of Polysaccharide Utilization Loci

    PubMed Central

    Pope, Phillip B.; Mackenzie, Alasdair K.; Gregor, Ivan; Smith, Wendy; Sundset, Monica A.; McHardy, Alice C.; Morrison, Mark; Eijsink, Vincent G.H.

    2012-01-01

    Lignocellulosic biomass remains a largely untapped source of renewable energy predominantly due to its recalcitrance and an incomplete understanding of how this is overcome in nature. We present here a compositional and comparative analysis of metagenomic data pertaining to a natural biomass-converting ecosystem adapted to austere arctic nutritional conditions, namely the rumen microbiome of Svalbard reindeer (Rangifer tarandus platyrhynchus). Community analysis showed that deeply-branched cellulolytic lineages affiliated to the Bacteroidetes and Firmicutes are dominant, whilst sequence binning methods facilitated the assemblage of metagenomic sequence for a dominant and novel Bacteroidales clade (SRM-1). Analysis of unassembled metagenomic sequence as well as metabolic reconstruction of SRM-1 revealed the presence of multiple polysaccharide utilization loci-like systems (PULs) as well as members of more than 20 glycoside hydrolase and other carbohydrate-active enzyme families targeting various polysaccharides including cellulose, xylan and pectin. Functional screening of cloned metagenome fragments revealed high cellulolytic activity and an abundance of PULs that are rich in endoglucanases (GH5) but devoid of other common enzymes thought to be involved in cellulose degradation. Combining these results with known and partly re-evaluated metagenomic data strongly indicates that much like the human distal gut, the digestive system of herbivores harbours high numbers of deeply branched and as-yet uncultured members of the Bacteroidetes that depend on PUL-like systems for plant biomass degradation. PMID:22701672

  6. Functional metagenomics of extreme environments.

    PubMed

    Mirete, Salvador; Morgante, Verónica; González-Pastor, José Eduardo

    2016-04-01

    The bioprospecting of enzymes that operate under extreme conditions is of particular interest for many biotechnological and industrial processes. Nevertheless, there is a considerable limitation to retrieve novel enzymes as only a small fraction of microorganisms derived from extreme environments can be cultured under standard laboratory conditions. Functional metagenomics has the advantage of not requiring the cultivation of microorganisms or previous sequence information to known genes, thus representing a valuable approach for mining enzymes with new features. In this review, we summarize studies showing how functional metagenomics was employed to retrieve genes encoding for proteins involved not only in molecular adaptation and resistance to extreme environmental conditions but also in other enzymatic activities of biotechnological interest. PMID:26901403

  7. Resolving the Complexity of Human Skin Metagenomes Using Single-Molecule Sequencing

    PubMed Central

    Tsai, Yu-Chih; Deming, Clayton; Segre, Julia A.; Kong, Heidi H.; Korlach, Jonas

    2016-01-01

    ABSTRACT Deep metagenomic shotgun sequencing has emerged as a powerful tool to interrogate composition and function of complex microbial communities. Computational approaches to assemble genome fragments have been demonstrated to be an effective tool for de novo reconstruction of genomes from these communities. However, the resultant “genomes” are typically fragmented and incomplete due to the limited ability of short-read sequence data to assemble complex or low-coverage regions. Here, we use single-molecule, real-time (SMRT) sequencing to reconstruct a high-quality, closed genome of a previously uncharacterized Corynebacterium simulans and its companion bacteriophage from a skin metagenomic sample. Considerable improvement in assembly quality occurs in hybrid approaches incorporating short-read data, with even relatively small amounts of long-read data being sufficient to improve metagenome reconstruction. Using short-read data to evaluate strain variation of this C. simulans in its skin community at single-nucleotide resolution, we observed a dominant C. simulans strain with moderate allelic heterozygosity throughout the population. We demonstrate the utility of SMRT sequencing and hybrid approaches in metagenome quantitation, reconstruction, and annotation. PMID:26861018

  8. Integrative workflows for metagenomic analysis

    PubMed Central

    Ladoukakis, Efthymios; Kolisis, Fragiskos N.; Chatziioannou, Aristotelis A.

    2014-01-01

    The rapid evolution of all sequencing technologies, described by the term Next Generation Sequencing (NGS), have revolutionized metagenomic analysis. They constitute a combination of high-throughput analytical protocols, coupled to delicate measuring techniques, in order to potentially discover, properly assemble and map allelic sequences to the correct genomes, achieving particularly high yields for only a fraction of the cost of traditional processes (i.e., Sanger). From a bioinformatic perspective, this boils down to many GB of data being generated from each single sequencing experiment, rendering the management or even the storage, critical bottlenecks with respect to the overall analytical endeavor. The enormous complexity is even more aggravated by the versatility of the processing steps available, represented by the numerous bioinformatic tools that are essential, for each analytical task, in order to fully unveil the genetic content of a metagenomic dataset. These disparate tasks range from simple, nonetheless non-trivial, quality control of raw data to exceptionally complex protein annotation procedures, requesting a high level of expertise for their proper application or the neat implementation of the whole workflow. Furthermore, a bioinformatic analysis of such scale, requires grand computational resources, imposing as the sole realistic solution, the utilization of cloud computing infrastructures. In this review article we discuss different, integrative, bioinformatic solutions available, which address the aforementioned issues, by performing a critical assessment of the available automated pipelines for data management, quality control, and annotation of metagenomic data, embracing various, major sequencing technologies and applications. PMID:25478562

  9. Exploring neighborhoods in the metagenome universe.

    PubMed

    Aßhauer, Kathrin P; Klingenberg, Heiner; Lingner, Thomas; Meinicke, Peter

    2014-01-01

    The variety of metagenomes in current databases provides a rapidly growing source of information for comparative studies. However, the quantity and quality of supplementary metadata is still lagging behind. It is therefore important to be able to identify related metagenomes by means of the available sequence data alone. We have studied efficient sequence-based methods for large-scale identification of similar metagenomes within a database retrieval context. In a broad comparison of different profiling methods we found that vector-based distance measures are well-suitable for the detection of metagenomic neighbors. Our evaluation on more than 1700 publicly available metagenomes indicates that for a query metagenome from a particular habitat on average nine out of ten nearest neighbors represent the same habitat category independent of the utilized profiling method or distance measure. While for well-defined labels a neighborhood accuracy of 100% can be achieved, in general the neighbor detection is severely affected by a natural overlap of manually annotated categories. In addition, we present results of a novel visualization method that is able to reflect the similarity of metagenomes in a 2D scatter plot. The visualization method shows a similarly high accuracy in the reduced space as compared with the high-dimensional profile space. Our study suggests that for inspection of metagenome neighborhoods the profiling methods and distance measures can be chosen to provide a convenient interpretation of results in terms of the underlying features. Furthermore, supplementary metadata of metagenome samples in the future needs to comply with readily available ontologies for fine-grained and standardized annotation. To make profile-based k-nearest-neighbor search and the 2D-visualization of the metagenome universe available to the research community, we included the proposed methods in our CoMet-Universe server for comparative metagenome analysis. PMID:25026170

  10. Accurate binning of metagenomic contigs via automated clustering sequences using information of genomic signatures and marker genes

    PubMed Central

    Lin, Hsin-Hung; Liao, Yu-Chieh

    2016-01-01

    Metagenomics, the application of shotgun sequencing, facilitates the reconstruction of the genomes of individual species from natural environments. A major challenge in the genome recovery domain is to agglomerate or ‘bin’ sequences assembled from metagenomic reads into individual groups. Metagenomic binning without consideration of reference sequences enables the comprehensive discovery of new microbial organisms and aids in the microbial genome reconstruction process. Here we present MyCC, an automated binning tool that combines genomic signatures, marker genes and optional contig coverages within one or multiple samples, in order to visualize the metagenomes and to identify the reconstructed genomic fragments. We demonstrate the superior performance of MyCC compared to other binning tools including CONCOCT, GroopM, MaxBin and MetaBAT on both synthetic and real human gut communities with a small sample size (one to 11 samples), as well as on a large metagenome dataset (over 250 samples). Moreover, we demonstrate the visualization of metagenomes in MyCC to aid in the reconstruction of genomes from distinct bins. MyCC is freely available at http://sourceforge.net/projects/sb2nhri/files/MyCC/. PMID:27067514

  11. Accurate binning of metagenomic contigs via automated clustering sequences using information of genomic signatures and marker genes.

    PubMed

    Lin, Hsin-Hung; Liao, Yu-Chieh

    2016-01-01

    Metagenomics, the application of shotgun sequencing, facilitates the reconstruction of the genomes of individual species from natural environments. A major challenge in the genome recovery domain is to agglomerate or 'bin' sequences assembled from metagenomic reads into individual groups. Metagenomic binning without consideration of reference sequences enables the comprehensive discovery of new microbial organisms and aids in the microbial genome reconstruction process. Here we present MyCC, an automated binning tool that combines genomic signatures, marker genes and optional contig coverages within one or multiple samples, in order to visualize the metagenomes and to identify the reconstructed genomic fragments. We demonstrate the superior performance of MyCC compared to other binning tools including CONCOCT, GroopM, MaxBin and MetaBAT on both synthetic and real human gut communities with a small sample size (one to 11 samples), as well as on a large metagenome dataset (over 250 samples). Moreover, we demonstrate the visualization of metagenomes in MyCC to aid in the reconstruction of genomes from distinct bins. MyCC is freely available at http://sourceforge.net/projects/sb2nhri/files/MyCC/. PMID:27067514

  12. Metagenomes from the saline desert of kutch.

    PubMed

    Pandit, A S; Joshi, M N; Bhargava, P; Ayachit, G N; Shaikh, I M; Saiyed, Z M; Saxena, A K; Bagatharia, S B

    2014-01-01

    We provide the first report on the metagenomic approach for unveiling the microbial diversity in the saline desert of Kutch. High-throughput metagenomic sequencing of environmental DNA isolated from soil collected from seven locations in Kutch was performed on an Ion Torrent platform. PMID:24831151

  13. Metagenomes from the Saline Desert of Kutch

    PubMed Central

    Pandit, A. S.; Joshi, M. N.; Bhargava, P.; Ayachit, G. N.; Shaikh, I. M.; Saiyed, Z. M.; Saxena, A. K.

    2014-01-01

    We provide the first report on the metagenomic approach for unveiling the microbial diversity in the saline desert of Kutch. High-throughput metagenomic sequencing of environmental DNA isolated from soil collected from seven locations in Kutch was performed on an Ion Torrent platform. PMID:24831151

  14. Current and future resources for functional metagenomics

    PubMed Central

    Lam, Kathy N.; Cheng, Jiujun; Engel, Katja; Neufeld, Josh D.; Charles, Trevor C.

    2015-01-01

    Functional metagenomics is a powerful experimental approach for studying gene function, starting from the extracted DNA of mixed microbial populations. A functional approach relies on the construction and screening of metagenomic libraries—physical libraries that contain DNA cloned from environmental metagenomes. The information obtained from functional metagenomics can help in future annotations of gene function and serve as a complement to sequence-based metagenomics. In this Perspective, we begin by summarizing the technical challenges of constructing metagenomic libraries and emphasize their value as resources. We then discuss libraries constructed using the popular cloning vector, pCC1FOS, and highlight the strengths and shortcomings of this system, alongside possible strategies to maximize existing pCC1FOS-based libraries by screening in diverse hosts. Finally, we discuss the known bias of libraries constructed from human gut and marine water samples, present results that suggest bias may also occur for soil libraries, and consider factors that bias metagenomic libraries in general. We anticipate that discussion of current resources and limitations will advance tools and technologies for functional metagenomics research. PMID:26579102

  15. Metagenomics using next-generation sequencing.

    PubMed

    Bragg, Lauren; Tyson, Gene W

    2014-01-01

    Traditionally, microbial genome sequencing has been restricted to the small number of species that can be grown in pure culture. The progressive development of culture-independent methods over the last 15 years now allows researchers to sequence microbial communities directly from environmental samples. This approach is commonly referred to as "metagenomics" or "community genomics". However, the term metagenomics is applied liberally in the literature to describe any culture-independent analysis of microbial communities. Here, we define metagenomics as shotgun ("random") sequencing of the genomic DNA of a sample taken directly from the environment. The metagenome can be thought of as a sampling of the collective genome of the microbial community. We outline the considerations and analyses that should be undertaken to ensure the success of a metagenomic sequencing project, including the choice of sequencing platform and methods for assembly, binning, annotation, and comparative analysis. PMID:24515370

  16. Assembling The Marine Metagenome, One Cell At A Time

    SciTech Connect

    Xie, Gang; Han, Shunsheng; Kiss, Hajnalka; Saw, Jimmy; Senin, Pavel; Woyke, Tanja; Copeland, Alex; Gonzalez, Jose; Chatterji, Sourav; Cheng, Jan - Fang; Eisen, Jonathan A; Sieracki, Michael E; Stepanauskas, Ramunas

    2008-01-01

    The difficulty associated with the cultivation of most microorganisms and the complexity of natural microbial assemblages, such as marine plankton or human microbiome, hinder genome reconstruction of representative taxa using cultivation or metagenomic approaches. Here we used an alternative, single cell sequencing approach to obtain high-quality genome assemblies of two uncultured, numerically significant marine microorganisms. We employed fluorescence-activated cell sorting and multiple displacement amplification to obtain hundreds of micrograms of genomic DNA from individual, uncultured cells of two marine flavobacteria from the Gulf of Maine that were phylogenetically distant from existing cultured strains. Shotgun sequencing and genome finishing yielded 1.9 Mbp in 17 contigs and 1.5 Mbp in 21 contigs for the two flavobacteria, with estimated genome recoveries of about 91% and 78%, respectively. Only 0.24% of the assembling sequences were contaminants and were removed from further analysis using rigorous quality control. In contrast to all cultured strains of marine flavobacteria, the two single cell genomes were excellent Global Ocean Sampling (GOS) metagenome fragment recruiters, demonstrating their numerical significance in the ocean. The geographic distribution of GOS recruits along the Northwest Atlantic coast coincided with ocean surface currents. Metabolic reconstruction indicated diverse potential energy sources, including biopolymer degradation, proteorhodopsin photometabolism, and hydrogen oxidation. Compared to cultured relatives, the two uncultured flavobacteria have small genome sizes, few non-coding nucleotides, and few paralogous genes, suggesting adaptations to narrow ecological niches. These features may have contributed to the abundance of the two taxa in specific regions of the ocean, and may have hindered their cultivation. We demonstrate the power of single cell DNA sequencing to generate reference genomes of uncultured taxa from a complex

  17. Assembling the Marine Metagenome, One Cell at a Time

    SciTech Connect

    Woyke, Tanja; Xie, Gary; Copeland, Alex; Gonzalez, Jose M.; Han, Cliff; Kiss, Hajnalka; Saw, Jimmy H.; Senin, Pavel; Yang, Chi; Chatterji, Sourav; Cheng, Jan-Fang; Eisen, Jonathan A.; Sieracki, Michael E.; Stepanauskas, Ramunas

    2010-06-24

    The difficulty associated with the cultivation of most microorganisms and the complexity of natural microbial assemblages, such as marine plankton or human microbiome, hinder genome reconstruction of representative taxa using cultivation or metagenomic approaches. Here we used an alternative, single cell sequencing approach to obtain high-quality genome assemblies of two uncultured, numerically significant marine microorganisms. We employed fluorescence-activated cell sorting and multiple displacement amplification to obtain hundreds of micrograms of genomic DNA from individual, uncultured cells of two marine flavobacteria from the Gulf of Maine that were phylogenetically distant from existing cultured strains. Shotgun sequencing and genome finishing yielded 1.9 Mbp in 17 contigs and 1.5 Mbp in 21 contigs for the two flavobacteria, with estimated genome recoveries of about 91percent and 78percent, respectively. Only 0.24percent of the assembling sequences were contaminants and were removed from further analysis using rigorous quality control. In contrast to all cultured strains of marine flavobacteria, the two single cell genomes were excellent Global Ocean Sampling (GOS) metagenome fragment recruiters, demonstrating their numerical significance in the ocean. The geographic distribution of GOS recruits along the Northwest Atlantic coast coincided with ocean surface currents. Metabolic reconstruction indicated diverse potential energy sources, including biopolymer degradation, proteorhodopsin photometabolism, and hydrogen oxidation. Compared to cultured relatives, the two uncultured flavobacteria have small genome sizes, few non-coding nucleotides, and few paralogous genes, suggesting adaptations to narrow ecological niches. These features may have contributed to the abundance of the two taxa in specific regions of the ocean, and may have hindered their cultivation. We demonstrate the power of single cell DNA sequencing to generate reference genomes of uncultured

  18. Metagenomic reconstructions of bacterial CRISPR loci constrain population histories.

    PubMed

    Sun, Christine L; Thomas, Brian C; Barrangou, Rodolphe; Banfield, Jillian F

    2016-04-01

    Bacterial CRISPR-Cas systems provide insight into recent population history because they rapidly incorporate, in a unidirectional manner, short fragments (spacers) from coexisting infective virus populations into host chromosomes. Immunity is achieved by sequence identity between transcripts of spacers and their targets. Here, we used metagenomics to study the stability and dynamics of the type I-E CRISPR-Cas locus of Leptospirillum group II bacteria in biofilms sampled over 5 years from an acid mine drainage (AMD) system. Despite recovery of 452,686 spacers from CRISPR amplicons and metagenomic data, rarefaction curves of spacers show no saturation. The vast repertoire of spacers is attributed to phage/plasmid population diversity and retention of old spacers, despite rapid evolution of the targeted phage/plasmid genome regions (proto-spacers). The oldest spacers (spacers found at the trailer end) are conserved for at least 5 years, and 12% of these retain perfect or near-perfect matches to proto-spacer targets. The majority of proto-spacer regions contain an AAG proto-spacer adjacent motif (PAM). Spacers throughout the locus target the same phage population (AMDV1), but there are blocks of consecutive spacers without AMDV1 target sequences. Results suggest long-term coexistence of Leptospirillum with AMDV1 and periods when AMDV1 was less dominant. Metagenomics can be applied to millions of cells in a single sample to provide an extremely large spacer inventory, allow identification of phage/plasmids and enable analysis of previous phage/plasmid exposure. Thus, this approach can provide insights into prior bacterial environment and genetic interplay between hosts and their viruses. PMID:26394009

  19. Bioprospecting metagenomes: Glycosyl hydrolases for converting biomass

    SciTech Connect

    Li, L.; van der Lelie, D.; McCorkle, S. R.; Monchy, S.; Taghavi, S.

    2009-05-18

    Throughout immeasurable time, microorganisms evolved and accumulated remarkable physiological and functional heterogeneity, and now constitute the major reserve for genetic diversity on earth. Using metagenomics, namely genetic material recovered directly from environmental samples, this biogenetic diversification can be accessed without the need to cultivate cells. Accordingly, microbial communities and their metagenomes, isolated from biotopes with high turnover rates of recalcitrant biomass, such as lignocellulosic plant cell walls, have become a major resource for bioprospecting; furthermore, this material is a major asset in the search for new biocatalytics (enzymes) for various industrial processes, including the production of biofuels from plant feedstocks. However, despite the contributions from metagenomics technologies consequent upon the discovery of novel enzymes, this relatively new enterprise requires major improvements. In this review, we compare function-based metagenome screening and sequence-based metagenome data mining, discussing the advantages and limitations of both methods. We also describe the unusual enzymes discovered via metagenomics approaches, and discuss the future prospects for metagenome technologies.

  20. Bioprospecting metagenomes: glycosyl hydrolases for converting biomass

    PubMed Central

    Li, Luen-Luen; McCorkle, Sean R; Monchy, Sebastien; Taghavi, Safiyh; van der Lelie, Daniel

    2009-01-01

    Throughout immeasurable time, microorganisms evolved and accumulated remarkable physiological and functional heterogeneity, and now constitute the major reserve for genetic diversity on earth. Using metagenomics, namely genetic material recovered directly from environmental samples, this biogenetic diversification can be accessed without the need to cultivate cells. Accordingly, microbial communities and their metagenomes, isolated from biotopes with high turnover rates of recalcitrant biomass, such as lignocellulosic plant cell walls, have become a major resource for bioprospecting; furthermore, this material is a major asset in the search for new biocatalytics (enzymes) for various industrial processes, including the production of biofuels from plant feedstocks. However, despite the contributions from metagenomics technologies consequent upon the discovery of novel enzymes, this relatively new enterprise requires major improvements. In this review, we compare function-based metagenome screening and sequence-based metagenome data mining, discussing the advantages and limitations of both methods. We also describe the unusual enzymes discovered via metagenomics approaches, and discuss the future prospects for metagenome technologies. PMID:19450243

  1. Chitinase genes revealed and compared in bacterial isolates, DNA extracts and a metagenomic library from a phytopathogen suppressive soil

    SciTech Connect

    Hjort, K.; Bergstrom, M.; Adesina, M.F.; Jansson, J.K.; Smalla, K.; Sjoling, S.

    2009-09-01

    Soil that is suppressive to disease caused by fungal pathogens is an interesting source to target for novel chitinases that might be contributing towards disease suppression. In this study we screened for chitinase genes, in a phytopathogen-suppressive soil in three ways: (1) from a metagenomic library constructed from microbial cells extracted from soil, (2) from directly extracted DNA and (3) from bacterial isolates with antifungal and chitinase activities. Terminal-restriction fragment length polymorphism (T-RFLP) of chitinase genes revealed differences in amplified chitinase genes from the metagenomic library and the directly extracted DNA, but approximately 40% of the identified chitinase terminal-restriction fragments (TRFs) were found in both sources. All of the chitinase TRFs from the isolates were matched to TRFs in the directly extracted DNA and the metagenomic library. The most abundant chitinase TRF in the soil DNA and the metagenomic library corresponded to the TRF{sup 103} of the isolate, Streptomyces mutomycini and/or Streptomyces clavifer. There were good matches between T-RFLP profiles of chitinase gene fragments obtained from different sources of DNA. However, there were also differences in both the chitinase and the 16S rRNA gene T-RFLP patterns depending on the source of DNA, emphasizing the lack of complete coverage of the gene diversity by any of the approaches used.

  2. Magma Fragmentation

    NASA Astrophysics Data System (ADS)

    Gonnermann, Helge M.

    2015-05-01

    Magma fragmentation is the breakup of a continuous volume of molten rock into discrete pieces, called pyroclasts. Because magma contains bubbles of compressible magmatic volatiles, decompression of low-viscosity magma leads to rapid expansion. The magma is torn into fragments, as it is stretched into hydrodynamically unstable sheets and filaments. If the magma is highly viscous, resistance to bubble growth will instead lead to excess gas pressure and the magma will deform viscoelastically by fracturing like a glassy solid, resulting in the formation of a violently expanding gas-pyroclast mixture. In either case, fragmentation represents the conversion of potential energy into the surface energy of the newly created fragments and the kinetic energy of the expanding gas-pyroclast mixture. If magma comes into contact with external water, the conversion of thermal energy will vaporize water and quench magma at the melt-water interface, thus creating dynamic stresses that cause fragmentation and the release of kinetic energy. Lastly, shear deformation of highly viscous magma may cause brittle fractures and release seismic energy.

  3. Human milk metagenome: a functional capacity analysis

    PubMed Central

    2013-01-01

    Background Human milk contains a diverse population of bacteria that likely influences colonization of the infant gastrointestinal tract. Recent studies, however, have been limited to characterization of this microbial community by 16S rRNA analysis. In the present study, a metagenomic approach using Illumina sequencing of a pooled milk sample (ten donors) was employed to determine the genera of bacteria and the types of bacterial open reading frames in human milk that may influence bacterial establishment and stability in this primal food matrix. The human milk metagenome was also compared to that of breast-fed and formula-fed infants’ feces (n = 5, each) and mothers’ feces (n = 3) at the phylum level and at a functional level using open reading frame abundance. Additionally, immune-modulatory bacterial-DNA motifs were also searched for within human milk. Results The bacterial community in human milk contained over 360 prokaryotic genera, with sequences aligning predominantly to the phyla of Proteobacteria (65%) and Firmicutes (34%), and the genera of Pseudomonas (61.1%), Staphylococcus (33.4%) and Streptococcus (0.5%). From assembled human milk-derived contigs, 30,128 open reading frames were annotated and assigned to functional categories. When compared to the metagenome of infants’ and mothers’ feces, the human milk metagenome was less diverse at the phylum level, and contained more open reading frames associated with nitrogen metabolism, membrane transport and stress response (P < 0.05). The human milk metagenome also contained a similar occurrence of immune-modulatory DNA motifs to that of infants’ and mothers’ fecal metagenomes. Conclusions Our results further expand the complexity of the human milk metagenome and enforce the benefits of human milk ingestion on the microbial colonization of the infant gut and immunity. Discovery of immune-modulatory motifs in the metagenome of human milk indicates more exhaustive analyses of the

  4. Metagenomic evidence for the presence of phototrophic Gemmatimonadetes bacteria in diverse environments.

    PubMed

    Zeng, Yonghui; Baumbach, Jan; Barbosa, Eudes Guilherme Vieira; Azevedo, Vasco; Zhang, Chuanlun; Koblížek, Michal

    2016-02-01

    Gemmatimonadetes represents a poorly understood bacterial phylum with only a handful of cultured species. Recently, one of its few representatives, Gemmatimonas phototrophica, was found to contain purple bacterial photosynthetic reaction centres. However, almost nothing is known about the environmental distribution of phototrophic Gemmatimonadetes bacteria. To fill this gap, we took advantage of fast-growing public metagenomic databases and performed an extensive survey of metagenomes deposited into the NCBI's WGS database, the JGI's IMG webserver and the MG-RAST webserver. By employing Mg protoporphyrin IX monomethyl ester oxidative cyclase (AcsF) as a marker gene, we identified 291 AcsF fragments (24-361 amino acids long) that are closely related to G. phototrophica from 161 metagenomes originating from various habitats, including air, river waters/sediment, estuarine waters, lake waters, biofilms, plant surfaces, intertidal sediment, soils, springs and wastewater treatment plants, but none from marine waters or sediment. Based on AcsF hit counts, phototrophic Gemmatimonadetes bacteria make up 0.4-11.9% of whole phototrophic microbial communities in these habitats. Unexpectedly, an almost complete 37.9 kb long photosynthesis gene cluster with identical gene composition and arrangement to those in G. phototrophica was reconstructed from the Odense wastewater metagenome, only differing in a 7.2 kb long non-photosynthesis-gene insert. These data suggest that phototrophic Gemmatimonadetes bacteria are much more widely distributed in the environment and exhibit a higher genetic diversity than previously thought. PMID:26636755

  5. Estimating DNA coverage and abundance in metagenomes using a gamma approximation

    PubMed Central

    Hooper, Sean D.; Dalevi, Daniel; Pati, Amrita; Mavromatis, Konstantinos; Ivanova, Natalia N.; Kyrpides, Nikos C.

    2010-01-01

    Motivation: Shotgun sequencing generates large numbers of short DNA reads from either an isolated organism or, in the case of metagenomics projects, from the aggregate genome of a microbial community. These reads are then assembled based on overlapping sequences into larger, contiguous sequences (contigs). The feasibility of assembly and the coverage achieved (reads per nucleotide or distinct sequence of nucleotides) depend on several factors: the number of reads sequenced, the read length and the relative abundances of their source genomes in the microbial community. A low coverage suggests that most of the genomic DNA in the sample has not been sequenced, but it is often difficult to estimate either the extent of the uncaptured diversity or the amount of additional sequencing that would be most efficacious. In this work, we regard a metagenome as a population of DNA fragments (bins), each of which may be covered by one or more reads. We employ a gamma distribution to model this bin population due to its flexibility and ease of use. When a gamma approximation can be found that adequately fits the data, we may estimate the number of bins that were not sequenced and that could potentially be revealed by additional sequencing. We evaluated the performance of this model using simulated metagenomes and demonstrate its applicability on three recent metagenomic datasets. Contact: sean.d.hooper@genpat.uu.se Supplementary information: Supplementary data are available at Bioinformatics online. PMID:20008478

  6. Estimating DNA coverage and abundance in metagenomes using a gamma approximation

    SciTech Connect

    Hooper, Sean D; Dalevi, Daniel; Pati, Amrita; Mavromatis, Konstantinos; Ivanova, Natalia N; Kyrpides, Nikos C

    2010-01-01

    Shotgun sequencing generates large numbers of short DNA reads from either an isolated organism or, in the case of metagenomics projects, from the aggregate genome of a microbial community. These reads are then assembled based on overlapping sequences into larger, contiguous sequences (contigs). The feasibility of assembly and the coverage achieved (reads per nucleotide or distinct sequence of nucleotides) depend on several factors: the number of reads sequenced, the read length and the relative abundances of their source genomes in the microbial community. A low coverage suggests that most of the genomic DNA in the sample has not been sequenced, but it is often difficult to estimate either the extent of the uncaptured diversity or the amount of additional sequencing that would be most efficacious. In this work, we regard a metagenome as a population of DNA fragments (bins), each of which may be covered by one or more reads. We employ a gamma distribution to model this bin population due to its flexibility and ease of use. When a gamma approximation can be found that adequately fits the data, we may estimate the number of bins that were not sequenced and that could potentially be revealed by additional sequencing. We evaluated the performance of this model using simulated metagenomes and demonstrate its applicability on three recent metagenomic datasets.

  7. MyTaxa: an advanced taxonomic classifier for genomic and metagenomic sequences

    PubMed Central

    Luo, Chengwei; Rodriguez-R, Luis M.; Konstantinidis, Konstantinos T.

    2014-01-01

    Determining the taxonomic affiliation of sequences assembled from metagenomes remains a major bottleneck that affects research across the fields of environmental, clinical and evolutionary microbiology. Here, we introduce MyTaxa, a homology-based bioinformatics framework to classify metagenomic and genomic sequences with unprecedented accuracy. The distinguishing aspect of MyTaxa is that it employs all genes present in an unknown sequence as classifiers, weighting each gene based on its (predetermined) classifying power at a given taxonomic level and frequency of horizontal gene transfer. MyTaxa also implements a novel classification scheme based on the genome-aggregate average amino acid identity concept to determine the degree of novelty of sequences representing uncharacterized taxa, i.e. whether they represent novel species, genera or phyla. Application of MyTaxa on in silico generated (mock) and real metagenomes of varied read length (100–2000 bp) revealed that it correctly classified at least 5% more sequences than any other tool. The analysis also showed that ∼10% of the assembled sequences from human gut metagenomes represent novel species with no sequenced representatives, several of which were highly abundant in situ such as members of the Prevotella genus. Thus, MyTaxa can find several important applications in microbial identification and diversity studies. PMID:24589583

  8. Forest Fragmentation

    EPA Science Inventory

    This indicator describes forest fragmentation in the contiguous United States circa 2001. This information provides a broad, recent picture of the spatial pattern of the nation’s forests and the extent to which they are being broken into smaller patches and pierced or interspe...

  9. Toward Accurate and Quantitative Comparative Metagenomics.

    PubMed

    Nayfach, Stephen; Pollard, Katherine S

    2016-08-25

    Shotgun metagenomics and computational analysis are used to compare the taxonomic and functional profiles of microbial communities. Leveraging this approach to understand roles of microbes in human biology and other environments requires quantitative data summaries whose values are comparable across samples and studies. Comparability is currently hampered by the use of abundance statistics that do not estimate a meaningful parameter of the microbial community and biases introduced by experimental protocols and data-cleaning approaches. Addressing these challenges, along with improving study design, data access, metadata standardization, and analysis tools, will enable accurate comparative metagenomics. We envision a future in which microbiome studies are replicable and new metagenomes are easily and rapidly integrated with existing data. Only then can the potential of metagenomics for predictive ecological modeling, well-powered association studies, and effective microbiome medicine be fully realized. PMID:27565341

  10. Finding the Needles in the Metagenome Haystack

    PubMed Central

    Speksnijder, Arjen G. C. L.; Zhang, Kun; Goodman, Robert M.; van Veen, Johannes A.

    2007-01-01

    In the collective genomes (the metagenome) of the microorganisms inhabiting the Earth’s diverse environments is written the history of life on this planet. New molecular tools developed and used for the past 15 years by microbial ecologists are facilitating the extraction, cloning, screening, and sequencing of these genomes. This approach allows microbial ecologists to access and study the full range of microbial diversity, regardless of our ability to culture organisms, and provides an unprecedented access to the breadth of natural products that these genomes encode. However, there is no way that the mere collection of sequences, no matter how expansive, can provide full coverage of the complex world of microbial metagenomes within the foreseeable future. Furthermore, although it is possible to fish out highly informative and useful genes from the sea of gene diversity in the environment, this can be a highly tedious and inefficient procedure. Microbial ecologists must be clever in their pursuit of ecologically relevant, valuable, and niche-defining genomic information within the vast haystack of microbial diversity. In this report, we seek to describe advances and prospects that will help microbial ecologists glean more knowledge from investigations into metagenomes. These include technological advances in sequencing and cloning methodologies, as well as improvements in annotation and comparative sequence analysis. More significant, however, will be ways to focus in on various subsets of the metagenome that may be of particular relevance, either by limiting the target community under study or improving the focus or speed of screening procedures. Lastly, given the cost and infrastructure necessary for large metagenome projects, and the almost inexhaustible amount of data they can produce, trends toward broader use of metagenome data across the research community coupled with the needed investment in bioinformatics infrastructure devoted to metagenomics will no

  11. Challenges and opportunities of airborne metagenomics.

    PubMed

    Behzad, Hayedeh; Gojobori, Takashi; Mineta, Katsuhiko

    2015-05-01

    Recent metagenomic studies of environments, such as marine and soil, have significantly enhanced our understanding of the diverse microbial communities living in these habitats and their essential roles in sustaining vast ecosystems. The increase in the number of publications related to soil and marine metagenomics is in sharp contrast to those of air, yet airborne microbes are thought to have significant impacts on many aspects of our lives from their potential roles in atmospheric events such as cloud formation, precipitation, and atmospheric chemistry to their major impact on human health. In this review, we will discuss the current progress in airborne metagenomics, with a special focus on exploring the challenges and opportunities of undertaking such studies. The main challenges of conducting metagenomic studies of airborne microbes are as follows: 1) Low density of microorganisms in the air, 2) efficient retrieval of microorganisms from the air, 3) variability in airborne microbial community composition, 4) the lack of standardized protocols and methodologies, and 5) DNA sequencing and bioinformatics-related challenges. Overcoming these challenges could provide the groundwork for comprehensive analysis of airborne microbes and their potential impact on the atmosphere, global climate, and our health. Metagenomic studies offer a unique opportunity to examine viral and bacterial diversity in the air and monitor their spread locally or across the globe, including threats from pathogenic microorganisms. Airborne metagenomic studies could also lead to discoveries of novel genes and metabolic pathways relevant to meteorological and industrial applications, environmental bioremediation, and biogeochemical cycles. PMID:25953766

  12. Challenges and Opportunities of Airborne Metagenomics

    PubMed Central

    Behzad, Hayedeh; Gojobori, Takashi; Mineta, Katsuhiko

    2015-01-01

    Recent metagenomic studies of environments, such as marine and soil, have significantly enhanced our understanding of the diverse microbial communities living in these habitats and their essential roles in sustaining vast ecosystems. The increase in the number of publications related to soil and marine metagenomics is in sharp contrast to those of air, yet airborne microbes are thought to have significant impacts on many aspects of our lives from their potential roles in atmospheric events such as cloud formation, precipitation, and atmospheric chemistry to their major impact on human health. In this review, we will discuss the current progress in airborne metagenomics, with a special focus on exploring the challenges and opportunities of undertaking such studies. The main challenges of conducting metagenomic studies of airborne microbes are as follows: 1) Low density of microorganisms in the air, 2) efficient retrieval of microorganisms from the air, 3) variability in airborne microbial community composition, 4) the lack of standardized protocols and methodologies, and 5) DNA sequencing and bioinformatics-related challenges. Overcoming these challenges could provide the groundwork for comprehensive analysis of airborne microbes and their potential impact on the atmosphere, global climate, and our health. Metagenomic studies offer a unique opportunity to examine viral and bacterial diversity in the air and monitor their spread locally or across the globe, including threats from pathogenic microorganisms. Airborne metagenomic studies could also lead to discoveries of novel genes and metabolic pathways relevant to meteorological and industrial applications, environmental bioremediation, and biogeochemical cycles. PMID:25953766

  13. FCMM: A comparative metagenomic approach for functional characterization of multiple metagenome samples.

    PubMed

    Lee, Jongin; Lee, Hoon Taek; Hong, Woon-young; Jang, Eunji; Kim, Jaebum

    2015-08-01

    Next-generation sequencing (NGS) technologies make it possible to obtain the entire genomic content of microorganisms in metagenome samples. Thus, many studies have developed methods for the processing and analysis of metagenomic NGS reads, including analyses for predicting functions and their enrichments in environmental metagenome samples. Especially, comparative functional studies by using multi-metagenome samples are essential for identifying and comparing different characteristics of multiple environmental samples. In this paper, we introduce a pipeline for functional characterization of multiple metagenome samples to infer major functions as well as their quantitative scores in a comparative metagenomics manner. The pipeline performs the annotation of functions related to expected proteins in the metagenome samples, calculates their enrichment scores based on the reads per kilobase per million reads (RPKM) measure, and predicts the relative abundance of associated functions by a statistical test. The results from single sample analysis are then used to find common and sample-specific major functions. By applying the pipeline to six different environmental metagenome samples, including two ocean (Antarctica aquatic and Baltic Sea) and four terrestrial (Acid mine drainage, human gut microbiome, Amazon River, and Wasca soil) samples, we were able to predict common functions as well as environment-specific functions. Our pipeline is available at http://bioinfo.konkuk.ac.kr/FCMM/. PMID:26027543

  14. Metagenomics of Glassy-Winged Sharpshooter, Homalodisca vitripennis (Hemiptera: Cicadellidae)

    Technology Transfer Automated Retrieval System (TEKTRAN)

    A Metagenomics approach was used to identify unknown organisms which live in association with the glassy-winged sharpshooter, Homalodisca vitripennis (Hemiptera: Cicadellidae). Metagenomics combines molecular biology and genetics to identify, and characterize genetic material from unique biological ...

  15. Under-detection of endospore-forming Firmicutes in metagenomic data.

    PubMed

    Filippidou, Sevasti; Junier, Thomas; Wunderlin, Tina; Lo, Chien-Chi; Li, Po-E; Chain, Patrick S; Junier, Pilar

    2015-01-01

    Microbial diversity studies based on metagenomic sequencing have greatly enhanced our knowledge of the microbial world. However, one caveat is the fact that not all microorganisms are equally well detected, questioning the universality of this approach. Firmicutes are known to be a dominant bacterial group. Several Firmicutes species are endospore formers and this property makes them hardy in potentially harsh conditions, and thus likely to be present in a wide variety of environments, even as residents and not functional players. While metagenomic libraries can be expected to contain endospore formers, endospores are known to be resilient to many traditional methods of DNA isolation and thus potentially undetectable. In this study we evaluated the representation of endospore-forming Firmicutes in 73 published metagenomic datasets using two molecular markers unique to this bacterial group (spo0A and gpr). Both markers were notably absent in well-known habitats of Firmicutes such as soil, with spo0A found only in three mammalian gut microbiomes. A tailored DNA extraction method resulted in the detection of a large diversity of endospore-formers in amplicon sequencing of the 16S rRNA and spo0A genes. However, shotgun classification was still poor with only a minor fraction of the community assigned to Firmicutes. Thus, removing a specific bias in a molecular workflow improves detection in amplicon sequencing, but it was insufficient to overcome the limitations for detecting endospore-forming Firmicutes in whole-genome metagenomics. In conclusion, this study highlights the importance of understanding the specific methodological biases that can contribute to improve the universality of metagenomic approaches. PMID:25973144

  16. Under-detection of endospore-forming Firmicutes in metagenomic data

    SciTech Connect

    Filippidou, Sevasti; Junier, Thomas; Wunderlin, Tina; Lo, Chien -Chi; Li, Po -E; Chain, Patrick S.; Junier, Pilar

    2015-04-25

    Microbial diversity studies based on metagenomic sequencing have greatly enhanced our knowledge of the microbial world. However, one caveat is the fact that not all microorganisms are equally well detected, questioning the universality of this approach. Firmicutes are known to be a dominant bacterial group. Several Firmicutes species are endospore formers and this property makes them hardy in potentially harsh conditions, and thus likely to be present in a wide variety of environments, even as residents and not functional players. While metagenomic libraries can be expected to contain endospore formers, endospores are known to be resilient to many traditional methods of DNA isolation and thus potentially undetectable. In this study we evaluated the representation of endospore-forming Firmicutes in 73 published metagenomic datasets using two molecular markers unique to this bacterial group (spo0A and gpr). Both markers were notably absent in well-known habitats of Firmicutes such as soil, with spo0A found only in three mammalian gut microbiomes. A tailored DNA extraction method resulted in the detection of a large diversity of endospore-formers in amplicon sequencing of the 16S rRNA and spo0A genes. However, shotgun classification was still poor with only a minor fraction of the community assigned to Firmicutes. Thus, removing a specific bias in a molecular workflow improves detection in amplicon sequencing, but it was insufficient to overcome the limitations for detecting endospore-forming Firmicutes in whole-genome metagenomics. In conclusion, this study highlights the importance of understanding the specific methodological biases that can contribute to improve the universality of metagenomic approaches.

  17. Under-detection of endospore-forming Firmicutes in metagenomic data

    DOE PAGESBeta

    Filippidou, Sevasti; Junier, Thomas; Wunderlin, Tina; Lo, Chien -Chi; Li, Po -E; Chain, Patrick S.; Junier, Pilar

    2015-04-25

    Microbial diversity studies based on metagenomic sequencing have greatly enhanced our knowledge of the microbial world. However, one caveat is the fact that not all microorganisms are equally well detected, questioning the universality of this approach. Firmicutes are known to be a dominant bacterial group. Several Firmicutes species are endospore formers and this property makes them hardy in potentially harsh conditions, and thus likely to be present in a wide variety of environments, even as residents and not functional players. While metagenomic libraries can be expected to contain endospore formers, endospores are known to be resilient to many traditional methodsmore » of DNA isolation and thus potentially undetectable. In this study we evaluated the representation of endospore-forming Firmicutes in 73 published metagenomic datasets using two molecular markers unique to this bacterial group (spo0A and gpr). Both markers were notably absent in well-known habitats of Firmicutes such as soil, with spo0A found only in three mammalian gut microbiomes. A tailored DNA extraction method resulted in the detection of a large diversity of endospore-formers in amplicon sequencing of the 16S rRNA and spo0A genes. However, shotgun classification was still poor with only a minor fraction of the community assigned to Firmicutes. Thus, removing a specific bias in a molecular workflow improves detection in amplicon sequencing, but it was insufficient to overcome the limitations for detecting endospore-forming Firmicutes in whole-genome metagenomics. In conclusion, this study highlights the importance of understanding the specific methodological biases that can contribute to improve the universality of metagenomic approaches.« less

  18. Alignment-free Visualization of Metagenomic Data by Nonlinear Dimension Reduction

    PubMed Central

    Laczny, Cedric C.; Pinel, Nicolás; Vlassis, Nikos; Wilmes, Paul

    2014-01-01

    The visualization of metagenomic data, especially without prior taxonomic identification of reconstructed genomic fragments, is a challenging problem in computational biology. An ideal visualization method should, among others, enable clear distinction of congruent groups of sequences of closely related taxa, be applicable to fragments of lengths typically achievable following assembly, and allow the efficient analysis of the growing amounts of community genomic sequence data. Here, we report a scalable approach for the visualization of metagenomic data that is based on nonlinear dimension reduction via Barnes-Hut Stochastic Neighbor Embedding of centered log-ratio transformed oligonucleotide signatures extracted from assembled genomic sequence fragments. The approach allows for alignment-free assessment of the data-inherent taxonomic structure, and it can potentially facilitate the downstream binning of genomic fragments into uniform clusters reflecting organismal origin. We demonstrate the performance of our approach by visualizing community genomic sequence data from simulated as well as groundwater, human-derived and marine microbial communities. PMID:24682077

  19. RiboFR-Seq: a novel approach to linking 16S rRNA amplicon profiles to metagenomes.

    PubMed

    Zhang, Yanming; Ji, Peifeng; Wang, Jinfeng; Zhao, Fangqing

    2016-06-01

    16S rRNA amplicon analysis and shotgun metagenome sequencing are two main culture-independent strategies to explore the genetic landscape of various microbial communities. Recently, numerous studies have employed these two approaches together, but downstream data analyses were performed separately, which always generated incongruent or conflict signals on both taxonomic and functional classifications. Here we propose a novel approach, RiboFR-Seq (Ribosomal RNA gene flanking region sequencing), for capturing both ribosomal RNA variable regions and their flanking protein-coding genes simultaneously. Through extensive testing on clonal bacterial strain, salivary microbiome and bacterial epibionts of marine kelp, we demonstrated that RiboFR-Seq could detect the vast majority of bacteria not only in well-studied microbiomes but also in novel communities with limited reference genomes. Combined with classical amplicon sequencing and shotgun metagenome sequencing, RiboFR-Seq can link the annotations of 16S rRNA and metagenomic contigs to make a consensus classification. By recognizing almost all 16S rRNA copies, the RiboFR-seq approach can effectively reduce the taxonomic abundance bias resulted from 16S rRNA copy number variation. We believe that RiboFR-Seq, which provides an integrated view of 16S rRNA profiles and metagenomes, will help us better understand diverse microbial communities. PMID:26984526

  20. RiboFR-Seq: a novel approach to linking 16S rRNA amplicon profiles to metagenomes

    PubMed Central

    Zhang, Yanming; Ji, Peifeng; Wang, Jinfeng; Zhao, Fangqing

    2016-01-01

    16S rRNA amplicon analysis and shotgun metagenome sequencing are two main culture-independent strategies to explore the genetic landscape of various microbial communities. Recently, numerous studies have employed these two approaches together, but downstream data analyses were performed separately, which always generated incongruent or conflict signals on both taxonomic and functional classifications. Here we propose a novel approach, RiboFR-Seq (Ribosomal RNA gene flanking region sequencing), for capturing both ribosomal RNA variable regions and their flanking protein-coding genes simultaneously. Through extensive testing on clonal bacterial strain, salivary microbiome and bacterial epibionts of marine kelp, we demonstrated that RiboFR-Seq could detect the vast majority of bacteria not only in well-studied microbiomes but also in novel communities with limited reference genomes. Combined with classical amplicon sequencing and shotgun metagenome sequencing, RiboFR-Seq can link the annotations of 16S rRNA and metagenomic contigs to make a consensus classification. By recognizing almost all 16S rRNA copies, the RiboFR-seq approach can effectively reduce the taxonomic abundance bias resulted from 16S rRNA copy number variation. We believe that RiboFR-Seq, which provides an integrated view of 16S rRNA profiles and metagenomes, will help us better understand diverse microbial communities. PMID:26984526

  1. Soil Metagenomes from Different Pristine Environments of Northwest Argentina

    PubMed Central

    Colman, Déborah I.

    2015-01-01

    This is the first study to use a high-throughput metagenomic shotgun approach to explore the biosynthetic potential of soil metagenomes from different pristine environments of northwest Argentina. Our data sets characterize these metagenomes and provide information on the possible effect these ecosystems have on their diversity and biosynthetic potential. PMID:26272581

  2. Perturbative fragmentation

    SciTech Connect

    Kopeliovich, B. Z.; Pirner, H.-J.; Potashnikova, I. K.; Schmidt, Ivan; Tarasov, A. V.

    2008-03-01

    The Berger model of perturbative fragmentation of quarks to pions is improved by providing an absolute normalization and keeping all terms in a (1-z) expansion, which makes the calculation valid at all values of fractional pion momentum z. We also replace the nonrelativistic wave function of a loosely bound pion by the more realistic procedure of projecting to the light-cone pion wave function, which in turn is taken from well known models. The full calculation does not confirm the (1-z){sup 2} behavior of the fragmentation function (FF) predicted in [E. L. Berger, Z. Phys. C 4, 289 (1980); Phys. Lett. 89B, 241 (1980] for z>0.5, and only works at very large z>0.95, where it is in reasonable agreement with phenomenological FFs. Otherwise, we observe quite a different z-dependence which grossly underestimates data at smaller z. The disagreement is reduced after the addition of pions from decays of light vector mesons, but still remains considerable. The process dependent higher twist terms are also calculated exactly and found to be important at large z and/or p{sub T}.

  3. Metazen – metadata capture for metagenomes

    DOE PAGESBeta

    Bischof, Jared; Harrison, Travis; Paczian, Tobias; Glass, Elizabeth; Wilke, Andreas; Meyer, Folker

    2014-12-08

    Background: As the impact and prevalence of large-scale metagenomic surveys grow, so does the acute need for more complete and standards compliant metadata. Metadata (data describing data) provides an essential complement to experimental data, helping to answer questions about its source, mode of collection, and reliability. Metadata collection and interpretation have become vital to the genomics and metagenomics communities, but considerable challenges remain, including exchange, curation, and distribution. Currently, tools are available for capturing basic field metadata during sampling, and for storing, updating and viewing it. These tools are not specifically designed for metagenomic surveys; in particular, they lack themore » appropriate metadata collection templates, a centralized storage repository, and a unique ID linking system that can be used to easily port complete and compatible metagenomic metadata into widely used assembly and sequence analysis tools. Results: Metazen was developed as a comprehensive framework designed to enable metadata capture for metagenomic sequencing projects. Specifically, Metazen provides a rapid, easy-to-use portal to encourage early deposition of project and sample metadata. Conclusion: Metazen is an interactive tool that aids users in recording their metadata in a complete and valid format. A defined set of mandatory fields captures vital information, while the option to add fields provides flexibility.« less

  4. Metazen – metadata capture for metagenomes

    SciTech Connect

    Bischof, Jared; Harrison, Travis; Paczian, Tobias; Glass, Elizabeth; Wilke, Andreas; Meyer, Folker

    2014-12-08

    Background: As the impact and prevalence of large-scale metagenomic surveys grow, so does the acute need for more complete and standards compliant metadata. Metadata (data describing data) provides an essential complement to experimental data, helping to answer questions about its source, mode of collection, and reliability. Metadata collection and interpretation have become vital to the genomics and metagenomics communities, but considerable challenges remain, including exchange, curation, and distribution. Currently, tools are available for capturing basic field metadata during sampling, and for storing, updating and viewing it. These tools are not specifically designed for metagenomic surveys; in particular, they lack the appropriate metadata collection templates, a centralized storage repository, and a unique ID linking system that can be used to easily port complete and compatible metagenomic metadata into widely used assembly and sequence analysis tools. Results: Metazen was developed as a comprehensive framework designed to enable metadata capture for metagenomic sequencing projects. Specifically, Metazen provides a rapid, easy-to-use portal to encourage early deposition of project and sample metadata. Conclusion: Metazen is an interactive tool that aids users in recording their metadata in a complete and valid format. A defined set of mandatory fields captures vital information, while the option to add fields provides flexibility.

  5. Shotgun metagenomic data streams: surfing without fear

    SciTech Connect

    Berendzen, Joel R

    2010-12-06

    Timely information about bio-threat prevalence, consequence, propagation, attribution, and mitigation is needed to support decision-making, both routinely and in a crisis. One DNA sequencer can stream 25 Gbp of information per day, but sampling strategies and analysis techniques are needed to turn raw sequencing power into actionable knowledge. Shotgun metagenomics can enable biosurveillance at the level of a single city, hospital, or airplane. Metagenomics characterizes viruses and bacteria from complex environments such as soil, air filters, or sewage. Unlike targeted-primer-based sequencing, shotgun methods are not blind to sequences that are truly novel, and they can measure absolute prevalence. Shotgun metagenomic sampling can be non-invasive, efficient, and inexpensive while being informative. We have developed analysis techniques for shotgun metagenomic sequencing that rely upon phylogenetic signature patterns. They work by indexing local sequence patterns in a manner similar to web search engines. Our methods are laptop-fast and favorable scaling properties ensure they will be sustainable as sequencing methods grow. We show examples of application to soil metagenomic samples.

  6. Metagenomics: Application of Genomics to Uncultured Microorganisms

    PubMed Central

    Handelsman, Jo

    2004-01-01

    Metagenomics (also referred to as environmental and community genomics) is the genomic analysis of microorganisms by direct extraction and cloning of DNA from an assemblage of microorganisms. The development of metagenomics stemmed from the ineluctable evidence that as-yet-uncultured microorganisms represent the vast majority of organisms in most environments on earth. This evidence was derived from analyses of 16S rRNA gene sequences amplified directly from the environment, an approach that avoided the bias imposed by culturing and led to the discovery of vast new lineages of microbial life. Although the portrait of the microbial world was revolutionized by analysis of 16S rRNA genes, such studies yielded only a phylogenetic description of community membership, providing little insight into the genetics, physiology, and biochemistry of the members. Metagenomics provides a second tier of technical innovation that facilitates study of the physiology and ecology of environmental microorganisms. Novel genes and gene products discovered through metagenomics include the first bacteriorhodopsin of bacterial origin; novel small molecules with antimicrobial activity; and new members of families of known proteins, such as an Na+(Li+)/H+ antiporter, RecA, DNA polymerase, and antibiotic resistance determinants. Reassembly of multiple genomes has provided insight into energy and nutrient cycling within the community, genome structure, gene function, population genetics and microheterogeneity, and lateral gene transfer among members of an uncultured community. The application of metagenomic sequence information will facilitate the design of better culturing strategies to link genomic analysis with pure culture studies. PMID:15590779

  7. Preliminary High-Throughput Metagenome Assembly

    SciTech Connect

    Dusheyko, Serge; Furman, Craig; Pangilinan, Jasmyn; Shapiro, Harris; Tu, Hank

    2007-03-26

    Metagenome data sets present a qualitatively different assembly problem than traditional single-organism whole-genome shotgun (WGS) assembly. The unique aspects of such projects include the presence of a potentially large number of distinct organisms and their representation in the data set at widely different fractions. In addition, multiple closely related strains could be present, which would be difficult to assemble separately. Failure to take these issues into account can result in poor assemblies that either jumble together different strains or which fail to yield useful results. The DOE Joint Genome Institute has sequenced a number of metagenomic projects and plans to considerably increase this number in the coming year. As a result, the JGI has a need for high-throughput tools and techniques for handling metagenome projects. We present the techniques developed to handle metagenome assemblies in a high-throughput environment. This includes a streamlined assembly wrapper, based on the JGI?s in-house WGS assembler, Jazz. It also includes the selection of sensible defaults targeted for metagenome data sets, as well as quality control automation for cleaning up the raw results. While analysis is ongoing, we will discuss preliminary assessments of the quality of the assembly results (http://fames.jgi-psf.org).

  8. Viral Metagenomics: MetaView Software

    SciTech Connect

    Zhou, C; Smith, J

    2007-10-22

    The purpose of this report is to design and develop a tool for analysis of raw sequence read data from viral metagenomics experiments. The tool should compare read sequences of known viral nucleic acid sequence data and enable a user to attempt to determine, with some degree of confidence, what virus groups may be present in the sample. This project was conducted in two phases. In phase 1 we surveyed the literature and examined existing metagenomics tools to educate ourselves and to more precisely define the problem of analyzing raw read data from viral metagenomic experiments. In phase 2 we devised an approach and built a prototype code and database. This code takes viral metagenomic read data in fasta format as input and accesses all complete viral genomes from Kpath for sequence comparison. The system executes at the UNIX command line, producing output that is stored in an Oracle relational database. We provide here a description of the approach we came up with for handling un-assembled, short read data sets from viral metagenomics experiments. We include a discussion of the current MetaView code capabilities and additional functionality that we believe should be added, should additional funding be acquired to continue the work.

  9. Viral metagenomics and blood safety.

    PubMed

    Sauvage, V; Eloit, M

    2016-02-01

    The characterization of the human blood-associated viral community (also called blood virome) is essential for epidemiological surveillance and to anticipate new potential threats for blood transfusion safety. Currently, the risk of blood-borne agent transmission of well-known viruses (HBV, HCV, HIV and HTLV) can be considered as under control in high-resource countries. However, other viruses unknown or unsuspected may be transmitted to recipients by blood-derived products. This is particularly relevant considering that a significant proportion of transfused patients are immunocompromised and more frequently subjected to fatal outcomes. Several measures to prevent transfusion transmission of unknown viruses have been implemented including the exclusion of at-risk donors, leukocyte reduction of donor blood, and physicochemical treatment of the different blood components. However, up to now there is no universal method for pathogen inactivation, which would be applicable for all types of blood components and, equally effective for all viral families. In addition, among available inactivation procedures of viral genomes, some of them are recognized to be less effective on non-enveloped viruses, and inadequate to inactivate higher viral titers in plasma pools or derivatives. Given this, there is the need to implement new methodologies for the discovery of unknown viruses that may affect blood transfusion. Viral metagenomics combined with High Throughput Sequencing appears as a promising approach for the identification and global surveillance of new and/or unexpected viruses that could impair blood transfusion safety. PMID:26778104

  10. Viral metagenomics: are we missing the giants?

    PubMed

    Halary, S; Temmam, S; Raoult, D; Desnues, C

    2016-06-01

    Amoeba-infecting giant viruses are recently discovered viruses that have been isolated from diverse environments all around the world. In parallel to isolation efforts, metagenomics confirmed their worldwide distribution from a broad range of environmental and host-associated samples, including humans, depicting them as a major component of eukaryotic viruses in nature and a possible resident of the human/animal virome whose role is still unclear. Nevertheless, metagenomics data about amoeba-infecting giant viruses still remain scarce, mainly because of methodological limitations. Efforts should be pursued both at the metagenomic sample preparation level and on in silico analyses to better understand their roles in the environment and in human/animal health and disease. PMID:26851442

  11. A catalog of the mouse gut metagenome.

    PubMed

    Xiao, Liang; Feng, Qiang; Liang, Suisha; Sonne, Si Brask; Xia, Zhongkui; Qiu, Xinmin; Li, Xiaoping; Long, Hua; Zhang, Jianfeng; Zhang, Dongya; Liu, Chuan; Fang, Zhiwei; Chou, Joyce; Glanville, Jacob; Hao, Qin; Kotowska, Dorota; Colding, Camilla; Licht, Tine Rask; Wu, Donghai; Yu, Jun; Sung, Joseph Jao Yiu; Liang, Qiaoyi; Li, Junhua; Jia, Huijue; Lan, Zhou; Tremaroli, Valentina; Dworzynski, Piotr; Nielsen, H Bjørn; Bäckhed, Fredrik; Doré, Joël; Le Chatelier, Emmanuelle; Ehrlich, S Dusko; Lin, John C; Arumugam, Manimozhiyan; Wang, Jun; Madsen, Lise; Kristiansen, Karsten

    2015-10-01

    We established a catalog of the mouse gut metagenome comprising ∼2.6 million nonredundant genes by sequencing DNA from fecal samples of 184 mice. To secure high microbiome diversity, we used mouse strains of diverse genetic backgrounds, from different providers, kept in different housing laboratories and fed either a low-fat or high-fat diet. Similar to the human gut microbiome, >99% of the cataloged genes are bacterial. We identified 541 metagenomic species and defined a core set of 26 metagenomic species found in 95% of the mice. The mouse gut microbiome is functionally similar to its human counterpart, with 95.2% of its Kyoto Encyclopedia of Genes and Genomes (KEGG) orthologous groups in common. However, only 4.0% of the mouse gut microbial genes were shared (95% identity, 90% coverage) with those of the human gut microbiome. This catalog provides a useful reference for future studies. PMID:26414350

  12. Metagenomic exploration of antibiotic resistance in soil.

    PubMed

    Monier, Jean-Michel; Demanèche, Sandrine; Delmont, Tom O; Mathieu, Alban; Vogel, Timothy M; Simonet, Pascal

    2011-06-01

    The ongoing development of metagenomic approaches is providing the means to explore antibiotic resistance in nature and address questions that could not be answered previously with conventional culture-based strategies. The number of available environmental metagenomic sequence datasets is rapidly expanding and henceforth offer the ability to gain a more comprehensive understanding of antibiotic resistance at the global scale. Although there is now evidence that the environment constitutes a vast reservoir of antibiotic resistance gene determinants (ARGDs) and that the majority of ARGDs acquired by human pathogens may have an environmental origin, a better understanding of their diversity, prevalence and ecological significance may help predict the emergence and spreading of newly acquired resistances. Recent applications of metagenomic approaches to the study of ARGDs in natural environments such as soil should help overcome challenges concerning expanding antibiotic resistances. PMID:21601510

  13. Pathway-Based Functional Analysis of Metagenomes

    NASA Astrophysics Data System (ADS)

    Bercovici, Sivan; Sharon, Itai; Pinter, Ron Y.; Shlomi, Tomer

    Metagenomic data enables the study of microbes and viruses through their DNA as retrieved directly from the environment in which they live. Functional analysis of metagenomes explores the abundance of gene families, pathways, and systems, rather than their taxonomy. Through such analysis researchers are able to identify those functional capabilities most important to organisms in the examined environment. Recently, a statistical framework for the functional analysis of metagenomes was described that focuses on gene families. Here we describe two pathway level computational models for functional analysis that take into account important, yet unaddressed issues such as pathway size, gene length and overlap in gene content among pathways. We test our models over carefully designed simulated data and propose novel approaches for performance evaluation. Our models significantly improve over current approach with respect to pathway ranking and the computations of relative abundance of pathways in environments.

  14. Metagenomics as a Tool for Enzyme Discovery: Hydrolytic Enzymes from Marine-Related Metagenomes.

    PubMed

    Popovic, Ana; Tchigvintsev, Anatoly; Tran, Hai; Chernikova, Tatyana N; Golyshina, Olga V; Yakimov, Michail M; Golyshin, Peter N; Yakunin, Alexander F

    2015-01-01

    This chapter discusses metagenomics and its application for enzyme discovery, with a focus on hydrolytic enzymes from marine metagenomic libraries. With less than one percent of culturable microorganisms in the environment, metagenomics, or the collective study of community genetics, has opened up a rich pool of uncharacterized metabolic pathways, enzymes, and adaptations. This great untapped pool of genes provides the particularly exciting potential to mine for new biochemical activities or novel enzymes with activities tailored to peculiar sets of environmental conditions. Metagenomes also represent a huge reservoir of novel enzymes for applications in biocatalysis, biofuels, and bioremediation. Here we present the results of enzyme discovery for four enzyme activities, of particular industrial or environmental interest, including esterase/lipase, glycosyl hydrolase, protease and dehalogenase. PMID:26621459

  15. Metagenomic Exploration of Viruses throughout the Indian Ocean

    PubMed Central

    Lorenzi, Hernan A.; Fadrosh, Douglas W.; Brami, Daniel; Thiagarajan, Mathangi; McCrow, John P.; Tovchigrechko, Andrey; Yooseph, Shibu; Venter, J. Craig

    2012-01-01

    The characterization of global marine microbial taxonomic and functional diversity is a primary goal of the Global Ocean Sampling Expedition. As part of this study, 19 water samples were collected aboard the Sorcerer II sailing vessel from the southern Indian Ocean in an effort to more thoroughly understand the lifestyle strategies of the microbial inhabitants of this ultra-oligotrophic region. No investigations of whole virioplankton assemblages have been conducted on waters collected from the Indian Ocean or across multiple size fractions thus far. Therefore, the goals of this study were to examine the effect of size fractionation on viral consortia structure and function and understand the diversity and functional potential of the Indian Ocean virome. Five samples were selected for comprehensive metagenomic exploration; and sequencing was performed on the microbes captured on 3.0-, 0.8- and 0.1 µm membrane filters as well as the viral fraction (<0.1 µm). Phylogenetic approaches were also used to identify predicted proteins of viral origin in the larger fractions of data from all Indian Ocean samples, which were included in subsequent metagenomic analyses. Taxonomic profiling of viral sequences suggested that size fractionation of marine microbial communities enriches for specific groups of viruses within the different size classes and functional characterization further substantiated this observation. Functional analyses also revealed a relative enrichment for metabolic proteins of viral origin that potentially reflect the physiological condition of host cells in the Indian Ocean including those involved in nitrogen metabolism and oxidative phosphorylation. A novel classification method, MGTAXA, was used to assess virus-host relationships in the Indian Ocean by predicting the taxonomy of putative host genera, with Prochlorococcus, Acanthochlois and members of the SAR86 cluster comprising the most abundant predictions. This is the first study to holistically

  16. A metagenomic study of primate insect diet diversity.

    PubMed

    Pickett, Sarah B; Bergey, Christina M; Di Fiore, Anthony

    2012-07-01

    Descriptions of primate diets are generally based on either direct observation of foraging behavior, morphological classification of food remains from feces, or analysis of the stomach contents of deceased individuals. Some diet items (e.g. insect prey), however, are difficult to identify visually, and observation conditions often do not permit adequate quantitative sampling of feeding behavior. Moreover, the taxonomically informative morphology of some food species (e.g. swallowed seeds, insect exoskeletons) may be destroyed by the digestive process. Because of these limitations, we used a metagenomic approach to conduct a preliminary, "proof of concept" study of interspecific variation in the insect component of the diets of six sympatric New World monkeys known, based on observational field studies, to differ markedly in their feeding ecology. We used generalized arthropod polymerase chain reaction (PCR) primers and cloning to sequence mitochondrial DNA (mtDNA) sequences of the arthropod cytochrome b (CYT B) gene from fecal samples of wild woolly, titi, saki, capuchin, squirrel, and spider monkeys collected from a single sampling site in western Amazonia where these genera occur sympatrically. We then assigned preliminary taxonomic identifications to the sequences by basic local alignment search tool (BLAST) comparison to arthropod CYT B sequences present in GenBank. This study is the first to use molecular techniques to identify insect prey in primate diets. The results suggest that a metagenomic approach may prove valuable in augmenting and corroborating observational data and increasing the resolution of primate diet studies, although the lack of comparative reference sequences for many South American insects limits the approach at present. As such reference data become available for more animal and plant taxa, this approach also holds promise for studying additional components of primate diets. PMID:22553123

  17. Metagenomic analysis of a stable trichloroethene-degrading microbial community

    PubMed Central

    Brisson, Vanessa L; West, Kimberlee A; Lee, Patrick KH; Tringe, Susannah G; Brodie, Eoin L; Alvarez-Cohen, Lisa

    2012-01-01

    Dehalococcoides bacteria are the only organisms known to completely reduce chlorinated ethenes to the harmless product ethene. However, Dehalococcoides dechlorinate these chemicals more effectively and grow more robustly in mixed microbial communities than in isolation. In this study, the phylogenetic composition and gene content of a functionally stable trichloroethene-degrading microbial community was examined using metagenomic sequencing and analysis. For phylogenetic classification, contiguous sequences (contigs) longer than 2500 bp were grouped into classes according to tetranucleotide frequencies and assigned to taxa based on rRNA genes and other phylogenetic marker genes. Classes were identified for Clostridiaceae, Dehalococcoides, Desulfovibrio, Methanobacterium, Methanospirillum, as well as a Spirochete, a Synergistete, and an unknown Deltaproteobacterium. Dehalococcoides contigs were also identified based on sequence similarity to previously sequenced genomes, allowing the identification of 170 kb on contigs shorter than 2500 bp. Examination of metagenome sequences affiliated with Dehalococcoides revealed 406 genes not found in previously sequenced Dehalococcoides genomes, including 9 cobalamin biosynthesis genes related to corrin ring synthesis. This is the first time that a Dehalococcoides strain has been found to possess genes for synthesizing this cofactor critical to reductive dechlorination. Besides Dehalococcoides, several other members of this community appear to have genes for complete or near-complete cobalamin biosynthesis pathways. In all, 17 genes for putative reductive dehalogenases were identified, including 11 novel ones, all associated with Dehalococcoides. Genes for hydrogenase components (271 in total) were widespread, highlighting the importance of hydrogen metabolism in this community. PhyloChip analysis confirmed the stability of this microbial community. PMID:22378537

  18. Identification of chicken-specific fecal microbial sequences using a metagenomic approach.

    PubMed

    Lu, Jingrang; Santo Domingo, Jorge; Shanks, Orin C

    2007-08-01

    In this study, we applied a genome fragment enrichment (GFE) method to select for genomic regions that differ among different fecal metagenomes. Competitive DNA hybridizations were performed between chicken fecal DNA and pig fecal DNA (CP) and between chicken fecal DNA and an avian DNA composite consisting of turkey, goose, and seagull fecal DNA extracts (CB) to enrich for chicken-specific DNA fragments. A total of 471 non-redundant chicken metagenomic sequences were retrieved and analyzed. All of the clone sequences were similar to prokaryotic genes, of which more than 60% could not be assigned to previously characterized functional roles. In general terms, sequences assigned characterized functional roles were associated with cellular processes (11.7%), metabolism (11.0%) and information storage and processing (13.4%). Approximately 53% of the non-redundant sequences are similar to genes present in intestinal bacteria belonging to Clostridia (20.9%), Bacteroidetes (15.0%), and Bacilli (17.3%). Twenty-five sequences from the CP and CB clone libraries were selected to develop chicken fecal-specific PCR assays. These assays were challenged against fecal DNA extracted from 21 different animal species, including mammals and birds. The results from the host-specificity studies showed that 12 of the assays had a high degree of specificity to chicken feces. In addition, three assays were specific to chicken and turkey while another four assays tested positive to more than two avian species, suggesting a broader distribution of some of the enriched gene fragments among different avian fecal microbial communities. Fecal pollution signals were detected using chicken-specific assays in contaminated water samples, although the PCR assays showed different detection limits. These results indicate the need for multiple assays to detect poultry fecal sources of pollution. The competitive DNA hybridization approach used in this study can rapidly select for numerous chicken fecal

  19. Unravelling core microbial metabolisms in the hypersaline microbial mats of Shark Bay using high-throughput metagenomics.

    PubMed

    Ruvindy, Rendy; White, Richard Allen; Neilan, Brett Anthony; Burns, Brendan Paul

    2016-01-01

    Modern microbial mats are potential analogues of some of Earth's earliest ecosystems. Excellent examples can be found in Shark Bay, Australia, with mats of various morphologies. To further our understanding of the functional genetic potential of these complex microbial ecosystems, we conducted for the first time shotgun metagenomic analyses. We assembled metagenomic next-generation sequencing data to classify the taxonomic and metabolic potential across diverse morphologies of marine mats in Shark Bay. The microbial community across taxonomic classifications using protein-coding and small subunit rRNA genes directly extracted from the metagenomes suggests that three phyla Proteobacteria, Cyanobacteria and Bacteriodetes dominate all marine mats. However, the microbial community structure between Shark Bay and Highbourne Cay (Bahamas) marine systems appears to be distinct from each other. The metabolic potential (based on SEED subsystem classifications) of the Shark Bay and Highbourne Cay microbial communities were also distinct. Shark Bay metagenomes have a metabolic pathway profile consisting of both heterotrophic and photosynthetic pathways, whereas Highbourne Cay appears to be dominated almost exclusively by photosynthetic pathways. Alternative non-rubisco-based carbon metabolism including reductive TCA cycle and 3-hydroxypropionate/4-hydroxybutyrate pathways is highly represented in Shark Bay metagenomes while not represented in Highbourne Cay microbial mats or any other mat forming ecosystems investigated to date. Potentially novel aspects of nitrogen cycling were also observed, as well as putative heavy metal cycling (arsenic, mercury, copper and cadmium). Finally, archaea are highly represented in Shark Bay and may have critical roles in overall ecosystem function in these modern microbial mats. PMID:26023869

  20. Physiological and evolutionary potential of microorganisms from the Canterbury Basin subseafloor, a metagenomic approach.

    PubMed

    Gaboyer, Frédéric; Burgaud, Gaëtan; Alain, Karine

    2015-05-01

    Subseafloor sediments represent a large reservoir of organic matter and are inhabited by microbial groups of the three domains of life. Besides impacting the planetary geochemical cycles, the subsurface biosphere remains poorly understood, notably questions related to possible metabolic pathways and selective advantages that may be deployed by buried microorganisms (sporulation, response to stress, dormancy). In order to better understand physiological potentials and possible lifestyles of subseafloor microbial communities, we analyzed two metagenomes from subseafloor sediments collected at 31 mbsf (meters below the sea floor) and 136 mbsf in the Canterbury Basin. Metagenomic phylogenetic and functional diversities were very similar. Phylogenetic diversity was mostly represented by Chloroflexi, Firmicutes and Proteobacteria for Bacteria and by Thaumarchaeota and Euryarchaeota for Archaea. Predicted anaerobic metabolisms encompassed fermentation, methanogenesis and utilization of fatty acids, aromatic and halogenated substrates. Potential processes that may confer selective advantages for subsurface microorganisms included sporulation, detoxication equipment or osmolyte accumulation. Annotation of genomic fragments described the metabolic versatility of Chloroflexi, Miscellaneous Crenarchaeotic Group and Euryarchaeota and showed frequent recombination events within subsurface taxa. This study confirmed that the subseafloor habitat is unique compared to other habitats at the (meta)-genomic level and described physiological potential of still uncultured groups. PMID:25873465

  1. Multisubstrate Isotope Labeling and Metagenomic Analysis of Active Soil Bacterial Communities

    PubMed Central

    Verastegui, Y.; Cheng, J.; Engel, K.; Kolczynski, D.; Mortimer, S.; Lavigne, J.; Montalibet, J.; Romantsov, T.; Hall, M.; McConkey, B. J.; Rose, D. R.; Tomashek, J. J.; Scott, B. R.

    2014-01-01

    ABSTRACT Soil microbial diversity represents the largest global reservoir of novel microorganisms and enzymes. In this study, we coupled functional metagenomics and DNA stable-isotope probing (DNA-SIP) using multiple plant-derived carbon substrates and diverse soils to characterize active soil bacterial communities and their glycoside hydrolase genes, which have value for industrial applications. We incubated samples from three disparate Canadian soils (tundra, temperate rainforest, and agricultural) with five native carbon (12C) or stable-isotope-labeled (13C) carbohydrates (glucose, cellobiose, xylose, arabinose, and cellulose). Indicator species analysis revealed high specificity and fidelity for many uncultured and unclassified bacterial taxa in the heavy DNA for all soils and substrates. Among characterized taxa, Actinomycetales (Salinibacterium), Rhizobiales (Devosia), Rhodospirillales (Telmatospirillum), and Caulobacterales (Phenylobacterium and Asticcacaulis) were bacterial indicator species for the heavy substrates and soils tested. Both Actinomycetales and Caulobacterales (Phenylobacterium) were associated with metabolism of cellulose, and Alphaproteobacteria were associated with the metabolism of arabinose; members of the order Rhizobiales were strongly associated with the metabolism of xylose. Annotated metagenomic data suggested diverse glycoside hydrolase gene representation within the pooled heavy DNA. By screening 2,876 cloned fragments derived from the 13C-labeled DNA isolated from soils incubated with cellulose, we demonstrate the power of combining DNA-SIP, multiple-displacement amplification (MDA), and functional metagenomics by efficiently isolating multiple clones with activity on carboxymethyl cellulose and fluorogenic proxy substrates for carbohydrate-active enzymes. PMID:25028422

  2. The Metagenome of Utricularia gibba's Traps: Into the Microbial Input to a Carnivorous Plant.

    PubMed

    Alcaraz, Luis David; Martínez-Sánchez, Shamayim; Torres, Ignacio; Ibarra-Laclette, Enrique; Herrera-Estrella, Luis

    2016-01-01

    The genome and transcriptome sequences of the aquatic, rootless, and carnivorous plant Utricularia gibba L. (Lentibulariaceae), were recently determined. Traps are necessary for U. gibba because they help the plant to survive in nutrient-deprived environments. The U. gibba's traps (Ugt) are specialized structures that have been proposed to selectively filter microbial inhabitants. To determine whether the traps indeed have a microbiome that differs, in composition or abundance, from the microbiome in the surrounding environment, we used whole-genome shotgun (WGS) metagenomics to describe both the taxonomic and functional diversity of the Ugt microbiome. We collected U. gibba plants from their natural habitat and directly sequenced the metagenome of the Ugt microbiome and its surrounding water. The total predicted number of species in the Ugt was more than 1,100. Using pan-genome fragment recruitment analysis, we were able to identify to the species level of some key Ugt players, such as Pseudomonas monteilii. Functional analysis of the Ugt metagenome suggests that the trap microbiome plays an important role in nutrient scavenging and assimilation while complementing the hydrolytic functions of the plant. PMID:26859489

  3. Novel resistance functions uncovered using functional metagenomic investigations of resistance reservoirs

    PubMed Central

    Pehrsson, Erica C.; Forsberg, Kevin J.; Gibson, Molly K.; Ahmadi, Sara; Dantas, Gautam

    2013-01-01

    Rates of infection with antibiotic-resistant bacteria have increased precipitously over the past several decades, with far-reaching healthcare and societal costs. Recent evidence has established a link between antibiotic resistance genes in human pathogens and those found in non-pathogenic, commensal, and environmental organisms, prompting deeper investigation of natural and human-associated reservoirs of antibiotic resistance. Functional metagenomic selections, in which shotgun-cloned DNA fragments are selected for their ability to confer survival to an indicator host, have been increasingly applied to the characterization of many antibiotic resistance reservoirs. These experiments have demonstrated that antibiotic resistance genes are highly diverse and widely distributed, many times bearing little to no similarity to known sequences. Through unbiased selections for survival to antibiotic exposure, functional metagenomics can improve annotations by reducing the discovery of false-positive resistance and by allowing for the identification of previously unrecognizable resistance genes. In this review, we summarize the novel resistance functions uncovered using functional metagenomic investigations of natural and human-impacted resistance reservoirs. Examples of novel antibiotic resistance genes include those highly divergent from known sequences, those for which sequence is entirely unable to predict resistance function, bifunctional resistance genes, and those with unconventional, atypical resistance mechanisms. Overcoming antibiotic resistance in the clinic will require a better understanding of existing resistance reservoirs and the dissemination networks that govern horizontal gene exchange, informing best practices to limit the spread of resistance-conferring genes to human pathogens. PMID:23760651

  4. Construction and validation of metagenomic DNA libraries from landfarm soil microorganisms.

    PubMed

    Pessoa, T B A; de Souza, S S; Cerqueira, A F; Rezende, R P; Pirovani, C P; Dias, J C T

    2013-01-01

    Landfarming biodegradation is a strategy used by the petrochemical industry to reduce pollutants in petroleum-contaminated soil. We constructed 2 metagenomic libraries from landfarming soil in order to determine the pathway used for mineralization of benzene and to examine protein expression of the bacteria in these soils. The DNA of landfarm soil, collected from Ilhéus, BA, Brazil, was extracted and a metagenomic library was constructed with the Copy Control(TM) Fosmid Library Production Kit, which clones 25-45-kb DNA fragments. The clones were selected for their ability to express enzymes capable of cleaving aromatic compounds. These clones were grown in Luria-Bertani broth plus L-arabinose, benzene, and chloramphenicol as induction substances; they were tested for activity in the catechol cleavage pathway, an intermediate step in benzene degradation. Nine clones were positive for ortho-cleavage and one was positive for meta-cleavage. Protein band patterns determined by SDS-polyacrylamide gel electrophoresis differed in bacteria grown on induced versus non-induced media (Luria-Bertani broth). We concluded that the DNA of landfarm soil is an important source of genes involved in mineralization of xenobiotic compounds, which are common in gasoline and oil spills. Metagenomic library allows identification of non-culturable microorganisms that have potential in the bioremediation of contaminated sites. PMID:23913392

  5. The Metagenome of Utricularia gibba's Traps: Into the Microbial Input to a Carnivorous Plant

    PubMed Central

    Alcaraz, Luis David; Martínez-Sánchez, Shamayim; Torres, Ignacio; Ibarra-Laclette, Enrique; Herrera-Estrella, Luis

    2016-01-01

    The genome and transcriptome sequences of the aquatic, rootless, and carnivorous plant Utricularia gibba L. (Lentibulariaceae), were recently determined. Traps are necessary for U. gibba because they help the plant to survive in nutrient-deprived environments. The U. gibba's traps (Ugt) are specialized structures that have been proposed to selectively filter microbial inhabitants. To determine whether the traps indeed have a microbiome that differs, in composition or abundance, from the microbiome in the surrounding environment, we used whole-genome shotgun (WGS) metagenomics to describe both the taxonomic and functional diversity of the Ugt microbiome. We collected U. gibba plants from their natural habitat and directly sequenced the metagenome of the Ugt microbiome and its surrounding water. The total predicted number of species in the Ugt was more than 1,100. Using pan-genome fragment recruitment analysis, we were able to identify to the species level of some key Ugt players, such as Pseudomonas monteilii. Functional analysis of the Ugt metagenome suggests that the trap microbiome plays an important role in nutrient scavenging and assimilation while complementing the hydrolytic functions of the plant. PMID:26859489

  6. Towards a more complete metagenomics toolkit

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The emerging scientific discipline of metagenomics has not only created a myriad of opportunities for biologists to reveal new insights into the microbial underpinnings of our environment, but has also presented a number of interesting challenges for bioinformatics algorithms and software developers...

  7. [Metagenomic studies and infectious diseases diagnostics].

    PubMed

    Alekseeva, A E; Brusnigina, N F

    2015-01-01

    Principles of mass parallel sequencing, otherwise called next generation sequencing (NGS), appeared at the beginning of 2000s and were realized in dozens of NGS platforms. High performance and sequencing speed of NGS platforms opened wide horizons for scientists in the field of genomic studies, including metagenomic, first of all related to studies of structure of various microbiocenoses. Dozens of studies dedicated to studies of microbiome and virome of various biotopes of humans in normal state and pathology by using NGS platforms have appeared, forming novel conceptions on pathogenesis and epidemiology ofvarious infectious diseases. Significant cost reduction of the analysis facilitates expansion of sphere of application for NGS technologies not only in the field of fundamental, but also applied microbiologic studies, including etiologic diagnostics of infectious diseases. Due to the increase of the number of cases of infectious diseases, that do not have a typical clinical presentation, use of metagenomic approach is of particular importance, allowing to carry out detection of a wide spectrum of causative agents of bacterial, viral and parasitic infections. Technologic features of mass parallel sequencing platform, main methods of metagenomic studies and bioinformatics approaches, used for the analysis of data obtained, are presented in the review. Studies on healthy human microbiome and in pathology are described; possibilities and perspectives of metagenomic approach application in diagnos- tics and system of epidemiologic control of infectious diseases are examined. PMID:26016350

  8. Assembly of viral genomes from metagenomes

    PubMed Central

    Smits, Saskia L.; Bodewes, Rogier; Ruiz-Gonzalez, Aritz; Baumgärtner, Wolfgang; Koopmans, Marion P.; Osterhaus, Albert D. M. E.; Schürch, Anita C.

    2014-01-01

    Viral infections remain a serious global health issue. Metagenomic approaches are increasingly used in the detection of novel viral pathogens but also to generate complete genomes of uncultivated viruses. In silico identification of complete viral genomes from sequence data would allow rapid phylogenetic characterization of these new viruses. Often, however, complete viral genomes are not recovered, but rather several distinct contigs derived from a single entity are, some of which have no sequence homology to any known proteins. De novo assembly of single viruses from a metagenome is challenging, not only because of the lack of a reference genome, but also because of intrapopulation variation and uneven or insufficient coverage. Here we explored different assembly algorithms, remote homology searches, genome-specific sequence motifs, k-mer frequency ranking, and coverage profile binning to detect and obtain viral target genomes from metagenomes. All methods were tested on 454-generated sequencing datasets containing three recently described RNA viruses with a relatively large genome which were divergent to previously known viruses from the viral families Rhabdoviridae and Coronaviridae. Depending on specific characteristics of the target virus and the metagenomic community, different assembly and in silico gap closure strategies were successful in obtaining near complete viral genomes. PMID:25566226

  9. The metagenomic approach and causality in virology

    PubMed Central

    Castrignano, Silvana Beres; Nagasse-Sugahara, Teresa Keico

    2015-01-01

    Nowadays, the metagenomic approach has been a very important tool in the discovery of new viruses in environmental and biological samples. Here we discuss how these discoveries may help to elucidate the etiology of diseases and the criteria necessary to establish a causal association between a virus and a disease. PMID:25902566

  10. Applications of metagenomics for industrial bioproducts

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Recent progress in mining the rich genetic resource of non-culturable microbes has led to the discovery of new genes, enzymes, and natural products. The impact of metagenomics is witnessed in the development of commodity and fine chemicals, agrochemicals and pharmaceuticals where the benefit of enz...

  11. Biomolecular and metagenomic analyses of biofouling communities

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Despite the decades of research that have focused on understanding the formation of biofouling communities, relatively little is known about the soft fouling consortia that are responsible for their formation and function. In this study, we used PhyloChip microbial profiling, metagenomic DNA sequenc...

  12. Microbial Diversity and Biochemical Potential Encoded by Thermal Spring Metagenomes Derived from the Kamchatka Peninsula

    PubMed Central

    Wemheuer, Bernd; Taube, Robert; Akyol, Pinar; Wemheuer, Franziska; Daniel, Rolf

    2013-01-01

    Volcanic regions contain a variety of environments suitable for extremophiles. This study was focused on assessing and exploiting the prokaryotic diversity of two microbial communities derived from different Kamchatkian thermal springs by metagenomic approaches. Samples were taken from a thermoacidophilic spring near the Mutnovsky Volcano and from a thermophilic spring in the Uzon Caldera. Environmental DNA for metagenomic analysis was isolated from collected sediment samples by direct cell lysis. The prokaryotic community composition was examined by analysis of archaeal and bacterial 16S rRNA genes. A total number of 1235 16S rRNA gene sequences were obtained and used for taxonomic classification. Most abundant in the samples were members of Thaumarchaeota, Thermotogae, and Proteobacteria. The Mutnovsky hot spring was dominated by the Terrestrial Hot Spring Group, Kosmotoga, and Acidithiobacillus. The Uzon Caldera was dominated by uncultured members of the Miscellaneous Crenarchaeotic Group and Enterobacteriaceae. The remaining 16S rRNA gene sequences belonged to the Aquificae, Dictyoglomi, Euryarchaeota, Korarchaeota, Thermodesulfobacteria, Firmicutes, and some potential new phyla. In addition, the recovered DNA was used for generation of metagenomic libraries, which were subsequently mined for genes encoding lipolytic and proteolytic enzymes. Three novel genes conferring lipolytic and one gene conferring proteolytic activity were identified. PMID:23533327

  13. Identification of a new lipase family in the Brazilian Atlantic Forest soil metagenome.

    PubMed

    Faoro, Helisson; Glogauer, Arnaldo; Souza, Emanuel M; Rigo, Liu U; Cruz, Leonardo M; Monteiro, Rose A; Pedrosa, Fábio O

    2011-12-01

    Lipases are the most investigated class of enzymes in metagenomics. Phylogenetic classification of bacterial lipases comprises eight families. Here we describe the construction and screening of three metagenomic libraries from Brazilian Atlantic Forest soil and identification of a new lipase family. The metagenomic libraries, MAF1, MAF2 and MAF3, contained 34 560, 29 280 and 36 288 clones respectively. Lipase screening on triolein-rhodamine B plates resulted in one positive clone, Lip018. The DNA insert of Lip018 was fully sequenced and 20 ORFs were identified by comparison against the GenBank. Transposon mutagenesis revealed that ORF15, similar to serine peptidases, and ORF16, a hypothetical protein, were both required for lipase activity. ORF16 has a typical lipase conserved pentapeptide G-X-S-X-G and the comparison against the Pfam database showed that ORF16 belongs to family 5 of αβ-hydrolase. Phylogenetic analyses indicated that ORF16, together with other related proteins, may be a member of a new lipase family, named LipAP, activated by a putative serine protease. Partial characterization of ORF16 lipase showed that the enzyme has activity against a broad range of p-nitrophenyl esters, but only after activation by the predicted peptidase ORF15. PMID:23761366

  14. Comparative metagenomics of microbial communities inhabiting deep-sea hydrothermal vent chimneys with contrasting chemistries

    PubMed Central

    Xie, Wei; Wang, Fengping; Guo, Lei; Chen, Zeling; Sievert, Stefan M; Meng, Jun; Huang, Guangrui; Li, Yuxin; Yan, Qingyu; Wu, Shan; Wang, Xin; Chen, Shangwu; He, Guangyuan; Xiao, Xiang; Xu, Anlong

    2011-01-01

    Deep-sea hydrothermal vent chimneys harbor a high diversity of largely unknown microorganisms. Although the phylogenetic diversity of these microorganisms has been described previously, the adaptation and metabolic potential of the microbial communities is only beginning to be revealed. A pyrosequencing approach was used to directly obtain sequences from a fosmid library constructed from a black smoker chimney 4143-1 in the Mothra hydrothermal vent field at the Juan de Fuca Ridge. A total of 308 034 reads with an average sequence length of 227 bp were generated. Comparative genomic analyses of metagenomes from a variety of environments by two-way clustering of samples and functional gene categories demonstrated that the 4143-1 metagenome clustered most closely with that from a carbonate chimney from Lost City. Both are highly enriched in genes for mismatch repair and homologous recombination, suggesting that the microbial communities have evolved extensive DNA repair systems to cope with the extreme conditions that have potential deleterious effects on the genomes. As previously reported for the Lost City microbiome, the metagenome of chimney 4143-1 exhibited a high proportion of transposases, implying that horizontal gene transfer may be a common occurrence in the deep-sea vent chimney biosphere. In addition, genes for chemotaxis and flagellar assembly were highly enriched in the chimney metagenomes, reflecting the adaptation of the organisms to the highly dynamic conditions present within the chimney walls. Reconstruction of the metabolic pathways revealed that the microbial community in the wall of chimney 4143-1 was mainly fueled by sulfur oxidation, putatively coupled to nitrate reduction to perform inorganic carbon fixation through the Calvin–Benson–Bassham cycle. On the basis of the genomic organization of the key genes of the carbon fixation and sulfur oxidation pathways contained in the large genomic fragments, both obligate and facultative

  15. Metagenomic Analysis Suggests Modern Freshwater Microbialites Harbor a Distinct Core Microbial Community.

    PubMed

    White, Richard Allen; Chan, Amy M; Gavelis, Gregory S; Leander, Brian S; Brady, Allyson L; Slater, Gregory F; Lim, Darlene S S; Suttle, Curtis A

    2015-01-01

    Modern microbialites are complex microbial communities that interface with abiotic factors to form carbonate-rich organosedimentary structures whose ancestors provide the earliest evidence of life. Past studies primarily on marine microbialites have inventoried diverse taxa and metabolic pathways, but it is unclear which of these are members of the microbialite community and which are introduced from adjacent environments. Here we control for these factors by sampling the surrounding water and nearby sediment, in addition to the microbialites and use a metagenomics approach to interrogate the microbial community. Our findings suggest that the Pavilion Lake microbialite community profile, metabolic potential and pathway distributions are distinct from those in the neighboring sediments and water. Based on RefSeq classification, members of the Proteobacteria (e.g., alpha and delta classes) were the dominant taxa in the microbialites, and possessed novel functional guilds associated with the metabolism of heavy metals, antibiotic resistance, primary alcohol biosynthesis and urea metabolism; the latter may help drive biomineralization. Urea metabolism within Pavilion Lake microbialites is a feature not previously associated in other microbialites. The microbialite communities were also significantly enriched for cyanobacteria and acidobacteria, which likely play an important role in biomineralization. Additional findings suggest that Pavilion Lake microbialites are under viral selection as genes associated with viral infection (e.g CRISPR-Cas, phage shock and phage excision) are abundant within the microbialite metagenomes. The morphology of Pavilion Lake microbialites changes dramatically with depth; yet, metagenomic data did not vary significantly by morphology or depth, indicating that microbialite morphology is altered by other factors, perhaps transcriptional differences or abiotic conditions. This work provides a comprehensive metagenomic perspective of the

  16. Metagenomic Analysis Suggests Modern Freshwater Microbialites Harbor a Distinct Core Microbial Community

    PubMed Central

    White, Richard Allen; Chan, Amy M.; Gavelis, Gregory S.; Leander, Brian S.; Brady, Allyson L.; Slater, Gregory F.; Lim, Darlene S. S.; Suttle, Curtis A.

    2016-01-01

    Modern microbialites are complex microbial communities that interface with abiotic factors to form carbonate-rich organosedimentary structures whose ancestors provide the earliest evidence of life. Past studies primarily on marine microbialites have inventoried diverse taxa and metabolic pathways, but it is unclear which of these are members of the microbialite community and which are introduced from adjacent environments. Here we control for these factors by sampling the surrounding water and nearby sediment, in addition to the microbialites and use a metagenomics approach to interrogate the microbial community. Our findings suggest that the Pavilion Lake microbialite community profile, metabolic potential and pathway distributions are distinct from those in the neighboring sediments and water. Based on RefSeq classification, members of the Proteobacteria (e.g., alpha and delta classes) were the dominant taxa in the microbialites, and possessed novel functional guilds associated with the metabolism of heavy metals, antibiotic resistance, primary alcohol biosynthesis and urea metabolism; the latter may help drive biomineralization. Urea metabolism within Pavilion Lake microbialites is a feature not previously associated in other microbialites. The microbialite communities were also significantly enriched for cyanobacteria and acidobacteria, which likely play an important role in biomineralization. Additional findings suggest that Pavilion Lake microbialites are under viral selection as genes associated with viral infection (e.g CRISPR-Cas, phage shock and phage excision) are abundant within the microbialite metagenomes. The morphology of Pavilion Lake microbialites changes dramatically with depth; yet, metagenomic data did not vary significantly by morphology or depth, indicating that microbialite morphology is altered by other factors, perhaps transcriptional differences or abiotic conditions. This work provides a comprehensive metagenomic perspective of the

  17. ProViDE: A software tool for accurate estimation of viral diversity in metagenomic samples

    PubMed Central

    Ghosh, Tarini Shankar; Mohammed, Monzoorul Haque; Komanduri, Dinakar; Mande, Sharmila Shekhar

    2011-01-01

    Given the absence of universal marker genes in the viral kingdom, researchers typically use BLAST (with stringent E-values) for taxonomic classification of viral metagenomic sequences. Since majority of metagenomic sequences originate from hitherto unknown viral groups, using stringent e-values results in most sequences remaining unclassified. Furthermore, using less stringent e-values results in a high number of incorrect taxonomic assignments. The SOrt-ITEMS algorithm provides an approach to address the above issues. Based on alignment parameters, SOrt-ITEMS follows an elaborate work-flow for assigning reads originating from hitherto unknown archaeal/bacterial genomes. In SOrt-ITEMS, alignment parameter thresholds were generated by observing patterns of sequence divergence within and across various taxonomic groups belonging to bacterial and archaeal kingdoms. However, many taxonomic groups within the viral kingdom lack a typical Linnean-like taxonomic hierarchy. In this paper, we present ProViDE (Program for Viral Diversity Estimation), an algorithm that uses a customized set of alignment parameter thresholds, specifically suited for viral metagenomic sequences. These thresholds capture the pattern of sequence divergence and the non-uniform taxonomic hierarchy observed within/across various taxonomic groups of the viral kingdom. Validation results indicate that the percentage of ‘correct’ assignments by ProViDE is around 1.7 to 3 times higher than that by the widely used similarity based method MEGAN. The misclassification rate of ProViDE is around 3 to 19% (as compared to 5 to 42% by MEGAN) indicating significantly better assignment accuracy. ProViDE software and a supplementary file (containing supplementary figures and tables referred to in this article) is available for download from http://metagenomics.atc.tcs.com/binning/ProViDE/ PMID:21544173

  18. A function-based screen for seeking RubisCO active clones from metagenomes: novel enzymes influencing RubisCO activity

    PubMed Central

    Böhnke, Stefanie; Perner, Mirjam

    2015-01-01

    Ribulose-1,5-bisphosphate carboxylase/oxygenase (RubisCO) is a key enzyme of the Calvin cycle, which is responsible for most of Earth's primary production. Although research on RubisCO genes and enzymes in plants, cyanobacteria and bacteria has been ongoing for years, still little is understood about its regulation and activation in bacteria. Even more so, hardly any information exists about the function of metagenomic RubisCOs and the role of the enzymes encoded on the flanking DNA owing to the lack of available function-based screens for seeking active RubisCOs from the environment. Here we present the first solely activity-based approach for identifying RubisCO active fosmid clones from a metagenomic library. We constructed a metagenomic library from hydrothermal vent fluids and screened 1056 fosmid clones. Twelve clones exhibited RubisCO activity and the metagenomic fragments resembled genes from Thiomicrospira crunogena. One of these clones was further analyzed. It contained a 35.2 kb metagenomic insert carrying the RubisCO gene cluster and flanking DNA regions. Knockouts of twelve genes and two intergenic regions on this metagenomic fragment demonstrated that the RubisCO activity was significantly impaired and was attributed to deletions in genes encoding putative transcriptional regulators and those believed to be vital for RubisCO activation. Our new technique revealed a novel link between a poorly characterized gene and RubisCO activity. This screen opens the door to directly investigating RubisCO genes and respective enzymes from environmental samples. PMID:25203835

  19. Metagenomic studies of the Red Sea.

    PubMed

    Behzad, Hayedeh; Ibarra, Martin Augusto; Mineta, Katsuhiko; Gojobori, Takashi

    2016-02-01

    Metagenomics has significantly advanced the field of marine microbial ecology, revealing the vast diversity of previously unknown microbial life forms in different marine niches. The tremendous amount of data generated has enabled identification of a large number of microbial genes (metagenomes), their community interactions, adaptation mechanisms, and their potential applications in pharmaceutical and biotechnology-based industries. Comparative metagenomics reveals that microbial diversity is a function of the local environment, meaning that unique or unusual environments typically harbor novel microbial species with unique genes and metabolic pathways. The Red Sea has an abundance of unique characteristics; however, its microbiota is one of the least studied among marine environments. The Red Sea harbors approximately 25 hot anoxic brine pools, plus a vibrant coral reef ecosystem. Physiochemical studies describe the Red Sea as an oligotrophic environment that contains one of the warmest and saltiest waters in the world with year-round high UV radiations. These characteristics are believed to have shaped the evolution of microbial communities in the Red Sea. Over-representation of genes involved in DNA repair, high-intensity light responses, and osmoregulation were found in the Red Sea metagenomic databases suggesting acquisition of specific environmental adaptation by the Red Sea microbiota. The Red Sea brine pools harbor a diverse range of halophilic and thermophilic bacterial and archaeal communities, which are potential sources of enzymes for pharmaceutical and biotechnology-based application. Understanding the mechanisms of these adaptations and their function within the larger ecosystem could also prove useful in light of predicted global warming scenarios where global ocean temperatures are expected to rise by 1-3°C in the next few decades. In this review, we provide an overview of the published metagenomic studies that were conducted in the Red Sea, and

  20. Current and future trends in metagenomics : Development of knowledge bases

    NASA Astrophysics Data System (ADS)

    Mori, Hiroshi; Yamada, Takuji; Kurokawa, Ken

    Microbes are essential for every part of life on Earth. Numerous microbes inhabit the biosphere, many of which are uncharacterized or uncultivable. They form a complex microbial community that deeply affects against surrounding environments. Metagenome analysis provides a radically new way of examining such complex microbial community without isolation or cultivation of individual bacterial community members. In this article, we present a brief discussion about a metagenomics and the development of knowledge bases, and also discuss about the future trends in metagenomics.

  1. An Experimental Metagenome Data Management and AnalysisSystem

    SciTech Connect

    Markowitz, Victor M.; Korzeniewski, Frank; Palaniappan, Krishna; Szeto, Ernest; Ivanova, Natalia N.; Kyrpides, Nikos C.; Hugenholtz, Philip

    2006-03-01

    The application of shotgun sequencing to environmental samples has revealed a new universe of microbial community genomes (metagenomes) involving previously uncultured organisms. Metagenome analysis, which is expected to provide a comprehensive picture of the gene functions and metabolic capacity of microbial community, needs to be conducted in the context of a comprehensive data management and analysis system. We present in this paper IMG/M, an experimental metagenome data management and analysis system that is based on the Integrated Microbial Genomes (IMG) system. IMG/M provides tools and viewers for analyzing both metagenomes and isolate genomes individually or in a comparative context.

  2. Interplay of metagenomics and in vitro compartmentalization

    PubMed Central

    Ferrer, Manuel; Beloqui, Ana; Vieites, José María; Guazzaroni, María Eugenia; Berger, Ilana; Aharoni, Amir

    2009-01-01

    Summary In recent years, the application of approaches for harvesting DNA from the environment, the so‐called, ‘metagenomic approaches’ has proven to be highly successful for the identification, isolation and generation of novel enzymes. Functional screening for the desired catalytic activity is one of the key steps in mining metagenomic libraries, as it does not rely on sequence homology. In this mini‐review, we survey high‐throughput screening tools, originally developed for directed evolution experiments, which can be readily adapted for the screening of large libraries. In particular, we focus on the use of in vitro compartmentalization (IVC) approaches to address potential advantages and problems the merger of culture‐independent and IVC techniques might bring on the mining of enzyme activities in microbial communities. PMID:21261880

  3. TagCleaner: Identification and removal of tag sequences from genomic and metagenomic datasets

    PubMed Central

    2010-01-01

    Background Sequencing metagenomes that were pre-amplified with primer-based methods requires the removal of the additional tag sequences from the datasets. The sequenced reads can contain deletions or insertions due to sequencing limitations, and the primer sequence may contain ambiguous bases. Furthermore, the tag sequence may be unavailable or incorrectly reported. Because of the potential for downstream inaccuracies introduced by unwanted sequence contaminations, it is important to use reliable tools for pre-processing sequence data. Results TagCleaner is a web application developed to automatically identify and remove known or unknown tag sequences allowing insertions and deletions in the dataset. TagCleaner is designed to filter the trimmed reads for duplicates, short reads, and reads with high rates of ambiguous sequences. An additional screening for and splitting of fragment-to-fragment concatenations that gave rise to artificial concatenated sequences can increase the quality of the dataset. Users may modify the different filter parameters according to their own preferences. Conclusions TagCleaner is a publicly available web application that is able to automatically detect and efficiently remove tag sequences from metagenomic datasets. It is easily configurable and provides a user-friendly interface. The interactive web interface facilitates export functionality for subsequent data processing, and is available at http://edwards.sdsu.edu/tagcleaner. PMID:20573248

  4. Metagenomic Assembly Reveals Hosts of Antibiotic Resistance Genes and the Shared Resistome in Pig, Chicken, and Human Feces.

    PubMed

    Ma, Liping; Xia, Yu; Li, Bing; Yang, Ying; Li, Li-Guan; Tiedje, James M; Zhang, Tong

    2016-01-01

    The risk associated with antibiotic resistance disseminating from animal and human feces is an urgent public issue. In the present study, we sought to establish a pipeline for annotating antibiotic resistance genes (ARGs) based on metagenomic assembly to investigate ARGs and their co-occurrence with associated genetic elements. Genetic elements found on the assembled genomic fragments include mobile genetic elements (MGEs) and metal resistance genes (MRGs). We then explored the hosts of these resistance genes and the shared resistome of pig, chicken and human fecal samples. High levels of tetracycline, multidrug, erythromycin, and aminoglycoside resistance genes were discovered in these fecal samples. In particular, significantly high level of ARGs (7762 ×/Gb) was detected in adult chicken feces, indicating higher ARG contamination level than other fecal samples. Many ARGs arrangements (e.g., macA-macB and tetA-tetR) were discovered shared by chicken, pig and human feces. In addition, MGEs such as the aadA5-dfrA17-carrying class 1 integron were identified on an assembled scaffold of chicken feces, and are carried by human pathogens. Differential coverage binning analysis revealed significant ARG enrichment in adult chicken feces. A draft genome, annotated as multidrug resistant Escherichia coli, was retrieved from chicken feces metagenomes and was determined to carry diverse ARGs (multidrug, acriflavine, and macrolide). The present study demonstrates the determination of ARG hosts and the shared resistome from metagenomic data sets and successfully establishes the relationship between ARGs, hosts, and environments. This ARG annotation pipeline based on metagenomic assembly will help to bridge the knowledge gaps regarding ARG-associated genes and ARG hosts with metagenomic data sets. Moreover, this pipeline will facilitate the evaluation of environmental risks in the genetic context of ARGs. PMID:26650334

  5. Generating viral metagenomes from the coral holobiont.

    PubMed

    Weynberg, Karen D; Wood-Charlson, Elisha M; Suttle, Curtis A; van Oppen, Madeleine J H

    2014-01-01

    Reef-building corals comprise multipartite symbioses where the cnidarian animal is host to an array of eukaryotic and prokaryotic organisms, and the viruses that infect them. These viruses are critical elements of the coral holobiont, serving not only as agents of mortality, but also as potential vectors for lateral gene flow, and as elements encoding a variety of auxiliary metabolic functions. Consequently, understanding the functioning and health of the coral holobiont requires detailed knowledge of the associated viral assemblage and its function. Currently, the most tractable way of uncovering viral diversity and function is through metagenomic approaches, which is inherently difficult in corals because of the complex holobiont community, an extracellular mucus layer that all corals secrete, and the variety of sizes and structures of nucleic acids found in viruses. Here we present the first protocol for isolating, purifying and amplifying viral nucleic acids from corals based on mechanical disruption of cells. This method produces at least 50% higher yields of viral nucleic acids, has very low levels of cellular sequence contamination and captures wider viral diversity than previously used chemical-based extraction methods. We demonstrate that our mechanical-based method profiles a greater diversity of DNA and RNA genomes, including virus groups such as Retro-transcribing and ssRNA viruses, which are absent from metagenomes generated via chemical-based methods. In addition, we briefly present (and make publically available) the first paired DNA and RNA viral metagenomes from the coral Acropora tenuis. PMID:24847321

  6. Bayesian mixture analysis for metagenomic community profiling

    PubMed Central

    Morfopoulou, Sofia; Plagnol, Vincent

    2015-01-01

    Motivation: Deep sequencing of clinical samples is now an established tool for the detection of infectious pathogens, with direct medical applications. The large amount of data generated produces an opportunity to detect species even at very low levels, provided that computational tools can effectively profile the relevant metagenomic communities. Data interpretation is complicated by the fact that short sequencing reads can match multiple organisms and by the lack of completeness of existing databases, in particular for viral pathogens. Here we present metaMix, a Bayesian mixture model framework for resolving complex metagenomic mixtures. We show that the use of parallel Monte Carlo Markov chains for the exploration of the species space enables the identification of the set of species most likely to contribute to the mixture. Results: We demonstrate the greater accuracy of metaMix compared with relevant methods, particularly for profiling complex communities consisting of several related species. We designed metaMix specifically for the analysis of deep transcriptome sequencing datasets, with a focus on viral pathogen detection; however, the principles are generally applicable to all types of metagenomic mixtures. Availability and implementation: metaMix is implemented as a user friendly R package, freely available on CRAN: http://cran.r-project.org/web/packages/metaMix Contact: sofia.morfopoulou.10@ucl.ac.uk Supplementary information: Supplementary data are available at Bionformatics online. PMID:26002885

  7. New Extremophilic Lipases and Esterases from Metagenomics

    PubMed Central

    López-López, Olalla; Cerdán, Maria E; González Siso, Maria I

    2014-01-01

    Lipolytic enzymes catalyze the hydrolysis of ester bonds in the presence of water. In media with low water content or in organic solvents, they can catalyze synthetic reactions such as esterification and transesterification. Lipases and esterases, in particular those from extremophilic origin, are robust enzymes, functional under the harsh conditions of industrial processes owing to their inherent thermostability and resistance towards organic solvents, which combined with their high chemo-, regio- and enantioselectivity make them very attractive biocatalysts for a variety of industrial applications. Likewise, enzymes from extremophile sources can provide additional features such as activity at extreme temperatures, extreme pH values or high salinity levels, which could be interesting for certain purposes. New lipases and esterases have traditionally been discovered by the isolation of microbial strains producing lipolytic activity. The Genome Projects Era allowed genome mining, exploiting homology with known lipases and esterases, to be used in the search for new enzymes. The Metagenomic Era meant a step forward in this field with the study of the metagenome, the pool of genomes in an environmental microbial community. Current molecular biology techniques make it possible to construct total environmental DNA libraries, including the genomes of unculturable organisms, opening a new window to a vast field of unknown enzymes with new and unique properties. Here, we review the latest advances and findings from research into new extremophilic lipases and esterases, using metagenomic approaches, and their potential industrial and biotechnological applications. PMID:24588890

  8. Generating viral metagenomes from the coral holobiont

    PubMed Central

    Wood-Charlson, Elisha M.; Suttle, Curtis A.; van Oppen, Madeleine J. H.

    2014-01-01

    Reef-building corals comprise multipartite symbioses where the cnidarian animal is host to an array of eukaryotic and prokaryotic organisms, and the viruses that infect them. These viruses are critical elements of the coral holobiont, serving not only as agents of mortality, but also as potential vectors for lateral gene flow, and as elements encoding a variety of auxiliary metabolic functions. Consequently, understanding the functioning and health of the coral holobiont requires detailed knowledge of the associated viral assemblage and its function. Currently, the most tractable way of uncovering viral diversity and function is through metagenomic approaches, which is inherently difficult in corals because of the complex holobiont community, an extracellular mucus layer that all corals secrete, and the variety of sizes and structures of nucleic acids found in viruses. Here we present the first protocol for isolating, purifying and amplifying viral nucleic acids from corals based on mechanical disruption of cells. This method produces at least 50% higher yields of viral nucleic acids, has very low levels of cellular sequence contamination and captures wider viral diversity than previously used chemical-based extraction methods. We demonstrate that our mechanical-based method profiles a greater diversity of DNA and RNA genomes, including virus groups such as Retro-transcribing and ssRNA viruses, which are absent from metagenomes generated via chemical-based methods. In addition, we briefly present (and make publically available) the first paired DNA and RNA viral metagenomes from the coral Acropora tenuis. PMID:24847321

  9. Antibiotic Resistome: Improving Detection and Quantification Accuracy for Comparative Metagenomics.

    PubMed

    Elbehery, Ali H A; Aziz, Ramy K; Siam, Rania

    2016-04-01

    The unprecedented rise of life-threatening antibiotic resistance (AR), combined with the unparalleled advances in DNA sequencing of genomes and metagenomes, has pushed the need for in silico detection of the resistance potential of clinical and environmental metagenomic samples through the quantification of AR genes (i.e., genes conferring antibiotic resistance). Therefore, determining an optimal methodology to quantitatively and accurately assess AR genes in a given environment is pivotal. Here, we optimized and improved existing AR detection methodologies from metagenomic datasets to properly consider AR-generating mutations in antibiotic target genes. Through comparative metagenomic analysis of previously published AR gene abundance in three publicly available metagenomes, we illustrate how mutation-generated resistance genes are either falsely assigned or neglected, which alters the detection and quantitation of the antibiotic resistome. In addition, we inspected factors influencing the outcome of AR gene quantification using metagenome simulation experiments, and identified that genome size, AR gene length, total number of metagenomics reads and selected sequencing platforms had pronounced effects on the level of detected AR. In conclusion, our proposed improvements in the current methodologies for accurate AR detection and resistome assessment show reliable results when tested on real and simulated metagenomic datasets. PMID:27031878

  10. Which Microbial Communities Are Present? Sequence-Based Metagenomics

    NASA Astrophysics Data System (ADS)

    Caffrey, Sean M.

    The use of metagenomic methods that directly sequence environmental samples has revealed the extraordinary microbial diversity missed by traditional culture-based methodologies. Therefore, to develop a complete and representative model of an environment's microbial community and activities, metagenomic analysis is an essential tool.

  11. Integrating alignment-based and alignment-free sequence similarity measures for biological sequence classification

    PubMed Central

    Borozan, Ivan; Watt, Stuart; Ferretti, Vincent

    2015-01-01

    Motivation: Alignment-based sequence similarity searches, while accurate for some type of sequences, can produce incorrect results when used on more divergent but functionally related sequences that have undergone the sequence rearrangements observed in many bacterial and viral genomes. Here, we propose a classification model that exploits the complementary nature of alignment-based and alignment-free similarity measures with the aim to improve the accuracy with which DNA and protein sequences are characterized. Results: Our model classifies sequences using a combined sequence similarity score calculated by adaptively weighting the contribution of different sequence similarity measures. Weights are determined independently for each sequence in the test set and reflect the discriminatory ability of individual similarity measures in the training set. Because the similarity between some sequences is determined more accurately with one type of measure rather than another, our classifier allows different sets of weights to be associated with different sequences. Using five different similarity measures, we show that our model significantly improves the classification accuracy over the current composition- and alignment-based models, when predicting the taxonomic lineage for both short viral sequence fragments and complete viral sequences. We also show that our model can be used effectively for the classification of reads from a real metagenome dataset as well as protein sequences. Availability and implementation: All the datasets and the code used in this study are freely available at https://collaborators.oicr.on.ca/vferretti/borozan_csss/csss.html. Contact: ivan.borozan@gmail.com Supplementary information: Supplementary data are available at Bioinformatics online. PMID:25573913

  12. Metagenomics: Retrospect and Prospects in High Throughput Age

    PubMed Central

    Kumar, Satish; Krishnani, Kishore Kumar; Bhushan, Bharat; Brahmane, Manoj Pandit

    2015-01-01

    In recent years, metagenomics has emerged as a powerful tool for mining of hidden microbial treasure in a culture independent manner. In the last two decades, metagenomics has been applied extensively to exploit concealed potential of microbial communities from almost all sorts of habitats. A brief historic progress made over the period is discussed in terms of origin of metagenomics to its current state and also the discovery of novel biological functions of commercial importance from metagenomes of diverse habitats. The present review also highlights the paradigm shift of metagenomics from basic study of community composition to insight into the microbial community dynamics for harnessing the full potential of uncultured microbes with more emphasis on the implication of breakthrough developments, namely, Next Generation Sequencing, advanced bioinformatics tools, and systems biology. PMID:26664751

  13. Metagenomics and Bioinformatics in Microbial Ecology: Current Status and Beyond

    PubMed Central

    Hiraoka, Satoshi; Yang, Ching-chia; Iwasaki, Wataru

    2016-01-01

    Metagenomic approaches are now commonly used in microbial ecology to study microbial communities in more detail, including many strains that cannot be cultivated in the laboratory. Bioinformatic analyses make it possible to mine huge metagenomic datasets and discover general patterns that govern microbial ecosystems. However, the findings of typical metagenomic and bioinformatic analyses still do not completely describe the ecology and evolution of microbes in their environments. Most analyses still depend on straightforward sequence similarity searches against reference databases. We herein review the current state of metagenomics and bioinformatics in microbial ecology and discuss future directions for the field. New techniques will allow us to go beyond routine analyses and broaden our knowledge of microbial ecosystems. We need to enrich reference databases, promote platforms that enable meta- or comprehensive analyses of diverse metagenomic datasets, devise methods that utilize long-read sequence information, and develop more powerful bioinformatic methods to analyze data from diverse perspectives. PMID:27383682

  14. Classification Options

    ERIC Educational Resources Information Center

    Exceptional Children, 1978

    1978-01-01

    The interview presents opinions of Nicholas Hobbs on the classification of exceptional children, including topics such as ecologically oriented classification systems, the role of parents, and need for revision of teacher preparation programs. (IM)

  15. Exploiting HPC Platforms for Metagenomics: Challenges and Opportunities (MICW - Metagenomics Informatics Challenges Workshop: 10K Genomes at a Time)

    ScienceCinema

    Canon, Shane [LBNL

    2013-01-22

    DOE JGI's Zhong Wang, chair of the High-performance Computing session, gives a brief introduction before Berkeley Lab's Shane Canon talks about "Exploiting HPC Platforms for Metagenomics: Challenges and Opportunities" at the Metagenomics Informatics Challenges Workshop held at the DOE JGI on October 12-13, 2011.

  16. Exploiting HPC Platforms for Metagenomics: Challenges and Opportunities (MICW - Metagenomics Informatics Challenges Workshop: 10K Genomes at a Time)

    SciTech Connect

    Canon, Shane

    2011-10-12

    DOE JGI's Zhong Wang, chair of the High-performance Computing session, gives a brief introduction before Berkeley Lab's Shane Canon talks about "Exploiting HPC Platforms for Metagenomics: Challenges and Opportunities" at the Metagenomics Informatics Challenges Workshop held at the DOE JGI on October 12-13, 2011.

  17. Joint Analysis of Multiple Metagenomic Samples

    PubMed Central

    Baran, Yael; Halperin, Eran

    2012-01-01

    The availability of metagenomic sequencing data, generated by sequencing DNA pooled from multiple microbes living jointly, has increased sharply in the last few years with developments in sequencing technology. Characterizing the contents of metagenomic samples is a challenging task, which has been extensively attempted by both supervised and unsupervised techniques, each with its own limitations. Common to practically all the methods is the processing of single samples only; when multiple samples are sequenced, each is analyzed separately and the results are combined. In this paper we propose to perform a combined analysis of a set of samples in order to obtain a better characterization of each of the samples, and provide two applications of this principle. First, we use an unsupervised probabilistic mixture model to infer hidden components shared across metagenomic samples. We incorporate the model in a novel framework for studying association of microbial sequence elements with phenotypes, analogous to the genome-wide association studies performed on human genomes: We demonstrate that stratification may result in false discoveries of such associations, and that the components inferred by the model can be used to correct for this stratification. Second, we propose a novel read clustering (also termed “binning”) algorithm which operates on multiple samples simultaneously, leveraging on the assumption that the different samples contain the same microbial species, possibly in different proportions. We show that integrating information across multiple samples yields more precise binning on each of the samples. Moreover, for both applications we demonstrate that given a fixed depth of coverage, the average per-sample performance generally increases with the number of sequenced samples as long as the per-sample coverage is high enough. PMID:22359490

  18. Short prokaryotic DNA fragment binning using a hierarchical classifier based on linear discriminant analysis and principal component analysis.

    PubMed

    Zheng, Hao; Wu, Hongwei

    2010-12-01

    Metagenomics is an emerging field in which the power of genomic analysis is applied to an entire microbial community, bypassing the need to isolate and culture individual microbial species. Assembling of metagenomic DNA fragments is very much like the overlap-layout-consensus procedure for assembling isolated genomes, but is augmented by an additional binning step to differentiate scaffolds, contigs and unassembled reads into various taxonomic groups. In this paper, we employed n-mer oligonucleotide frequencies as the features and developed a hierarchical classifier (PCAHIER) for binning short (≤ 1,000 bps) metagenomic fragments. The principal component analysis was used to reduce the high dimensionality of the feature space. The hierarchical classifier consists of four layers of local classifiers that are implemented based on the linear discriminant analysis. These local classifiers are responsible for binning prokaryotic DNA fragments into superkingdoms, of the same superkingdom into phyla, of the same phylum into genera, and of the same genus into species, respectively. We evaluated the performance of the PCAHIER by using our own simulated data sets as well as the widely used simHC synthetic metagenome data set from the IMG/M system. The effectiveness of the PCAHIER was demonstrated through comparisons against a non-hierarchical classifier, and two existing binning algorithms (TETRA and Phylopythia). PMID:21121023

  19. Metagenomic approaches to identifying infectious agents.

    PubMed

    Höper, D; Mettenleiter, T C; Beer, M

    2016-04-01

    Since the advent of next-generation sequencing (NGS) technologies, the untargeted screening of samples from outbreaks for pathogen identification using metagenomics has become technically and economically feasible. However, various aspects need to be considered in order to exploit the full potential of NGS for virus discovery. Here, the authors summarise those aspects of the main steps that have a significant impact, from sample selection through sample handling and processing, as well as sequencing and finally data analysis, with a special emphasis on existing pitfalls. PMID:27217170

  20. Streaming fragment assignment for real-time analysis of sequencing experiments

    PubMed Central

    Roberts, Adam; Pachter, Lior

    2013-01-01

    We present eXpress, a software package for highly efficient probabilistic assignment of ambiguously mapping sequenced fragments. eXpress uses a streaming algorithm with linear run time and constant memory use. It can determine abundances of sequenced molecules in real time, and can be applied to ChIP-seq, metagenomics and other large-scale sequencing data. We demonstrate its use on RNA-seq data, showing greater efficiency than other quantification methods. PMID:23160280

  1. Machine Learning Meta-analysis of Large Metagenomic Datasets: Tools and Biological Insights.

    PubMed

    Pasolli, Edoardo; Truong, Duy Tin; Malik, Faizan; Waldron, Levi; Segata, Nicola

    2016-07-01

    Shotgun metagenomic analysis of the human associated microbiome provides a rich set of microbial features for prediction and biomarker discovery in the context of human diseases and health conditions. However, the use of such high-resolution microbial features presents new challenges, and validated computational tools for learning tasks are lacking. Moreover, classification rules have scarcely been validated in independent studies, posing questions about the generality and generalization of disease-predictive models across cohorts. In this paper, we comprehensively assess approaches to metagenomics-based prediction tasks and for quantitative assessment of the strength of potential microbiome-phenotype associations. We develop a computational framework for prediction tasks using quantitative microbiome profiles, including species-level relative abundances and presence of strain-specific markers. A comprehensive meta-analysis, with particular emphasis on generalization across cohorts, was performed in a collection of 2424 publicly available metagenomic samples from eight large-scale studies. Cross-validation revealed good disease-prediction capabilities, which were in general improved by feature selection and use of strain-specific markers instead of species-level taxonomic abundance. In cross-study analysis, models transferred between studies were in some cases less accurate than models tested by within-study cross-validation. Interestingly, the addition of healthy (control) samples from other studies to training sets improved disease prediction capabilities. Some microbial species (most notably Streptococcus anginosus) seem to characterize general dysbiotic states of the microbiome rather than connections with a specific disease. Our results in modelling features of the "healthy" microbiome can be considered a first step toward defining general microbial dysbiosis. The software framework, microbiome profiles, and metadata for thousands of samples are publicly

  2. Machine Learning Meta-analysis of Large Metagenomic Datasets: Tools and Biological Insights

    PubMed Central

    Pasolli, Edoardo; Truong, Duy Tin; Malik, Faizan; Waldron, Levi

    2016-01-01

    Shotgun metagenomic analysis of the human associated microbiome provides a rich set of microbial features for prediction and biomarker discovery in the context of human diseases and health conditions. However, the use of such high-resolution microbial features presents new challenges, and validated computational tools for learning tasks are lacking. Moreover, classification rules have scarcely been validated in independent studies, posing questions about the generality and generalization of disease-predictive models across cohorts. In this paper, we comprehensively assess approaches to metagenomics-based prediction tasks and for quantitative assessment of the strength of potential microbiome-phenotype associations. We develop a computational framework for prediction tasks using quantitative microbiome profiles, including species-level relative abundances and presence of strain-specific markers. A comprehensive meta-analysis, with particular emphasis on generalization across cohorts, was performed in a collection of 2424 publicly available metagenomic samples from eight large-scale studies. Cross-validation revealed good disease-prediction capabilities, which were in general improved by feature selection and use of strain-specific markers instead of species-level taxonomic abundance. In cross-study analysis, models transferred between studies were in some cases less accurate than models tested by within-study cross-validation. Interestingly, the addition of healthy (control) samples from other studies to training sets improved disease prediction capabilities. Some microbial species (most notably Streptococcus anginosus) seem to characterize general dysbiotic states of the microbiome rather than connections with a specific disease. Our results in modelling features of the “healthy” microbiome can be considered a first step toward defining general microbial dysbiosis. The software framework, microbiome profiles, and metadata for thousands of samples are publicly

  3. MetaVelvet: An Extension of Velvet Assembler to de novo Metagenome Assembly from Short Sequence Reads (Metagenomics Informatics Challenges Workshop: 10K Genomes at a Time)

    SciTech Connect

    Sakakibara, Yasumbumi

    2011-10-13

    Keio University's Yasumbumi Sakakibara on "MetaVelvet: An Extension of Velvet Assembler to de novo Metagenome Assembly from Short Sequence Reads" at the Metagenomics Informatics Challenges Workshop held at the DOE JGI on October 12-13, 2011.

  4. MetaVelvet: An Extension of Velvet Assembler to de novo Metagenome Assembly from Short Sequence Reads (Metagenomics Informatics Challenges Workshop: 10K Genomes at a Time)

    ScienceCinema

    Sakakibara, Yasumbumi [Keio University

    2013-01-22

    Keio University's Yasumbumi Sakakibara on "MetaVelvet: An Extension of Velvet Assembler to de novo Metagenome Assembly from Short Sequence Reads" at the Metagenomics Informatics Challenges Workshop held at the DOE JGI on October 12-13, 2011.

  5. Effective Analysis of NGS Metagenomic Data with Ultra-Fast Clustering Algorithms (MICW - Metagenomics Informatics Challenges Workshop: 10K Genomes at a Time)

    ScienceCinema

    Li, Weizhong [San Diego Supercomputer Center

    2013-01-22

    San Diego Supercomputer Center's Weizhong Li on "Effective Analysis of NGS Metagenomic Data with Ultra-fast Clustering Algorithms" at the Metagenomics Informatics Challenges Workshop held at the DOE JGI on October 12-13, 2011.

  6. Evaluation of the Cow Rumen Metagenome; Assembly by Single Copy Gene Analysis and Single Cell Genome Assemblies(Metagenomics Informatics Challenges Workshop: 10K Genomes at a Time)

    SciTech Connect

    Sczyrba, Alex

    2011-10-13

    DOE JGI's Alex Sczyrba on "Evaluation of the Cow Rumen Metagenome" and "Assembly by Single Copy Gene Analysis and Single Cell Genome Assemblies" at the Metagenomics Informatics Challenges Workshop held at the DOE JGI on October 12-13, 2011.

  7. Effective Analysis of NGS Metagenomic Data with Ultra-Fast Clustering Algorithms (MICW - Metagenomics Informatics Challenges Workshop: 10K Genomes at a Time)

    SciTech Connect

    Li, Weizhong

    2011-10-12

    San Diego Supercomputer Center's Weizhong Li on "Effective Analysis of NGS Metagenomic Data with Ultra-fast Clustering Algorithms" at the Metagenomics Informatics Challenges Workshop held at the DOE JGI on October 12-13, 2011.

  8. Evaluation of the Cow Rumen Metagenome; Assembly by Single Copy Gene Analysis and Single Cell Genome Assemblies(Metagenomics Informatics Challenges Workshop: 10K Genomes at a Time)

    ScienceCinema

    Sczyrba, Alex [DOE JGI

    2013-01-22

    DOE JGI's Alex Sczyrba on "Evaluation of the Cow Rumen Metagenome" and "Assembly by Single Copy Gene Analysis and Single Cell Genome Assemblies" at the Metagenomics Informatics Challenges Workshop held at the DOE JGI on October 12-13, 2011.

  9. Size Does Matter: Application-driven Approaches for Soil Metagenomics

    PubMed Central

    Kakirde, Kavita S.; Parsley, Larissa C.; Liles, Mark R.

    2010-01-01

    Metagenomic analyses can provide extensive information on the structure, composition, and predicted gene functions of diverse environmental microbial assemblages. Each environment presents its own unique challenges to metagenomic investigation and requires a specifically designed approach to accommodate physicochemical and biotic factors unique to each environment that can pose technical hurdles and/or bias the metagenomic analyses. In particular, soils harbor an exceptional diversity of prokaryotes that are largely undescribed beyond the level of ribotype and are a potentially vast resource for natural product discovery. The successful application of a soil metagenomic approach depends on selecting the appropriate DNA extraction, purification, and if necessary, cloning methods for the intended downstream analyses. The most important technical considerations in a metagenomic study include obtaining a sufficient yield of high-purity DNA representing the targeted microorganisms within an environmental sample or enrichment and (if required) constructing a metagenomic library in a suitable vector and host. Size does matter in the context of the average insert size within a clone library or the sequence read length for a high-throughput sequencing approach. It is also imperative to select the appropriate metagenomic screening strategy to address the specific question(s) of interest, which should drive the selection of methods used in the earlier stages of a metagenomic project (e.g., DNA size, to clone or not to clone). Here, we present both the promising and problematic nature of soil metagenomics and discuss the factors that should be considered when selecting soil sampling, DNA extraction, purification, and cloning methods to implement based on the ultimate study objectives. PMID:21076656

  10. Metagenomics of an Alkaline Hot Spring in Galicia (Spain): Microbial Diversity Analysis and Screening for Novel Lipolytic Enzymes

    PubMed Central

    López-López, Olalla; Knapik, Kamila; Cerdán, Maria-Esperanza; González-Siso, María-Isabel

    2015-01-01

    A fosmid library was constructed with the metagenomic DNA from the water of the Lobios hot spring (76°C, pH = 8.2) located in Ourense (Spain). Metagenomic sequencing of the fosmid library allowed the assembly of 9722 contigs ranging in size from 500 to 56,677 bp and spanning ~18 Mbp. 23,207 ORFs (Open Reading Frames) were predicted from the assembly. Biodiversity was explored by taxonomic classification and it revealed that bacteria were predominant, while the archaea were less abundant. The six most abundant bacterial phyla were Deinococcus-Thermus, Proteobacteria, Firmicutes, Acidobacteria, Aquificae, and Chloroflexi. Within the archaeal superkingdom, the phylum Thaumarchaeota was predominant with the dominant species “Candidatus Caldiarchaeum subterraneum.” Functional classification revealed the genes associated to one-carbon metabolism as the most abundant. Both taxonomic and functional classifications showed a mixture of different microbial metabolic patterns: aerobic and anaerobic, chemoorganotrophic and chemolithotrophic, autotrophic and heterotrophic. Remarkably, the presence of genes encoding enzymes with potential biotechnological interest, such as xylanases, galactosidases, proteases, and lipases, was also revealed in the metagenomic library. Functional screening of this library was subsequently done looking for genes encoding lipolytic enzymes. Six genes conferring lipolytic activity were identified and one was cloned and characterized. This gene was named LOB4Est and it was expressed in a yeast mesophilic host. LOB4Est codes for a novel esterase of family VIII, with sequence similarity to β-lactamases, but with unusual wide substrate specificity. When the enzyme was purified from the mesophilic host it showed half-life of 1 h and 43 min at 50°C, and maximal activity at 40°C and pH 7.5 with p-nitrophenyl-laurate as substrate. Interestingly, the enzyme retained more than 80% of maximal activity in a broad range of pH from 6.5 to 8. PMID:26635759

  11. Fizzy. Feature subset selection for metagenomics

    SciTech Connect

    Ditzler, Gregory; Morrison, J. Calvin; Lan, Yemin; Rosen, Gail L.

    2015-11-04

    Background: Some of the current software tools for comparative metagenomics provide ecologists with the ability to investigate and explore bacterial communities using α– & β–diversity. Feature subset selection – a sub-field of machine learning – can also provide a unique insight into the differences between metagenomic or 16S phenotypes. In particular, feature subset selection methods can obtain the operational taxonomic units (OTUs), or functional features, that have a high-level of influence on the condition being studied. For example, in a previous study we have used information-theoretic feature selection to understand the differences between protein family abundances that best discriminate between age groups in the human gut microbiome. Results: We have developed a new Python command line tool, which is compatible with the widely adopted BIOM format, for microbial ecologists that implements information-theoretic subset selection methods for biological data formats. We demonstrate the software tools capabilities on publicly available datasets. Conclusions: We have made the software implementation of Fizzy available to the public under the GNU GPL license. The standalone implementation can be found at http://github.com/EESI/Fizzy.

  12. Identifying personal microbiomes using metagenomic codes

    PubMed Central

    Franzosa, Eric A.; Huang, Katherine; Meadow, James F.; Gevers, Dirk; Lemon, Katherine P.; Bohannan, Brendan J. M.; Huttenhower, Curtis

    2015-01-01

    Community composition within the human microbiome varies across individuals, but it remains unknown if this variation is sufficient to uniquely identify individuals within large populations or stable enough to identify them over time. We investigated this by developing a hitting set-based coding algorithm and applying it to the Human Microbiome Project population. Our approach defined body site-specific metagenomic codes: sets of microbial taxa or genes prioritized to uniquely and stably identify individuals. Codes capturing strain variation in clade-specific marker genes were able to distinguish among 100s of individuals at an initial sampling time point. In comparisons with follow-up samples collected 30–300 d later, ∼30% of individuals could still be uniquely pinpointed using metagenomic codes from a typical body site; coincidental (false positive) matches were rare. Codes based on the gut microbiome were exceptionally stable and pinpointed >80% of individuals. The failure of a code to match its owner at a later time point was largely explained by the loss of specific microbial strains (at current limits of detection) and was only weakly associated with the length of the sampling interval. In addition to highlighting patterns of temporal variation in the ecology of the human microbiome, this work demonstrates the feasibility of microbiome-based identifiability—a result with important ethical implications for microbiome study design. The datasets and code used in this work are available for download from huttenhower.sph.harvard.edu/idability. PMID:25964341

  13. Picoeukaryotic sequences in the Sargasso Sea metagenome

    PubMed Central

    Piganeau, Gwenael; Desdevises, Yves; Derelle, Evelyne; Moreau, Herve

    2008-01-01

    Background With genome sequencing becoming more and more affordable, environmental shotgun sequencing of the microorganisms present in an environment generates a challenging amount of sequence data for the scientific community. These sequence data enable the diversity of the microbial world and the metabolic pathways within an environment to be investigated, a previously unthinkable achievement when using traditional approaches. DNA sequence data assembled from extracts of 0.8 μm filtered Sargasso seawater unveiled an unprecedented glimpse of marine prokaryotic diversity and gene content. Serendipitously, many sequences representing picoeukaryotes (cell size <2 μm) were also present within this dataset. We investigated the picoeukaryotic diversity of this database by searching sequences containing homologs of eight nuclear anchor genes that are well conserved throughout the eukaryotic lineage, as well as one chloroplastic and one mitochondrial gene. Results We found up to 41 distinct eukaryotic scaffolds, with a broad phylogenetic spread on the eukaryotic tree of life. The average eukaryotic scaffold size is 2,909 bp, with one gap every 1,253 bp. Strikingly, the AT frequency of the eukaryotic sequences (51.4%) is significantly lower than the average AT frequency of the metagenome (61.4%). This represents 4% to 18% of the estimated prokaryotic diversity, depending on the average prokaryotic versus eukaryotic genome size ratio. Conclusion Despite similar cell size, eukaryotic sequences of the Sargasso Sea metagenome have higher GC content, suggesting that different environmental pressures affect the evolution of their base composition. PMID:18179699

  14. Identifying personal microbiomes using metagenomic codes.

    PubMed

    Franzosa, Eric A; Huang, Katherine; Meadow, James F; Gevers, Dirk; Lemon, Katherine P; Bohannan, Brendan J M; Huttenhower, Curtis

    2015-06-01

    Community composition within the human microbiome varies across individuals, but it remains unknown if this variation is sufficient to uniquely identify individuals within large populations or stable enough to identify them over time. We investigated this by developing a hitting set-based coding algorithm and applying it to the Human Microbiome Project population. Our approach defined body site-specific metagenomic codes: sets of microbial taxa or genes prioritized to uniquely and stably identify individuals. Codes capturing strain variation in clade-specific marker genes were able to distinguish among 100s of individuals at an initial sampling time point. In comparisons with follow-up samples collected 30-300 d later, ∼30% of individuals could still be uniquely pinpointed using metagenomic codes from a typical body site; coincidental (false positive) matches were rare. Codes based on the gut microbiome were exceptionally stable and pinpointed >80% of individuals. The failure of a code to match its owner at a later time point was largely explained by the loss of specific microbial strains (at current limits of detection) and was only weakly associated with the length of the sampling interval. In addition to highlighting patterns of temporal variation in the ecology of the human microbiome, this work demonstrates the feasibility of microbiome-based identifiability-a result with important ethical implications for microbiome study design. The datasets and code used in this work are available for download from huttenhower.sph.harvard.edu/idability. PMID:25964341

  15. Genovo: De Novo Assembly for Metagenomes

    NASA Astrophysics Data System (ADS)

    Laserson, Jonathan; Jojic, Vladimir; Koller, Daphne

    Next-generation sequencing technologies produce a large number of noisy reads from the DNA in a sample. Metagenomics and population sequencing aim to recover the genomic sequences of the species in the sample, which could be of high diversity. Methods geared towards single sequence reconstruction are not sensitive enough when applied in this setting. We introduce a generative probabilistic model of read generation from environmental samples and present Genovo, a novel de novo sequence assembler that discovers likely sequence reconstructions under the model. A Chinese restaurant process prior accounts for the unknown number of genomes in the sample. Inference is made by applying a series of hill-climbing steps iteratively until convergence. We compare the performance of Genovo to three other short read assembly programs across one synthetic dataset and eight metagenomic datasets created using the 454 platform, the largest of which has 311k reads. Genovo's reconstructions cover more bases and recover more genes than the other methods, and yield a higher assembly score.

  16. Fizzy. Feature subset selection for metagenomics

    DOE PAGESBeta

    Ditzler, Gregory; Morrison, J. Calvin; Lan, Yemin; Rosen, Gail L.

    2015-11-04

    Background: Some of the current software tools for comparative metagenomics provide ecologists with the ability to investigate and explore bacterial communities using α– & β–diversity. Feature subset selection – a sub-field of machine learning – can also provide a unique insight into the differences between metagenomic or 16S phenotypes. In particular, feature subset selection methods can obtain the operational taxonomic units (OTUs), or functional features, that have a high-level of influence on the condition being studied. For example, in a previous study we have used information-theoretic feature selection to understand the differences between protein family abundances that best discriminate betweenmore » age groups in the human gut microbiome. Results: We have developed a new Python command line tool, which is compatible with the widely adopted BIOM format, for microbial ecologists that implements information-theoretic subset selection methods for biological data formats. We demonstrate the software tools capabilities on publicly available datasets. Conclusions: We have made the software implementation of Fizzy available to the public under the GNU GPL license. The standalone implementation can be found at http://github.com/EESI/Fizzy.« less

  17. Tales from the crypt and coral reef: the successes and challenges of identifying new herpesviruses using metagenomics

    PubMed Central

    Houldcroft, Charlotte J.; Breuer, Judith

    2015-01-01

    Herpesviruses are ubiquitous double-stranded DNA viruses infecting many animals, with the capacity to cause disease in both immunocompetent and immunocompromised hosts. Different herpesviruses have different cell tropisms, and have been detected in a diverse range of tissues and sample types. Metagenomics—encompassing viromics—analyses the nucleic acid of a tissue or other sample in an unbiased manner, making few or no prior assumptions about which viruses may be present in a sample. This approach has successfully discovered a number of novel herpesviruses. Furthermore, metagenomic analysis can identify herpesviruses with high degrees of sequence divergence from known herpesviruses and does not rely upon culturing large quantities of viral material. Metagenomics has had success in two areas of herpesvirus sequencing: firstly, the discovery of novel exogenous and endogenous herpesviruses in primates, bats and cnidarians; and secondly, in characterizing large areas of the genomes of herpesviruses previously only known from small fragments, revealing unexpected diversity. This review will discuss the successes and challenges of using metagenomics to identify novel herpesviruses, and future directions within the field. PMID:25821447

  18. Selectable fragmentation warhead

    DOEpatents

    Bryan, Courtney S.; Paisley, Dennis L.; Montoya, Nelson I.; Stahl, David B.

    1993-01-01

    A selectable fragmentation warhead capable of producing a predetermined number of fragments from a metal plate, and accelerating the fragments toward a target. A first explosive located adjacent to the plate is detonated at selected number of points by laser-driven slapper detonators. In one embodiment, a smoother-disk and a second explosive, located adjacent to the first explosive, serve to increase acceleration of the fragments toward a target. The ability to produce a selected number of fragments allows for effective destruction of a chosen target.

  19. Selectable fragmentation warhead

    SciTech Connect

    Bryan, C.S.; Paisley, D.L.; Montoya, N.I.; Stahl, D.B.

    1992-12-31

    This report discusses a selectable fragmentation warhead which is capable of producing a predetermined number of fragments from a metal plate, and accelerating the fragments toward a target. A first explosive located adjacent to the plate is detonated at selected number of points by laser-driven slapper detonators. In one embodiment, a smoother-disk and a second explosive, located adjacent to the first explosive, serve to increase acceleration of the fragments toward a target. The ability to produce a selected number of fragments allows for effective destruction of a chosen target.

  20. Enhancing Metagenomics Investigations of Microbial Interactions with Biofilm Technology

    PubMed Central

    McLean, Robert J. C.; Kakirde, Kavita S.

    2013-01-01

    Investigations of microbial ecology and diversity have been greatly enhanced by the application of culture-independent techniques. One such approach, metagenomics, involves sample collections from soil, water, and other environments. Extracted nucleic acids from bulk environmental samples are sequenced and analyzed, which allows microbial interactions to be inferred on the basis of bioinformatics calculations. In most environments, microbial interactions occur predominately in surface-adherent, biofilm communities. In this review, we address metagenomics sampling and biofilm biology, and propose an experimental strategy whereby the resolving power of metagenomics can be enhanced by incorporating a biofilm-enrichment step during sample acquisition. PMID:24284397

  1. Metagenomic search strategies for interactions among plants and multiple microbes

    PubMed Central

    Melcher, Ulrich; Verma, Ruchi; Schneider, William L.

    2014-01-01

    Plants harbor multiple microbes. Metagenomics can facilitate understanding of the significance, for the plant, of the microbes, and of the interactions among them. However, current approaches to metagenomic analysis of plants are computationally time consuming. Efforts to speed the discovery process include improvement of computational speed, condensing the sequencing reads into smaller datasets before BLAST searches, simplifying the target database of BLAST searches, and flipping the roles of metagenomic and reference datasets. The latter is exemplified by the e-probe diagnostic nucleic acid analysis approach originally devised for improving analysis during plant quarantine. PMID:24966863

  2. Competitive Metagenomic DNA Hybridization Identifies Host-Specific Microbial Genetic Markers in Cow Fecal Samples†

    PubMed Central

    Shanks, Orin C.; Santo Domingo, Jorge W.; Lamendella, Regina; Kelty, Catherine A.; Graham, James E.

    2006-01-01

    Several PCR methods have recently been developed to identify fecal contamination in surface waters. In all cases, researchers have relied on one gene or one microorganism for selection of host-specific markers. Here we describe the application of a genome fragment enrichment (GFE) method to identify host-specific genetic markers from fecal microbial community DNA. As a proof of concept, bovine fecal DNA was challenged against a porcine fecal DNA background to select for bovine-specific DNA sequences. Bioinformatic analyses of 380 bovine enriched metagenomic sequences indicated a preponderance of Bacteroidales-like regions predicted to encode membrane-associated and secreted proteins. Oligonucleotide primers capable of annealing to select Bacteroidales-like bovine GFE sequences exhibited extremely high specificity (>99%) in PCR assays with total fecal DNAs from 279 different animal sources. These primers also demonstrated a broad distribution of corresponding genetic markers (81% positive) among 148 different bovine sources. These data demonstrate that direct metagenomic DNA analysis by the competitive solution hybridization approach described is an efficient method for identifying potentially useful fecal genetic markers and for characterizing differences between environmental microbial communities. PMID:16751515

  3. Metagenomic discovery of novel enzymes and biosurfactants in a slaughterhouse biofilm microbial community

    PubMed Central

    Thies, Stephan; Rausch, Sonja Christina; Kovacic, Filip; Schmidt-Thaler, Alexandra; Wilhelm, Susanne; Rosenau, Frank; Daniel, Rolf; Streit, Wolfgang; Pietruszka, Jörg; Jaeger, Karl-Erich

    2016-01-01

    DNA derived from environmental samples is a rich source of novel bioactive molecules. The choice of the habitat to be sampled predefines the properties of the biomolecules to be discovered due to the physiological adaptation of the microbial community to the prevailing environmental conditions. We have constructed a metagenomic library in Escherichia coli DH10b with environmental DNA (eDNA) isolated from the microbial community of a slaughterhouse drain biofilm consisting mainly of species from the family Flavobacteriaceae. By functional screening of this library we have identified several lipases, proteases and two clones (SA343 and SA354) with biosurfactant and hemolytic activities. Sequence analysis of the respective eDNA fragments and subsequent structure homology modelling identified genes encoding putative N-acyl amino acid synthases with a unique two-domain organisation. The produced biosurfactants were identified by NMR spectroscopy as N-acyltyrosines with N-myristoyltyrosine as the predominant species. Critical micelle concentration and reduction of surface tension were similar to those of chemically synthesised N-myristoyltyrosine. Furthermore, we showed that the newly isolated N-acyltyrosines exhibit antibiotic activity against various bacteria. This is the first report describing the successful application of functional high-throughput screening assays for the identification of biosurfactant producing clones within a metagenomic library. PMID:27271534

  4. Metagenomic discovery of novel enzymes and biosurfactants in a slaughterhouse biofilm microbial community.

    PubMed

    Thies, Stephan; Rausch, Sonja Christina; Kovacic, Filip; Schmidt-Thaler, Alexandra; Wilhelm, Susanne; Rosenau, Frank; Daniel, Rolf; Streit, Wolfgang; Pietruszka, Jörg; Jaeger, Karl-Erich

    2016-01-01

    DNA derived from environmental samples is a rich source of novel bioactive molecules. The choice of the habitat to be sampled predefines the properties of the biomolecules to be discovered due to the physiological adaptation of the microbial community to the prevailing environmental conditions. We have constructed a metagenomic library in Escherichia coli DH10b with environmental DNA (eDNA) isolated from the microbial community of a slaughterhouse drain biofilm consisting mainly of species from the family Flavobacteriaceae. By functional screening of this library we have identified several lipases, proteases and two clones (SA343 and SA354) with biosurfactant and hemolytic activities. Sequence analysis of the respective eDNA fragments and subsequent structure homology modelling identified genes encoding putative N-acyl amino acid synthases with a unique two-domain organisation. The produced biosurfactants were identified by NMR spectroscopy as N-acyltyrosines with N-myristoyltyrosine as the predominant species. Critical micelle concentration and reduction of surface tension were similar to those of chemically synthesised N-myristoyltyrosine. Furthermore, we showed that the newly isolated N-acyltyrosines exhibit antibiotic activity against various bacteria. This is the first report describing the successful application of functional high-throughput screening assays for the identification of biosurfactant producing clones within a metagenomic library. PMID:27271534

  5. Single Cell and Metagenomic Assemblies: Biology Drives Technical Choices and Goals (Metagenomics Informatics Challenges Workshop: 10K Genomes at a Time)

    SciTech Connect

    Stepanauskas, Ramunas

    2011-10-13

    DOE JGI's Tanja Woyke, chair of the Single Cells and Metagenomes session, delivers an introduction, followed by Bigelow Laboratory's Ramunas Stepanauskas on "Single Cell and Metagenomic Assemblies: Biology Drives Technical Choices and Goals" at the Metagenomics Informatics Challenges Workshop held at the DOE JGI on October 12-13, 2011.

  6. Single Cell and Metagenomic Assemblies: Biology Drives Technical Choices and Goals (Metagenomics Informatics Challenges Workshop: 10K Genomes at a Time)

    ScienceCinema

    Stepanauskas, Ramunas [Bigelow Laboratory

    2013-01-22

    DOE JGI's Tanja Woyke, chair of the Single Cells and Metagenomes session, delivers an introduction, followed by Bigelow Laboratory's Ramunas Stepanauskas on "Single Cell and Metagenomic Assemblies: Biology Drives Technical Choices and Goals" at the Metagenomics Informatics Challenges Workshop held at the DOE JGI on October 12-13, 2011.

  7. Metagenome and Metatranscriptome Analyses Using Protein Family Profiles.

    PubMed

    Zhong, Cuncong; Edlund, Anna; Yang, Youngik; McLean, Jeffrey S; Yooseph, Shibu

    2016-07-01

    Analyses of metagenome data (MG) and metatranscriptome data (MT) are often challenged by a paucity of complete reference genome sequences and the uneven/low sequencing depth of the constituent organisms in the microbial community, which respectively limit the power of reference-based alignment and de novo sequence assembly. These limitations make accurate protein family classification and abundance estimation challenging, which in turn hamper downstream analyses such as abundance profiling of metabolic pathways, identification of differentially encoded/expressed genes, and de novo reconstruction of complete gene and protein sequences from the protein family of interest. The profile hidden Markov model (HMM) framework enables the construction of very useful probabilistic models for protein families that allow for accurate modeling of position specific matches, insertions, and deletions. We present a novel homology detection algorithm that integrates banded Viterbi algorithm for profile HMM parsing with an iterative simultaneous alignment and assembly computational framework. The algorithm searches a given profile HMM of a protein family against a database of fragmentary MG/MT sequencing data and simultaneously assembles complete or near-complete gene and protein sequences of the protein family. The resulting program, HMM-GRASPx, demonstrates superior performance in aligning and assembling homologs when benchmarked on both simulated marine MG and real human saliva MG datasets. On real supragingival plaque and stool MG datasets that were generated from healthy individuals, HMM-GRASPx accurately estimates the abundances of the antimicrobial resistance (AMR) gene families and enables accurate characterization of the resistome profiles of these microbial communities. For real human oral microbiome MT datasets, using the HMM-GRASPx estimated transcript abundances significantly improves detection of differentially expressed (DE) genes. Finally, HMM-GRASPx was used to

  8. Metagenome and Metatranscriptome Analyses Using Protein Family Profiles

    PubMed Central

    Zhong, Cuncong; Yooseph, Shibu

    2016-01-01

    Analyses of metagenome data (MG) and metatranscriptome data (MT) are often challenged by a paucity of complete reference genome sequences and the uneven/low sequencing depth of the constituent organisms in the microbial community, which respectively limit the power of reference-based alignment and de novo sequence assembly. These limitations make accurate protein family classification and abundance estimation challenging, which in turn hamper downstream analyses such as abundance profiling of metabolic pathways, identification of differentially encoded/expressed genes, and de novo reconstruction of complete gene and protein sequences from the protein family of interest. The profile hidden Markov model (HMM) framework enables the construction of very useful probabilistic models for protein families that allow for accurate modeling of position specific matches, insertions, and deletions. We present a novel homology detection algorithm that integrates banded Viterbi algorithm for profile HMM parsing with an iterative simultaneous alignment and assembly computational framework. The algorithm searches a given profile HMM of a protein family against a database of fragmentary MG/MT sequencing data and simultaneously assembles complete or near-complete gene and protein sequences of the protein family. The resulting program, HMM-GRASPx, demonstrates superior performance in aligning and assembling homologs when benchmarked on both simulated marine MG and real human saliva MG datasets. On real supragingival plaque and stool MG datasets that were generated from healthy individuals, HMM-GRASPx accurately estimates the abundances of the antimicrobial resistance (AMR) gene families and enables accurate characterization of the resistome profiles of these microbial communities. For real human oral microbiome MT datasets, using the HMM-GRASPx estimated transcript abundances significantly improves detection of differentially expressed (DE) genes. Finally, HMM-GRASPx was used to

  9. Characterization of the gut microbiota of Kawasaki disease patients by metagenomic analysis

    PubMed Central

    Kinumaki, Akiko; Sekizuka, Tsuyoshi; Hamada, Hiromichi; Kato, Kengo; Yamashita, Akifumi; Kuroda, Makoto

    2015-01-01

    Kawasaki disease (KD) is an acute febrile illness of early childhood. Previous reports have suggested that genetic disease susceptibility factors, together with a triggering infectious agent, could be involved in KD pathogenesis; however, the precise etiology of this disease remains unknown. Additionally, previous culture-based studies have suggested a possible role of intestinal microbiota in KD pathogenesis. In this study, we performed metagenomic analysis to comprehensively assess the longitudinal variation in the intestinal microbiota of 28 KD patients. Several notable bacterial genera were commonly extracted during the acute phase, whereas a relative increase in the number of Ruminococcus bacteria was observed during the non-acute phase of KD. The metagenomic analysis results based on bacterial species classification suggested that the number of sequencing reads with similarity to five Streptococcus spp. (S. pneumonia, pseudopneumoniae, oralis, gordonii, and sanguinis), in addition to patient-derived Streptococcus isolates, markedly increased during the acute phase in most patients. Streptococci include a variety of pathogenic bacteria and probiotic bacteria that promote human health; therefore, this further species discrimination could comprehensively illuminate the KD-associated microbiota. The findings of this study suggest that KD-related Streptococci might be involved in the pathogenesis of this disease. PMID:26322033

  10. Metagenomic analysis of the gut microbiota of the Timber Rattlesnake, Crotalus horridus.

    PubMed

    McLaughlin, Richard William; Cochran, Philip A; Dowd, Scot E

    2015-07-01

    Snakes are capable of surviving long periods without food. In this study we characterized the microbiota of a Timber Rattlesnake (Crotalus horridus), devoid of digesta, living in the wild. Pyrosequencing-based metagenomics were used to analyze phylogenetic and metabolic profiles with the aid of the MG-RAST server. Pyrosequencing of samples taken from the stomach, small intestine and colon yielded 691696, 957756 and 700419 high quality sequence reads. Taxonomic analysis of metagenomic reads indicated Eukarya was the most predominant domain, followed by bacteria and then viruses, for all three tissues. The most predominant phylum in the domain Bacteria was Proteobacteria for the tissues examined. Functional classifications by the subsystem database showed cluster-based subsystems were most predominant (10-15 %). Almost equally predominant (10-13 %) was carbohydrate metabolism. To identify bacteria in the colon at a finer taxonomic resolution, a 16S rRNA gene clone library was created. Proteobacteria was again found to be the most predominant phylum. The present study provides a baseline for understanding the microbial ecology of snakes living in the wild. PMID:25663091

  11. Classifying the uncultivated microbial majority: A place for metagenomic data in the Candidatus proposal.

    PubMed

    Konstantinidis, Konstantinos T; Rosselló-Móra, Ramon

    2015-06-01

    Microbial taxonomists have generally been reluctant to accept the valid publication of names of uncultured taxa given that only pure cultures allow for a thorough description of the genealogy, genetics and phenotype of the putative taxa to be classified. The classification of conspicuous uncultured organisms has been considered into the Candidatus provisional status, but this is only possible with organisms for which it is possible to retrieve basic data on phylogeny, morphology, ecology and some metabolic traits that unequivocally identify them. The current developments on modern sequencing techniques, and especially metagenomics, allow the recognition of discrete populations of DNA sequences in environmental samples, which can be considered to belong to individual closely related populations that may be identified as members of yet-to-be described species. The recognition of such populations of (meta)genomes allow the retrieval of valuable taxonomic information, i.e. genealogy, genome, phenotypic coherence with other populations, and ecological relevant traits. Such traits may be included in the Candidatus proposals of environmentally occurring, yet uncultured species not exhibiting exceptional morphologies, phenotypes or ecological relevancies. PMID:25681255

  12. Metagenomic Profiling of a Microbial Assemblage Associated with the California Mussel: A Node in Networks of Carbon and Nitrogen Cycling

    PubMed Central

    Pfister, Catherine A.; Meyer, Folker; Antonopoulos, Dionysios A.

    2010-01-01

    Mussels are conspicuous and often abundant members of rocky shores and may constitute an important site for the nitrogen cycle due to their feeding and excretion activities. We used shotgun metagenomics of the microbial community associated with the surface of mussels (Mytilus californianus) on Tatoosh Island in Washington state to test whether there is a nitrogen-based microbial assemblage associated with mussels. Analyses of both tidepool mussels and those on emergent benches revealed a diverse community of Bacteria and Archaea with approximately 31 million bp from 6 mussels in each habitat. Using MG-RAST, between 22.5–25.6% were identifiable using the SEED non-redundant database for proteins. Of those fragments that were identifiable through MG-RAST, the composition was dominated by Cyanobacteria and Alpha- and Gamma-proteobacteria. Microbial composition was highly similar between the tidepool and emergent bench mussels, suggesting similar functions across these different microhabitats. One percent of the proteins identified in each sample were related to nitrogen cycling. When normalized to protein discovery rate, the high diversity and abundance of enzymes related to the nitrogen cycle in mussel-associated microbes is as great or greater than that described for other marine metagenomes. In some instances, the nitrogen-utilizing profile of this assemblage was more concordant with soil metagenomes in the Midwestern U.S. than for open ocean system. Carbon fixation and Calvin cycle enzymes further represented 0.65 and 1.26% of all proteins and their abundance was comparable to a number of open ocean marine metagenomes. In sum, the diversity and abundance of nitrogen and carbon cycle related enzymes in the microbes occupying the shells of Mytilus californianus suggest these mussels provide a node for microbial populations and thus biogeochemical processes. PMID:20463896

  13. Universality of fragment shapes.

    PubMed

    Domokos, Gábor; Kun, Ferenc; Sipos, András Árpád; Szabó, Tímea

    2015-01-01

    The shape of fragments generated by the breakup of solids is central to a wide variety of problems ranging from the geomorphic evolution of boulders to the accumulation of space debris orbiting Earth. Although the statistics of the mass of fragments has been found to show a universal scaling behavior, the comprehensive characterization of fragment shapes still remained a fundamental challenge. We performed a thorough experimental study of the problem fragmenting various types of materials by slowly proceeding weathering and by rapid breakup due to explosion and hammering. We demonstrate that the shape of fragments obeys an astonishing universality having the same generic evolution with the fragment size irrespective of materials details and loading conditions. There exists a cutoff size below which fragments have an isotropic shape, however, as the size increases an exponential convergence is obtained to a unique elongated form. We show that a discrete stochastic model of fragmentation reproduces both the size and shape of fragments tuning only a single parameter which strengthens the general validity of the scaling laws. The dependence of the probability of the crack plan orientation on the linear extension of fragments proved to be essential for the shape selection mechanism. PMID:25772300

  14. Universality of fragment shapes

    PubMed Central

    Domokos, Gábor; Kun, Ferenc; Sipos, András Árpád; Szabó, Tímea

    2015-01-01

    The shape of fragments generated by the breakup of solids is central to a wide variety of problems ranging from the geomorphic evolution of boulders to the accumulation of space debris orbiting Earth. Although the statistics of the mass of fragments has been found to show a universal scaling behavior, the comprehensive characterization of fragment shapes still remained a fundamental challenge. We performed a thorough experimental study of the problem fragmenting various types of materials by slowly proceeding weathering and by rapid breakup due to explosion and hammering. We demonstrate that the shape of fragments obeys an astonishing universality having the same generic evolution with the fragment size irrespective of materials details and loading conditions. There exists a cutoff size below which fragments have an isotropic shape, however, as the size increases an exponential convergence is obtained to a unique elongated form. We show that a discrete stochastic model of fragmentation reproduces both the size and shape of fragments tuning only a single parameter which strengthens the general validity of the scaling laws. The dependence of the probability of the crack plan orientation on the linear extension of fragments proved to be essential for the shape selection mechanism. PMID:25772300

  15. Universality of fragment shapes

    NASA Astrophysics Data System (ADS)

    Domokos, Gábor; Kun, Ferenc; Sipos, András Árpád; Szabó, Tímea

    2015-03-01

    The shape of fragments generated by the breakup of solids is central to a wide variety of problems ranging from the geomorphic evolution of boulders to the accumulation of space debris orbiting Earth. Although the statistics of the mass of fragments has been found to show a universal scaling behavior, the comprehensive characterization of fragment shapes still remained a fundamental challenge. We performed a thorough experimental study of the problem fragmenting various types of materials by slowly proceeding weathering and by rapid breakup due to explosion and hammering. We demonstrate that the shape of fragments obeys an astonishing universality having the same generic evolution with the fragment size irrespective of materials details and loading conditions. There exists a cutoff size below which fragments have an isotropic shape, however, as the size increases an exponential convergence is obtained to a unique elongated form. We show that a discrete stochastic model of fragmentation reproduces both the size and shape of fragments tuning only a single parameter which strengthens the general validity of the scaling laws. The dependence of the probability of the crack plan orientation on the linear extension of fragments proved to be essential for the shape selection mechanism.

  16. Metagenomics of Glassy-winged Sharpshooter, Homalodisca vitripennis (Hemiptera: Cicadellidae)

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Three new insect-infecting viruses, three endosymbiotic bacteria, a fungus, and a bacterial phage were discovered using a metagenomics approach to identify unknown organisms that live in association with the sharpshooter, Homalodisca vitripennis (Hemiptera: Cicadellidae). The genetic composition of ...

  17. Analysis of Metagenomic Sequences: From Megabases to Terabases

    SciTech Connect

    Krypides, Nikos

    2010-06-04

    Nikos Krypides of the DOE Joint Genome Institute discusses metagenomics and the challenge of dealing with terabases of data on June 4, 2010 at the "Sequencing, Finishing, Analysis in the Future" meeting in Santa Fe, NM

  18. Exploring Metagenomics in the Laboratory of an Introductory Biology Course†

    PubMed Central

    Gibbens, Brian B.; Scott, Cheryl L.; Hoff, Courtney D.; Schottel, Janet L.

    2015-01-01

    Four laboratory modules were designed for introductory biology students to explore the field of metagenomics. Students collected microbes from environmental samples, extracted the DNA, and amplified 16S rRNA gene sequences using polymerase chain reaction (PCR). Students designed functional metagenomics screens to determine and compare antibiotic resistance profiles among the samples. Bioinformatics tools were used to generate and interpret phylogenetic trees and identify homologous genes. A pretest and posttest were used to assess learning gains, and the results indicated that these modules increased student performance by an average of 22%. Here we describe ways to engage students in metagenomics-related research and provide readers with ideas for how they can start developing metagenomics exercises for their own classrooms. PMID:25949755

  19. Application of DNA microarray for screening metagenome library clones.

    PubMed

    Park, Soo-Je; Chae, Jong-Chan; Rhee, Sung-Keun

    2010-01-01

    Sequence-based screening tools of a metagenome library can expedite metagenome researches considering tremendous metagenome diversities. Several critical disadvantages of activity-based screening of metagenome libraries could be overcome by sequence-based screening approaches. DNA microarray technology widely used for monitoring environmental genes can be employed for screening environmental fosmid and BAC clones harboring target genes due to its high throughput nature. DNAs of fosmid clones are extracted and spotted on a glass slide and fluorescence-labeled probes are hybridized to the microarray. Specific hybridization signals can be obtained only for the fosmid clones that contain the target gene with high sensitivity (10 ng/μL of fosmid clone DNA) and quantitativeness. PMID:20830574

  20. Metagenomics - a guide from sampling to data analysis

    PubMed Central

    2012-01-01

    Metagenomics applies a suite of genomic technologies and bioinformatics tools to directly access the genetic content of entire communities of organisms. The field of metagenomics has been responsible for substantial advances in microbial ecology, evolution, and diversity over the past 5 to 10 years, and many research laboratories are actively engaged in it now. With the growing numbers of activities also comes a plethora of methodological knowledge and expertise that should guide future developments in the field. This review summarizes the current opinions in metagenomics, and provides practical guidance and advice on sample processing, sequencing technology, assembly, binning, annotation, experimental design, statistical analysis, data storage, and data sharing. As more metagenomic datasets are generated, the availability of standardized procedures and shared data storage and analysis becomes increasingly important to ensure that output of individual projects can be assessed and compared. PMID:22587947

  1. Identifying Differentially Abundant Metabolic Pathways in Metagenomic Datasets

    NASA Astrophysics Data System (ADS)

    Liu, Bo; Pop, Mihai

    Enabled by rapid advances in sequencing technology, metagenomic studies aim to characterize entire communities of microbes bypassing the need for culturing individual bacterial members. One major goal of such studies is to identify specific functional adaptations of microbial communities to their habitats. Here we describe a powerful analytical method (MetaPath) that can identify differentially abundant pathways in metagenomic data-sets, relying on a combination of metagenomic sequence data and prior metabolic pathway knowledge. We show that MetaPath outperforms other common approaches when evaluated on simulated datasets. We also demonstrate the power of our methods in analyzing two, publicly available, metagenomic datasets: a comparison of the gut microbiome of obese and lean twins; and a comparison of the gut microbiome of infant and adult subjects. We demonstrate that the subpathways identified by our method provide valuable insights into the biological activities of the microbiome.

  2. Comparative Metagenomics of Freshwater Microbial Communities

    SciTech Connect

    Hemme, Chris; Deng, Ye; Tu, Qichao; Fields, Matthew; Gentry, Terry; Wu, Liyou; Tringe, Susannah; Watson, David; He, Zhili; Hazen, Terry; Tiedje, James; Rubin, Eddy; Zhou, Jizhong

    2010-05-17

    Previous analyses of a microbial metagenome from uranium and nitric-acid contaminated groundwater (FW106) showed significant environmental effects resulting from the rapid introduction of multiple contaminants. Effects include a massive loss of species and strain biodiversity, accumulation of toxin resistant genes in the metagenome and lateral transfer of toxin resistance genes between community members. To better understand these results in an ecological context, a second metagenome from a pristine groundwater system located along the same geological strike was sequenced and analyzed (FW301). It is hypothesized that FW301 approximates the ancestral FW106 community based on phylogenetic profiles and common geological parameters; however, even if is not the case, the datasets still permit comparisons between healthy and stressed groundwater ecosystems. Complex carbohydrate metabolism has been almost entirely lost in the stressed ecosystem. In contrast, the pristine system encodes a wide diversity of complex carbohydrate metabolism systems, suggesting that carbon turnover is very rapid and less leaky in the healthy groundwater system. FW301 encodes many (~;;160+) carbon monoxide dehydrogenase genes while FW106 encodes none. This result suggests that the community is frequently exposed to oxygen from aerated rainwater percolating into the subsurface, with a resulting high rate of carbon metabolism and CO production. When oxygen levels fall, the CO then serves as a major carbon source for the community. FW301 appears to be capable of CO2 fixation via the reductive carboxylase (reverse TCA) cycle and possibly acetogenesis, activities; these activities are lacking in the heterotrophic FW106 system which relies exclusively on respiration of nitrate and/or oxygen for energy production. FW301 encodes a complete set of B12 biosynthesis pathway at high abundance suggesting the use of sodium gradients for energy production in the healthy groundwater community. Overall

  3. Metagenomic Sequencing of an In Vitro-Simulated Microbial Community

    SciTech Connect

    Morgan, Jenna L.; Darling, Aaron E.; Eisen, Jonathan A.

    2009-12-01

    Background: Microbial life dominates the earth, but many species are difficult or even impossible to study under laboratory conditions. Sequencing DNA directly from the environment, a technique commonly referred to as metagenomics, is an important tool for cataloging microbial life. This culture-independent approach involves collecting samples that include microbes in them, extracting DNA from the samples, and sequencing the DNA. A sample may contain many different microorganisms, macroorganisms, and even free-floating environmental DNA. A fundamental challenge in metagenomics has been estimating the abundance of organisms in a sample based on the frequency with which the organism's DNA was observed in reads generated via DNA sequencing. Methodology/Principal Findings: We created mixtures of ten microbial species for which genome sequences are known. Each mixture contained an equal number of cells of each species. We then extracted DNA from the mixtures, sequenced the DNA, and measured the frequency with which genomic regions from each organism was observed in the sequenced DNA. We found that the observed frequency of reads mapping to each organism did not reflect the equal numbers of cells that were known to be included in each mixture. The relative organism abundances varied significantly depending on the DNA extraction and sequencing protocol utilized. Conclusions/Significance: We describe a new data resource for measuring the accuracy of metagenomic binning methods, created by in vitro-simulation of a metagenomic community. Our in vitro simulation can be used to complement previous in silico benchmark studies. In constructing a synthetic community and sequencing its metagenome, we encountered several sources of observation bias that likely affect most metagenomic experiments to date and present challenges for comparative metagenomic studies. DNA preparation methods have a particularly profound effect in our study, implying that samples prepared with different

  4. Biocatalysts and their small molecule products from metagenomic studies

    PubMed Central

    Iqbal, Hala A.; Feng, Zhiyang; Brady, Sean F.

    2012-01-01

    The vast majority of bacteria present in environmental samples have never been cultured and therefore they have not been available to exploit their ability to produce useful biocatalysts or collections of biocatalysts that can biosynthesize interesting small molecules. Metagenomic libraries constructed using DNA extracted directly from natural bacterial communities offer access to the genetic information present in the genomes of these as yet uncultured bacteria. This review highlights recent efforts to recover both discrete enzymes and small molecules from metagenomic libraries. PMID:22455793

  5. Exploration of Metagenome Assemblies with an Interactive Visualization Tool

    SciTech Connect

    Cantor, Michael; Nordberg, Henrik; Smirnova, Tatyana; Andersen, Evan; Tringe, Susannah; Hess, Matthias; Dubchak, Inna

    2014-07-09

    Metagenomics, one of the fastest growing areas of modern genomic science, is the genetic profiling of the entire community of microbial organisms present in an environmental sample. Elviz is a web-based tool for the interactive exploration of metagenome assemblies. Elviz can be used with publicly available data sets from the Joint Genome Institute or with custom user-loaded assemblies. Elviz is available at genome.jgi.doe.gov/viz

  6. Metagenomics for studying unculturable microorganisms: cutting the Gordian knot

    PubMed Central

    Schloss, Patrick D; Handelsman, Jo

    2005-01-01

    More than 99% of prokaryotes in the environment cannot be cultured in the laboratory, a phenomenon that limits our understanding of microbial physiology, genetics, and community ecology. One way around this problem is metagenomics, the culture-independent cloning and analysis of microbial DNA extracted directly from an environmental sample. Recent advances in shotgun sequencing and computational methods for genome assembly have advanced the field of metagenomics to provide glimpses into the life of uncultured microorganisms. PMID:16086859

  7. Fragmentation properties of metals

    SciTech Connect

    Grady, D.E.; Kipp, M.E.

    1996-06-01

    In the present study we are developing an experimental fracture material property test method specific to dynamic fragmentation. Spherical test samples of the metals of interest are subjected to controlled impulsive stress loads by acceleration to high velocities with a light-gas launcher facility and subsequent normal impact on thin plates. Motion, deformation and fragmentation of the test samples are diagnosed with multiple flash radiography methods. The impact plate materials are selected to be transparent to the x-ray method so that only test metal material is imaged. Through a systematic series of such tests, both strain-to-failure and fragmentation resistance properties are determined through this experimental method. Fragmentation property data for several steels, copper, aluminum, tantalum and titanium have been obtained to date. Aspects of the dynamic data have been analyzed with computational methods to achieve a better understanding of the processes leading to failure and fragmentation, and to test an existing computational fragmentation model.

  8. Endodontic classification.

    PubMed

    Morse, D R; Seltzer, S; Sinai, I; Biron, G

    1977-04-01

    Clinical and histopathologic findings are mixed in current endodontic classifications. A new system, based on symptomatology, may be more useful in clincial practice. The classifications are vital asymptomatic, hypersensitive dentin, inflamed-reversible, inflamed/dengenerating without area-irreversible, inflamed/degenerating with area-irreversible, necrotic without area, and necrotic with area. PMID:265327

  9. THE ROLE OF WATERSHED CLASSIFICATION IN DIAGNOSING CAUSES OF BIOLOGICAL IMPAIRMENT

    EPA Science Inventory

    We compared classification schemes based on watershed storage (wetland + lake area/watershed area) and forest fragmention with a gewographically-based classification scheme for two case studies involving 1) Lake Superior tributaries and 2) watersheds of riverine coastal wetlands ...

  10. Beyond the bounds of orthology: functional inference from metagenomic context.

    PubMed

    Vey, Gregory; Moreno-Hagelsieb, Gabriel

    2010-07-01

    The effectiveness of the computational inference of function by genomic context is bounded by the diversity of known microbial genomes. Although metagenomes offer access to previously inaccessible organisms, their fragmentary nature prevents the conventional establishment of orthologous relationships required for reliably predicting functional interactions. We introduce a protocol for the prediction of functional interactions using data sources without information about orthologous relationships. To illustrate this process, we use the Sargasso Sea metagenome to construct a functional interaction network for the Escherichia coli K12 genome. We identify two reliability metrics, target intergenic distance and source interaction count, and apply them to selectively filter the predictions retained to construct the network of functional interactions. The resulting network contains 2297 nodes with 10 072 edges with a positive predictive value of 0.80. The metagenome yielded 8423 functional interactions beyond those found using only the genomic orthologs as a data source. This amounted to a 134% increase in the total number of functional interactions that are predicted by combining the metagenome and the genomic orthologs versus the genomic orthologs alone. In the absence of detectable orthologous relationships it remains feasible to derive a reliable set of predicted functional interactions. This offers a strategy for harnessing other metagenomes and homologs in general. Because metagenomes allow access to previously unreachable microorganisms, this will result in expanding the universe of known functional interactions thus furthering our understanding of functional organization. PMID:20419183

  11. Uncovering oral Neisseria tropism and persistence using metagenomic sequencing.

    PubMed

    Donati, Claudio; Zolfo, Moreno; Albanese, Davide; Tin Truong, Duy; Asnicar, Francesco; Iebba, Valerio; Cavalieri, Duccio; Jousson, Olivier; De Filippo, Carlotta; Huttenhower, Curtis; Segata, Nicola

    2016-01-01

    Microbial epidemiology and population genomics have previously been carried out near-exclusively for organisms grown in vitro. Metagenomics helps to overcome this limitation, but it is still challenging to achieve strain-level characterization of microorganisms from culture-independent data with sufficient resolution for epidemiological modelling. Here, we have developed multiple complementary approaches that can be combined to profile and track individual microbial strains. To specifically profile highly recombinant neisseriae from oral metagenomes, we integrated four metagenomic analysis techniques: single nucleotide polymorphisms in the clade's core genome, DNA uptake sequence signatures, metagenomic multilocus sequence typing and strain-specific marker genes. We applied these tools to 520 oral metagenomes from the Human Microbiome Project, finding evidence of site tropism and temporal intra-subject strain retention. Although the opportunistic pathogen Neisseria meningitidis is enriched for colonization in the throat, N. flavescens and N. subflava populate the tongue dorsum, and N. sicca, N. mucosa and N. elongata the gingival plaque. The buccal mucosa appeared as an intermediate ecological niche between the plaque and the tongue. The resulting approaches to metagenomic strain profiling are generalizable and can be extended to other organisms and microbiomes across environments. PMID:27572971

  12. Assessment of diversity indices for the characterization of the soil prokaryotic community by metagenomic analysis

    NASA Astrophysics Data System (ADS)

    Chernov, T. I.; Tkhakakhova, A. K.; Kutovaya, O. V.

    2015-04-01

    The diversity indices used in ecology for assessing the metagenomes of soil prokaryotic communities at different phylogenetic levels were compared. The following indices were considered: the number of detected taxa and the Shannon, Menhinick, Margalef, Simpson, Chao1, and ACE indices. The diversity analysis of the prokaryotic communities in the upper horizons of a typical chernozem (Haplic Chernozem (Pachic)), a dark chestnut soil (Haplic Kastanozem (Chromic)), and an extremely arid desert soil (Endosalic Calcisol (Yermic)) was based on the analysis of 16S rRNA genes. The Menhinick, Margalef, Chao1, and ACE indices gave similar results for the classification of the communities according to their diversity levels; the Simpson index gave good results only for the high-level taxa (phyla); the best results were obtained with the Shannon index. In general, all the indices used showed a decrease in the diversity of the soil prokaryotes in the following sequence: chernozem > dark chestnut soil > extremely arid desert soil.

  13. Developing a metagenomic view of xenobiotic metabolism

    PubMed Central

    Haiser, Henry J.; Turnbaugh, Peter J.

    2012-01-01

    The microbes residing in and on the human body influence human physiology in many ways, particularly through their impact on the metabolism of xenobiotic compounds, including therapeutic drugs, antibiotics, and diet-derived bioactive compounds. Despite the importance of these interactions and the many possibilities for intervention, microbial xenobiotic metabolism remains a largely underexplored component of pharmacology. Here, we discuss the emerging evidence for both direct and indirect effects of the human gut microbiota on xenobiotic metabolism, and the initial links that have been made between specific compounds, diverse members of this complex community, and the microbial genes responsible. Furthermore, we highlight the many parallels to the now well-established field of environmental bioremediation, and the vast potential to leverage emerging metagenomic tools to shed new light on these important microbial biotransformations. PMID:22902524

  14. Introduction to Metagenomics at DOE JGI (Opening Remarks for the Metagenomics Informatics Challenges Workshop: 10K Genomes at a Time)

    ScienceCinema

    Kyrpides, Nikos [DOE JGI

    2013-01-22

    After a quick introduction by DOE JGI Director Eddy Rubin, DOE JGI's Nikos Kyrpides delivers the opening remarks at the Metagenomics Informatics Challenges Workshop held at the DOE JGI on October 12-13, 2011

  15. EBI metagenomics in 2016--an expanding and evolving resource for the analysis and archiving of metagenomic data.

    PubMed

    Mitchell, Alex; Bucchini, Francois; Cochrane, Guy; Denise, Hubert; ten Hoopen, Petra; Fraser, Matthew; Pesseat, Sebastien; Potter, Simon; Scheremetjew, Maxim; Sterk, Peter; Finn, Robert D

    2016-01-01

    EBI metagenomics (https://www.ebi.ac.uk/metagenomics/) is a freely available hub for the analysis and archiving of metagenomic and metatranscriptomic data. Over the last 2 years, the resource has undergone rapid growth, with an increase of over five-fold in the number of processed samples and consequently represents one of the largest resources of analysed shotgun metagenomes. Here, we report the status of the resource in 2016 and give an overview of new developments. In particular, we describe updates to data content, a complete overhaul of the analysis pipeline, streamlining of data presentation via the website and the development of a new web based tool to compare functional analyses of sequence runs within a study. We also highlight two of the higher profile projects that have been analysed using the resource in the last year: the oceanographic projects Ocean Sampling Day and Tara Oceans. PMID:26582919

  16. Introduction to Metagenomics at DOE JGI (Opening Remarks for the Metagenomics Informatics Challenges Workshop: 10K Genomes at a Time)

    SciTech Connect

    Kyrpides, Nikos

    2011-10-12

    After a quick introduction by DOE JGI Director Eddy Rubin, DOE JGI's Nikos Kyrpides delivers the opening remarks at the Metagenomics Informatics Challenges Workshop held at the DOE JGI on October 12-13, 2011

  17. Metagenomics, metaMicrobesOnline and Kbase Data Integration (MICW - Metagenomics Informatics Challenges Workshop: 10K Genomes at a Time)

    ScienceCinema

    Dehal, Paramvir [LBNL

    2013-01-22

    Berkeley Lab's Paramvir Dehal on "Managing and Storing large Datasets in MicrobesOnline, metaMicrobesOnline and the DOE Knowledgebase" at the Metagenomics Informatics Challenges Workshop held at the DOE JGI on October 12-13, 2011.

  18. EBI metagenomics in 2016 - an expanding and evolving resource for the analysis and archiving of metagenomic data

    PubMed Central

    Mitchell, Alex; Bucchini, Francois; Cochrane, Guy; Denise, Hubert; Hoopen, Petra ten; Fraser, Matthew; Pesseat, Sebastien; Potter, Simon; Scheremetjew, Maxim; Sterk, Peter; Finn, Robert D.

    2016-01-01

    EBI metagenomics (https://www.ebi.ac.uk/metagenomics/) is a freely available hub for the analysis and archiving of metagenomic and metatranscriptomic data. Over the last 2 years, the resource has undergone rapid growth, with an increase of over five-fold in the number of processed samples and consequently represents one of the largest resources of analysed shotgun metagenomes. Here, we report the status of the resource in 2016 and give an overview of new developments. In particular, we describe updates to data content, a complete overhaul of the analysis pipeline, streamlining of data presentation via the website and the development of a new web based tool to compare functional analyses of sequence runs within a study. We also highlight two of the higher profile projects that have been analysed using the resource in the last year: the oceanographic projects Ocean Sampling Day and Tara Oceans. PMID:26582919

  19. Metagenomics, metaMicrobesOnline and Kbase Data Integration (MICW - Metagenomics Informatics Challenges Workshop: 10K Genomes at a Time)

    SciTech Connect

    Dehal, Paramvir

    2011-10-12

    Berkeley Lab's Paramvir Dehal on "Managing and Storing large Datasets in MicrobesOnline, metaMicrobesOnline and the DOE Knowledgebase" at the Metagenomics Informatics Challenges Workshop held at the DOE JGI on October 12-13, 2011.

  20. Phylogenetic Analysis of a Spontaneous Cocoa Bean Fermentation Metagenome Reveals New Insights into Its Bacterial and Fungal Community Diversity

    PubMed Central

    Illeghems, Koen; De Vuyst, Luc; Papalexandratou, Zoi; Weckx, Stefan

    2012-01-01

    This is the first report on the phylogenetic analysis of the community diversity of a single spontaneous cocoa bean box fermentation sample through a metagenomic approach involving 454 pyrosequencing. Several sequence-based and composition-based taxonomic profiling tools were used and evaluated to avoid software-dependent results and their outcome was validated by comparison with previously obtained culture-dependent and culture-independent data. Overall, this approach revealed a wider bacterial (mainly γ-Proteobacteria) and fungal diversity than previously found. Further, the use of a combination of different classification methods, in a software-independent way, helped to understand the actual composition of the microbial ecosystem under study. In addition, bacteriophage-related sequences were found. The bacterial diversity depended partially on the methods used, as composition-based methods predicted a wider diversity than sequence-based methods, and as classification methods based solely on phylogenetic marker genes predicted a more restricted diversity compared with methods that took all reads into account. The metagenomic sequencing analysis identified Hanseniaspora uvarum, Hanseniaspora opuntiae, Saccharomyces cerevisiae, Lactobacillus fermentum, and Acetobacter pasteurianus as the prevailing species. Also, the presence of occasional members of the cocoa bean fermentation process was revealed (such as Erwinia tasmaniensis, Lactobacillus brevis, Lactobacillus casei, Lactobacillus rhamnosus, Lactococcus lactis, Leuconostoc mesenteroides, and Oenococcus oeni). Furthermore, the sequence reads associated with viral communities were of a restricted diversity, dominated by Myoviridae and Siphoviridae, and reflecting Lactobacillus as the dominant host. To conclude, an accurate overview of all members of a cocoa bean fermentation process sample was revealed, indicating the superiority of metagenomic sequencing over previously used techniques. PMID:22666442

  1. Fragments and Coherence

    ERIC Educational Resources Information Center

    Watson, Anne

    2008-01-01

    Can teachers contact the inner coherence of mathematics while working in a context fragmented by always-new objectives, criteria, and initiatives? How, more importantly, can learners experience the inner coherence of mathematics while working in a context fragmented by testing, modular curricular, short-term learning objectives, and lessons that…

  2. Fluctuations in nuclear fragmentation

    SciTech Connect

    Aranda, A.; Dorso, C.O.; Furci, V.; Lopez, J.A.

    1995-12-01

    Heavy ion collisions can be used to study the thermodynamics of hot and dense nuclear matter only if the initial mass and energy fluctuations that lead to fragmentation are of thermal origin and survive the disassembly process. If this is the case, the observed fragment multiplicity should be directly related to those initial fluctuations and to the conditions of temperature and density causing them. The feasibility of this scenario is demonstrated with a molecular dynamics study of the evolution of mass and energy fluctuations, and fluctuations of the phase-space density. First, it is verified that the fluctuations leading to fragmentation are indeed early ones. Second, it is determined that different initial conditions of density and temperature can indeed produce varying final fragment multiplicities. The {rho}-{ital T} plane is mapped to the fragment multiplicity with good precision. This mapping should be easily reproducible with existing experimental data.

  3. Going deeper: metagenome of a hadopelagic microbial community.

    PubMed

    Eloe, Emiley A; Fadrosh, Douglas W; Novotny, Mark; Zeigler Allen, Lisa; Kim, Maria; Lombardo, Mary-Jane; Yee-Greenbaum, Joyclyn; Yooseph, Shibu; Allen, Eric E; Lasken, Roger; Williamson, Shannon J; Bartlett, Douglas H

    2011-01-01

    The paucity of sequence data from pelagic deep-ocean microbial assemblages has severely restricted molecular exploration of the largest biome on Earth. In this study, an analysis is presented of a large-scale 454-pyrosequencing metagenomic dataset from a hadopelagic environment from 6,000 m depth within the Puerto Rico Trench (PRT). A total of 145 Mbp of assembled sequence data was generated and compared to two pelagic deep ocean metagenomes and two representative surface seawater datasets from the Sargasso Sea. In a number of instances, all three deep metagenomes displayed similar trends, but were most magnified in the PRT, including enrichment in functions for two-component signal transduction mechanisms and transcriptional regulation. Overrepresented transporters in the PRT metagenome included outer membrane porins, diverse cation transporters, and di- and tri-carboxylate transporters that matched well with the prevailing catabolic processes such as butanoate, glyoxylate and dicarboxylate metabolism. A surprisingly high abundance of sulfatases for the degradation of sulfated polysaccharides were also present in the PRT. The most dramatic adaptational feature of the PRT microbes appears to be heavy metal resistance, as reflected in the large numbers of transporters present for their removal. As a complement to the metagenome approach, single-cell genomic techniques were utilized to generate partial whole-genome sequence data from four uncultivated cells from members of the dominant phyla within the PRT, Alphaproteobacteria, Gammaproteobacteria, Bacteroidetes and Planctomycetes. The single-cell sequence data provided genomic context for many of the highly abundant functional attributes identified from the PRT metagenome, as well as recruiting heavily the PRT metagenomic sequence data compared to 172 available reference marine genomes. Through these multifaceted sequence approaches, new insights have been provided into the unique functional attributes present in

  4. Going Deeper: Metagenome of a Hadopelagic Microbial Community

    PubMed Central

    Eloe, Emiley A.; Fadrosh, Douglas W.; Novotny, Mark; Zeigler Allen, Lisa; Kim, Maria; Lombardo, Mary-Jane; Yee-Greenbaum, Joyclyn; Yooseph, Shibu; Allen, Eric E.; Lasken, Roger; Williamson, Shannon J.; Bartlett, Douglas H.

    2011-01-01

    The paucity of sequence data from pelagic deep-ocean microbial assemblages has severely restricted molecular exploration of the largest biome on Earth. In this study, an analysis is presented of a large-scale 454-pyrosequencing metagenomic dataset from a hadopelagic environment from 6,000 m depth within the Puerto Rico Trench (PRT). A total of 145 Mbp of assembled sequence data was generated and compared to two pelagic deep ocean metagenomes and two representative surface seawater datasets from the Sargasso Sea. In a number of instances, all three deep metagenomes displayed similar trends, but were most magnified in the PRT, including enrichment in functions for two-component signal transduction mechanisms and transcriptional regulation. Overrepresented transporters in the PRT metagenome included outer membrane porins, diverse cation transporters, and di- and tri-carboxylate transporters that matched well with the prevailing catabolic processes such as butanoate, glyoxylate and dicarboxylate metabolism. A surprisingly high abundance of sulfatases for the degradation of sulfated polysaccharides were also present in the PRT. The most dramatic adaptational feature of the PRT microbes appears to be heavy metal resistance, as reflected in the large numbers of transporters present for their removal. As a complement to the metagenome approach, single-cell genomic techniques were utilized to generate partial whole-genome sequence data from four uncultivated cells from members of the dominant phyla within the PRT, Alphaproteobacteria, Gammaproteobacteria, Bacteroidetes and Planctomycetes. The single-cell sequence data provided genomic context for many of the highly abundant functional attributes identified from the PRT metagenome, as well as recruiting heavily the PRT metagenomic sequence data compared to 172 available reference marine genomes. Through these multifaceted sequence approaches, new insights have been provided into the unique functional attributes present in

  5. Environmental Metagenomics: The Data Assembly and Data Analysis Perspectives

    NASA Astrophysics Data System (ADS)

    Kumar, Vinay; Maitra, S. S.; Shukla, Rohit Nandan

    2015-01-01

    Novel gene finding is one of the emerging fields in the environmental research. In the past decades the research was focused mainly on the discovery of microorganisms which were capable of degrading a particular compound. A lot of methods are available in literature about the cultivation and screening of these novel microorganisms. All of these methods are efficient for screening of microbes which can be cultivated in the laboratory. Microorganisms which live in extreme conditions like hot springs, frozen glaciers, acid mine drainage, etc. cannot be cultivated in the laboratory, this is because of incomplete knowledge about their growth requirements like temperature, nutrients and their mutual dependence on each other. The microbes that can be cultivated correspond only to less than 1 % of the total microbes which are present in the earth. Rest of the 99 % of uncultivated majority remains inaccessible. Metagenomics transcends the culture requirements of microbes. In metagenomics DNA is directly extracted from the environmental samples such as soil, seawater, acid mine drainage etc., followed by construction and screening of metagenomic library. With the ongoing research, a huge amount of metagenomic data is accumulating. Understanding this data is an essential step to extract novel genes of industrial importance. Various bioinformatics tools have been designed to analyze and annotate the data produced from the metagenome. The Bio-informatic requirements of metagenomics data analysis are different in theory and practice. This paper reviews the tools that are available for metagenomic data analysis and the capability such tools—what they can do and their web availability.

  6. Opaque rock fragments

    SciTech Connect

    Abhijit, B.; Molinaroli, E.; Olsen, J.

    1987-05-01

    The authors describe a new, rare, but petrogenetically significant variety of rock fragments from Holocene detrital sediments. Approximately 50% of the opaque heavy mineral concentrates from Holocene siliciclastic sands are polymineralic-Fe-Ti oxide particles, i.e., they are opaque rock fragments. About 40% to 70% of these rock fragments show intergrowth of hm + il, mt + il, and mt + hm +/- il. Modal analysis of 23,282 opaque particles in 117 polished thin sections of granitic and metamorphic parent rocks and their daughter sands from semi-arid and humid climates show the following relative abundances. The data show that opaque rock fragments are more common in sands from igneous source rocks and that hm + il fragments are more durable. They assume that equilibrium conditions existed in parent rocks during the growth of these paired minerals, and that the Ti/Fe ratio did not change during oxidation of mt to hm. Geothermometric determinations using electron probe microanalysis of opaque rock fragments in sand samples from Lake Erie and the Adriatic Sea suggest that these rock fragments may have equilibrated at approximately 900/sup 0/ and 525/sup 0/C, respectively.

  7. Selectable fragmentation warhead

    SciTech Connect

    Bryan, C.S.; Paisley, D.L.; Montoya, N.I.; Stahl, D.B.

    1993-07-20

    A selectable fragmentation warhead is described comprising: a case having proximal and distal ends; a fragmenting plate mounted in said distal end of said casing; first explosive means cast adjacent to said fragmenting plate for creating a predetermined number of fragments from said fragmenting plate; three or more first laser-driven slapper detonators located adjacent to said first explosive means for detonating said first explosive means in a predetermined pattern; smoother-disk means located adjacent to said first means for accelerating said fragments; second explosive means cast adjacent to said smoother-disk means for further accelerating said fragments; at least one laser-driven slapper detonators located in said second explosive means; a laser located in said proximal end of said casing; optical fibers connecting said laser to said first and second laser-driven slapper detonators; and optical switch means located in series with said optical fibers connected to said plurality of first laser-driven slapper detonators for blocking or passing light from said laser to said plurality of first laser-driven slapper detonators.

  8. Stratification of gallstone fragments: the key to more effective fragmentation.

    PubMed

    Alderfer, J T; Laufer, I; Wisniewski, F; Malet, P F

    1992-04-01

    During previous experiments with in vitro fragmentation in a simulated gallbladder, we noticed that stone fragments tended to stratify with the dust and smaller fragments settled to the dependent portion, while the larger fragments settled on top. We reviewed the oral cholecystogram (OCG) of 10 patients examined 6 months following gallstone lithotripsy. In all cases with adequate visualization of stone fragments, the stratification phenomenon was observed. We hypothesized that adjusting the shock wave focus to target on these large fragments would improve the efficiency of fragmentation. To test this hypothesis, we fragmented three matched pairs of gallstones in vitro. For each pair, the stones were removed from the same gallbladder and the stone weights of the two stones were within 10%. The smaller member of each pair was fragmented using the "old method" with the focus on the fragment line. The larger stone was fragmented with the "new method" with the focus in the acoustic shadow deep to the echogenic line caused by the dust and small fragments in the dependent portion. The distribution of fragments was analyzed by passing the fragments through a series of filters. With the new method of targeting, the proportion of fragments less than 1.5 mm was doubled while the fragments greater than 5 mm were eliminated. The new method of targeting, taking into account the stratification of stone fragments, produces more effective fragmentation and should lead to more rapid clearance of fragments from the gallbladder. PMID:10149180

  9. COGNIZER: A Framework for Functional Annotation of Metagenomic Datasets

    PubMed Central

    Bose, Tungadri; Haque, Mohammed Monzoorul; Reddy, CVSK; Mande, Sharmila S.

    2015-01-01

    Background Recent advances in sequencing technologies have resulted in an unprecedented increase in the number of metagenomes that are being sequenced world-wide. Given their volume, functional annotation of metagenomic sequence datasets requires specialized computational tools/techniques. In spite of having high accuracy, existing stand-alone functional annotation tools necessitate end-users to perform compute-intensive homology searches of metagenomic datasets against "multiple" databases prior to functional analysis. Although, web-based functional annotation servers address to some extent the problem of availability of compute resources, uploading and analyzing huge volumes of sequence data on a shared public web-service has its own set of limitations. In this study, we present COGNIZER, a comprehensive stand-alone annotation framework which enables end-users to functionally annotate sequences constituting metagenomic datasets. The COGNIZER framework provides multiple workflow options. A subset of these options employs a novel directed-search strategy which helps in reducing the overall compute requirements for end-users. The COGNIZER framework includes a cross-mapping database that enables end-users to simultaneously derive/infer KEGG, Pfam, GO, and SEED subsystem information from the COG annotations. Results Validation experiments performed with real-world metagenomes and metatranscriptomes, generated using diverse sequencing technologies, indicate that the novel directed-search strategy employed in COGNIZER helps in reducing the compute requirements without significant loss in annotation accuracy. A comparison of COGNIZER's results with pre-computed benchmark values indicate the reliability of the cross-mapping database employed in COGNIZER. Conclusion The COGNIZER framework is capable of comprehensively annotating any metagenomic or metatranscriptomic dataset from varied sequencing platforms in functional terms. Multiple search options in COGNIZER provide

  10. [Comparative Metagenomics of BIOLAK and A2O Activated Sludge Based on Next-generation Sequencing Technology].

    PubMed

    Tian, Mei; Liu, Han-hu; Shen, Xin

    2016-02-15

    This is the first report of comparative metagenomic analyses of BIOLAK sludge and anaerobic/anoxic/oxic (A2O) sludge. In the BIOLAK and A2O sludge metagenomes, 47 and 51 phyla were identified respectively, more than the numbers of phyla identified in Australia EBPR (enhanced biological phosphorus removal), USA EBPR and Bibby sludge. All phyla found in the BIOLAK sludge were detected in the A2O sludge, but four phyla were exclusively found in the A20 sludge. The proportion of the phylum Ignavibacteriae in the A2O sludge was 2.0440%, which was 3.2 times as much as that in the BIOLAK sludge (0.6376%). Meanwhile, the proportion of the bacterial phylum Gemmatimonadetes in the BIOLAK sludge was 2.4673%, which was >17 times as much as that in the A2O sludge (0.1404%). The proportion of the bacterial phylum Chlamydiae in the BIOLAK metagenome (0.2192%) was >6 times higher than that in the A2O (0.0360%). Furthermore, 167 genera found in the A20 sludge were not detected in the BIOLAK sludge. And 50 genera found in the BIOLAK sludge were not detected in the A20 sludge. From the analyses of both the phylum and genus levels, there were huge differences between the two biological communities of A2O and BIOLAK sludge. However, the proportions of each group of functional genes associated with metabolism of nitrogen, phosphor, sulfur and aromatic compounds in BIOLAK were very similar to those in A2O sludge. Moreover, the rankings of all six KEGG (Kyoto Encyclopedia for Genes and Genomes) categories were identical in the two sludges. In addition, the analyses of functional classification and pathway related nitrogen metabolism showed that the abundant enzymes had identical ranking in the BIOLAK and A2O metagenomes. Therefore, comparative metagenomics of BIOLAK and A2O activated sludge indicated similar function assignments from the two different biological communities. PMID:27363155

  11. Reconstruction of Bacterial and Viral Genomes from Multiple Metagenomes.

    PubMed

    Gupta, Ankit; Kumar, Sanjiv; Prasoodanan, Vishnu P K; Harish, K; Sharma, Ashok K; Sharma, Vineet K

    2016-01-01

    Several metagenomic projects have been accomplished or are in progress. However, in most cases, it is not feasible to generate complete genomic assemblies of species from the metagenomic sequencing of a complex environment. Only a few studies have reported the reconstruction of bacterial genomes from complex metagenomes. In this work, Binning-Assembly approach has been proposed and demonstrated for the reconstruction of bacterial and viral genomes from 72 human gut metagenomic datasets. A total 1156 bacterial genomes belonging to 219 bacterial families and, 279 viral genomes belonging to 84 viral families could be identified. More than 80% complete draft genome sequences could be reconstructed for a total of 126 bacterial and 11 viral genomes. Selected draft assembled genomes could be validated with 99.8% accuracy using their ORFs. The study provides useful information on the assembly expected for a species given its number of reads and abundance. This approach along with spiking was also demonstrated to be useful in improving the draft assembly of a bacterial genome. The Binning-Assembly approach can be successfully used to reconstruct bacterial and viral genomes from multiple metagenomic datasets obtained from similar environments. PMID:27148174

  12. Exploratory experimentation and scientific practice: metagenomics and the proteorhodopsin case.

    PubMed

    O'Malley, Maureen A

    2007-01-01

    Exploratory experimentation and high-throughput molecular biology appear to have considerable affinity for each other. Included in the latter category is metagenomics, which is the DNA-based study of diverse microbial communities from a vast range of non-laboratory environments. Metagenomics has already made numerous discoveries and these have led to reinterpretations of fundamental concepts of microbial organization, evolution, and ecology. The most outstanding success story of metagenomics to date involves the discovery of a rhodopsin gene, named proteorhodopsin, in marine bacteria that were never suspected to have any photobiological capacities. A discussion of this finding and its detailed investigation illuminates the relationship between exploratory experimentation and metagenomics. Specifically, the proteorhodopsin story indicates that a dichotomous interpretation of theory-driven and exploratory experimentation is insufficient and that an interactive understanding of these two types of experimentation can be usefully supplemented by another category, "natural history experimentation". Further reflection on the context of metagenomics suggests the necessity of thinking more historically about exploratory and other forms of experimentation. PMID:18822661

  13. Reconstruction of Bacterial and Viral Genomes from Multiple Metagenomes

    PubMed Central

    Gupta, Ankit; Kumar, Sanjiv; Prasoodanan, Vishnu P. K.; Harish, K.; Sharma, Ashok K.; Sharma, Vineet K.

    2016-01-01

    Several metagenomic projects have been accomplished or are in progress. However, in most cases, it is not feasible to generate complete genomic assemblies of species from the metagenomic sequencing of a complex environment. Only a few studies have reported the reconstruction of bacterial genomes from complex metagenomes. In this work, Binning-Assembly approach has been proposed and demonstrated for the reconstruction of bacterial and viral genomes from 72 human gut metagenomic datasets. A total 1156 bacterial genomes belonging to 219 bacterial families and, 279 viral genomes belonging to 84 viral families could be identified. More than 80% complete draft genome sequences could be reconstructed for a total of 126 bacterial and 11 viral genomes. Selected draft assembled genomes could be validated with 99.8% accuracy using their ORFs. The study provides useful information on the assembly expected for a species given its number of reads and abundance. This approach along with spiking was also demonstrated to be useful in improving the draft assembly of a bacterial genome. The Binning-Assembly approach can be successfully used to reconstruct bacterial and viral genomes from multiple metagenomic datasets obtained from similar environments. PMID:27148174

  14. Metagenomic Insights into Transferable Antibiotic Resistance in Oral Bacteria.

    PubMed

    Sukumar, S; Roberts, A P; Martin, F E; Adler, C J

    2016-08-01

    Antibiotic resistance is considered one of the greatest threats to global public health. Resistance is often conferred by the presence of antibiotic resistance genes (ARGs), which are readily found in the oral microbiome. In-depth genetic analyses of the oral microbiome through metagenomic techniques reveal a broad distribution of ARGs (including novel ARGs) in individuals not recently exposed to antibiotics, including humans in isolated indigenous populations. This has resulted in a paradigm shift from focusing on the carriage of antibiotic resistance in pathogenic bacteria to a broader concept of an oral resistome, which includes all resistance genes in the microbiome. Metagenomics is beginning to demonstrate the role of the oral resistome and horizontal gene transfer within and between commensals in the absence of selective pressure, such as an antibiotic. At the chairside, metagenomic data reinforce our need to adhere to current antibiotic guidelines to minimize the spread of resistance, as such data reveal the extent of ARGs without exposure to antimicrobials and the ecologic changes created in the oral microbiome by even a single dose of antibiotics. The aim of this review is to discuss the role of metagenomics in the investigation of the oral resistome, including the transmission of antibiotic resistance in the oral microbiome. Future perspectives, including clinical implications of the findings from metagenomic investigations of oral ARGs, are also considered. PMID:27183895

  15. Exploration and retrieval of whole-metagenome sequencing samples

    PubMed Central

    Seth, Sohan; Välimäki, Niko; Kaski, Samuel; Honkela, Antti

    2014-01-01

    Motivation: Over the recent years, the field of whole-metagenome shotgun sequencing has witnessed significant growth owing to the high-throughput sequencing technologies that allow sequencing genomic samples cheaper, faster and with better coverage than before. This technical advancement has initiated the trend of sequencing multiple samples in different conditions or environments to explore the similarities and dissimilarities of the microbial communities. Examples include the human microbiome project and various studies of the human intestinal tract. With the availability of ever larger databases of such measurements, finding samples similar to a given query sample is becoming a central operation. Results: In this article, we develop a content-based exploration and retrieval method for whole-metagenome sequencing samples. We apply a distributed string mining framework to efficiently extract all informative sequence k-mers from a pool of metagenomic samples and use them to measure the dissimilarity between two samples. We evaluate the performance of the proposed approach on two human gut metagenome datasets as well as human microbiome project metagenomic samples. We observe significant enrichment for diseased gut samples in results of queries with another diseased sample and high accuracy in discriminating between different body sites even though the method is unsupervised. Availability and implementation: A software implementation of the DSM framework is available at https://github.com/HIITMetagenomics/dsm-framework. Contact: sohan.seth@hiit.fi or antti.honkela@hiit.fi Supplementary information: Supplementary data are available at Bioinformatics online. PMID:24845653

  16. Longitudinal Metagenomic Analysis of Hospital Air Identifies Clinically Relevant Microbes

    PubMed Central

    King, Paula; Pham, Long K.; Waltz, Shannon; Sphar, Dan; Yamamoto, Robert T.; Conrad, Douglas; Taplitz, Randy; Torriani, Francesca

    2016-01-01

    We describe the sampling of sixty-three uncultured hospital air samples collected over a six-month period and analysis using shotgun metagenomic sequencing. Our primary goals were to determine the longitudinal metagenomic variability of this environment, identify and characterize genomes of potential pathogens and determine whether they are atypical to the hospital airborne metagenome. Air samples were collected from eight locations which included patient wards, the main lobby and outside. The resulting DNA libraries produced 972 million sequences representing 51 gigabases. Hierarchical clustering of samples by the most abundant 50 microbial orders generated three major nodes which primarily clustered by type of location. Because the indoor locations were longitudinally consistent, episodic relative increases in microbial genomic signatures related to the opportunistic pathogens Aspergillus, Penicillium and Stenotrophomonas were identified as outliers at specific locations. Further analysis of microbial reads specific for Stenotrophomonas maltophilia indicated homology to a sequenced multi-drug resistant clinical strain and we observed broad sequence coverage of resistance genes. We demonstrate that a shotgun metagenomic sequencing approach can be used to characterize the resistance determinants of pathogen genomes that are uncharacteristic for an otherwise consistent hospital air microbial metagenomic profile. PMID:27482891

  17. An introduction to the analysis of shotgun metagenomic data

    PubMed Central

    Sharpton, Thomas J.

    2014-01-01

    Environmental DNA sequencing has revealed the expansive biodiversity of microorganisms and clarified the relationship between host-associated microbial communities and host phenotype. Shotgun metagenomic DNA sequencing is a relatively new and powerful environmental sequencing approach that provides insight into community biodiversity and function. But, the analysis of metagenomic sequences is complicated due to the complex structure of the data. Fortunately, new tools and data resources have been developed to circumvent these complexities and allow researchers to determine which microbes are present in the community and what they might be doing. This review describes the analytical strategies and specific tools that can be applied to metagenomic data and the considerations and caveats associated with their use. Specifically, it documents how metagenomes can be analyzed to quantify community structure and diversity, assemble novel genomes, identify new taxa and genes, and determine which metabolic pathways are encoded in the community. It also discusses several methods that can be used compare metagenomes to identify taxa and functions that differentiate communities. PMID:24982662

  18. Application of metagenomics in the human gut microbiome.

    PubMed

    Wang, Wei-Lin; Xu, Shao-Yan; Ren, Zhi-Gang; Tao, Liang; Jiang, Jian-Wen; Zheng, Shu-Sen

    2015-01-21

    There are more than 1000 microbial species living in the complex human intestine. The gut microbial community plays an important role in protecting the host against pathogenic microbes, modulating immunity, regulating metabolic processes, and is even regarded as an endocrine organ. However, traditional culture methods are very limited for identifying microbes. With the application of molecular biologic technology in the field of the intestinal microbiome, especially metagenomic sequencing of the next-generation sequencing technology, progress has been made in the study of the human intestinal microbiome. Metagenomics can be used to study intestinal microbiome diversity and dysbiosis, as well as its relationship to health and disease. Moreover, functional metagenomics can identify novel functional genes, microbial pathways, antibiotic resistance genes, functional dysbiosis of the intestinal microbiome, and determine interactions and co-evolution between microbiota and host, though there are still some limitations. Metatranscriptomics, metaproteomics and metabolomics represent enormous complements to the understanding of the human gut microbiome. This review aims to demonstrate that metagenomics can be a powerful tool in studying the human gut microbiome with encouraging prospects. The limitations of metagenomics to be overcome are also discussed. Metatranscriptomics, metaproteomics and metabolomics in relation to the study of the human gut microbiome are also briefly discussed. PMID:25624713

  19. Metagenomic analysis of permafrost microbial community response to thaw

    SciTech Connect

    Mackelprang, R.; Waldrop, M.P.; DeAngelis, K.M.; David, M.M.; Chavarria, K.L.; Blazewicz, S.J.; Rubin, E.M.; Jansson, J.K.

    2011-07-01

    We employed deep metagenomic sequencing to determine the impact of thaw on microbial phylogenetic and functional genes and related this data to measurements of methane emissions. Metagenomics, the direct sequencing of DNA from the environment, allows for the examination of whole biochemical pathways and associated processes, as opposed to individual pieces of the metabolic puzzle. Our metagenome analyses revealed that during transition from a frozen to a thawed state there were rapid shifts in many microbial, phylogenetic and functional gene abundances and pathways. After one week of incubation at 5°C, permafrost metagenomes converged to be more similar to each other than while they were frozen. We found that multiple genes involved in cycling of C and nitrogen shifted rapidly during thaw. We also constructed the first draft genome from a complex soil metagenome, which corresponded to a novel methanogen. Methane previously accumulated in permafrost was released during thaw and subsequently consumed by methanotrophic bacteria. Together these data point towards the importance of rapid cycling of methane and nitrogen in thawing permafrost.

  20. Assessment of Metagenomic Assembly Using Simulated Next Generation Sequencing Data

    PubMed Central

    Sunagawa, Shinichi; Järvelin, Aino I.; Chan, Michelle M.; Arumugam, Manimozhiyan; Raes, Jeroen; Bork, Peer

    2012-01-01

    Due to the complexity of the protocols and a limited knowledge of the nature of microbial communities, simulating metagenomic sequences plays an important role in testing the performance of existing tools and data analysis methods with metagenomic data. We developed metagenomic read simulators with platform-specific (Sanger, pyrosequencing, Illumina) base-error models, and simulated metagenomes of differing community complexities. We first evaluated the effect of rigorous quality control on Illumina data. Although quality filtering removed a large proportion of the data, it greatly improved the accuracy and contig lengths of resulting assemblies. We then compared the quality-trimmed Illumina assemblies to those from Sanger and pyrosequencing. For the simple community (10 genomes) all sequencing technologies assembled a similar amount and accurately represented the expected functional composition. For the more complex community (100 genomes) Illumina produced the best assemblies and more correctly resembled the expected functional composition. For the most complex community (400 genomes) there was very little assembly of reads from any sequencing technology. However, due to the longer read length the Sanger reads still represented the overall functional composition reasonably well. We further examined the effect of scaffolding of contigs using paired-end Illumina reads. It dramatically increased contig lengths of the simple community and yielded minor improvements to the more complex communities. Although the increase in contig length was accompanied by increased chimericity, it resulted in more complete genes and a better characterization of the functional repertoire. The metagenomic simulators developed for this research are freely available. PMID:22384016

  1. Fragment capture device

    DOEpatents

    Payne, Lloyd R.; Cole, David L.

    2010-03-30

    A fragment capture device for use in explosive containment. The device comprises an assembly of at least two rows of bars positioned to eliminate line-of-sight trajectories between the generation point of fragments and a surrounding containment vessel or asset. The device comprises an array of at least two rows of bars, wherein each row is staggered with respect to the adjacent row, and wherein a lateral dimension of each bar and a relative position of each bar in combination provides blockage of a straight-line passage of a solid fragment through the adjacent rows of bars, wherein a generation point of the solid fragment is located within a cavity at least partially enclosed by the array of bars.

  2. Denoising PCR-amplified metagenome data

    PubMed Central

    2012-01-01

    Background PCR amplification and high-throughput sequencing theoretically enable the characterization of the finest-scale diversity in natural microbial and viral populations, but each of these methods introduces random errors that are difficult to distinguish from genuine biological diversity. Several approaches have been proposed to denoise these data but lack either speed or accuracy. Results We introduce a new denoising algorithm that we call DADA (Divisive Amplicon Denoising Algorithm). Without training data, DADA infers both the sample genotypes and error parameters that produced a metagenome data set. We demonstrate performance on control data sequenced on Roche’s 454 platform, and compare the results to the most accurate denoising software currently available, AmpliconNoise. Conclusions DADA is more accurate and over an order of magnitude faster than AmpliconNoise. It eliminates the need for training data to establish error parameters, fully utilizes sequence-abundance information, and enables inclusion of context-dependent PCR error rates. It should be readily extensible to other sequencing platforms such as Illumina. PMID:23113967

  3. Bioinformatic Insights from Metagenomics through Visualization

    SciTech Connect

    Havre, Susan L.; Webb-Robertson, Bobbie-Jo M.; Shah, Anuj; Posse, Christian; Gopalan, Banu; Brockman, Fred J.

    2005-08-10

    Revised abstract: (remove current and replace with this) Cutting-edge biological and bioinformatics research seeks a systems perspective through the analysis of multiple types of high-throughput and other experimental data for the same sample. Systems-level analysis requires the integration and fusion of such data, typically through advanced statistics and mathematics. Visualization is a complementary com-putational approach that supports integration and analysis of complex data or its derivatives. We present a bioinformatics visualization prototype, Juxter, which depicts categorical information derived from or assigned to these diverse data for the purpose of comparing patterns across categorizations. The visualization allows users to easily discern correlated and anomalous patterns in the data. These patterns, which might not be detected automatically by algorithms, may reveal valuable information leading to insight and discovery. We describe the visualization and interaction capabilities and demonstrate its utility in a new field, metagenomics, which combines molecular biology and genetics to identify and characterize genetic material from multi-species microbial samples.

  4. Metagenomic scaffolds enable combinatorial lignin transformation

    PubMed Central

    Strachan, Cameron R.; Singh, Rahul; VanInsberghe, David; Ievdokymenko, Kateryna; Budwill, Karen; Mohn, William W.; Eltis, Lindsay D.; Hallam, Steven J.

    2014-01-01

    Engineering the microbial transformation of lignocellulosic biomass is essential to developing modern biorefining processes that alleviate reliance on petroleum-derived energy and chemicals. Many current bioprocess streams depend on the genetic tractability of Escherichia coli with a primary emphasis on engineering cellulose/hemicellulose catabolism, small molecule production, and resistance to product inhibition. Conversely, bioprocess streams for lignin transformation remain embryonic, with relatively few environmental strains or enzymes implicated. Here we develop a biosensor responsive to monoaromatic lignin transformation products compatible with functional screening in E. coli. We use this biosensor to retrieve metagenomic scaffolds sourced from coal bed bacterial communities conferring an array of lignin transformation phenotypes that synergize in combination. Transposon mutagenesis and comparative sequence analysis of active clones identified genes encoding six functional classes mediating lignin transformation phenotypes that appear to be rearrayed in nature via horizontal gene transfer. Lignin transformation activity was then demonstrated for one of the predicted gene products encoding a multicopper oxidase to validate the screen. These results illuminate cellular and community-wide networks acting on aromatic polymers and expand the toolkit for engineering recombinant lignin transformation based on ecological design principles. PMID:24982175

  5. Unlocking Short Read Sequencing for Metagenomics

    PubMed Central

    Timberlake, Sonia C.; Blackburn, Matthew C.; Malmstrom, Rex R.; Alm, Eric J.; Chisholm, Sallie W.

    2010-01-01

    Background Different high-throughput nucleic acid sequencing platforms are currently available but a trade-off currently exists between the cost and number of reads that can be generated versus the read length that can be achieved. Methodology/Principal Findings We describe an experimental and computational pipeline yielding millions of reads that can exceed 200 bp with quality scores approaching that of traditional Sanger sequencing. The method combines an automatable gel-less library construction step with paired-end sequencing on a short-read instrument. With appropriately sized library inserts, mate-pair sequences can overlap, and we describe the SHERA software package that joins them to form a longer composite read. Conclusions/Significance This strategy is broadly applicable to sequencing applications that benefit from low-cost high-throughput sequencing, but require longer read lengths. We demonstrate that our approach enables metagenomic analyses using the Illumina Genome Analyzer, with low error rates, and at a fraction of the cost of pyrosequencing. PMID:20676378

  6. Genomics, metagenomics and proteomics in biomining microorganisms.

    PubMed

    Valenzuela, Lissette; Chi, An; Beard, Simon; Orell, Alvaro; Guiliani, Nicolas; Shabanowitz, Jeff; Hunt, Donald F; Jerez, Carlos A

    2006-01-01

    The use of acidophilic, chemolithotrophic microorganisms capable of oxidizing iron and sulfur in industrial processes to recover metals from minerals containing copper, gold and uranium is a well established biotechnology with distinctive advantages over traditional mining. A consortium of different microorganisms participates in the oxidative reactions resulting in the extraction of dissolved metal values from ores. Considerable effort has been spent in the last years to understand the biochemistry of iron and sulfur compounds oxidation, bacteria-mineral interactions (chemotaxis, quorum sensing, adhesion, biofilm formation) and several adaptive responses allowing the microorganisms to survive in a bioleaching environment. All of these are considered key phenomena for understanding the process of biomining. The use of genomics, metagenomics and high throughput proteomics to study the global regulatory responses that the biomining community uses to adapt to their changing environment is just beginning to emerge in the last years. These powerful approaches are reviewed here since they offer the possibility of exciting new findings that will allow analyzing the community as a microbial system, determining the extent to which each of the individual participants contributes to the process, how they evolve in time to keep the conglomerate healthy and therefore efficient during the entire process of bioleaching. PMID:16288845

  7. Metagenomic analysis of phosphorus removing sludgecommunities

    SciTech Connect

    Garcia Martin, Hector; Ivanova, Natalia; Kunin, Victor; Warnecke,Falk; Barry, Kerrie; McHardy, Alice C.; Yeates, Christine; He, Shaomei; Salamov, Asaf; Szeto, Ernest; Dalin, Eileen; Putnam, Nik; Shapiro, HarrisJ.; Pangilinan, Jasmyn L.; Rigoutsos, Isidore; Kyrpides, Nikos C.; Blackall, Linda Louise; McMahon, Katherine D.; Hugenholtz, Philip

    2006-02-01

    Enhanced Biological Phosphorus Removal (EBPR) is not wellunderstood at the metabolic level despite being one of the best-studiedmicrobially-mediated industrial processes due to its ecological andeconomic relevance. Here we present a metagenomic analysis of twolab-scale EBPR sludges dominated by the uncultured bacterium, "CandidatusAccumulibacter phosphatis." This analysis resolves several controversiesin EBPR metabolic models and provides hypotheses explaining the dominanceof A. phosphatis in this habitat, its lifestyle outside EBPR and probablecultivation requirements. Comparison of the same species from differentEBPR sludges highlights recent evolutionary dynamics in the A. phosphatisgenome that could be linked to mechanisms for environmental adaptation.In spite of an apparent lack of phylogenetic overlap in the flankingcommunities of the two sludges studied, common functional themes werefound, at least one of them complementary to the inferred metabolism ofthe dominant organism. The present study provides a much-needed blueprintfor a systems-level understanding of EBPR and illustrates thatmetagenomics enables detailed, often novel, insights into evenwell-studied biological systems.

  8. Fracture tooth fragment reattachment

    PubMed Central

    Maitin, Nitin; Maitin, Shipra Nangalia; Rastogi, Khushboo; Bhushan, Rajarshi

    2013-01-01

    Coronal fractures of the anterior teeth are a common form of dental trauma and its sequelae may impair the establishment and accomplishment of an adequate treatment plan. Among the various treatment options, reattachment of a crown fragment is a conservative treatment that should be considered for crown fractures of anterior teeth. This clinical case reports the management of two coronal tooth fracture cases that were successfully treated using tooth fragment reattachment using glass-fibre-reinforced composite post. PMID:23853012

  9. From bacterial genomics to metagenomics: concept, tools and recent advances.

    PubMed

    Sharma, Pooja; Kumari, Hansi; Kumar, Mukesh; Verma, Mansi; Kumari, Kirti; Malhotra, Shweta; Khurana, Jitendra; Lal, Rup

    2008-06-01

    In the last 20 years, the applications of genomics tools have completely transformed the field of microbial research. This has primarily happened due to revolution in sequencing technologies that have become available today. This review therefore, first describes the discoveries, upgradation and automation of sequencing techniques in a chronological order, followed by a brief discussion on microbial genomics. Some of the recently sequenced bacterial genomes are described to explain how complete genome data is now being used to derive interesting findings. Apart from the genomics of individual microbes, the study of unculturable microbiota from different environments is increasingly gaining importance. The second section is thus dedicated to the concept of metagenomics describing environmental DNA isolation, metagenomic library construction and screening methods to look for novel and potentially important genes, enzymes and biomolecules. It also deals with the pioneering studies in the area of metagenomics that are offering new insights into the previously unappreciated microbial world. PMID:23100712

  10. Recovery of a Medieval Brucella melitensis Genome Using Shotgun Metagenomics

    PubMed Central

    Kay, Gemma L.; Sergeant, Martin J.; Giuffra, Valentina; Bandiera, Pasquale; Milanese, Marco; Bramanti, Barbara

    2014-01-01

    ABSTRACT Shotgun metagenomics provides a powerful assumption-free approach to the recovery of pathogen genomes from contemporary and historical material. We sequenced the metagenome of a calcified nodule from the skeleton of a 14th-century middle-aged male excavated from the medieval Sardinian settlement of Geridu. We obtained 6.5-fold coverage of a Brucella melitensis genome. Sequence reads from this genome showed signatures typical of ancient or aged DNA. Despite the relatively low coverage, we were able to use information from single-nucleotide polymorphisms to place the medieval pathogen genome within a clade of B. melitensis strains that included the well-studied Ether strain and two other recent Italian isolates. We confirmed this placement using information from deletions and IS711 insertions. We conclude that metagenomics stands ready to document past and present infections, shedding light on the emergence, evolution, and spread of microbial pathogens. PMID:25028426

  11. Natural Product Discovery through Improved Functional Metagenomics in Streptomyces.

    PubMed

    Iqbal, Hala A; Low-Beinart, Lila; Obiajulu, Joseph U; Brady, Sean F

    2016-08-01

    Because the majority of environmental bacteria are not easily culturable, access to many bacterially encoded secondary metabolites will be dependent on the development of improved functional metagenomic screening methods. In this study, we examined a collection of diverse Streptomyces species for the best innate ability to heterologously express biosynthetic gene clusters. We then optimized methods for constructing high quality metagenomic cosmid libraries in the best Streptomyces host. An initial screen of a 1.5 million-membered metagenomic library constructed in Streptomyces albus, the species that exhibited the highest propensity for heterologous expression of gene clusters, led to the identification of the novel natural product metatricycloene (1). Metatricycloene is a tricyclic polyene encoded by a reductive, iterative polyketide-like gene cluster. Related gene clusters found in sequenced genomes appear to encode a largely unexplored collection of structurally diverse, polyene-based metabolites. PMID:27447056

  12. Recovering complete and draft population genomes from metagenome datasets

    DOE PAGESBeta

    Sangwan, Naseer; Xia, Fangfang; Gilbert, Jack A.

    2016-03-08

    Assembly of metagenomic sequence data into microbial genomes is of fundamental value to improving our understanding of microbial ecology and metabolism by elucidating the functional potential of hard-to-culture microorganisms. Here, we provide a synthesis of available methods to bin metagenomic contigs into species-level groups and highlight how genetic diversity, sequencing depth, and coverage influence binning success. Despite the computational cost on application to deeply sequenced complex metagenomes (e.g., soil), covarying patterns of contig coverage across multiple datasets significantly improves the binning process. We also discuss and compare current genome validation methods and reveal how these methods tackle the problem ofmore » chimeric genome bins i.e., sequences from multiple species. Finally, we explore how population genome assembly can be used to uncover biogeographic trends and to characterize the effect of in situ functional constraints on the genome-wide evolution.« less

  13. Ray Meta: scalable de novo metagenome assembly and profiling

    PubMed Central

    2012-01-01

    Voluminous parallel sequencing datasets, especially metagenomic experiments, require distributed computing for de novo assembly and taxonomic profiling. Ray Meta is a massively distributed metagenome assembler that is coupled with Ray Communities, which profiles microbiomes based on uniquely-colored k-mers. It can accurately assemble and profile a three billion read metagenomic experiment representing 1,000 bacterial genomes of uneven proportions in 15 hours with 1,024 processor cores, using only 1.5 GB per core. The software will facilitate the processing of large and complex datasets, and will help in generating biological insights for specific environments. Ray Meta is open source and available at http://denovoassembler.sf.net. PMID:23259615

  14. Characterizing forest fragments in boreal, temperate, and tropical ecosystems.

    PubMed

    Meddens, Arjan J H; Hudak, Andrew T; Evans, Jeffrey S; Gould, William A; González, Grizelle

    2008-12-01

    An increased ability to analyze landscapes in a spatial manner through the use of remote sensing leads to improved capabilities for quantifying human-induced forest fragmentation. Developments of spatially explicit methods in landscape analyses are emerging. In this paper, the image delineation software program eCognition and the spatial pattern analysis program FRAGSTATS were used to quantify patterns of forest fragments on six landscapes across three different climatic regions characterized by different moisture regimes and different influences of human pressure. Our results support the idea that landscapes with higher road and population density are more fragmented; however, there are other, equally influential factors contributing to fragmentation, such as moisture regime, historic land use, and fire dynamics. Our method provided an objective means to characterize landscapes and assess patterns of forest fragments across different forested ecosystems by addressing the limitations of pixel-based classification and incorporating image objects. PMID:19205180

  15. Form classification

    NASA Astrophysics Data System (ADS)

    Reddy, K. V. Umamaheswara; Govindaraju, Venu

    2008-01-01

    The problem of form classification is to assign a single-page form image to one of a set of predefined form types or classes. We classify the form images using low level pixel density information from the binary images of the documents. In this paper, we solve the form classification problem with a classifier based on the k-means algorithm, supported by adaptive boosting. Our classification method is tested on the NIST scanned tax forms data bases (special forms databases 2 and 6) which include machine-typed and handwritten documents. Our method improves the performance over published results on the same databases, while still using a simple set of image features.

  16. Genetic and functional properties of uncultivated MCG archaea assessed by metagenome and gene expression analyses

    PubMed Central

    Meng, Jun; Xu, Jun; Qin, Dan; He, Ying; Xiao, Xiang; Wang, Fengping

    2014-01-01

    The Miscellaneous Crenarchaeota group (MCG) Archaea is one of the predominant archaeal groups in anoxic environments and may have significant roles in the global biogeochemical cycles. However, no isolate of MCG has been cultivated or characterized to date. In this study, we investigated the genetic organization, ecophysiological properties and evolutionary relationships of MCG archaea with other archaeal members using metagenome information and the result of gene expression experiments. A comparison of the gene organizations and similarities around the 16S rRNA genes from all available MCG fosmid and cosmid clones revealed no significant synteny among genomic fragments, demonstrating that there are large genetic variations within members of the MCG. Phylogenetic analyses of large-subunit+small-subunit rRNA, concatenated ribosomal protein genes and topoisomerases IB gene (TopoIB) all demonstrate that MCG constituted a sister lineage to the newly proposed archaeal phylum Aigarchaeota and Thaumarchaeota. Genes involved in protocatechuate degradation and chemotaxis were found in a MCG fosmid 75G8 genome fragment, suggesting that this MCG member may have a role in the degradation of aromatic compounds. Moreover, the expression of a putative 4-carboxymuconolactone decarboxylase was observed when the sediment was supplemented with protocatechuate, further supporting the hypothesis that this MCG member degrades aromatic compounds. PMID:24108328

  17. Polyketide Synthases in the Microbiome of the Marine Sponge Plakortis halichondrioides: A Metagenomic Update

    PubMed Central

    Della Sala, Gerardo; Hochmuth, Thomas; Teta, Roberta; Costantino, Valeria; Mangoni, Alfonso

    2014-01-01

    Sponge-associated microorganisms are able to assemble the complex machinery for the production of secondary metabolites such as polyketides, the most important class of marine natural products from a drug discovery perspective. A comprehensive overview of polyketide biosynthetic genes of the sponge Plakortis halichondrioides and its symbionts was obtained in the present study by massively parallel 454 pyrosequencing of complex and heterogeneous PCR (Polymerase Chain Reaction) products amplified from the metagenomic DNA of a specimen of P. halichondrioides collected in the Caribbean Sea. This was accompanied by a survey of the bacterial diversity within the sponge. In line with previous studies, sequences belonging to supA and swfA, two widespread sponge-specific groups of polyketide synthase (PKS) genes were dominant. While they have been previously reported as belonging to Poribacteria (a novel bacterial phylum found exclusively in sponges), re-examination of current genomic sequencing data showed supA and swfA not to be present in the poribacterial genome. Several non-supA, non-swfA type-I PKS fragments were also identified. A significant portion of these fragments resembled type-I PKSs from protists, suggesting that bacteria may not be the only source of polyketides from P. halichondrioides, and that protistan PKSs should receive further investigation as a source of novel polyketides. PMID:25405856

  18. A New Secondary Structure Assignment Algorithm Using Cα Backbone Fragments.

    PubMed

    Cao, Chen; Wang, Guishen; Liu, An; Xu, Shutan; Wang, Lincong; Zou, Shuxue

    2016-01-01

    The assignment of secondary structure elements in proteins is a key step in the analysis of their structures and functions. We have developed an algorithm, SACF (secondary structure assignment based on Cα fragments), for secondary structure element (SSE) assignment based on the alignment of Cα backbone fragments with central poses derived by clustering known SSE fragments. The assignment algorithm consists of three steps: First, the outlier fragments on known SSEs are detected. Next, the remaining fragments are clustered to obtain the central fragments for each cluster. Finally, the central fragments are used as a template to make assignments. Following a large-scale comparison of 11 secondary structure assignment methods, SACF, KAKSI and PROSS are found to have similar agreement with DSSP, while PCASSO agrees with DSSP best. SACF and PCASSO show preference to reducing residues in N and C cap regions, whereas KAKSI, P-SEA and SEGNO tend to add residues to the terminals when DSSP assignment is taken as standard. Moreover, our algorithm is able to assign subtle helices (310-helix, π-helix and left-handed helix) and make uniform assignments, as well as to detect rare SSEs in β-sheets or long helices as outlier fragments from other programs. The structural uniformity should be useful for protein structure classification and prediction, while outlier fragments underlie the structure-function relationship. PMID:26978354

  19. A New Secondary Structure Assignment Algorithm Using Cα Backbone Fragments

    PubMed Central

    Cao, Chen; Wang, Guishen; Liu, An; Xu, Shutan; Wang, Lincong; Zou, Shuxue

    2016-01-01

    The assignment of secondary structure elements in proteins is a key step in the analysis of their structures and functions. We have developed an algorithm, SACF (secondary structure assignment based on Cα fragments), for secondary structure element (SSE) assignment based on the alignment of Cα backbone fragments with central poses derived by clustering known SSE fragments. The assignment algorithm consists of three steps: First, the outlier fragments on known SSEs are detected. Next, the remaining fragments are clustered to obtain the central fragments for each cluster. Finally, the central fragments are used as a template to make assignments. Following a large-scale comparison of 11 secondary structure assignment methods, SACF, KAKSI and PROSS are found to have similar agreement with DSSP, while PCASSO agrees with DSSP best. SACF and PCASSO show preference to reducing residues in N and C cap regions, whereas KAKSI, P-SEA and SEGNO tend to add residues to the terminals when DSSP assignment is taken as standard. Moreover, our algorithm is able to assign subtle helices (310-helix, π-helix and left-handed helix) and make uniform assignments, as well as to detect rare SSEs in β-sheets or long helices as outlier fragments from other programs. The structural uniformity should be useful for protein structure classification and prediction, while outlier fragments underlie the structure–function relationship. PMID:26978354

  20. Evaluating the Quantitative Capabilities of Metagenomic Analysis Software.

    PubMed

    Kerepesi, Csaba; Grolmusz, Vince

    2016-05-01

    DNA sequencing technologies are applied widely and frequently today to describe metagenomes, i.e., microbial communities in environmental or clinical samples, without the need for culturing them. These technologies usually return short (100-300 base-pairs long) DNA reads, and these reads are processed by metagenomic analysis software that assign phylogenetic composition-information to the dataset. Here we evaluate three metagenomic analysis software (AmphoraNet-a webserver implementation of AMPHORA2-, MG-RAST, and MEGAN5) for their capabilities of assigning quantitative phylogenetic information for the data, describing the frequency of appearance of the microorganisms of the same taxa in the sample. The difficulties of the task arise from the fact that longer genomes produce more reads from the same organism than shorter genomes, and some software assign higher frequencies to species with longer genomes than to those with shorter ones. This phenomenon is called the "genome length bias." Dozens of complex artificial metagenome benchmarks can be found in the literature. Because of the complexity of those benchmarks, it is usually difficult to judge the resistance of a metagenomic software to this "genome length bias." Therefore, we have made a simple benchmark for the evaluation of the "taxon-counting" in a metagenomic sample: we have taken the same number of copies of three full bacterial genomes of different lengths, break them up randomly to short reads of average length of 150 bp, and mixed the reads, creating our simple benchmark. Because of its simplicity, the benchmark is not supposed to serve as a mock metagenome, but if a software fails on that simple task, it will surely fail on most real metagenomes. We applied three software for the benchmark. The ideal quantitative solution would assign the same proportion to the three bacterial taxa. We have found that AMPHORA2/AmphoraNet gave the most accurate results and the other two software were under

  1. Whither or wither geomicrobiology in the era of 'community metagenomics'

    USGS Publications Warehouse

    Oremland, R.S.; Capone, D.G.; Stolz, J.F.; Fuhrman, J.

    2005-01-01

    Molecular techniques are valuable tools that can improve our understanding of the structure of microbial communities. They provide the ability to probe for life in all niches of the biosphere, perhaps even supplanting the need to cultivate microorganisms or to conduct ecophysiological investigations. However, an overemphasis and strict dependence on such large information-driven endeavours as environmental metagenomics could overwhelm the field, to the detriment of microbial ecology. We now call for more balanced, hypothesis-driven research efforts that couple metagenomics with classic approaches.

  2. FIELD TESTS OF GEOGRAPHICALLY-DEPENDENT VS. THRESHOLD-BASED WATERSHED CLASSIFICATION SCHEMED IN THE GREAT LAKES BASIN

    EPA Science Inventory

    We compared classification schemes based on watershed storage (wetland + lake area/watershed area) and forest fragmentation with a geographically-based classification scheme for two case studies involving 1)Lake Superior tributaries and 2) watersheds of riverine coastal wetlands ...

  3. FIELD TESTS OF GEOGRAPHICALLY-DEPENDENT VS. THRESHOLD-BASED WATERSHED CLASSIFICATION SCHEMES IN THE GREAT LAKES BASIN

    EPA Science Inventory

    We compared classification schemes based on watershed storage (wetland + lake area/watershed area) and forest fragmentation with a geographically-based classification scheme for two case studies involving 1) Lake Superior tributaries and 2) watersheds of riverine coastal wetlands...

  4. Characterization of a novel thermostable patatin-like protein from a Guaymas basin metagenomic library.

    PubMed

    Fu, Ling; He, Ying; Xu, Fangdi; Ma, Qun; Wang, Fengping; Xu, Jun

    2015-07-01

    Deep-sea hydrothermal vents are a natural habitat for thermophiles, in which contain plenty of enzymes that can function at high temperatures. In this work, we constructed a fosmid library in Escherichia coli using metagenomic DNA isolated from a chimney sample collected in the hydrothermal vents in Guaymas Basin. The library was screened for lipolytic activity and positive clones were subjected to subcloning. A novel patatin-like protein (PLP) that exhibited less than 45 % identity in amino acid sequence to known enzymes was obtained. Common features of the patatin-like proteins, such as four conserved blocks, were detected. Interestingly, there was an Ala at site 42 in PLP instead of the first Gly-residue in the consensus sequence Gly-X-Ser-X-Gly found in other PLP homologs. The active sites of PLP were Ser44 and Asp160. Spectrophotometric assays with different p-nitrophenyl esters demonstrated a preference for p-nitrophenyl butyrate (C4) and p-nitrophenyl decanoate (C10). Moreover, PLP demonstrated optimal activity at 70 °C and at pH 9.0 (Tris-HCl). The activation energy from the linear Arrhenius plot was found to be 38.3 ± 0.9 kJ/mol. The K m and V max of PLP for C4 were 304 ± 38 μM and 14 ± 0.38 μmol min(-1) mg(-1), respectively. Gene-mining of the metagenome dataset that was generated by pyrosequencing the same chimney sample resulted in identification of 20 PLP homolog gene fragments, which could represent promising examples of this category of thermostable proteins. PMID:26016814

  5. Metagenomic analysis of microbial community of an Amazonian geothermal spring in Peru.

    PubMed

    Paul, Sujay; Cortez, Yolanda; Vera, Nadia; Villena, Gretty K; Gutiérrez-Correa, Marcel

    2016-09-01

    Aguas Calientes (AC) is an isolated geothermal spring located deep into the Amazon rainforest (7°21'12″ S, 75°00'54″ W) of Peru. This geothermal spring is slightly acidic (pH 5.0-7.0) in nature, with temperatures varying from 45 to 90 °C and continually fed by plant litter, resulting in a relatively high degree of total organic content (TOC). Pooled water sample was analyzed at 16S rRNA V3-V4 hypervariable region by amplicon metagenome sequencing on Illumina HiSeq platform. A total of 2,976,534 paired ends reads were generated which were assigned into 5434 numbers of OTUs. All the resulting 16S rRNA fragments were then classified into 58 bacterial phyla and 2 archaeal phyla. Proteobacteria (88.06%) was found to be the highest represented phyla followed by Thermi (6.43%), Firmicutes (3.41%) and Aquificae (1.10%), respectively. Crenarchaeota and Euryarchaeota were the only 2 archaeal phyla detected in this study with low abundance. Metagenomic sequences were deposited to SRA database which is available at NCBI with accession number SRX1809286. Functional categorization of the assigned OTUs was performed using PICRUSt tool. In COG analysis "Amino acid transport and metabolism" (8.5%) was found to be the highest represented category whereas among predicted KEGG pathways "Metabolism" (50.6%) was the most abundant. This is the first report of a high resolution microbial phylogenetic profile of an Amazonian hot spring. PMID:27408814

  6. Classifying Classification

    ERIC Educational Resources Information Center

    Novakowski, Janice

    2009-01-01

    This article describes the experience of a group of first-grade teachers as they tackled the science process of classification, a targeted learning objective for the first grade. While the two-year process was not easy and required teachers to teach in a new, more investigation-oriented way, the benefits were great. The project helped teachers and…

  7. Utility of Metagenomic Next-Generation Sequencing for Characterization of HIV and Human Pegivirus Diversity

    PubMed Central

    Naccache, Samia N.; Kabre, Beniwende; Federman, Scot; Mbanya, Dora; Kaptué, Lazare; Chiu, Charles Y.; Brennan, Catherine A.; Hackett, John

    2015-01-01

    Given the dynamic changes in HIV-1 complexity and diversity, next-generation sequencing (NGS) has the potential to revolutionize strategies for effective HIV global surveillance. In this study, we explore the utility of metagenomic NGS to characterize divergent strains of HIV-1 and to simultaneously screen for other co-infecting viruses. Thirty-five HIV-1-infected Cameroonian blood donor specimens with viral loads of >4.4 log10 copies/ml were selected to include a diverse representation of group M strains. Random-primed NGS libraries, prepared from plasma specimens, resulted in greater than 90% genome coverage for 88% of specimens. Correct subtype designations based on NGS were concordant with sub-region PCR data in 31 of 35 (89%) cases. Complete genomes were assembled for 25 strains, including circulating recombinant forms with relatively limited data available (7 CRF11_cpx, 2 CRF13_cpx, 1 CRF18_cpx, and 1 CRF37_cpx), as well as 9 unique recombinant forms. HPgV (formerly designated GBV-C) co-infection was detected in 9 of 35 (25%) specimens, of which eight specimens yielded complete genomes. The recovered HPgV genomes formed a diverse cluster with genotype 1 sequences previously reported from Ghana, Uganda, and Japan. The extensive genome coverage obtained by NGS improved accuracy and confidence in phylogenetic classification of the HIV-1 strains present in the study population relative to conventional sub-region PCR. In addition, these data demonstrate the potential for metagenomic analysis to be used for routine characterization of HIV-1 and identification of other viral co-infections. PMID:26599538

  8. Diversity of virophages in metagenomic data sets.

    PubMed

    Zhou, Jinglie; Zhang, Weijia; Yan, Shuling; Xiao, Jinzhou; Zhang, Yuanyuan; Li, Bailin; Pan, Yingjie; Wang, Yongjie

    2013-04-01

    Virophages, e.g., Sputnik, Mavirus, and Organic Lake virophage (OLV), are unusual parasites of giant double-stranded DNA (dsDNA) viruses, yet little is known about their diversity. Here, we describe the global distribution, abundance, and genetic diversity of virophages based on analyzing and mapping comprehensive metagenomic databases. The results reveal a distinct abundance and worldwide distribution of virophages, involving almost all geographical zones and a variety of unique environments. These environments ranged from deep ocean to inland, iced to hydrothermal lakes, and human gut- to animal-associated habitats. Four complete virophage genomic sequences (Yellowstone Lake virophages [YSLVs]) were obtained, as was one nearly complete sequence (Ace Lake Mavirus [ALM]). The genomes obtained were 27,849 bp long with 26 predicted open reading frames (ORFs) (YSLV1), 23,184 bp with 21 ORFs (YSLV2), 27,050 bp with 23 ORFs (YSLV3), 28,306 bp with 34 ORFs (YSLV4), and 17,767 bp with 22 ORFs (ALM). The homologous counterparts of five genes, including putative FtsK-HerA family DNA packaging ATPase and genes encoding DNA helicase/primase, cysteine protease, major capsid protein (MCP), and minor capsid protein (mCP), were present in all virophages studied thus far. They also shared a conserved gene cluster comprising the two core genes of MCP and mCP. Comparative genomic and phylogenetic analyses showed that YSLVs, having a closer relationship to each other than to the other virophages, were more closely related to OLV than to Sputnik but distantly related to Mavirus and ALM. These findings indicate that virophages appear to be widespread and genetically diverse, with at least 3 major lineages. PMID:23408616

  9. Comparative metagenome analysis of an Alaskan glacier.

    PubMed

    Choudhari, Sulbha; Lohia, Ruchi; Grigoriev, Andrey

    2014-04-01

    The temperature in the Arctic region has been increasing in the recent past accompanied by melting of its glaciers. We took a snapshot of the current microbial inhabitation of an Alaskan glacier (which can be considered as one of the simplest possible ecosystems) by using metagenomic sequencing of 16S rRNA recovered from ice/snow samples. Somewhat contrary to our expectations and earlier estimates, a rich and diverse microbial population of more than 2,500 species was revealed including several species of Archaea that has been identified for the first time in the glaciers of the Northern hemisphere. The most prominent bacterial groups found were Proteobacteria, Bacteroidetes, and Firmicutes. Firmicutes were not reported in large numbers in a previously studied Alpine glacier but were dominant in an Antarctic subglacial lake. Representatives of Cyanobacteria, Actinobacteria and Planctomycetes were among the most numerous, likely reflecting the dependence of the ecosystem on the energy obtained through photosynthesis and close links with the microbial community of the soil. Principal component analysis (PCA) of nucleotide word frequency revealed distinct sequence clusters for different taxonomic groups in the Alaskan glacier community and separate clusters for the glacial communities from other regions of the world. Comparative analysis of the community composition and bacterial diversity present in the Byron glacier in Alaska with other environments showed larger overlap with an Arctic soil than with a high Arctic lake, indicating patterns of community exchange and suggesting that these bacteria may play an important role in soil development during glacial retreat. PMID:24712530

  10. Heavy fragment radioactivities

    SciTech Connect

    Price, P.B.

    1987-12-10

    This recently discovered mode of radioactive decay, like alpha decay and spontaneous fission, is believed to involve tunneling through the deformation-energy barrier between a very heavy nucleus and two separated fragments the sum of whose masses is less than the mass of the parent nucleus. In all known cases the heavier of the two fragments is close to doubly magic /sup 208/Pb, and the lighter fragment has even Z. Four isotopes of Ra are known to emit /sup 14/C nuclei; several isotopes of U as well as /sup 230/Th and /sup 231/Pa emit Ne nuclei; and /sup 234/U exhibits four hadronic decay modes: alpha decay, spontaneous fission, Ne decay and Mg decay.

  11. Fragment screening: an introduction.

    PubMed

    Leach, Andrew R; Hann, Michael M; Burrows, Jeremy N; Griffen, Ed J

    2006-09-01

    There are clearly many different philosophies associated with adapting fragment screening into mainstream Drug Discovery Lead Generation strategies. Scientists at Astex, for instance, focus entirely on strategies involving use of X-ray crystallography and NMR. However, AstraZeneca uses a number of different fragment screening strategies. One approach is to screen a 2000 compound fragment set (with close to "lead-like" complexity) at 100 microM in parallel with every HTS such that the data are obtained on the entire screening collection at 10 microM plus the extra samples at 100 microM; this provides valuable compound potency data in a concentration range that is usually unexplored. The fragments are then screen-specific "privileged structures" that can be searched for in the rest of the HTS output and other databases as well as having synthesis follow-up. A typical workflow for a fragment screen within AstraZeneca is shown below (Figure 24) and highlights the desirability (particularly when screening >100 microM) for NMR and X-ray information to validate weak hits and give information on how to optimise them. In this chapter, we have provided an introduction to the theoretical and practical issues associated with the use of fragment methods and lead-likeness. Fragment-based approaches are still in an early stage of development and are just one of many interrelated techniques that are now used to identify novel lead compounds for drug development. Fragment based screening has some advantages, but like every other drug hunting strategy will not be universally applicable. There are in particular some practical challenges associated with fragment screening that relate to the generally lower level of potency that such compounds initially possess. Considerable synthetic effort has to be applied for post-fragment screening to build the sort of potency that would be expected to be found from a traditional HTS. However, if there are no low-hanging fruit in a screening

  12. IMPACT fragmentation model developments

    NASA Astrophysics Data System (ADS)

    Sorge, Marlon E.; Mains, Deanna L.

    2016-09-01

    The IMPACT fragmentation model has been used by The Aerospace Corporation for more than 25 years to analyze orbital altitude explosions and hypervelocity collisions. The model is semi-empirical, combining mass, energy and momentum conservation laws with empirically derived relationships for fragment characteristics such as number, mass, area-to-mass ratio, and spreading velocity as well as event energy distribution. Model results are used for several types of analysis including assessment of short-term risks to satellites from orbital altitude fragmentations, prediction of the long-term evolution of the orbital debris environment and forensic assessments of breakup events. A new version of IMPACT, version 6, has been completed and incorporates a number of advancements enabled by a multi-year long effort to characterize more than 11,000 debris fragments from more than three dozen historical on-orbit breakup events. These events involved a wide range of causes, energies, and fragmenting objects. Special focus was placed on the explosion model, as the majority of events examined were explosions. Revisions were made to the mass distribution used for explosion events, increasing the number of smaller fragments generated. The algorithm for modeling upper stage large fragment generation was updated. A momentum conserving asymmetric spreading velocity distribution algorithm was implemented to better represent sub-catastrophic events. An approach was developed for modeling sub-catastrophic explosions, those where the majority of the parent object remains intact, based on estimated event energy. Finally, significant modifications were made to the area-to-mass ratio distribution to incorporate the tendencies of different materials to fragment into different shapes. This ability enabled better matches between the observed area-to-mass ratios and those generated by the model. It also opened up additional possibilities for post-event analysis of breakups. The paper will discuss

  13. Evaluating techniques for metagenome annotation using simulated sequence data.

    PubMed

    Randle-Boggis, Richard J; Helgason, Thorunn; Sapp, Melanie; Ashton, Peter D

    2016-07-01

    The advent of next-generation sequencing has allowed huge amounts of DNA sequence data to be produced, advancing the capabilities of microbial ecosystem studies. The current challenge is to identify from which microorganisms and genes the DNA originated. Several tools and databases are available for annotating DNA sequences. The tools, databases and parameters used can have a significant impact on the results: naïve choice of these factors can result in a false representation of community composition and function. We use a simulated metagenome to show how different parameters affect annotation accuracy by evaluating the sequence annotation performances of MEGAN, MG-RAST, One Codex and Megablast. This simulated metagenome allowed the recovery of known organism and function abundances to be quantitatively evaluated, which is not possible for environmental metagenomes. The performance of each program and database varied, e.g. One Codex correctly annotated many sequences at the genus level, whereas MG-RAST RefSeq produced many false positive annotations. This effect decreased as the taxonomic level investigated increased. Selecting more stringent parameters decreases the annotation sensitivity, but increases precision. Ultimately, there is a trade-off between taxonomic resolution and annotation accuracy. These results should be considered when annotating metagenomes and interpreting results from previous studies. PMID:27162180

  14. Metagenomes provide valuable comparative information on soil microeukaryotes.

    PubMed

    Jacquiod, Samuel; Stenbæk, Jonas; Santos, Susana S; Winding, Anne; Sørensen, Søren J; Priemé, Anders

    2016-06-01

    Despite the critical ecological roles of microeukaryotes in terrestrial ecosystems, most descriptive studies of soil microbes published so far focused only on specific groups. Meanwhile, the fast development of metagenome sequencing leads to considerable data accumulation in public repositories, providing microbiologists with substantial amounts of accessible information. We took advantage of public metagenomes in order to investigate microeukaryote communities in a well characterized grassland soil. The data gathered allowed the evaluation of several factors impacting the community structure, including the DNA extraction method, the database choice and also the annotation procedure. While most studies on soil microeukaryotes are based on sequencing of PCR-amplified taxonomic markers (18S rRNA genes, ITS regions), this work represents, to our knowledge, the first report based solely on metagenomic microeukaryote DNA. Choosing the correct annotation procedure and reference database has proven to be crucial, as it considerably limits the risk of wrong assignments. In addition, a significant and pronounced effect of the DNA extraction method on the taxonomical structure of soil microeukaryotes has been identified. Our analyses suggest that publicly available metagenome data can provide valuable information on soil microeukaryotes for comparative purposes when handled appropriately, complementing the current view provided by ribosomal amplicon sequencing methods. PMID:27020245

  15. Assessment of quality control approaches for metagenomic data analysis

    PubMed Central

    Zhou, Qian; Ning, Kang

    2014-01-01

    Currently there is an explosive increase of the next-generation sequencing (NGS) projects and related datasets, which have to be processed by Quality Control (QC) procedures before they could be utilized for omics analysis. QC procedure usually includes identification and filtration of sequencing artifacts such as low-quality reads and contaminating reads, which would significantly affect and sometimes mislead downstream analysis. Quality control of NGS data for microbial communities is especially challenging. In this work, we have evaluated and compared the performance and effects of various QC pipelines on different types of metagenomic NGS data and from different angles, based on which general principles of using QC pipelines were proposed. Results based on both simulated and real metagenomic datasets have shown that: firstly, QC-Chain is superior in its ability for contamination identification for metagenomic NGS datasets with different complexities with high sensitivity and specificity. Secondly, the high performance computing engine enabled QC-Chain to achieve a significant reduction in processing time compared to other pipelines based on serial computing. Thirdly, QC-Chain could outperform other tools in benefiting downstream metagenomic data analysis. PMID:25376098

  16. MetaGenomic Assembly by Merging (MeGAMerge)

    Energy Science and Technology Software Center (ESTSC)

    2015-08-03

    "MetaGenomic Assembly by Merging" (MeGAMerge)Is a novel method of merging of multiple genomic assembly or long read data sources for assembly by use of internal trimming/filtering of data, followed by use of two 3rd party tools to merge data by overlap based assembly.

  17. Marine Metagenome as A Resource for Novel Enzymes

    PubMed Central

    Alma’abadi, Amani D.; Gojobori, Takashi; Mineta, Katsuhiko

    2015-01-01

    More than 99% of identified prokaryotes, including many from the marine environment, cannot be cultured in the laboratory. This lack of capability restricts our knowledge of microbial genetics and community ecology. Metagenomics, the culture-independent cloning of environmental DNAs that are isolated directly from an environmental sample, has already provided a wealth of information about the uncultured microbial world. It has also facilitated the discovery of novel biocatalysts by allowing researchers to probe directly into a huge diversity of enzymes within natural microbial communities. Recent advances in these studies have led to a great interest in recruiting microbial enzymes for the development of environmentally-friendly industry. Although the metagenomics approach has many limitations, it is expected to provide not only scientific insights but also economic benefits, especially in industry. This review highlights the importance of metagenomics in mining microbial lipases, as an example, by using high-throughput techniques. In addition, we discuss challenges in the metagenomics as an important part of bioinformatics analysis in big data. PMID:26563467

  18. Metagenomic gene annotation by a homology-independent approach

    SciTech Connect

    Froula, Jeff; Zhang, Tao; Salmeen, Annette; Hess, Matthias; Kerfeld, Cheryl A.; Wang, Zhong; Du, Changbin

    2011-06-02

    Fully understanding the genetic potential of a microbial community requires functional annotation of all the genes it encodes. The recently developed deep metagenome sequencing approach has enabled rapid identification of millions of genes from a complex microbial community without cultivation. Current homology-based gene annotation fails to detect distantly-related or structural homologs. Furthermore, homology searches with millions of genes are very computational intensive. To overcome these limitations, we developed rhModeller, a homology-independent software pipeline to efficiently annotate genes from metagenomic sequencing projects. Using cellulases and carbonic anhydrases as two independent test cases, we demonstrated that rhModeller is much faster than HMMER but with comparable accuracy, at 94.5percent and 99.9percent accuracy, respectively. More importantly, rhModeller has the ability to detect novel proteins that do not share significant homology to any known protein families. As {approx}50percent of the 2 million genes derived from the cow rumen metagenome failed to be annotated based on sequence homology, we tested whether rhModeller could be used to annotate these genes. Preliminary results suggest that rhModeller is robust in the presence of missense and frameshift mutations, two common errors in metagenomic genes. Applying the pipeline to the cow rumen genes identified 4,990 novel cellulases candidates and 8,196 novel carbonic anhydrase candidates.In summary, we expect rhModeller to dramatically increase the speed and quality of metagnomic gene annotation.

  19. Functional Metagenomic Investigations of the Human Intestinal Microbiota

    PubMed Central

    Moore, Aimee M.; Munck, Christian; Sommer, Morten O. A.; Dantas, Gautam

    2011-01-01

    The human intestinal microbiota encode multiple critical functions impacting human health, including metabolism of dietary substrate, prevention of pathogen invasion, immune system modulation, and provision of a reservoir of antibiotic resistance genes accessible to pathogens. The complexity of this microbial community, its recalcitrance to standard cultivation, and the immense diversity of its encoded genes has necessitated the development of novel molecular, microbiological, and genomic tools. Functional metagenomics is one such culture-independent technique, used for decades to study environmental microorganisms, but relatively recently applied to the study of the human commensal microbiota. Metagenomic functional screens characterize the functional capacity of a microbial community, independent of identity to known genes, by subjecting the metagenome to functional assays in a genetically tractable host. Here we highlight recent work applying this technique to study the functional diversity of the intestinal microbiota, and discuss how an approach combining high-throughput sequencing, cultivation, and metagenomic functional screens can improve our understanding of interactions between this complex community and its human host. PMID:22022321

  20. Scaling metagenome sequence assembly with probabilistic de Bruijn graphs

    PubMed Central

    Pell, Jason; Hintze, Arend; Canino-Koning, Rosangela; Howe, Adina; Tiedje, James M.; Brown, C. Titus

    2012-01-01

    Deep sequencing has enabled the investigation of a wide range of environmental microbial ecosystems, but the high memory requirements for de novo assembly of short-read shotgun sequencing data from these complex populations are an increasingly large practical barrier. Here we introduce a memory-efficient graph representation with which we can analyze the k-mer connectivity of metagenomic samples. The graph representation is based on a probabilistic data structure, a Bloom filter, that allows us to efficiently store assembly graphs in as little as 4 bits per k-mer, albeit inexactly. We show that this data structure accurately represents DNA assembly graphs in low memory. We apply this data structure to the problem of partitioning assembly graphs into components as a prelude to assembly, and show that this reduces the overall memory requirements for de novo assembly of metagenomes. On one soil metagenome assembly, this approach achieves a nearly 40-fold decrease in the maximum memory requirements for assembly. This probabilistic graph representation is a significant theoretical advance in storing assembly graphs and also yields immediate leverage on metagenomic assembly. PMID:22847406

  1. Metagenomic Assessment of the Eastern Oyster-Associated Microbiota

    PubMed Central

    Wafula, Denis; Lewis, Dawn E.; Pathak, Ashish

    2014-01-01

    Bacteria associated with the Eastern oysters (Crassostrea virginica) native to Apalachicola Bay, FL, were investigated using 16S rRNA gene amplicon metagenomic sequencing which revealed that the oyster microbiome was predominated by Cyanobacteria and Proteobacteria. We also found that the oyster tissues were predominated by the pathogenic and symbiotic Photobacterium spp. (formerly known as Vibrio damselae). PMID:25342691

  2. Metagenome Sequencing of the Greater Kudu (Tragelaphus strepsiceros) Rumen Microbiome.

    PubMed

    Dube, Anita N; Moyo, Freeman; Dhlamini, Zephaniah

    2015-01-01

    Ruminant herbivores utilize a symbiotic relationship with microorganisms in their rumen to exploit fibrous foods for nutrition. We report the metagenome sequences of the greater kudu (Tragelaphus strepsiceros) rumen digesta, revealing a diverse community of microbes and some novel hydrolytic enzymes. PMID:26272573

  3. Metagenome Sequencing of the Greater Kudu (Tragelaphus strepsiceros) Rumen Microbiome

    PubMed Central

    Dube, Anita N.; Moyo, Freeman

    2015-01-01

    Ruminant herbivores utilize a symbiotic relationship with microorganisms in their rumen to exploit fibrous foods for nutrition. We report the metagenome sequences of the greater kudu (Tragelaphus strepsiceros) rumen digesta, revealing a diverse community of microbes and some novel hydrolytic enzymes. PMID:26272573

  4. MePIC, metagenomic pathogen identification for clinical specimens.

    PubMed

    Takeuchi, Fumihiko; Sekizuka, Tsuyoshi; Yamashita, Akifumi; Ogasawara, Yumiko; Mizuta, Katsumi; Kuroda, Makoto

    2014-01-01

    Next-generation DNA sequencing technologies have led to a new method of identifying the causative agents of infectious diseases. The analysis comprises three steps. First, DNA/RNA is extracted and extensively sequenced from a specimen that includes the pathogen, human tissue and commensal microorganisms. Second, the sequenced reads are matched with a database of known sequences, and the organisms from which the individual reads were derived are inferred. Last, the percentages of the organisms' genomic sequences in the specimen (i.e., the metagenome) are estimated, and the pathogen is identified. The first and last steps have become easy due to the development of benchtop sequencers and metagenomic software. To facilitate the middle step, which requires computational resources and skill, we developed a cloud-computing pipeline, MePIC: "Metagenomic Pathogen Identification for Clinical specimens." In the pipeline, unnecessary bases are trimmed off the reads, and human reads are removed. For the remaining reads, similar sequences are searched in the database of known nucleotide sequences. The search is drastically sped up by using a cloud-computing system. The webpage interface can be used easily by clinicians and epidemiologists. We believe that the use of the MePIC pipeline will promote metagenomic pathogen identification and improve the understanding of infectious diseases. PMID:24451106

  5. Marine Metagenome as A Resource for Novel Enzymes.

    PubMed

    Alma'abadi, Amani D; Gojobori, Takashi; Mineta, Katsuhiko

    2015-10-01

    More than 99% of identified prokaryotes, including many from the marine environment, cannot be cultured in the laboratory. This lack of capability restricts our knowledge of microbial genetics and community ecology. Metagenomics, the culture-independent cloning of environmental DNAs that are isolated directly from an environmental sample, has already provided a wealth of information about the uncultured microbial world. It has also facilitated the discovery of novel biocatalysts by allowing researchers to probe directly into a huge diversity of enzymes within natural microbial communities. Recent advances in these studies have led to a great interest in recruiting microbial enzymes for the development of environmentally-friendly industry. Although the metagenomics approach has many limitations, it is expected to provide not only scientific insights but also economic benefits, especially in industry. This review highlights the importance of metagenomics in mining microbial lipases, as an example, by using high-throughput techniques. In addition, we discuss challenges in the metagenomics as an important part of bioinformatics analysis in big data. PMID:26563467

  6. Evaluating techniques for metagenome annotation using simulated sequence data

    PubMed Central

    Randle-Boggis, Richard J.; Helgason, Thorunn; Sapp, Melanie; Ashton, Peter D.

    2016-01-01

    The advent of next-generation sequencing has allowed huge amounts of DNA sequence data to be produced, advancing the capabilities of microbial ecosystem studies. The current challenge is to identify from which microorganisms and genes the DNA originated. Several tools and databases are available for annotating DNA sequences. The tools, databases and parameters used can have a significant impact on the results: naïve choice of these factors can result in a false representation of community composition and function. We use a simulated metagenome to show how different parameters affect annotation accuracy by evaluating the sequence annotation performances of MEGAN, MG-RAST, One Codex and Megablast. This simulated metagenome allowed the recovery of known organism and function abundances to be quantitatively evaluated, which is not possible for environmental metagenomes. The performance of each program and database varied, e.g. One Codex correctly annotated many sequences at the genus level, whereas MG-RAST RefSeq produced many false positive annotations. This effect decreased as the taxonomic level investigated increased. Selecting more stringent parameters decreases the annotation sensitivity, but increases precision. Ultimately, there is a trade-off between taxonomic resolution and annotation accuracy. These results should be considered when annotating metagenomes and interpreting results from previous studies. PMID:27162180

  7. Assessment of quality control approaches for metagenomic data analysis

    NASA Astrophysics Data System (ADS)

    Zhou, Qian; Su, Xiaoquan; Ning, Kang

    2014-11-01

    Currently there is an explosive increase of the next-generation sequencing (NGS) projects and related datasets, which have to be processed by Quality Control (QC) procedures before they could be utilized for omics analysis. QC procedure usually includes identification and filtration of sequencing artifacts such as low-quality reads and contaminating reads, which would significantly affect and sometimes mislead downstream analysis. Quality control of NGS data for microbial communities is especially challenging. In this work, we have evaluated and compared the performance and effects of various QC pipelines on different types of metagenomic NGS data and from different angles, based on which general principles of using QC pipelines were proposed. Results based on both simulated and real metagenomic datasets have shown that: firstly, QC-Chain is superior in its ability for contamination identification for metagenomic NGS datasets with different complexities with high sensitivity and specificity. Secondly, the high performance computing engine enabled QC-Chain to achieve a significant reduction in processing time compared to other pipelines based on serial computing. Thirdly, QC-Chain could outperform other tools in benefiting downstream metagenomic data analysis.

  8. Integrated metagenomic and metaproteomic analyses of marine biofilm communities

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Metagenomic and metaproteomic analyses were utilized to begin to understand the role varying environments play on the composition and function of complex air-water interface biofilms sampled from the hulls of two ships that were deployed in different geographic waters. Prokaryotic community analyses...

  9. Substrate Type Determines Metagenomic Profiles from Diverse Chemical Habitats

    PubMed Central

    Jeffries, Thomas C.; Seymour, Justin R.; Gilbert, Jack A.; Dinsdale, Elizabeth A.; Newton, Kelly; Leterme, Sophie S. C.; Roudnew, Ben; Smith, Renee J.; Seuront, Laurent; Mitchell, James G.

    2011-01-01

    Environmental parameters drive phenotypic and genotypic frequency variations in microbial communities and thus control the extent and structure of microbial diversity. We tested the extent to which microbial community composition changes are controlled by shifting physiochemical properties within a hypersaline lagoon. We sequenced four sediment metagenomes from the Coorong, South Australia from samples which varied in salinity by 99 Practical Salinity Units (PSU), an order of magnitude in ammonia concentration and two orders of magnitude in microbial abundance. Despite the marked divergence in environmental parameters observed between samples, hierarchical clustering of taxonomic and metabolic profiles of these metagenomes showed striking similarity between the samples (>89%). Comparison of these profiles to those derived from a wide variety of publically available datasets demonstrated that the Coorong sediment metagenomes were similar to other sediment, soil, biofilm and microbial mat samples regardless of salinity (>85% similarity). Overall, clustering of solid substrate and water metagenomes into discrete similarity groups based on functional potential indicated that the dichotomy between water and solid matrices is a fundamental determinant of community microbial metabolism that is not masked by salinity, nutrient concentration or microbial abundance. PMID:21966446

  10. Metagenomics and other Methods for Measuring Antibiotic Resistance in Agroecosystems

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Background: There is broad concern regarding antibiotic resistance on farms and in fields, however there is no standard method for defining or measuring antibiotic resistance in environmental samples. Methods: We used metagenomic, culture-based, and molecular methods to characterize the amount, t...

  11. Metagenomic Analyses of Drinking Water Receiving Different Disinfection Treatments

    EPA Science Inventory

    A metagenome-based approach was utilized for assessing the taxonomic affiliation and function potential of microbial populations in free chlorine (CHL) and monochloramine (CHM) treated drinking water (DW). A total of 1,024, 242 (averaging 544 bp) and 849, 349 (averaging 554 bp) ...

  12. Metagenomic assessment of the eastern oyster-associated microbiota.

    PubMed

    Chauhan, Ashvini; Wafula, Denis; Lewis, Dawn E; Pathak, Ashish

    2014-01-01

    Bacteria associated with the Eastern oysters (Crassostrea virginica) native to Apalachicola Bay, FL, were investigated using 16S rRNA gene amplicon metagenomic sequencing which revealed that the oyster microbiome was predominated by Cyanobacteria and Proteobacteria. We also found that the oyster tissues were predominated by the pathogenic and symbiotic Photobacterium spp. (formerly known as Vibrio damselae). PMID:25342691

  13. Target fragmentation in radiobiology

    NASA Technical Reports Server (NTRS)

    Wilson, John W.; Cucinotta, Francis A.; Shinn, Judy L.; Townsend, Lawrence W.

    1993-01-01

    Nuclear reactions in biological systems produce low-energy fragments of the target nuclei seen as local high events of linear energy transfer (LET). A nuclear-reaction formalism is used to evaluate the nuclear-induced fields within biosystems and their effects within several biological models. On the basis of direct ionization interaction, one anticipates high-energy protons to have a quality factor and relative biological effectiveness (RBE) of unity. Target fragmentation contributions raise the effective quality factor of 10 GeV protons to 3.3 in reasonable agreement with RBE values for induced micronuclei in bean sprouts. Application of the Katz model indicates that the relative increase in RBE with decreasing exposure observed in cell survival experiments with 160 MeV protons is related solely to target fragmentation events. Target fragment contributions to lens opacity given an RBE of 1.4 for 2 GeV protons in agreement with the work of Lett and Cox. Predictions are made for the effective RBE for Harderian gland tumors induced by high-energy protons. An exposure model for lifetime cancer risk is derived from NCRP 98 risk tables, and protraction effects are examined for proton and helium ion exposures. The implications of dose rate enhancement effects on space radiation protection are considered.

  14. Diagnosis of Bacterial Bloodstream Infections: A 16S Metagenomics Approach

    PubMed Central

    Van Puyvelde, Sandra; De Block, Tessa; Maltha, Jessica; Palpouguini, Lompo; Tahita, Marc; Tinto, Halidou; Jacobs, Jan; Deborggraeve, Stijn

    2016-01-01

    Background Bacterial bloodstream infection (bBSI) is one of the leading causes of death in critically ill patients and accurate diagnosis is therefore crucial. We here report a 16S metagenomics approach for diagnosing and understanding bBSI. Methodology/Principal Findings The proof-of-concept was delivered in 75 children (median age 15 months) with severe febrile illness in Burkina Faso. Standard blood culture and malaria testing were conducted at the time of hospital admission. 16S metagenomics testing was done retrospectively and in duplicate on the blood of all patients. Total DNA was extracted from the blood and the V3–V4 regions of the bacterial 16S rRNA genes were amplified by PCR and deep sequenced on an Illumina MiSeq sequencer. Paired reads were curated, taxonomically labeled, and filtered. Blood culture diagnosed bBSI in 12 patients, but this number increased to 22 patients when combining blood culture and 16S metagenomics results. In addition to superior sensitivity compared to standard blood culture, 16S metagenomics revealed important novel insights into the nature of bBSI. Patients with acute malaria or recovering from malaria had a 7-fold higher risk of presenting polymicrobial bloodstream infections compared to patients with no recent malaria diagnosis (p-value = 0.046). Malaria is known to affect epithelial gut function and may thus facilitate bacterial translocation from the intestinal lumen to the blood. Importantly, patients with such polymicrobial blood infections showed a 9-fold higher risk factor for not surviving their febrile illness (p-value = 0.030). Conclusions/Significance Our data demonstrate that 16S metagenomics is a powerful approach for the diagnosis and understanding of bBSI. This proof-of-concept study also showed that appropriate control samples are crucial to detect background signals due to environmental contamination. PMID:26927306

  15. Metagenome Skimming of Insect Specimen Pools: Potential for Comparative Genomics.

    PubMed

    Linard, Benjamin; Crampton-Platt, Alex; Gillett, Conrad P D T; Timmermans, Martijn J T N; Vogler, Alfried P

    2015-06-01

    Metagenomic analyses are challenging in metazoans, but high-copy number and repeat regions can be assembled from low-coverage sequencing by "genome skimming," which is applied here as a new way of characterizing metagenomes obtained in an ecological or taxonomic context. Illumina shotgun sequencing on two pools of Coleoptera (beetles) of approximately 200 species each were assembled into tens of thousands of scaffolds. Repeated low-coverage sequencing recovered similar scaffold sets consistently, although approximately 70% of scaffolds could not be identified against existing genome databases. Identifiable scaffolds included mitochondrial DNA, conserved sequences with hits to expressed sequence tag and protein databases, and known repeat elements of high and low complexity, including numerous copies of rRNA and histone genes. Assemblies of histones captured a diversity of gene order and primary sequence in Coleoptera. Scaffolds with similarity to multiple sites in available coleopteran genome sequences for Dendroctonus and Tribolium revealed high specificity of scaffolds to either of these genomes, in particular for high-copy number repeats. Numerous "clusters" of scaffolds mapped to the same genomic site revealed intra- and/or intergenomic variation within a metagenome pool. In addition to effect of taxonomic composition of the metagenomes, the number of mapped scaffolds also revealed structural differences between the two reference genomes, although the significance of this striking finding remains unclear. Finally, apparently exogenous sequences were recovered, including potential food plants, fungal pathogens, and bacterial symbionts. The "metagenome skimming" approach is useful for capturing the genomic diversity of poorly studied, species-rich lineages and opens new prospects in environmental genomics. PMID:25979752

  16. Metagenome Skimming of Insect Specimen Pools: Potential for Comparative Genomics

    PubMed Central

    Linard, Benjamin; Crampton-Platt, Alex; Gillett, Conrad P.D.T.; Timmermans, Martijn J.T.N.; Vogler, Alfried P.

    2015-01-01

    Metagenomic analyses are challenging in metazoans, but high-copy number and repeat regions can be assembled from low-coverage sequencing by “genome skimming,” which is applied here as a new way of characterizing metagenomes obtained in an ecological or taxonomic context. Illumina shotgun sequencing on two pools of Coleoptera (beetles) of approximately 200 species each were assembled into tens of thousands of scaffolds. Repeated low-coverage sequencing recovered similar scaffold sets consistently, although approximately 70% of scaffolds could not be identified against existing genome databases. Identifiable scaffolds included mitochondrial DNA, conserved sequences with hits to expressed sequence tag and protein databases, and known repeat elements of high and low complexity, including numerous copies of rRNA and histone genes. Assemblies of histones captured a diversity of gene order and primary sequence in Coleoptera. Scaffolds with similarity to multiple sites in available coleopteran genome sequences for Dendroctonus and Tribolium revealed high specificity of scaffolds to either of these genomes, in particular for high-copy number repeats. Numerous “clusters” of scaffolds mapped to the same genomic site revealed intra- and/or intergenomic variation within a metagenome pool. In addition to effect of taxonomic composition of the metagenomes, the number of mapped scaffolds also revealed structural differences between the two reference genomes, although the significance of this striking finding remains unclear. Finally, apparently exogenous sequences were recovered, including potential food plants, fungal pathogens, and bacterial symbionts. The “metagenome skimming” approach is useful for capturing the genomic diversity of poorly studied, species-rich lineages and opens new prospects in environmental genomics. PMID:25979752

  17. gbtools: Interactive Visualization of Metagenome Bins in R.

    PubMed

    Seah, Brandon K B; Gruber-Vodicka, Harald R

    2015-01-01

    Improvements in DNA sequencing technology have increased the amount and quality of sequences that can be obtained from metagenomic samples, making it practical to extract individual microbial genomes from metagenomic assemblies ("binning"). However, while many tools and methods exist for unsupervised binning with various statistical algorithms, there are few options for visualizing the results, even though visualization is vital to exploratory data analysis. We have developed gbtools, a software package that allows users to visualize metagenomic assemblies by plotting coverage (sequencing depth) and GC values of contigs, and also to annotate the plots with taxonomic information. Different sets of annotations, including taxonomic assignments from conserved marker genes or SSU rRNA genes, can be imported simultaneously; users can choose which annotations to plot. Bins can be manually defined from plots, or be imported from third-party binning tools and overlaid onto plots, such that results from different methods can be compared side-by-side. gbtools reports summary statistics of bins including marker gene completeness, and allows the user to add or subtract bins with each other. We illustrate some of the functions available in gbtools with two examples: the metagenome of Olavius algarvensis, a marine oligochaete worm that has up to five bacterial symbionts, and the metagenome of a synthetic mock community comprising 64 bacterial and archaeal strains. We show how instances of poor automated binning, sequencer GC% bias, and variation between samples can be quickly diagnosed by visualization, and demonstrate how the results from different binning tools can be combined and refined to yield manually curated bins with higher completeness. gbtools is open-source and written in R. The software package, documentation, and example data are available freely online at https://github.com/kbseah/genome-bin-tools. PMID:26732662

  18. Sequencing platform and library preparation choices impact viral metagenomes

    PubMed Central

    2013-01-01

    Background Microbes drive the biogeochemistry that fuels the planet. Microbial viruses modulate their hosts directly through mortality and horizontal gene transfer, and indirectly by re-programming host metabolisms during infection. However, our ability to study these virus-host interactions is limited by methods that are low-throughput and heavily reliant upon the subset of organisms that are in culture. One way forward are culture-independent metagenomic approaches, but these novel methods are rarely rigorously tested, especially for studies of environmental viruses, air microbiomes, extreme environment microbiology and other areas with constrained sample amounts. Here we perform replicated experiments to evaluate Roche 454, Illumina HiSeq, and Ion Torrent PGM sequencing and library preparation protocols on virus metagenomes generated from as little as 10pg of DNA. Results Using %G + C content to compare metagenomes, we find that (i) metagenomes are highly replicable, (ii) some treatment effects are minimal, e.g., sequencing technology choice has 6-fold less impact than varying input DNA amount, and (iii) when restricted to a limited DNA concentration (<1μg), changing the amount of amplification produces little variation. These trends were also observed when examining the metagenomes for gene function and assembly performance, although the latter more closely aligned to sequencing effort and read length than preparation steps tested. Among Illumina library preparation options, transposon-based libraries diverged from all others and adaptor ligation was a critical step for optimizing sequencing yields. Conclusions These data guide researchers in generating systematic, comparative datasets to understand complex ecosystems, and suggest that neither varied amplification nor sequencing platforms will deter such efforts. PMID:23663384

  19. gbtools: Interactive Visualization of Metagenome Bins in R

    PubMed Central

    Seah, Brandon K. B.; Gruber-Vodicka, Harald R.

    2015-01-01

    Improvements in DNA sequencing technology have increased the amount and quality of sequences that can be obtained from metagenomic samples, making it practical to extract individual microbial genomes from metagenomic assemblies (“binning”). However, while many tools and methods exist for unsupervised binning with various statistical algorithms, there are few options for visualizing the results, even though visualization is vital to exploratory data analysis. We have developed gbtools, a software package that allows users to visualize metagenomic assemblies by plotting coverage (sequencing depth) and GC values of contigs, and also to annotate the plots with taxonomic information. Different sets of annotations, including taxonomic assignments from conserved marker genes or SSU rRNA genes, can be imported simultaneously; users can choose which annotations to plot. Bins can be manually defined from plots, or be imported from third-party binning tools and overlaid onto plots, such that results from different methods can be compared side-by-side. gbtools reports summary statistics of bins including marker gene completeness, and allows the user to add or subtract bins with each other. We illustrate some of the functions available in gbtools with two examples: the metagenome of Olavius algarvensis, a marine oligochaete worm that has up to five bacterial symbionts, and the metagenome of a synthetic mock community comprising 64 bacterial and archaeal strains. We show how instances of poor automated binning, sequencer GC% bias, and variation between samples can be quickly diagnosed by visualization, and demonstrate how the results from different binning tools can be combined and refined to yield manually curated bins with higher completeness. gbtools is open-source and written in R. The software package, documentation, and example data are available freely online at https://github.com/kbseah/genome-bin-tools. PMID:26732662

  20. Metagenomic Sequencing of the Chronic Obstructive Pulmonary Disease Upper Bronchial Tract Microbiome Reveals Functional Changes Associated with Disease Severity.

    PubMed

    Cameron, Simon J S; Lewis, Keir E; Huws, Sharon A; Lin, Wanchang; Hegarty, Matthew J; Lewis, Paul D; Mur, Luis A J; Pachebat, Justin A

    2016-01-01

    Chronic Obstructive Pulmonary Disease (COPD) is a major source of mortality and morbidity worldwide. The microbiome associated with this disease may be an important component of the disease, though studies to date have been based on sequencing of the 16S rRNA gene, and have revealed unequivocal results. Here, we employed metagenomic sequencing of the upper bronchial tract (UBT) microbiome to allow for greater elucidation of its taxonomic composition, and revealing functional changes associated with the disease. The bacterial metagenomes within sputum samples from eight COPD patients and ten 'healthy' smokers (Controls) were sequenced, and suggested significant changes in the abundance of bacterial species, particularly within the Streptococcus genus. The functional capacity of the COPD UBT microbiome indicated an increased capacity for bacterial growth, which could be an important feature in bacterial-associated acute exacerbations. Regression analyses correlated COPD severity (FEV1% of predicted) with differences in the abundance of Streptococcus pneumoniae and functional classifications related to a reduced capacity for bacterial sialic acid metabolism. This study suggests that the COPD UBT microbiome could be used in patient risk stratification and in identifying novel monitoring and treatment methods, but study of a longitudinal cohort will be required to unequivocally relate these features of the microbiome with COPD severity. PMID:26872143

  1. Metagenomic Sequencing of the Chronic Obstructive Pulmonary Disease Upper Bronchial Tract Microbiome Reveals Functional Changes Associated with Disease Severity

    PubMed Central

    Cameron, Simon J. S.; Lewis, Keir E.; Huws, Sharon A.; Lin, Wanchang; Hegarty, Matthew J.; Lewis, Paul D.; Mur, Luis A. J.; Pachebat, Justin A.

    2016-01-01

    Chronic Obstructive Pulmonary Disease (COPD) is a major source of mortality and morbidity worldwide. The microbiome associated with this disease may be an important component of the disease, though studies to date have been based on sequencing of the 16S rRNA gene, and have revealed unequivocal results. Here, we employed metagenomic sequencing of the upper bronchial tract (UBT) microbiome to allow for greater elucidation of its taxonomic composition, and revealing functional changes associated with the disease. The bacterial metagenomes within sputum samples from eight COPD patients and ten ‘healthy’ smokers (Controls) were sequenced, and suggested significant changes in the abundance of bacterial species, particularly within the Streptococcus genus. The functional capacity of the COPD UBT microbiome indicated an increased capacity for bacterial growth, which could be an important feature in bacterial-associated acute exacerbations. Regression analyses correlated COPD severity (FEV1% of predicted) with differences in the abundance of Streptococcus pneumoniae and functional classifications related to a reduced capacity for bacterial sialic acid metabolism. This study suggests that the COPD UBT microbiome could be used in patient risk stratification and in identifying novel monitoring and treatment methods, but study of a longitudinal cohort will be required to unequivocally relate these features of the microbiome with COPD severity. PMID:26872143

  2. Dietary history contributes to enterotype-like clustering and functional metagenomic content in the intestinal microbiome of wild mice

    PubMed Central

    Wang, Jun; Linnenbrink, Miriam; Künzel, Sven; Fernandes, Ricardo; Nadeau, Marie-Josée; Rosenstiel, Philip; Baines, John F.

    2014-01-01

    Understanding the origins of gut microbial community structure is critical for the identification and interpretation of potential fitness-related traits for the host. The presence of community clusters characterized by differences in the abundance of signature taxa, referred to as enterotypes, is a debated concept first reported in humans and later extended to other mammalian hosts. In this study, we provide a thorough assessment of their existence in wild house mice using a panel of evaluation criteria. We identify support for two clusters that are compositionally similar to clusters identified in humans, chimpanzees, and laboratory mice, characterized by differences in Bacteroides, Robinsoniella, and unclassified genera belonging to the family Lachnospiraceae. To further evaluate these clusters, we (i) monitored community changes associated with moving mice from the natural to a laboratory environment, (ii) performed functional metagenomic sequencing, and (iii) subjected wild-caught samples to stable isotope analysis to reconstruct dietary patterns. This process reveals differences in the proportions of genes involved in carbohydrate versus protein metabolism in the functional metagenome, as well as differences in plant- versus meat-derived food sources between clusters. In conjunction with wild-caught mice quickly changing their enterotype classification upon transfer to a standard laboratory chow diet, these results provide strong evidence that dietary history contributes to the presence of enterotype-like clustering in wild mice. PMID:24912178

  3. DOE JGI Quality Metrics; Approaches to Scaling and Improving Metagenome Assembly (Metagenomics Informatics Challenges Workshop: 10K Genomes at a Time)

    SciTech Connect

    Copeland, Alex; Brown, C Titus

    2011-10-13

    DOE JGI's Alex Copeland on "DOE JGI Quality Metrics" and Michigan State University's C. Titus Brown on "Approaches to Scaling and Improving Metagenome Assembly" at the Metagenomics Informatics Challenges Workshop held at the DOE JGI on October 12-13, 2011.

  4. DOE JGI Quality Metrics; Approaches to Scaling and Improving Metagenome Assembly (Metagenomics Informatics Challenges Workshop: 10K Genomes at a Time)

    ScienceCinema

    Copeland, Alex [DOE JGI]; Brown, C Titus [Michigan State University

    2013-01-22

    DOE JGI's Alex Copeland on "DOE JGI Quality Metrics" and Michigan State University's C. Titus Brown on "Approaches to Scaling and Improving Metagenome Assembly" at the Metagenomics Informatics Challenges Workshop held at the DOE JGI on October 12-13, 2011.

  5. Toward Cloning of the Magnetotactic Metagenome: Identification of Magnetosome Island Gene Clusters in Uncultivated Magnetotactic Bacteria from Different Aquatic Sediments▿

    PubMed Central

    Jogler, Christian; Lin, Wei; Meyerdierks, Anke; Kube, Michael; Katzmann, Emanuel; Flies, Christine; Pan, Yongxin; Amann, Rudolf; Reinhardt, Richard; Schüler, Dirk

    2009-01-01

    In this report, we describe the selective cloning of large DNA fragments from magnetotactic metagenomes from various aquatic habitats. This was achieved by a two-step magnetic enrichment which allowed the mass collection of environmental magnetotactic bacteria (MTB) virtually free of nonmagnetic contaminants. Four fosmid libraries were constructed and screened by end sequencing and hybridization analysis using heterologous magnetosome gene probes. A total of 14 fosmids were fully sequenced. We identified and characterized two fosmids, most likely originating from two different alphaproteobacterial strains of MTB that contain several putative operons with homology to the magnetosome island (MAI) of cultivated MTB. This is the first evidence that uncultivated MTB exhibit similar yet differing organizations of the MAI, which may account for the diversity in biomineralization and magnetotaxis observed in MTB from various environments. PMID:19395570

  6. Metagenomic sequence of saline desert microbiota from wild ass sanctuary, Little Rann of Kutch, Gujarat, India.

    PubMed

    Patel, Rajesh; Mevada, Vishal; Prajapati, Dhaval; Dudhagara, Pravin; Koringa, Prakash; Joshi, C G

    2015-03-01

    We report Metagenome from the saline desert soil sample of Little Rann of Kutch, Gujarat State, India. Metagenome consisted of 633,760 sequences with size 141,307,202 bp and 56% G + C content. Metagenome sequence data are available at EBI under EBI Metagenomics database with accession no. ERP005612. Community metagenomics revealed total 1802 species belonged to 43 different phyla with dominating Marinobacter (48.7%) and Halobacterium (4.6%) genus in bacterial and archaeal domain respectively. Remarkably, 18.2% sequences in a poorly characterized group and 4% gene for various stress responses along with versatile presence of commercial enzyme were evident in a functional metagenome analysis. PMID:26484162

  7. Shielding and fragmentation studies.

    PubMed

    Zeitlin, C; Guetersloh, S; Heilbronn, L; Miller, J

    2005-01-01

    Radiation dosimetry for manned spaced missions depends on the ability to adequately describe the process of high-energy ion transport through many materials. Since the types of possible nuclear interactions are many and complex, transport models are used which depend upon a reliable source of experimental data. To expand the heavy ion database used in the models we have been measuring charge-changing cross sections and fragment production cross sections from heavy-ion interactions in various elementa targets. These include materials flown on space missions such as carbon and aluminium, as well as those important in radiation dosimetry such as hydrogen, nitrogen and water. Measuring heavy-ion fragmentation through these targets also gives us the ability to determine the effectiveness of new materials proposed for shielding such as graphite composites and polyethylene hybrids. Measurement without a target present gives an indication of the level of contamination of the primary beam, which is also important in radiobiology experiments. PMID:16604611

  8. Metagenome analyses reveal the influence of the inoculant Lactobacillus buchneri CD034 on the microbial community involved in grass ensiling.

    PubMed

    Eikmeyer, Felix G; Köfinger, Petra; Poschenel, Andrea; Jünemann, Sebastian; Zakrzewski, Martha; Heinl, Stefan; Mayrhuber, Elisabeth; Grabherr, Reingard; Pühler, Alfred; Schwab, Helmut; Schlüter, Andreas

    2013-09-10

    Silage is green fodder conserved by lactic acid fermentation performed by epiphytic lactic acid bacteria under anaerobic conditions. To improve the ensiling process and the quality of the resulting silage, starter cultures are added to the fresh forage. A detailed analysis of the microbial community playing a role in grass ensiling has been carried out by high throughput sequencing technologies. Moreover, the influence of the inoculant Lactobacillus buchneri CD034 on the microbial community composition was studied. For this purpose, grass was ensiled untreated or inoculated with L. buchneri CD034. The fresh forage as well as silages after 14 and 58 days of fermentation were characterized physico-chemically. Characteristic silage conditions such as increased titers of lactic acid bacteria and higher concentrations of acetic acid were observed in the inoculated silage in comparison to the untreated samples. Taxonomic community profiles deduced from 16S rDNA amplicon sequences indicated that the relative abundance of Lactococci diminished in the course of fermentations and that the proportion of bacteria belonging to the phyla Proteobacteria and Bacteroidetes increased during the fermentation of untreated silage. In the inoculated silage, members of these phyla were repressed due to an increased abundance of Lactobacilli. In addition, metagenome analyses of silage samples confirmed taxonomic profiles based on 16S rDNA amplicons. Moreover, Lactobacillus plantarum, Lactobacillus brevis and Lactococcus lactis were found to be dominant species within silages as analyzed by means of fragment recruitments of metagenomic sequence reads on complete reference genome sequences. Fragment recruitments also provided clear evidence for the competitiveness of the inoculant strain L. buchneri CD034 during the fermentation of the inoculated silage. The inoculation strain was able to outcompete other community members and also affected physico-chemical characteristics of the silage

  9. Fragmentation of cancer cells

    NASA Astrophysics Data System (ADS)

    Vanapalli, Siva; Kamyabi, Nabiollah

    Tumor cells have to travel through blood capillaries to be able to metastasize and colonize in distant organs. Among the numerous cells that are shed by the primary tumor, very few survive in circulation. In vivo studies have shown that tumor cells can undergo breakup at microcapillary junctions affecting their survival. It is currently unclear what hydrodynamic and biomechanical factors contribute to fragmentation and moreover how different are the breakup dynamics of highly and weakly metastatic cells. In this study, we use microfluidics to investigate flow-induced breakup of prostate and breast cancer cells. We observe several different modes of breakup of cancer cells, which have striking similarities with breakup of viscous drops. We quantify the breakup time and find that highly metastatic cancer cells take longer to breakup than lowly metastatic cells suggesting that tumor cells may dynamically modify their deformability to avoid fragmentation. We also identify the role that cytoskeleton and membrane plays in the breakup process. Our study highlights the important role that tumor cell fragmentation plays in cancer metastasis. Cancer Prevention and Research Institute of Texas.

  10. Electroeluting DNA fragments.

    PubMed

    Zarzosa-Alvarez, Ana L; Sandoval-Cabrera, Antonio; Torres-Huerta, Ana L; Bermudez-Cruz, Rosa M

    2010-01-01

    Purified DNA fragments are used for different purposes in Molecular Biology and they can be prepared by several procedures. Most of them require a previous electrophoresis of the DNA fragments in order to separate the band of interest. Then, this band is excised out from an agarose or acrylamide gel and purified by using either: binding and elution from glass or silica particles, DEAE-cellulose membranes, "crush and soak method", electroelution or very often expensive commercial purification kits. Thus, selecting a method will depend mostly of what is available in the laboratory. The electroelution procedure allows one to purify very clean DNA to be used in a large number of applications (sequencing, radiolabeling, enzymatic restriction, enzymatic modification, cloning etc). This procedure consists in placing DNA band-containing agarose or acrylamide slices into sample wells of the electroeluter, then applying current will make the DNA fragment to leave the agarose and thus be trapped in a cushion salt to be recovered later by ethanol precipitation. PMID:20834225

  11. Fracture, failure, and fragmentation

    SciTech Connect

    Dienes, J.K.

    1984-01-01

    Though continuum descriptions of material behavior are useful for many kinds of problems, particularly those involving plastic flow, a more general approach is required when the failure is likely to involve growth and coalescence of a large number of fractures, as in fragmentation. Failures of this kind appear frequently in rapid dynamic processes such as those resulting from impacts and explosions, particularly in the formation of spall fragments. In the first part of this paper an approach to formulating constitutive relations that accounts for the opening, shear and growth of an ensemble of cracks is discussed. The approach also accounts for plastic flow accompanying fragmentation. The resulting constitutive relations have been incorporated into a Lagrangean computer program. In the second part of this paper a theoretical approach to coalescence is described. The simplest formulation makes use of a linear Liouville equation, with crack growth limited by the mean free path of cracks, assumed constant. This approach allows for an anisotropic distribution of cracks. An alternative approach is also described in which the decrease of the mean free path with increasing crack size is accounted for, but the crack distribution is assumed isotropic. A reduction of the governing Liouville equation to an ordinary differential equation of third order is possible, and the result can be used to determine how mean-free-path decreases with increasing crack size.

  12. Metagenomes from two microbial consortia associated with Santa Barbara seep oil.

    PubMed

    Hawley, Erik R; Malfatti, Stephanie A; Pagani, Ioanna; Huntemann, Marcel; Chen, Amy; Foster, Brian; Copeland, Alexander; del Rio, Tijana Glavina; Pati, Amrita; Jansson, Janet R; Gilbert, Jack A; Tringe, Susannah Green; Lorenson, Thomas D; Hess, Matthias

    2014-12-01

    The metagenomes from two microbial consortia associated with natural oils seeping into the Pacific Ocean offshore the coast of Santa Barbara (California, USA) were determined to complement already existing metagenomes generated from microbial communities associated with hydrocarbons that pollute the marine ecosystem. This genomics resource article is the first of two publications reporting a total of four new metagenomes from oils that seep into the Santa Barbara Channel. PMID:24958360

  13. MetLab: An In Silico Experimental Design, Simulation and Analysis Tool for Viral Metagenomics Studies

    PubMed Central

    Gourlé, Hadrien; Bongcam-Rudloff, Erik; Hayer, Juliette

    2016-01-01

    Metagenomics, the sequence characterization of all genomes within a sample, is widely used as a virus discovery tool as well as a tool to study viral diversity of animals. Metagenomics can be considered to have three main steps; sample collection and preparation, sequencing and finally bioinformatics. Bioinformatic analysis of metagenomic datasets is in itself a complex process, involving few standardized methodologies, thereby hampering comparison of metagenomics studies between research groups. In this publication the new bioinformatics framework MetLab is presented, aimed at providing scientists with an integrated tool for experimental design and analysis of viral metagenomes. MetLab provides support in designing the metagenomics experiment by estimating the sequencing depth needed for the complete coverage of a species. This is achieved by applying a methodology to calculate the probability of coverage using an adaptation of Stevens’ theorem. It also provides scientists with several pipelines aimed at simplifying the analysis of viral metagenomes, including; quality control, assembly and taxonomic binning. We also implement a tool for simulating metagenomics datasets from several sequencing platforms. The overall aim is to provide virologists with an easy to use tool for designing, simulating and analyzing viral metagenomes. The results presented here include a benchmark towards other existing software, with emphasis on detection of viruses as well as speed of applications. This is packaged, as comprehensive software, readily available for Linux and OSX users at https://github.com/norling/metlab. PMID:27479078

  14. MetLab: An In Silico Experimental Design, Simulation and Analysis Tool for Viral Metagenomics Studies.

    PubMed

    Norling, Martin; Karlsson-Lindsjö, Oskar E; Gourlé, Hadrien; Bongcam-Rudloff, Erik; Hayer, Juliette

    2016-01-01

    Metagenomics, the sequence characterization of all genomes within a sample, is widely used as a virus discovery tool as well as a tool to study viral diversity of animals. Metagenomics can be considered to have three main steps; sample collection and preparation, sequencing and finally bioinformatics. Bioinformatic analysis of metagenomic datasets is in itself a complex process, involving few standardized methodologies, thereby hampering comparison of metagenomics studies between research groups. In this publication the new bioinformatics framework MetLab is presented, aimed at providing scientists with an integrated tool for experimental design and analysis of viral metagenomes. MetLab provides support in designing the metagenomics experiment by estimating the sequencing depth needed for the complete coverage of a species. This is achieved by applying a methodology to calculate the probability of coverage using an adaptation of Stevens' theorem. It also provides scientists with several pipelines aimed at simplifying the analysis of viral metagenomes, including; quality control, assembly and taxonomic binning. We also implement a tool for simulating metagenomics datasets from several sequencing platforms. The overall aim is to provide virologists with an easy to use tool for designing, simulating and analyzing viral metagenomes. The results presented here include a benchmark towards other existing software, with emphasis on detection of viruses as well as speed of applications. This is packaged, as comprehensive software, readily available for Linux and OSX users at https://github.com/norling/metlab. PMID:27479078

  15. Fragment-based lead design

    NASA Astrophysics Data System (ADS)

    Filz, O. A.; Poroikov, Vladimir V.

    2012-02-01

    State-of-the-art approaches to the fragment-based design of organic compounds with desired properties are considered. The review covers methods, which are used in different steps of the design, such as computational methods for fragment library design, experimental and computational methods for fragment discovery and methods for the generation of structures of organic compounds. Examples are given of drug candidates, which were constructed using the fragment-based approach. The bibliography includes 156 references.

  16. Classification in Australia.

    ERIC Educational Resources Information Center

    McKinlay, John

    Despite some inroads by the Library of Congress Classification and short-lived experimentation with Universal Decimal Classification and Bliss Classification, Dewey Decimal Classification, with its ability in recent editions to be hospitable to local needs, remains the most widely used classification system in Australia. Although supplemented at…

  17. A Statistical Framework for the Functional Analysis of Metagenomes

    SciTech Connect

    Sharon, Itai; Pati, Amrita; Markowitz, Victor; Pinter, Ron Y.

    2008-10-01

    Metagenomic studies consider the genetic makeup of microbial communities as a whole, rather than their individual member organisms. The functional and metabolic potential of microbial communities can be analyzed by comparing the relative abundance of gene families in their collective genomic sequences (metagenome) under different conditions. Such comparisons require accurate estimation of gene family frequencies. They present a statistical framework for assessing these frequencies based on the Lander-Waterman theory developed originally for Whole Genome Shotgun (WGS) sequencing projects. They also provide a novel method for assessing the reliability of the estimations which can be used for removing seemingly unreliable measurements. They tested their method on a wide range of datasets, including simulated genomes and real WGS data from sequencing projects of whole genomes. Results suggest that their framework corrects inherent biases in accepted methods and provides a good approximation to the true statistics of gene families in WGS projects.

  18. A Microbial Metagenome (Leucobacter sp.) in Caenorhabditis Whole Genome Sequences.

    PubMed

    Percudani, Riccardo

    2013-01-01

    DNA of apparently recent bacterial origin is found in the genomic sequences of Caenorhabditis angaria and Caenorhabditis remanei. Here we present evidence that the DNA belongs to a single species of the genus Leucobacter (high-GC Gram+ Actinobacteria). Metagenomic tools enabled the assembly of the contaminating sequences in a draft genome of 3.2 Mb harboring 2,826 genes. This information provides insight into a microbial organism intimately associated with Caenorhabditis as well as a solid basis for the reassignment of 3,373 metazoan entries of the public database to a novel bacterial species (Leucobacter sp. AEAR). The application of metagenomic techniques can thus prevent annotation errors and reveal unexpected genetic information in data obtained by conventional genomics. PMID:23585714

  19. A metagenomics portal for a democratized sequencing world.

    PubMed

    Wilke, Andreas; Glass, Elizabeth M; Bartels, Daniela; Bischof, Jared; Braithwaite, Daniel; D'Souza, Mark; Gerlach, Wolfgang; Harrison, Travis; Keegan, Kevin; Matthews, Hunter; Kottmann, Renzo; Paczian, Tobias; Tang, Wei; Trimble, William L; Yilmaz, Pelin; Wilkening, Jared; Desai, Narayan; Meyer, Folker

    2013-01-01

    The democratized world of sequencing is leading to numerous data analysis challenges; MG-RAST addresses many of these challenges for diverse datasets, including amplicon datasets, shotgun metagenomes, and metatranscriptomes. The changes from version 2 to version 3 include the addition of a dedicated gene calling stage using FragGenescan, clustering of predicted proteins at 90% identity, and the use of BLAT for the computation of similarities. Together with changes in the underlying software infrastructure, this has enabled the dramatic scaling up of pipeline throughput while remaining on a limited hardware budget. The Web-based service allows upload, fully automated analysis, and visualization of results. As a result of the plummeting cost of sequencing and the readily available analytical power of MG-RAST, over 78,000 metagenomic datasets have been analyzed, with over 12,000 of them publicly available in MG-RAST. PMID:24060134

  20. The MG-RAST metagenomics database and portal in 2015.

    PubMed

    Wilke, Andreas; Bischof, Jared; Gerlach, Wolfgang; Glass, Elizabeth; Harrison, Travis; Keegan, Kevin P; Paczian, Tobias; Trimble, William L; Bagchi, Saurabh; Grama, Ananth; Chaterji, Somali; Meyer, Folker

    2016-01-01

    MG-RAST (http://metagenomics.anl.gov) is an open-submission data portal for processing, analyzing, sharing and disseminating metagenomic datasets. The system currently hosts over 200,000 datasets and is continuously updated. The volume of submissions has increased 4-fold over the past 24 months, now averaging 4 terabasepairs per month. In addition to several new features, we report changes to the analysis workflow and the technologies used to scale the pipeline up to the required throughput levels. To show possible uses for the data from MG-RAST, we present several examples integrating data and analyses from MG-RAST into popular third-party analysis tools or sequence alignment tools. PMID:26656948

  1. The MG-RAST Metagenomics Database and Portal in 2015

    DOE PAGESBeta

    Wilke, Andreas; Bischof, Jared; Gerlach, Wolfgang; Glass, Elizabeth; Harrison, Travis; Keegan, Kevin; Paczian, Tobias; Trimble, William L.; Bagchi, Saurabh; Grama, Ananth; et al

    2015-12-09

    MG-RAST (http://metagenomics.anl.gov) is an opensubmission data portal for processing, analyzing, sharing and disseminating metagenomic datasets. Currently, the system hosts over 200 000 datasets and is continuously updated. The volume of submissions has increased 4-fold over the past 24 months, now averaging 4 terabasepairs per month. In addition to several new features, we report changes to the analysis workflow and the technologies used to scale the pipeline up to the required throughput levels. Lastly, to show possible uses for the data from MG-RAST, we present several examples integrating data and analyses from MG-RAST into popular third-party analysis tools or sequence alignmentmore » tools.« less

  2. The MG-RAST Metagenomics Database and Portal in 2015

    SciTech Connect

    Wilke, Andreas; Bischof, Jared; Gerlach, Wolfgang; Glass, Elizabeth; Harrison, Travis; Keegan, Kevin; Paczian, Tobias; Trimble, William L.; Bagchi, Saurabh; Grama, Ananth; Chaterji, Somali; Meyer, Folker

    2015-12-09

    MG-RAST (http://metagenomics.anl.gov) is an opensubmission data portal for processing, analyzing, sharing and disseminating metagenomic datasets. Currently, the system hosts over 200 000 datasets and is continuously updated. The volume of submissions has increased 4-fold over the past 24 months, now averaging 4 terabasepairs per month. In addition to several new features, we report changes to the analysis workflow and the technologies used to scale the pipeline up to the required throughput levels. Lastly, to show possible uses for the data from MG-RAST, we present several examples integrating data and analyses from MG-RAST into popular third-party analysis tools or sequence alignment tools.

  3. Unlocking the potential of metagenomics through replicated experimental design

    PubMed Central

    Knight, Rob; Jansson, Janet; Field, Dawn; Fierer, Noah; Desai, Narayan; Fuhrman, Jed A.; Hugenholtz, Phil; van der Lelie, Daniel; Meyer, Folker; Stevens, Rick; Bailey, Mark J.; Gordon, Jeffrey I.; Kowalchuk, George A.; Gilbert, Jack A.

    2015-01-01

    Metagenomics holds enormous promise for discovering novel enzymes and organisms that are biomarkers or causes of processes relevant to disease, industry and the environment. In the last two years we have seen a paradigm shift in metagenomics to the application of broad cross-sectional and longitudinal studies enabled by advances in DNA sequencing and high-performance computing. These technologies now make it possible to broadly assess microbial diversity and function, allowing systematic investigation of the largely unexplored frontier of microbial life. To achieve this aim, the global scientific community must collaborate and agree upon common objectives and data standards to enable comparative research across the Earth’s microbiome. Improvements in comparability of data will facilitate the study of biotechnologically relevant processes such as bioprospecting for new glycoside hydrolases or identifying novel energy sources. PMID:22678395

  4. Functional metagenomic screen reveals new and diverse microbial rhodopsins

    PubMed Central

    Pushkarev, Alina; Béjà, Oded

    2016-01-01

    Ion-translocating retinylidene rhodopsins are widely distributed among marine and freshwater microbes. The translocation is light-driven, contributing to the production of biochemical energy in diverse microbes. Until today, most microbial rhodopsins had been detected using bioinformatics based on homology to other rhodopsins. In the past decade, there has been increased interest in microbial rhodopsins in the field of optogenetics since microbial rhodopsins were found to be most useful in vertebrate neuronal systems. Here we report on a functional metagenomic assay for detecting microbial rhodopsins. Using an array of narrow pH electrodes and light-emitting diode illumination, we were able to screen a metagenomic fosmid library to detect diverse marine proteorhodopsins and an actinorhodopsin based solely on proton-pumping activity. Our assay therefore provides a rather simple phenotypic means to enrich our understanding of microbial rhodopsins without any prior knowledge of the genomic content of the environmental entities screened. PMID:26894445

  5. Metagenomic Human Repiratory Air in a Hospital Environment

    PubMed Central

    Lang, Jidong; Tong, Xunliang; Zhang, Lina; Fang, Jianhuo; Xing, Jingli; Cai, Meng; Xu, Hongtao; Deng, Yan; Xiao, Fei; Tian, Geng

    2015-01-01

    Hospital-acquired infection (HAI) or nosocomial infection is an issue that frequent hospital environment. We believe conventional regulated Petri dish method is insufficient to evaluate HAI. To address this problem, metagenomic sequencing was applied to screen airborne microbes in four rooms of Beijing Hospital. With air-in amount of sampler being setup to one person’s respiration quantity, metagenomic sequencing identified huge numbers of species in the rooms which had already qualified widely accepted petridish exposing standard, imposing urgency for new technology. Meanwhile,the comparative culture only got small portion of recovered species and remain blind for even cultivable pathogens reminded us the limitations of old technologies. To the best of our knowledge, the method demonstrated in this study could be broadly applied in hospital indoor environment for various monitoring activities as well as HAI study. It is also potential as a transmissible pathogen real-time modelling system worldwide.

  6. Metagenome of a versatile chemolithoautotroph from expanding oceanic dead zones.

    PubMed

    Walsh, David A; Zaikova, Elena; Howes, Charles G; Song, Young C; Wright, Jody J; Tringe, Susannah G; Tortell, Philippe D; Hallam, Steven J

    2009-10-23

    Oxygen minimum zones, also known as oceanic "dead zones," are widespread oceanographic features currently expanding because of global warming. Although inhospitable to metazoan life, they support a cryptic microbiota whose metabolic activities affect nutrient and trace gas cycling within the global ocean. Here, we report metagenomic analyses of a ubiquitous and abundant but uncultivated oxygen minimum zone microbe (SUP05) related to chemoautotrophic gill symbionts of deep-sea clams and mussels. The SUP05 metagenome harbors a versatile repertoire of genes mediating autotrophic carbon assimilation, sulfur oxidation, and nitrate respiration responsive to a wide range of water-column redox states. Our analysis provides a genomic foundation for understanding the ecological and biogeochemical role of pelagic SUP05 in oxygen-deficient oceanic waters and its potential sensitivity to environmental changes. PMID:19900896

  7. A Metagenomic Framework for the Study of Airborne Microbial Communities

    PubMed Central

    Tenney, Aaron; McQuaid, Jeff; Williamson, Shannon; Thiagarajan, Mathangi; Brami, Daniel; Zeigler-Allen, Lisa; Hoffman, Jeff; Goll, Johannes B.; Fadrosh, Douglas; Glass, John; Adams, Mark D.; Friedman, Robert; Venter, J. Craig

    2013-01-01

    Understanding the microbial content of the air has important scientific, health, and economic implications. While studies have primarily characterized the taxonomic content of air samples by sequencing the 16S or 18S ribosomal RNA gene, direct analysis of the genomic content of airborne microorganisms has not been possible due to the extremely low density of biological material in airborne environments. We developed sampling and amplification methods to enable adequate DNA recovery to allow metagenomic profiling of air samples collected from indoor and outdoor environments. Air samples were collected from a large urban building, a medical center, a house, and a pier. Analyses of metagenomic data generated from these samples reveal airborne communities with a high degree of diversity and different genera abundance profiles. The identities of many of the taxonomic groups and protein families also allows for the identification of the likely sources of the sampled airborne bacteria. PMID:24349140

  8. MOCAT2: a metagenomic assembly, annotation and profiling framework

    PubMed Central

    Kultima, Jens Roat; Coelho, Luis Pedro; Forslund, Kristoffer; Huerta-Cepas, Jaime; Li, Simone S.; Driessen, Marja; Voigt, Anita Yvonne; Zeller, Georg; Sunagawa, Shinichi; Bork, Peer

    2016-01-01

    Summary: MOCAT2 is a software pipeline for metagenomic sequence assembly and gene prediction with novel features for taxonomic and functional abundance profiling. The automated generation and efficient annotation of non-redundant reference catalogs by propagating pre-computed assignments from 18 databases covering various functional categories allows for fast and comprehensive functional characterization of metagenomes. Availability and Implementation: MOCAT2 is implemented in Perl 5 and Python 2.7, designed for 64-bit UNIX systems and offers support for high-performance computer usage via LSF, PBS or SGE queuing systems; source code is freely available under the GPL3 license at http://mocat.embl.de. Contact: bork@embl.de Supplementary information: Supplementary data are available at Bioinformatics online. PMID:27153620

  9. Metagenomic approaches to understanding phylogenetic diversity in quorum sensing

    PubMed Central

    Kimura, Nobutada

    2014-01-01

    Quorum sensing, a form of cell–cell communication among bacteria, allows bacteria to synchronize their behaviors at the population level in order to control behaviors such as luminescence, biofilm formation, signal turnover, pigment production, antibiotics production, swarming, and virulence. A better understanding of quorum-sensing systems will provide us with greater insight into the complex interaction mechanisms used widely in the Bacteria and even the Archaea domain in the environment. Metagenomics, the use of culture-independent sequencing to study the genomic material of microorganisms, has the potential to provide direct information about the quorum-sensing systems in uncultured bacteria. This article provides an overview of the current knowledge of quorum sensing focused on phylogenetic diversity, and presents examples of studies that have used metagenomic techniques. Future technologies potentially related to quorum-sensing systems are also discussed. PMID:24429899

  10. The MG-RAST metagenomics database and portal in 2015

    PubMed Central

    Wilke, Andreas; Bischof, Jared; Gerlach, Wolfgang; Glass, Elizabeth; Harrison, Travis; Keegan, Kevin P.; Paczian, Tobias; Trimble, William L.; Bagchi, Saurabh; Grama, Ananth; Chaterji, Somali; Meyer, Folker

    2016-01-01

    MG-RAST (http://metagenomics.anl.gov) is an open-submission data portal for processing, analyzing, sharing and disseminating metagenomic datasets. The system currently hosts over 200 000 datasets and is continuously updated. The volume of submissions has increased 4-fold over the past 24 months, now averaging 4 terabasepairs per month. In addition to several new features, we report changes to the analysis workflow and the technologies used to scale the pipeline up to the required throughput levels. To show possible uses for the data from MG-RAST, we present several examples integrating data and analyses from MG-RAST into popular third-party analysis tools or sequence alignment tools. PMID:26656948

  11. Biogeography and individuality shape function in the human skin metagenome.

    PubMed

    Oh, Julia; Byrd, Allyson L; Deming, Clay; Conlan, Sean; Kong, Heidi H; Segre, Julia A

    2014-10-01

    The varied topography of human skin offers a unique opportunity to study how the body's microenvironments influence the functional and taxonomic composition of microbial communities. Phylogenetic marker gene-based studies have identified many bacteria and fungi that colonize distinct skin niches. Here metagenomic analyses of diverse body sites in healthy humans demonstrate that local biogeography and strong individuality define the skin microbiome. We developed a relational analysis of bacterial, fungal and viral communities, which showed not only site specificity but also individual signatures. We further identified strain-level variation of dominant species as heterogeneous and multiphyletic. Reference-free analyses captured the uncharacterized metagenome through the development of a multi-kingdom gene catalogue, which was used to uncover genetic signatures of species lacking reference genomes. This work is foundational for human disease studies investigating inter-kingdom interactions, metabolic changes and strain tracking, and defines the dual influence of biogeography and individuality on microbial composition and function. PMID:25279917

  12. Functional metagenomic screen reveals new and diverse microbial rhodopsins.

    PubMed

    Pushkarev, Alina; Béjà, Oded

    2016-09-01

    Ion-translocating retinylidene rhodopsins are widely distributed among marine and freshwater microbes. The translocation is light-driven, contributing to the production of biochemical energy in diverse microbes. Until today, most microbial rhodopsins had been detected using bioinformatics based on homology to other rhodopsins. In the past decade, there has been increased interest in microbial rhodopsins in the field of optogenetics since microbial rhodopsins were found to be most useful in vertebrate neuronal systems. Here we report on a functional metagenomic assay for detecting microbial rhodopsins. Using an array of narrow pH electrodes and light-emitting diode illumination, we were able to screen a metagenomic fosmid library to detect diverse marine proteorhodopsins and an actinorhodopsin based solely on proton-pumping activity. Our assay therefore provides a rather simple phenotypic means to enrich our understanding of microbial rhodopsins without any prior knowledge of the genomic content of the environmental entities screened. PMID:26894445

  13. Gene and translation initiation site prediction in metagenomic sequences

    SciTech Connect

    Hyatt, Philip Douglas; LoCascio, Philip F; Hauser, Loren John; Uberbacher, Edward C

    2012-01-01

    Gene prediction in metagenomic sequences remains a difficult problem. Current sequencing technologies do not achieve sufficient coverage to assemble the individual genomes in a typical sample; consequently, sequencing runs produce a large number of short sequences whose exact origin is unknown. Since these sequences are usually smaller than the average length of a gene, algorithms must make predictions based on very little data. We present MetaProdigal, a metagenomic version of the gene prediction program Prodigal, that can identify genes in short, anonymous coding sequences with a high degree of accuracy. The novel value of the method consists of enhanced translation initiation site identification, ability to identify sequences that use alternate genetic codes and confidence values for each gene call. We compare the results of MetaProdigal with other methods and conclude with a discussion of future improvements.

  14. Metagenome of a Versatile Chemolithoautotroph from Expanding Oceanic Dead Zones

    SciTech Connect

    Walsh, David A.; Zaikova, Elena; Howes, Charles L.; Song, Young; Wright, Jody; Tringe, Susannah G.; Tortell, Philippe D.; Hallam, Steven J.

    2009-07-15

    Oxygen minimum zones (OMZs), also known as oceanic"dead zones", are widespread oceanographic features currently expanding due to global warming and coastal eutrophication. Although inhospitable to metazoan life, OMZs support a thriving but cryptic microbiota whose combined metabolic activity is intimately connected to nutrient and trace gas cycling within the global ocean. Here we report time-resolved metagenomic analyses of a ubiquitous and abundant but uncultivated OMZ microbe (SUP05) closely related to chemoautotrophic gill symbionts of deep-sea clams and mussels. The SUP05 metagenome harbors a versatile repertoire of genes mediating autotrophic carbon assimilation, sulfur-oxidation and nitrate respiration responsive to a wide range of water column redox states. Thus, SUP05 plays integral roles in shaping nutrient and energy flow within oxygen-deficient oceanic waters via carbon sequestration, sulfide detoxification and biological nitrogen loss with important implications for marine productivity and atmospheric greenhouse control.

  15. Metagenomic insights into the dynamics of microbial communities in food.

    PubMed

    Kergourlay, Gilles; Taminiau, Bernard; Daube, Georges; Champomier Vergès, Marie-Christine

    2015-11-20

    Metagenomics has proven to be a powerful tool in exploring a large diversity of natural environments such as air, soil, water, and plants, as well as various human microbiota (e.g. digestive tract, lungs, skin). DNA sequencing techniques are becoming increasingly popular and less and less expensive. Given that high-throughput DNA sequencing approaches have only recently started to be used to decipher food microbial ecosystems, there is a significant growth potential for such technologies in the field of food microbiology. The aim of this review is to present a survey of recent food investigations via metagenomics and to illustrate how this approach can be a valuable tool in the better characterization of foods and their transformation, storage and safety. Traditional food in particular has been thoroughly explored by global approaches in order to provide information on multi-species and multi-organism communities. PMID:26414193

  16. Methylotrophs in natural habitats: current insights through metagenomics.

    PubMed

    Chistoserdova, Ludmila

    2015-07-01

    The focus of this review is on the recent data from the omics approaches, measuring the presence of methylotrophs in natural environments. Both Bacteria and Archaea are considered. The data are discussed in the context of the current knowledge on the biochemistry of methylotrophy and the physiology of cultivated methylotrophs. One major issue discussed is the recent metagenomic data pointing toward the activity of "aerobic" methanotrophs, such as Methylobacter, in microoxic or hypoxic conditions. A related issue of the metabolic distinction between aerobic and "anaerobic" methylotrophy is addressed in the light of the genomic and metagenomic data for respective organisms. The role of communities, as opposed to single-organism activities in environmental cycling of single-carbon compounds, such as methane, is also discussed. In addition, the emerging issue of the role of non-traditional methylotrophs in global metabolism of single-carbon compounds and the role of methylotrophy pathways in non-methylotrophs is briefly mentioned. PMID:26051673

  17. DIME: a novel framework for de novo metagenomic sequence assembly.

    PubMed

    Guo, Xuan; Yu, Ning; Ding, Xiaojun; Wang, Jianxin; Pan, Yi

    2015-02-01

    The recently developed next generation sequencing platforms not only decrease the cost for metagenomics data analysis, but also greatly enlarge the size of metagenomic sequence datasets. A common bottleneck of available assemblers is that the trade-off between the noise of the resulting contigs and the gain in sequence length for better annotation has not been attended enough for large-scale sequencing projects, especially for the datasets with low coverage and a large number of nonoverlapping contigs. To address this limitation and promote both accuracy and efficiency, we develop a novel metagenomic sequence assembly framework, DIME, by taking the DIvide, conquer, and MErge strategies. In addition, we give two MapReduce implementations of DIME, DIME-cap3 and DIME-genovo, on Apache Hadoop platform. For a systematic comparison of the performance of the assembly tasks, we tested DIME and five other popular short read assembly programs, Cap3, Genovo, MetaVelvet, SOAPdenovo, and SPAdes on four synthetic and three real metagenomic sequence datasets with various reads from fifty thousand to a couple million in size. The experimental results demonstrate that our method not only partitions the sequence reads with an extremely high accuracy, but also reconstructs more bases, generates higher quality assembled consensus, and yields higher assembly scores, including corrected N50 and BLAST-score-per-base, than other tools with a nearly theoretical speed-up. Results indicate that DIME offers great improvement in assembly across a range of sequence abundances and thus is robust to decreasing coverage. PMID:25684202

  18. DIME: A Novel Framework for De Novo Metagenomic Sequence Assembly

    PubMed Central

    Guo, Xuan; Yu, Ning; Ding, Xiaojun; Wang, Jianxin

    2015-01-01

    Abstract The recently developed next generation sequencing platforms not only decrease the cost for metagenomics data analysis, but also greatly enlarge the size of metagenomic sequence datasets. A common bottleneck of available assemblers is that the trade-off between the noise of the resulting contigs and the gain in sequence length for better annotation has not been attended enough for large-scale sequencing projects, especially for the datasets with low coverage and a large number of nonoverlapping contigs. To address this limitation and promote both accuracy and efficiency, we develop a novel metagenomic sequence assembly framework, DIME, by taking the DIvide, conquer, and MErge strategies. In addition, we give two MapReduce implementations of DIME, DIME-cap3 and DIME-genovo, on Apache Hadoop platform. For a systematic comparison of the performance of the assembly tasks, we tested DIME and five other popular short read assembly programs, Cap3, Genovo, MetaVelvet, SOAPdenovo, and SPAdes on four synthetic and three real metagenomic sequence datasets with various reads from fifty thousand to a couple million in size. The experimental results demonstrate that our method not only partitions the sequence reads with an extremely high accuracy, but also reconstructs more bases, generates higher quality assembled consensus, and yields higher assembly scores, including corrected N50 and BLAST-score-per-base, than other tools with a nearly theoretical speed-up. Results indicate that DIME offers great improvement in assembly across a range of sequence abundances and thus is robust to decreasing coverage. PMID:25684202

  19. Bioprospecting metagenomics of decaying wood: mining for new glycoside hydrolases

    PubMed Central

    2011-01-01

    Background To efficiently deconstruct recalcitrant plant biomass to fermentable sugars in industrial processes, biocatalysts of higher performance and lower cost are required. The genetic diversity found in the metagenomes of natural microbial biomass decay communities may harbor such enzymes. Our goal was to discover and characterize new glycoside hydrolases (GHases) from microbial biomass decay communities, especially those from unknown or never previously cultivated microorganisms. Results From the metagenome sequences of an anaerobic microbial community actively decaying poplar biomass, we identified approximately 4,000 GHase homologs. Based on homology to GHase families/activities of interest and the quality of the sequences, candidates were selected for full-length cloning and subsequent expression. As an alternative strategy, a metagenome expression library was constructed and screened for GHase activities. These combined efforts resulted in the cloning of four novel GHases that could be successfully expressed in Escherichia coli. Further characterization showed that two enzymes showed significant activity on p-nitrophenyl-α-L-arabinofuranoside, one enzyme had significant activity against p-nitrophenyl-β-D-glucopyranoside, and one enzyme showed significant activity against p-nitrophenyl-β-D-xylopyranoside. Enzymes were also tested in the presence of ionic liquids. Conclusions Metagenomics provides a good resource for mining novel biomass degrading enzymes and for screening of cellulolytic enzyme activities. The four GHases that were cloned may have potential application for deconstruction of biomass pretreated with ionic liquids, as they remain active in the presence of up to 20% ionic liquid (except for 1-ethyl-3-methylimidazolium diethyl phosphate). Alternatively, ionic liquids might be used to immobilize or stabilize these enzymes for minimal solvent processing of biomass. PMID:21816041

  20. The Challenge and Potential of Metagenomics in the Clinic

    PubMed Central

    Mulcahy-O’Grady, Heidi; Workentine, Matthew L.

    2016-01-01

    The bacteria, fungi, and viruses that live on and in us have a tremendous impact on our day-to-day health and are often linked to many diseases, including autoimmune disorders and infections. Diagnosing and treating these disorders relies on accurate identification and characterization of the microbial community. Current sequencing technologies allow the sequencing of the entire nucleic acid complement of a sample providing an accurate snapshot of the community members present in addition to the full genetic potential of that microbial community. There are a number of clinical applications that stand to benefit from these data sets, such as the rapid identification of pathogens present in a sample. Other applications include the identification of antibiotic-resistance genes, diagnosis and treatment of gastrointestinal disorders, and many other diseases associated with bacterial, viral, and fungal microbiomes. Metagenomics also allows the physician to probe more complex phenotypes such as microbial dysbiosis with intestinal disorders and disruptions of the skin microbiome that may be associated with skin disorders. Many of these disorders are not associated with a single pathogen but emerge as a result of complex ecological interactions within microbiota. Currently, we understand very little about these complex phenotypes, yet clearly they are important and in some cases, as with fecal microbiota transplants in Clostridium difficile infections, treating the microbiome of the patient is effective. Here, we give an overview of metagenomics and discuss a number of areas where metagenomics is applicable in the clinic, and progress being made in these areas. This includes (1) the identification of unknown pathogens, and those pathogens particularly hard to culture, (2) utilizing functional information and gene content to understand complex infections such as Clostridium difficile, and (3) predicting antimicrobial resistance of the community using genetic determinants of

  1. Metagenomic Approaches for Defining the Pathogenesis of Inflammatory Bowel Diseases

    PubMed Central

    Peterson, Daniel A.; Frank, Daniel N.; Pace, Norman R.; Gordon, Jeffrey I.

    2010-01-01

    The human gastrointestinal tract is home to immense and complex populations of microorganisms. Through recent technical innovations, the diversity present in this human body habitat is now being subjected to detailed analyses. This review focuses on the microbial ecology of the gut in inflammatory bowel diseases, and how recent studies provide an impetus for delving into the structure and operations of the gut microbial community, and its interrelationships with the immune system, using carefully designed, comparative metagenomic approaches. PMID:18541218

  2. Bioprospecting metagenomics of decaying wood: mining for new glycoside hydrolases

    SciTech Connect

    Li L. L.; van der Lelie D.; Taghavi, S.; McCorkle, S. M.; Zhang, Y.-B.; Blewitt, M. G.; Brunecky, R.; Adney, W. S.; Himmel, M. E.; Brumm, P.; Drinkwater, C.; Mead, D. A.; Tringe, S. G.

    2011-08-01

    To efficiently deconstruct recalcitrant plant biomass to fermentable sugars in industrial processes, biocatalysts of higher performance and lower cost are required. The genetic diversity found in the metagenomes of natural microbial biomass decay communities may harbor such enzymes. Our goal was to discover and characterize new glycoside hydrolases (GHases) from microbial biomass decay communities, especially those from unknown or never previously cultivated microorganisms. From the metagenome sequences of an anaerobic microbial community actively decaying poplar biomass, we identified approximately 4,000 GHase homologs. Based on homology to GHase families/activities of interest and the quality of the sequences, candidates were selected for full-length cloning and subsequent expression. As an alternative strategy, a metagenome expression library was constructed and screened for GHase activities. These combined efforts resulted in the cloning of four novel GHases that could be successfully expressed in Escherichia coli. Further characterization showed that two enzymes showed significant activity on p-nitrophenyl-{alpha}-L-arabinofuranoside, one enzyme had significant activity against p-nitrophenyl-{beta}-D-glucopyranoside, and one enzyme showed significant activity against p-nitrophenyl-{beta}-D-xylopyranoside. Enzymes were also tested in the presence of ionic liquids. Metagenomics provides a good resource for mining novel biomass degrading enzymes and for screening of cellulolytic enzyme activities. The four GHases that were cloned may have potential application for deconstruction of biomass pretreated with ionic liquids, as they remain active in the presence of up to 20% ionic liquid (except for 1-ethyl-3-methylimidazolium diethyl phosphate). Alternatively, ionic liquids might be used to immobilize or stabilize these enzymes for minimal solvent processing of biomass.

  3. Virtual fragment preparation for computational fragment-based drug design.

    PubMed

    Ludington, Jennifer L

    2015-01-01

    Fragment-based drug design (FBDD) has become an important component of the drug discovery process. The use of fragments can accelerate both the search for a hit molecule and the development of that hit into a lead molecule for clinical testing. In addition to experimental methodologies for FBDD such as NMR and X-ray Crystallography screens, computational techniques are playing an increasingly important role. The success of the computational simulations is due in large part to how the database of virtual fragments is prepared. In order to prepare the fragments appropriately it is necessary to understand how FBDD differs from other approaches and the issues inherent in building up molecules from smaller fragment pieces. The ultimate goal of these calculations is to link two or more simulated fragments into a molecule that has an experimental binding affinity consistent with the additive predicted binding affinities of the virtual fragments. Computationally predicting binding affinities is a complex process, with many opportunities for introducing error. Therefore, care should be taken with the fragment preparation procedure to avoid introducing additional inaccuracies.This chapter is focused on the preparation process used to create a virtual fragment database. Several key issues of fragment preparation which affect the accuracy of binding affinity predictions are discussed. The first issue is the selection of the two-dimensional atomic structure of the virtual fragment. Although the particular usage of the fragment can affect this choice (i.e., whether the fragment will be used for calibration, binding site characterization, hit identification, or lead optimization), general factors such as synthetic accessibility, size, and flexibility are major considerations in selecting the 2D structure. Other aspects of preparing the virtual fragments for simulation are the generation of three-dimensional conformations and the assignment of the associated atomic point charges

  4. THE WESTERN LAKE SUPERIOR COMPARATIVE WATERSHED FRAMEWORK: A FIELD TEST OF GEOGRAPHICALLY-DEPENDENT VS. THRESHOLD-BASED GEOGRAPHICALLY-INDEPENDENT CLASSIFICATION

    EPA Science Inventory

    Stratified random selection of watersheds allowed us to compare geographically-independent classification schemes based on watershed storage (wetland + lake area/watershed area) and forest fragmentation with a geographically-based classification scheme within the Northern Lakes a...

  5. Diversity of putative archaeal RNA viruses in metagenomic datasets of a yellowstone acidic hot spring.

    PubMed

    Wang, Hongming; Yu, Yongxin; Liu, Taigang; Pan, Yingjie; Yan, Shuling; Wang, Yongjie

    2015-01-01

    Two genomic fragments (5,662 and 1,269 nt in size, GenBank accession no. JQ756122 and JQ756123, respectively) of novel, positive-strand RNA viruses that infect archaea were first discovered in an acidic hot spring in Yellowstone National Park (Bolduc et al., 2012). To investigate the diversity of these newly identified putative archaeal RNA viruses, global metagenomic datasets were searched for sequences that were significantly similar to those of the viruses. A total of 3,757 associated reads were retrieved solely from the Yellowstone datasets and were used to assemble the genomes of the putative archaeal RNA viruses. Nine contigs with lengths ranging from 417 to 5,866 nt were obtained, 4 of which were longer than 2,200 nt; one contig was 204 nt longer than JQ756122, representing the longest genomic sequence of the putative archaeal RNA viruses. These contigs revealed more than 50% sequence similarity to JQ756122 or JQ756123 and may be partial or nearly complete genomes of novel genogroups or genotypes of the putative archaeal RNA viruses. Sequence and phylogenetic analyses indicated that the archaeal RNA viruses are genetically diverse, with at least 3 related viral lineages in the Yellowstone acidic hot spring environment. PMID:25918685

  6. Identification of an antibacterial protein by functional screening of a human oral metagenomic library.

    PubMed

    Arivaradarajan, Preeti; Warburton, Philip J; Paramasamy, Gunasekaran; Nair, Sean P; Allan, Elaine; Mullany, Peter

    2015-09-01

    Screening of a bacterial artificial chromosome (BAC) library containing metagenomic DNA from human plaque and saliva allowed the isolation of four clones producing antimicrobial activity. Three of these were pigmented and encoded homologues of glutamyl-tRNA reductase (GluTR), an enzyme involved in the C5 pathway leading to tetrapyrole synthesis, and one clone had antibacterial activity with no pigmentation. The latter contained a BAC with an insert of 15.6 kb. Initial attempts to localize the gene(s) responsible for antimicrobial activity by subcloning into pUC-based vectors failed. A new plasmid for toxic gene expression (pTGEX) was designed enabling localization of the antibacterial activity to a 4.7-kb HindIII fragment. Transposon mutagenesis localized the gene to an open reading frame of 483 bp designated antibacterial protein1 (abp1). Abp1 was 94% identical to a hypothetical protein of Neisseria subflava (accession number WP_004519448.1). An Escherichia coli clone expressing Abp1 exhibited antibacterial activity against Bacillus subtilis BS78H, Staphylococcus epidermidis NCTC 11964 and B4268, and S. aureus NCTC 12493,ATCC 35696 and NCTC 11561. However, no antibacterial activity was observed against Pseudomonas aeruginosa ATCC 9027, N. subflava ATCC A1078, E. coli K12 JM109 and BL21(DE3) Fusobacterium nucleatum ATCC 25586 and NCTC 11326, Prevotella intermedia ATCC 25611, Veillonella parvula ATCC 10790 or Lactobacillus casei NCTC 6375. PMID:26347298

  7. ExoMeg1: a new exonuclease from metagenomic library.

    PubMed

    Silva-Portela, Rita C B; Carvalho, Fabíola M; Pereira, Carolina P M; de Souza-Pinto, Nadja C; Modesti, Mauro; Fuchs, Robert P; Agnez-Lima, Lucymara F

    2016-01-01

    DNA repair mechanisms are responsible for maintaining the integrity of DNA and are essential to life. However, our knowledge of DNA repair mechanisms is based on model organisms such as Escherichia coli, and little is known about free living and uncultured microorganisms. In this study, a functional screening was applied in a metagenomic library with the goal of discovering new genes involved in the maintenance of genomic integrity. One clone was identified and the sequence analysis showed an open reading frame homolog to a hypothetical protein annotated as a member of the Exo_Endo_Phos superfamily. This novel enzyme shows 3'-5' exonuclease activity on single and double strand DNA substrates and it is divalent metal-dependent, EDTA-sensitive and salt resistant. The clone carrying the hypothetical ORF was able to complement strains deficient in recombination or base excision repair, suggesting that the new enzyme may be acting on the repair of single strand breaks with 3' blockers, which are substrates for these repair pathways. Because this is the first report of an enzyme obtained from a metagenomic approach showing exonuclease activity, it was named ExoMeg1. The metagenomic approach has proved to be a useful tool for identifying new genes of uncultured microorganisms. PMID:26815639

  8. Metagenome Sequencing of the Hadza Hunter-Gatherer Gut Microbiota.

    PubMed

    Rampelli, Simone; Schnorr, Stephanie L; Consolandi, Clarissa; Turroni, Silvia; Severgnini, Marco; Peano, Clelia; Brigidi, Patrizia; Crittenden, Alyssa N; Henry, Amanda G; Candela, Marco

    2015-06-29

    Through human microbiome sequencing, we can better understand how host evolutionary and ontogenetic history is reflected in the microbial function. However, there has been no information on the gut metagenome configuration in hunter-gatherer populations, posing a gap in our knowledge of gut microbiota (GM)-host mutualism arising from a lifestyle that describes over 90% of human evolutionary history. Here, we present the first metagenomic analysis of GM from Hadza hunter-gatherers of Tanzania, showing a unique enrichment in metabolic pathways that aligns with the dietary and environmental factors characteristic of their foraging lifestyle. We found that the Hadza GM is adapted for broad-spectrum carbohydrate metabolism, reflecting the complex polysaccharides in their diet. Furthermore, the Hadza GM is equipped for branched-chain amino acid degradation and aromatic amino acid biosynthesis. Resistome functionality demonstrates the existence of antibiotic resistance genes in a population with little antibiotic exposure, indicating the ubiquitous presence of environmentally derived resistances. Our results demonstrate how the functional specificity of the GM correlates with certain environment and lifestyle factors and how complexity from the exogenous environment can be balanced by endogenous homeostasis. The Hadza gut metagenome structure allows us to appreciate the co-adaptive functional role of the GM in complementing the human physiology, providing a better understanding of the versatility of human life and subsistence. PMID:25981789

  9. Culture-independent discovery of natural products from soil metagenomes.

    PubMed

    Katz, Micah; Hover, Bradley M; Brady, Sean F

    2016-03-01

    Bacterial natural products have proven to be invaluable starting points in the development of many currently used therapeutic agents. Unfortunately, traditional culture-based methods for natural product discovery have been deemphasized by pharmaceutical companies due in large part to high rediscovery rates. Culture-independent, or "metagenomic," methods, which rely on the heterologous expression of DNA extracted directly from environmental samples (eDNA), have the potential to provide access to metabolites encoded by a large fraction of the earth's microbial biosynthetic diversity. As soil is both ubiquitous and rich in bacterial diversity, it is an appealing starting point for culture-independent natural product discovery efforts. This review provides an overview of the history of soil metagenome-driven natural product discovery studies and elaborates on the recent development of new tools for sequence-based, high-throughput profiling of environmental samples used in discovering novel natural product biosynthetic gene clusters. We conclude with several examples of these new tools being employed to facilitate the recovery of novel secondary metabolite encoding gene clusters from soil metagenomes and the subsequent heterologous expression of these clusters to produce bioactive small molecules. PMID:26586404

  10. ExoMeg1: a new exonuclease from metagenomic library

    PubMed Central

    Silva-Portela, Rita C. B.; Carvalho, Fabíola M.; Pereira, Carolina P. M.; de Souza-Pinto, Nadja C.; Modesti, Mauro; Fuchs, Robert P.; Agnez-Lima, Lucymara F.

    2016-01-01

    DNA repair mechanisms are responsible for maintaining the integrity of DNA and are essential to life. However, our knowledge of DNA repair mechanisms is based on model organisms such as Escherichia coli, and little is known about free living and uncultured microorganisms. In this study, a functional screening was applied in a metagenomic library with the goal of discovering new genes involved in the maintenance of genomic integrity. One clone was identified and the sequence analysis showed an open reading frame homolog to a hypothetical protein annotated as a member of the Exo_Endo_Phos superfamily. This novel enzyme shows 3′-5′ exonuclease activity on single and double strand DNA substrates and it is divalent metal-dependent, EDTA-sensitive and salt resistant. The clone carrying the hypothetical ORF was able to complement strains deficient in recombination or base excision repair, suggesting that the new enzyme may be acting on the repair of single strand breaks with 3′ blockers, which are substrates for these repair pathways. Because this is the first report of an enzyme obtained from a metagenomic approach showing exonuclease activity, it was named ExoMeg1. The metagenomic approach has proved to be a useful tool for identifying new genes of uncultured microorganisms. PMID:26815639

  11. Bioinformatic Approaches Reveal Metagenomic Characterization of Soil Microbial Community

    PubMed Central

    Xu, Zhuofei; Hansen, Martin Asser; Hansen, Lars H.; Jacquiod, Samuel; Sørensen, Søren J.

    2014-01-01

    As is well known, soil is a complex ecosystem harboring the most prokaryotic biodiversity on the Earth. In recent years, the advent of high-throughput sequencing techniques has greatly facilitated the progress of soil ecological studies. However, how to effectively understand the underlying biological features of large-scale sequencing data is a new challenge. In the present study, we used 33 publicly available metagenomes from diverse soil sites (i.e. grassland, forest soil, desert, Arctic soil, and mangrove sediment) and integrated some state-of-the-art computational tools to explore the phylogenetic and functional characterizations of the microbial communities in soil. Microbial composition and metabolic potential in soils were comprehensively illustrated at the metagenomic level. A spectrum of metagenomic biomarkers containing 46 taxa and 33 metabolic modules were detected to be significantly differential that could be used as indicators to distinguish at least one of five soil communities. The co-occurrence associations between complex microbial compositions and functions were inferred by network-based approaches. Our results together with the established bioinformatic pipelines should provide a foundation for future research into the relation between soil biodiversity and ecosystem function. PMID:24691166

  12. PhyloSift: phylogenetic analysis of genomes and metagenomes

    PubMed Central

    Jospin, Guillaume; Lowe, Eric; Matsen, Frederick A.; Bik, Holly M.; Eisen, Jonathan A.

    2014-01-01

    Like all organisms on the planet, environmental microbes are subject to the forces of molecular evolution. Metagenomic sequencing provides a means to access the DNA sequence of uncultured microbes. By combining DNA sequencing of microbial communities with evolutionary modeling and phylogenetic analysis we might obtain new insights into microbiology and also provide a basis for practical tools such as forensic pathogen detection. In this work we present an approach to leverage phylogenetic analysis of metagenomic sequence data to conduct several types of analysis. First, we present a method to conduct phylogeny-driven Bayesian hypothesis tests for the presence of an organism in a sample. Second, we present a means to compare community structure across a collection of many samples and develop direct associations between the abundance of certain organisms and sample metadata. Third, we apply new tools to analyze the phylogenetic diversity of microbial communities and again demonstrate how this can be associated to sample metadata. These analyses are implemented in an open source software pipeline called PhyloSift. As a pipeline, PhyloSift incorporates several other programs including LAST, HMMER, and pplacer to automate phylogenetic analysis of protein coding and RNA sequences in metagenomic datasets generated by modern sequencing platforms (e.g., Illumina, 454). PMID:24482762

  13. Metagenomic abundance estimation and diagnostic testing on species level

    PubMed Central

    Lindner, Martin S.; Renard, Bernhard Y.

    2013-01-01

    One goal of sequencing-based metagenomic community analysis is the quantitative taxonomic assessment of microbial community compositions. In particular, relative quantification of taxons is of high relevance for metagenomic diagnostics or microbial community comparison. However, the majority of existing approaches quantify at low resolution (e.g. at phylum level), rely on the existence of special genes (e.g. 16S), or have severe problems discerning species with highly similar genome sequences. Yet, problems as metagenomic diagnostics require accurate quantification on species level. We developed Genome Abundance Similarity Correction (GASiC), a method to estimate true genome abundances via read alignment by considering reference genome similarities in a non-negative LASSO approach. We demonstrate GASiC’s superior performance over existing methods on simulated benchmark data as well as on real data. In addition, we present applications to datasets of both bacterial DNA and viral RNA source. We further discuss our approach as an alternative to PCR-based DNA quantification. PMID:22941661

  14. Functional metagenomic selection of RubisCOs from uncultivated bacteria

    USGS Publications Warehouse

    Varaljay, Vanessa A; Satagopan, Sriram; North, Justin A.; Witteveen, Briana; Dourado, Manuella N.; Anantharaman, Karthik; Arbing, Mark A.; McCann, Shelley; Oremland, Ronald S.; Banfield, Jillian F.; Wrighton, Kelly C.; Tabita, F. Robert

    2016-01-01

    Ribulose 1,5-bisphosphate carboxylase/oxygenase (RubisCO) is a critical yet severely inefficient enzyme that catalyses the fixation of virtually all of the carbon found on Earth. Here, we report a functional metagenomic selection that recovers physiologically active RubisCO molecules directly from uncultivated and largely unknown members of natural microbial communities. Selection is based on CO2-dependent growth in a host strain capable of expressing environmental deoxyribonucleic acid (DNA), precluding the need for pure cultures or screening of recombinant clones for enzymatic activity. Seventeen functional RubisCO-encoded sequences were selected using DNA extracted from soil and river autotrophic enrichments, a photosynthetic biofilm and a subsurface groundwater aquifer. Notably, three related form II RubisCOs were recovered which share high sequence similarity with metagenomic scaffolds from uncultivated members of theGallionellaceae family. One of the Gallionellaceae RubisCOs was purified and shown to possessCO2/O2 specificity typical of form II enzymes. X-ray crystallography determined that this enzyme is a hexamer, only the second form II multimer ever solved and the first RubisCO structure obtained from an uncultivated bacterium. Functional metagenomic selection leverages natural biological diversity and billions of years of evolution inherent in environmental communities, providing a new window into the discovery of CO2-fixing enzymes not previously characterized.

  15. Forest harvesting reduces the soil metagenomic potential for biomass decomposition.

    PubMed

    Cardenas, Erick; Kranabetter, J M; Hope, Graeme; Maas, Kendra R; Hallam, Steven; Mohn, William W

    2015-11-01

    Soil is the key resource that must be managed to ensure sustainable forest productivity. Soil microbial communities mediate numerous essential ecosystem functions, and recent studies show that forest harvesting alters soil community composition. From a long-term soil productivity study site in a temperate coniferous forest in British Columbia, 21 forest soil shotgun metagenomes were generated, totaling 187 Gb. A method to analyze unassembled metagenome reads from the complex community was optimized and validated. The subsequent metagenome analysis revealed that, 12 years after forest harvesting, there were 16% and 8% reductions in relative abundances of biomass decomposition genes in the organic and mineral soil layers, respectively. Organic and mineral soil layers differed markedly in genetic potential for biomass degradation, with the organic layer having greater potential and being more strongly affected by harvesting. Gene families were disproportionately affected, and we identified 41 gene families consistently affected by harvesting, including families involved in lignin, cellulose, hemicellulose and pectin degradation. The results strongly suggest that harvesting profoundly altered below-ground cycling of carbon and other nutrients at this site, with potentially important consequences for forest regeneration. Thus, it is important to determine whether these changes foreshadow long-term changes in forest productivity or resilience and whether these changes are broadly characteristic of harvested forests. PMID:25909978

  16. New viruses in veterinary medicine, detected by metagenomic approaches.

    PubMed

    Belák, Sándor; Karlsson, Oskar E; Blomström, Anne-Lie; Berg, Mikael; Granberg, Fredrik

    2013-07-26

    In our world, which is faced today with exceptional environmental changes and dramatically intensifying globalisation, we are encountering challenges due to many new factors, including the emergence or re-emergence of novel, so far "unknown" infectious diseases. Although a broad arsenal of diagnostic methods is at our disposal, the majority of the conventional diagnostic tests is highly virus-specific or is targeted entirely towards a limited group of infectious agents. This specificity complicates or even hinders the detection of new or unexpected pathogens, such as new, emerging or re-emerging viruses or novel viral variants. The recently developed approaches of viral metagenomics provide an effective novel way to screen samples and detect viruses without previous knowledge of the infectious agent, thereby enabling a better diagnosis and disease control, in line with the "One World, One Health" principles (www.oneworldonehealth.org). Using metagenomic approaches, we have recently identified a broad variety of new viruses, such as novel bocaviruses, Torque Teno viruses, astroviruses, rotaviruses and kobuviruses in porcine disease syndromes, new virus variants in honeybee populations, as well as a range of other infectious agents in further host species. These findings indicate that the metagenomic detection of viral pathogens is becoming now a powerful, cultivation-independent, and useful novel diagnostic tool in veterinary diagnostic virology. PMID:23428379

  17. Bioinformatic approaches reveal metagenomic characterization of soil microbial community.

    PubMed

    Xu, Zhuofei; Hansen, Martin Asser; Hansen, Lars H; Jacquiod, Samuel; Sørensen, Søren J

    2014-01-01

    As is well known, soil is a complex ecosystem harboring the most prokaryotic biodiversity on the Earth. In recent years, the advent of high-throughput sequencing techniques has greatly facilitated the progress of soil ecological studies. However, how to effectively understand the underlying biological features of large-scale sequencing data is a new challenge. In the present study, we used 33 publicly available metagenomes from diverse soil sites (i.e. grassland, forest soil, desert, Arctic soil, and mangrove sediment) and integrated some state-of-the-art computational tools to explore the phylogenetic and functional characterizations of the microbial communities in soil. Microbial composition and metabolic potential in soils were comprehensively illustrated at the metagenomic level. A spectrum of metagenomic biomarkers containing 46 taxa and 33 metabolic modules were detected to be significantly differential that could be used as indicators to distinguish at least one of five soil communities. The co-occurrence associations between complex microbial compositions and functions were inferred by network-based approaches. Our results together with the established bioinformatic pipelines should provide a foundation for future research into the relation between soil biodiversity and ecosystem function. PMID:24691166

  18. Expanding the catalog of cas genes with metagenomes.

    PubMed

    Zhang, Quan; Doak, Thomas G; Ye, Yuzhen

    2014-02-01

    The CRISPR (clusters of regularly interspaced short palindromic repeats)-Cas adaptive immune system is an important defense system in bacteria, providing targeted defense against invasions of foreign nucleic acids. CRISPR-Cas systems consist of CRISPR loci and cas (CRISPR-associated) genes: sequence segments of invaders are incorporated into host genomes at CRISPR loci to generate specificity, while adjacent cas genes encode proteins that mediate the defense process. We pursued an integrated approach to identifying putative cas genes from genomes and metagenomes, combining similarity searches with genomic neighborhood analysis. Application of our approach to bacterial genomes and human microbiome datasets allowed us to significantly expand the collection of cas genes: the sequence space of the Cas9 family, the key player in the recently engineered RNA-guided platforms for genome editing in eukaryotes, is expanded by at least two-fold with metagenomic datasets. We found genes in cas loci encoding other functions, for example, toxins and antitoxins, confirming the recently discovered potential of coupling between adaptive immunity and the dormancy/suicide systems. We further identified 24 novel Cas families; one novel family contains 20 proteins, all identified from the human microbiome datasets, illustrating the importance of metagenomics projects in expanding the diversity of cas genes. PMID:24319142

  19. Tube Fragmentation of Multiple Materials

    SciTech Connect

    Thornhill, T. F.; Chhabildas, L. C.; Vogler, T. J.

    2006-07-28

    In the current study we are developing an experimental fracture material property test method specific to dynamic fragmentation. This test method allows the study of fracture fragmentation in a reproducible laboratory environment under well-controlled loading conditions. Motion and fragmentation of the specimen are diagnosed using framing camera, VISAR and soft recovery methods. Fragmentation properties of several steels, nitinol, tungsten alloy, copper, aluminum, and titanium have been obtained to date. The values for fragmentation toughness, and failure threshold will be reported, as well as effects in these values as the material strain-rate is varied through changes in wall thickness and impact conditions.

  20. Tube fragmentation of multiple materials.

    SciTech Connect

    Thornhill, Tom Finley, III; Vogler, Tracy John; Chhabildas, Lalit Chandra

    2003-07-01

    In the current study we are developing an experimental fracture material property test method specific to dynamic fragmentation. This test method allows the study of fracture fragmentation in a reproducible laboratory environment under well-controlled loading conditions. Motion and fragmentation of the specimen are diagnosed using framing camera, VISAR and soft recovery methods. Fragmentation properties of several steels, nitinol, tungsten alloy, copper, aluminum, and titanium have been obtained to date. The values for fragmentation toughness, and failure threshold will be reported, as well as effects in these values as the material strain-rate is varied through changes in wall thickness and impact conditions.

  1. New Scalings in Nuclear Fragmentation

    SciTech Connect

    Bonnet, E.; Bougault, R.; Galichet, E.; Gagnon-Moisan, F.; Guinet, D.; Lautesse, P.; Marini, P.; Parlog, M.

    2010-10-01

    Fragment partitions of fragmenting hot nuclei produced in central and semiperipheral collisions have been compared in the excitation energy region 4-10 MeV per nucleon where radial collective expansion takes place. It is shown that, for a given total excitation energy per nucleon, the amount of radial collective energy fixes the mean fragment multiplicity. It is also shown that, at a given total excitation energy per nucleon, the different properties of fragment partitions are completely determined by the reduced fragment multiplicity (i.e., normalized to the source size). Freeze-out volumes seem to play a role in the scalings observed.

  2. Classification and knowledge

    NASA Technical Reports Server (NTRS)

    Kurtz, Michael J.

    1989-01-01

    Automated procedures to classify objects are discussed. The classification problem is reviewed, and the relation of epistemology and classification is considered. The classification of stellar spectra and of resolved images of galaxies is addressed.

  3. Remote Sensing Information Classification

    NASA Technical Reports Server (NTRS)

    Rickman, Douglas L.

    2008-01-01

    This viewgraph presentation reviews the classification of Remote Sensing data in relation to epidemiology. Classification is a way to reduce the dimensionality and precision to something a human can understand. Classification changes SCALAR data into NOMINAL data.

  4. Characterization of the Gut Microbiome Using 16S or Shotgun Metagenomics.

    PubMed

    Jovel, Juan; Patterson, Jordan; Wang, Weiwei; Hotte, Naomi; O'Keefe, Sandra; Mitchel, Troy; Perry, Troy; Kao, Dina; Mason, Andrew L; Madsen, Karen L; Wong, Gane K-S

    2016-01-01

    The advent of next generation sequencing (NGS) has enabled investigations of the gut microbiome with unprecedented resolution and throughput. This has stimulated the development of sophisticated bioinformatics tools to analyze the massive amounts of data generated. Researchers therefore need a clear understanding of the key concepts required for the design, execution and interpretation of NGS experiments on microbiomes. We conducted a literature review and used our own data to determine which approaches work best. The two main approaches for analyzing the microbiome, 16S ribosomal RNA (rRNA) gene amplicons and shotgun metagenomics, are illustrated with analyses of libraries designed to highlight their strengths and weaknesses. Several methods for taxonomic classification of bacterial sequences are discussed. We present simulations to assess the number of sequences that are required to perform reliable appraisals of bacterial community structure. To the extent that fluctuations in the diversity of gut bacterial populations correlate with health and disease, we emphasize various techniques for the analysis of bacterial communities within samples (α-diversity) and between samples (β-diversity). Finally, we demonstrate techniques to infer the metabolic capabilities of a bacteria community from these 16S and shotgun data. PMID:27148170

  5. VIP: an integrated pipeline for metagenomics of virus identification and discovery

    PubMed Central

    Li, Yang; Wang, Hao; Nie, Kai; Zhang, Chen; Zhang, Yi; Wang, Ji; Niu, Peihua; Ma, Xuejun

    2016-01-01

    Identification and discovery of viruses using next-generation sequencing technology is a fast-developing area with potential wide application in clinical diagnostics, public health monitoring and novel virus discovery. However, tremendous sequence data from NGS study has posed great challenge both in accuracy and velocity for application of NGS study. Here we describe VIP (“Virus Identification Pipeline”), a one-touch computational pipeline for virus identification and discovery from metagenomic NGS data. VIP performs the following steps to achieve its goal: (i) map and filter out background-related reads, (ii) extensive classification of reads on the basis of nucleotide and remote amino acid homology, (iii) multiple k-mer based de novo assembly and phylogenetic analysis to provide evolutionary insight. We validated the feasibility and veracity of this pipeline with sequencing results of various types of clinical samples and public datasets. VIP has also contributed to timely virus diagnosis (~10 min) in acutely ill patients, demonstrating its potential in the performance of unbiased NGS-based clinical studies with demand of short turnaround time. VIP is released under GPLv3 and is available for free download at: https://github.com/keylabivdc/VIP. PMID:27026381

  6. VIP: an integrated pipeline for metagenomics of virus identification and discovery.

    PubMed

    Li, Yang; Wang, Hao; Nie, Kai; Zhang, Chen; Zhang, Yi; Wang, Ji; Niu, Peihua; Ma, Xuejun

    2016-01-01

    Identification and discovery of viruses using next-generation sequencing technology is a fast-developing area with potential wide application in clinical diagnostics, public health monitoring and novel virus discovery. However, tremendous sequence data from NGS study has posed great challenge both in accuracy and velocity for application of NGS study. Here we describe VIP ("Virus Identification Pipeline"), a one-touch computational pipeline for virus identification and discovery from metagenomic NGS data. VIP performs the following steps to achieve its goal: (i) map and filter out background-related reads, (ii) extensive classification of reads on the basis of nucleotide and remote amino acid homology, (iii) multiple k-mer based de novo assembly and phylogenetic analysis to provide evolutionary insight. We validated the feasibility and veracity of this pipeline with sequencing results of various types of clinical samples and public datasets. VIP has also contributed to timely virus diagnosis (~10 min) in acutely ill patients, demonstrating its potential in the performance of unbiased NGS-based clinical studies with demand of short turnaround time. VIP is released under GPLv3 and is available for free download at: https://github.com/keylabivdc/VIP. PMID:27026381

  7. Characterization of the Gut Microbiome Using 16S or Shotgun Metagenomics

    PubMed Central

    Jovel, Juan; Patterson, Jordan; Wang, Weiwei; Hotte, Naomi; O'Keefe, Sandra; Mitchel, Troy; Perry, Troy; Kao, Dina; Mason, Andrew L.; Madsen, Karen L.; Wong, Gane K.-S.

    2016-01-01

    The advent of next generation sequencing (NGS) has enabled investigations of the gut microbiome with unprecedented resolution and throughput. This has stimulated the development of sophisticated bioinformatics tools to analyze the massive amounts of data generated. Researchers therefore need a clear understanding of the key concepts required for the design, execution and interpretation of NGS experiments on microbiomes. We conducted a literature review and used our own data to determine which approaches work best. The two main approaches for analyzing the microbiome, 16S ribosomal RNA (rRNA) gene amplicons and shotgun metagenomics, are illustrated with analyses of libraries designed to highlight their strengths and weaknesses. Several methods for taxonomic classification of bacterial sequences are discussed. We present simulations to assess the number of sequences that are required to perform reliable appraisals of bacterial community structure. To the extent that fluctuations in the diversity of gut bacterial populations correlate with health and disease, we emphasize various techniques for the analysis of bacterial communities within samples (α-diversity) and between samples (β-diversity). Finally, we demonstrate techniques to infer the metabolic capabilities of a bacteria community from these 16S and shotgun data. PMID:27148170

  8. IMG/M: A data management and analysis system for metagenomes

    SciTech Connect

    Markowitz, Victor M.; Ivanova, Natalia N.; Szeto, Ernest; Palaniappan, Krishna; Chu, Ken; Dalevi, Daniel; Chen, I-Min A.; Grechkin,Yuri; Dubchak,Inna; Anderson, Iain; Lykidis, Athanasios; Mavromatis,Konstantinos; Hug enholtz, Phil; Kyrpides, Nikos C.

    2007-08-01

    IMG/M is a data management and analysis system for microbial community genomes (metagenomes) hosted at the Joint Genome Institute (JGI). IMG/M consists of metagenome data integrated with isolate microbial genomes from the Integrated Microbial Genomes (IMG) system. IMG/M provides IMG's comparative data analysis tools extended to handle metagenome data, together with metagenome-specific analysis tools. IMG/M is available at http://img.jgi.doe.gov/m. Studies of the collective genomes (also known as metagenomes) of environmental microbial communities (also known as microbiomes) are expected to lead to advances in environmental cleanup, agriculture, industrial processes, alternative energy production, and human health (1). Metagenomes of specific microbiome samples are sequenced by organizations worldwide, such as the Department of Energy's (DOE) Joint Genome Institute (JGI), the Venter Institute and the Washington University in St. Louis using different sequencing strategies, technology platforms, and annotation procedures. According to the Genomes OnLine Database, about 28 metagenome studies have been published to date, with over 60 other projects ongoing and more in the process of being launched (2). The Department of Energy's (DOE) Joint Genome Institute (JGI) is one of the major contributors of metagenome sequence data, currently sequencing more than 50% of the reported metagenome projects worldwide. Due to the higher complexity, inherent incompleteness, and lower quality of metagenome sequence data, traditional assembly, gene prediction, and annotation methods do not perform on these datasets as well as they do on isolate microbial genome sequences (3, 4). In spite of these limitations, metagenome data are amenable to a variety of analyses, as illustrated by several recent studies (5-10). Metagenome data analysis is usually set up in the context of reference isolate genomes and considers the questions of composition and functional or metabolic potential of

  9. Scaling behavior of fragment shapes.

    PubMed

    Kun, F; Wittel, F K; Herrmann, H J; Kröplin, B H; Måløy, K J

    2006-01-20

    We present an experimental and theoretical study of the shape of fragments generated by explosive and impact loading of closed shells. Based on high speed imaging, we have determined the fragmentation mechanism of shells. Experiments have shown that the fragments vary from completely isotropic to highly anisotropic elongated shapes, depending on the microscopic cracking mechanism of the shell. Anisotropic fragments proved to have a self-affine character described by a scaling exponent. The distribution of fragment shapes exhibits a power-law decay. The robustness of the scaling laws is illustrated by a stochastic hierarchical model of fragmentation. Our results provide a possible improvement of the representation of fragment shapes in models of space debris. PMID:16486594

  10. Fragment oriented molecular shapes.

    PubMed

    Hain, Ethan; Camacho, Carlos J; Koes, David Ryan

    2016-05-01

    Molecular shape is an important concept in drug design and virtual screening. Shape similarity typically uses either alignment methods, which dynamically optimize molecular poses with respect to the query molecular shape, or feature vector methods, which are computationally less demanding but less accurate. The computational cost of alignment can be reduced by pre-aligning shapes, as is done with the Volumetric-Aligned Molecular Shapes (VAMS) method. Here, we introduce and evaluate fragment oriented molecular shapes (FOMS), where shapes are aligned based on molecular fragments. FOMS enables the use of shape constraints, a novel method for precisely specifying molecular shape queries that provides the ability to perform partial shape matching and supports search algorithms that function on an interactive time scale. When evaluated using the challenging Maximum Unbiased Validation dataset, shape constraints were able to extract significantly enriched subsets of compounds for the majority of targets, and FOMS matched or exceeded the performance of both VAMS and an optimizing alignment method of shape similarity search. PMID:27085751

  11. FAMeS: Fidelity of Analysis of Metagenomic Samples

    DOE Data Explorer

    Metagenomics is a rapidly emerging field of research for studying microbial communities. To evaluate methods currently used to process metagenomic sequences, simulated datasets of varying complexity were constructed by combining sequencing reads randomly selected from 113 isolate genomes. These datasets were designed to model real metagenomes in terms of complexity and phylogenetic composition. Assembly, gene prediction and binning, employing methods commonly used for the analysis of metagenomic datasets at the DOE JGI, were performed. This site provides access to the simulated datasets, and aims to facilitate standardized benchmarking of tools for metagenomic analysis. FAMeS now hosts data coming from a comprehensive study of methodologies used to create OTUs from 16S rRNA targeted studies of microbial communities. Studies of phylogenetic markers at the molecular level have revealed a vast biodiversity of microorganisms living in the sea, land, and even within the human body. Microbial diversity studies of uncharacterized environments typically seek to estimate the richness and diversity of endemic microflora using a 16S rRNA gene sequencing approach. When most of the species in an environment are unknown and cannot be classified through a database search, researchers cluster 16S sequences into operational taxonomic units (OTUs) or phylotypes, thereby providing an estimate of population structure. Using real 16S sequence data, we have performed a critical analysis of OTU clustering methodologies to assess the potential variability in OTU quality. FAMeS provides the sequence data, taxonomic information, multiple sequence alignments, and distance matrices used and described in the core paper, as well as compiled results of more than 700 unique OTU methods. [The above was copied from the FAMeS home page at http://fames.jgi-psf.org/] The core paper behind FAMeS is: Konstantinos Mavromatis, Natalia Ivanova, Kerrie Barry, Harris Shapiro, Eugene Goltsman, Alice C Mc

  12. Exploring antibiotic resistance genes and metal resistance genes in plasmid metagenomes from wastewater treatment plants.

    PubMed

    Li, An-Dong; Li, Li-Guan; Zhang, Tong

    2015-01-01

    Plasmids operate as independent genetic elements in microorganism communities. Through horizontal gene transfer (HGT), they can provide their host microorganisms with important functions such as antibiotic resistance and heavy metal resistance. In this study, six metagenomic libraries were constructed with plasmid DNA extracted from influent, activated sludge (AS) and digested sludge (DS) of two wastewater treatment plants (WWTPs). Compared with the metagenomes of the total DNA extracted from the same sectors of the wastewater treatment plant, the plasmid metagenomes had significantly higher annotation rates, indicating that the functional genes on plasmids are commonly shared by those studied microorganisms. Meanwhile, the plasmid metagenomes also encoded many more genes related to defense mechanisms, including ARGs. Searching against an antibiotic resistance genes (ARGs) database and a metal resistance genes (MRGs) database revealed a broad-spectrum of antibiotic (323 out of a total 618 subtypes) and MRGs (23 out of a total 23 types) on these plasmid metagenomes. The influent plasmid metagenomes contained many more resistance genes (both ARGs and MRGs) than the AS and the DS metagenomes. Sixteen novel plasmids with a complete circular structure that carried these resistance genes were assembled from the plasmid metagenomes. The results of this study demonstrated that the plasmids in WWTPs could be important reservoirs for resistance genes, and may play a significant role in the horizontal transfer of these genes. PMID:26441947

  13. Exploring antibiotic resistance genes and metal resistance genes in plasmid metagenomes from wastewater treatment plants

    PubMed Central

    Li, An-Dong; Li, Li-Guan; Zhang, Tong

    2015-01-01

    Plasmids operate as independent genetic elements in microorganism communities. Through horizontal gene transfer (HGT), they can provide their host microorganisms with important functions such as antibiotic resistance and heavy metal resistance. In this study, six metagenomic libraries were constructed with plasmid DNA extracted from influent, activated sludge (AS) and digested sludge (DS) of two wastewater treatment plants (WWTPs). Compared with the metagenomes of the total DNA extracted from the same sectors of the wastewater treatment plant, the plasmid metagenomes had significantly higher annotation rates, indicating that the functional genes on plasmids are commonly shared by those studied microorganisms. Meanwhile, the plasmid metagenomes also encoded many more genes related to defense mechanisms, including ARGs. Searching against an antibiotic resistance genes (ARGs) database and a metal resistance genes (MRGs) database revealed a broad-spectrum of antibiotic (323 out of a total 618 subtypes) and MRGs (23 out of a total 23 types) on these plasmid metagenomes. The influent plasmid metagenomes contained many more resistance genes (both ARGs and MRGs) than the AS and the DS metagenomes. Sixteen novel plasmids with a complete circular structure that carried these resistance genes were assembled from the plasmid metagenomes. The results of this study demonstrated that the plasmids in WWTPs could be important reservoirs for resistance genes, and may play a significant role in the horizontal transfer of these genes. PMID:26441947

  14. Bead-beating artefacts in the Bacteroidetes to Firmicutes ratio of the human stool metagenome.

    PubMed

    Vebø, Heidi C; Karlsson, Magdalena Kauczynska; Avershina, Ekaterina; Finnby, Lene; Rudi, Knut

    2016-10-01

    We evaluated bead-beating cell-lysis in analysing the human stool metagenome, since this is a key step. We observed that two different bead-beating instruments from the same producer gave a three-fold difference in the Bacteroidetes to Firmicutes ratio. This illustrates that bead-beating can have a major impact on downstream metagenome analyses. PMID:27498349

  15. A metagenomic snapshot of taxonomic and functional diversity in an alpine glacier cryoconite ecosystem

    NASA Astrophysics Data System (ADS)

    Edwards, Arwyn; Pachebat, Justin A.; Swain, Martin; Hegarty, Matt; Hodson, Andrew J.; Irvine-Fynn, Tristram D. L.; Rassner, Sara M. E.; Sattler, Birgit

    2013-09-01

    Cryoconite is a microbe-mineral aggregate which darkens the ice surface of glaciers. Microbial process and marker gene PCR-dependent measurements reveal active and diverse cryoconite microbial communities on polar glaciers. Here, we provide the first report of a cryoconite metagenome and culture-independent study of alpine cryoconite microbial diversity. We assembled 1.2 Gbp of metagenomic DNA sequenced using an Illumina HiScanSQ from cryoconite holes across the ablation zone of Rotmoosferner in the Austrian Alps. The metagenome revealed a bacterially-dominated community, with Proteobacteria (62% of bacterial-assigned contigs) and Bacteroidetes (14%) considerably more abundant than Cyanobacteria (2.5%). Streptophyte DNA dominated the eukaryotic metagenome. Functional genes linked to N, Fe, S and P cycling illustrated an acquisitive trend and a nitrogen cycle based upon efficient ammonia recycling. A comparison of 32 metagenome datasets revealed a similarity in functional profiles between the cryoconite and metagenomes characterized from other cold microbe-mineral aggregates. Overall, the metagenomic snapshot reveals the cryoconite ecosystem of this alpine glacier as dependent on scavenging carbon and nutrients from allochthonous sources, in particular mosses transported by wind from ice-marginal habitats, consistent with net heterotrophy indicated by productivity measurements. A transition from singular snapshots of cryoconite metagenomes to comparative analyses is advocated.

  16. Improved metagenome screening efficiency by random insertion of T7 promoters.

    PubMed

    Kim, Yu Jung; Kim, Haseong; Kim, Seo Hyeon; Rha, Eugene; Choi, Su-Lim; Yeom, Soo-Jin; Kim, Hak-Sung; Lee, Seung-Goo

    2016-07-20

    Metagenomes constitute a major source for the identification of novel enzymes for industrial applications. However, current functional screening methods are hindered by the limited transcription efficiency of foreign metagenomic genes. To overcome this constraint, we introduced the 'Enforced Transcription' technique, which involves the random insertion of the bi-directional T7 promoter into a metagenomic fosmid library. Then the effect of enforced transcription was quantitatively assessed by screening for metagenomic lipolytic genes encoding enzymes whose catalytic activity forms halos on tributyrin agar plates. The metagenomic library containing the enforced transcription system yielded a significantly increased number of screening hits with lipolytic activity compared to the library without random T7 promoter insertions. Additional sequence analysis revealed that the hits from the enforced transcription library had greater genetic diversity than those from the original metagenome library. Enhancing heterologous expression using the T7 promoter should enable the identification of greater numbers of diverse novel biocatalysts from the metagenome than possible using conventional metagenome screening approaches. PMID:27239964

  17. Structure based function prediction of proteins using fragment library frequency vectors

    PubMed Central

    Yadav, Akshay; Jayaraman, Valadi Krishnamoorthy

    2012-01-01

    The function of the protein is primarily dictated by its structure. Therefore it is far more logical to find the functional clues of the protein in its overall 3-dimensional fold or its global structure. In this paper, we have developed a novel Support Vector Machines (SVM) based prediction model for functional classification and prediction of proteins using features extracted from its global structure based on fragment libraries. Fragment libraries have been previously used for abintio modelling of proteins and protein structure comparisons. The query protein structure is broken down into a collection of short contiguous backbone fragments and this collection is discretized using a library of fragments. The input feature vector is frequency vector that counts the number of each library fragment in the collection of fragments by all-to-all fragment comparisons. SVM models were trained and optimised for obtaining the best 10-fold Cross validation accuracy for classification. As an example, this method was applied for prediction and classification of Cell Adhesion molecules (CAMs). Thirty-four different fragment libraries with sizes ranging from 4 to 400 and fragment lengths ranging from 4 to 12 were used for obtaining the best prediction model. The best 10-fold CV accuracy of 95.25% was obtained for library of 400 fragments of length 10. An accuracy of 87.5% was obtained on an unseen test dataset consisting of 20 CAMs and 20 NonCAMs. This shows that protein structure can be accurately and uniquely described using 400 representative fragments of length 10. PMID:23144557

  18. Metagenomic Insights into the Uncultured Diversity and Physiology of Microbes in Four Hypersaline Soda Lake Brines.

    PubMed

    Vavourakis, Charlotte D; Ghai, Rohit; Rodriguez-Valera, Francisco; Sorokin, Dimitry Y; Tringe, Susannah G; Hugenholtz, Philip; Muyzer, Gerard

    2016-01-01

    Soda lakes are salt lakes with a naturally alkaline pH due to evaporative concentration of sodium carbonates in the absence of major divalent cations. Hypersaline soda brines harbor microbial communities with a high species- and strain-level archaeal diversity and a large proportion of still uncultured poly-extremophiles compared to neutral brines of similar salinities. We present the first "metagenomic snapshots" of microbial communities thriving in the brines of four shallow soda lakes from the Kulunda Steppe (Altai, Russia) covering a salinity range from 170 to 400 g/L. Both amplicon sequencing of 16S rRNA fragments and direct metagenomic sequencing showed that the top-level taxa abundance was linked to the ambient salinity: Bacteroidetes, Alpha-, and Gamma-proteobacteria were dominant below a salinity of 250 g/L, Euryarchaeota at higher salinities. Within these taxa, amplicon sequences related to Halorubrum, Natrinema, Gracilimonas, purple non-sulfur bacteria (Rhizobiales, Rhodobacter, and Rhodobaca) and chemolithotrophic sulfur oxidizers (Thioalkalivibrio) were highly abundant. Twenty-four draft population genomes from novel members and ecotypes within the Nanohaloarchaea, Halobacteria, and Bacteroidetes were reconstructed to explore their metabolic features, environmental abundance and strategies for osmotic adaptation. The Halobacteria- and Bacteroidetes-related draft genomes belong to putative aerobic heterotrophs, likely with the capacity to ferment sugars in the absence of oxygen. Members from both taxonomic groups are likely involved in primary organic carbon degradation, since some of the reconstructed genomes encode the ability to hydrolyze recalcitrant substrates, such as cellulose and chitin. Putative sodium-pumping rhodopsins were found in both a Flavobacteriaceae- and a Chitinophagaceae-related draft genome. The predicted proteomes of both the latter and a Rhodothermaceae-related draft genome were indicative of a "salt-in" strategy of osmotic

  19. Metagenomic Insights into the Uncultured Diversity and Physiology of Microbes in Four Hypersaline Soda Lake Brines

    PubMed Central

    Vavourakis, Charlotte D.; Ghai, Rohit; Rodriguez-Valera, Francisco; Sorokin, Dimitry Y.; Tringe, Susannah G.; Hugenholtz, Philip; Muyzer, Gerard

    2016-01-01

    Soda lakes are salt lakes with a naturally alkaline pH due to evaporative concentration of sodium carbonates in the absence of major divalent cations. Hypersaline soda brines harbor microbial communities with a high species- and strain-level archaeal diversity and a large proportion of still uncultured poly-extremophiles compared to neutral brines of similar salinities. We present the first “metagenomic snapshots” of microbial communities thriving in the brines of four shallow soda lakes from the Kulunda Steppe (Altai, Russia) covering a salinity range from 170 to 400 g/L. Both amplicon sequencing of 16S rRNA fragments and direct metagenomic sequencing showed that the top-level taxa abundance was linked to the ambient salinity: Bacteroidetes, Alpha-, and Gamma-proteobacteria were dominant below a salinity of 250 g/L, Euryarchaeota at higher salinities. Within these taxa, amplicon sequences related to Halorubrum, Natrinema, Gracilimonas, purple non-sulfur bacteria (Rhizobiales, Rhodobacter, and Rhodobaca) and chemolithotrophic sulfur oxidizers (Thioalkalivibrio) were highly abundant. Twenty-four draft population genomes from novel members and ecotypes within the Nanohaloarchaea, Halobacteria, and Bacteroidetes were reconstructed to explore their metabolic features, environmental abundance and strategies for osmotic adaptation. The Halobacteria- and Bacteroidetes-related draft genomes belong to putative aerobic heterotrophs, likely with the capacity to ferment sugars in the absence of oxygen. Members from both taxonomic groups are likely involved in primary organic carbon degradation, since some of the reconstructed genomes encode the ability to hydrolyze recalcitrant substrates, such as cellulose and chitin. Putative sodium-pumping rhodopsins were found in both a Flavobacteriaceae- and a Chitinophagaceae-related draft genome. The predicted proteomes of both the latter and a Rhodothermaceae-related draft genome were indicative of a “salt-in” strategy of

  20. Woods: A fast and accurate functional annotator and classifier of genomic and metagenomic sequences.

    PubMed

    Sharma, Ashok K; Gupta, Ankit; Kumar, Sanjiv; Dhakan, Darshan B; Sharma, Vineet K

    2015-07-01

    Functional annotation of the gigantic metagenomic data is one of the major time-consuming and computationally demanding tasks, which is currently a bottleneck for the efficient analysis. The commonly used homology-based methods to functionally annotate and classify proteins are extremely slow. Therefore, to achieve faster and accurate functional annotation, we have developed an orthology-based functional classifier 'Woods' by using a combination of machine learning and similarity-based approaches. Woods displayed a precision of 98.79% on independent genomic dataset, 96.66% on simulated metagenomic dataset and >97% on two real metagenomic datasets. In addition, it performed >87 times faster than BLAST on the two real metagenomic datasets. Woods can be used as a highly efficient and accurate classifier with high-throughput capability which facilitates its usability on large metagenomic datasets. PMID:25863333

  1. Metagenomic exploration of the bacterial community structure at Paradip Port, Odisha, India.

    PubMed

    Pramanik, Arnab; Basak, Pijush; Banerjee, Satabdi; Sengupta, Sanghamitra; Chattopadhyay, Dhrubajyoti; Bhattacharyya, Maitree

    2016-03-01

    This is a pioneering report on the metagenomic exploration of the bacterial diversity from a busy sea port in Paradip, Odisha, India. In our study, high-throughput sequencing of community 16S rRNA gene amplicon was performed using 454 GS Junior platform. Metagenome contain 34,121 sequences with 16,677,333 bp and 56.3% G + C content. Metagenome sequences data are now available at NCBI under the Sequence Read Archive (SRA) database with accession no. SRX897055. Community metagenome sequence revealed the presence of 11,705 species belonging to 40 different phyla. Bacteroidetes (23%), Firmicutes (19%), Proteobacteria (17%), Spirochaetes (10%), Nitrospirae (8%), Actinobacteria (7%) and Acidobacteria (3%) are the predominant bacterial phyla in this port soil. Analysis of metagenomic sequences unfolded the interesting distribution of several phyla which pointed to the significant anthropogenic intervention influencing the bacterial community character of this port. PMID:26981374

  2. Heterologous viral expression systems in fosmid vectors increase the functional analysis potential of metagenomic libraries

    PubMed Central

    Terrón-González, L.; Medina, C.; Limón-Mortés, M. C.; Santero, E.

    2013-01-01

    The extraordinary potential of metagenomic functional analyses to identify activities of interest present in uncultured microorganisms has been limited by reduced gene expression in surrogate hosts. We have developed vectors and specialized E. coli strains as improved metagenomic DNA heterologous expression systems, taking advantage of viral components that prevent transcription termination at metagenomic terminators. One of the systems uses the phage T7 RNA-polymerase to drive metagenomic gene expression, while the other approach uses the lambda phage transcription anti-termination protein N to limit transcription termination. A metagenomic library was constructed and functionally screened to identify genes conferring carbenicillin resistance to E. coli. The use of these enhanced expression systems resulted in a 6-fold increase in the frequency of carbenicillin resistant clones. Subcloning and sequence analysis showed that, besides β-lactamases, efflux pumps are not only able contribute to carbenicillin resistance but may in fact be sufficient by themselves to convey carbenicillin resistance. PMID:23346364

  3. Current opportunities and challenges in microbial metagenome analysis—a bioinformatic perspective

    PubMed Central

    Teeling, Hanno

    2012-01-01

    Metagenomics has become an indispensable tool for studying the diversity and metabolic potential of environmental microbes, whose bulk is as yet non-cultivable. Continual progress in next-generation sequencing allows for generating increasingly large metagenomes and studying multiple metagenomes over time or space. Recently, a new type of holistic ecosystem study has emerged that seeks to combine metagenomics with biodiversity, meta-expression and contextual data. Such ‘ecosystems biology’ approaches bear the potential to not only advance our understanding of environmental microbes to a new level but also impose challenges due to increasing data complexities, in particular with respect to bioinformatic post-processing. This mini review aims to address selected opportunities and challenges of modern metagenomics from a bioinformatics perspective and hopefully will serve as a useful resource for microbial ecologists and bioinformaticians alike. PMID:22966151

  4. Metagenomic exploration of the bacterial community structure at Paradip Port, Odisha, India

    PubMed Central

    Pramanik, Arnab; Basak, Pijush; Banerjee, Satabdi; Sengupta, Sanghamitra; Chattopadhyay, Dhrubajyoti; Bhattacharyya, Maitree

    2015-01-01

    This is a pioneering report on the metagenomic exploration of the bacterial diversity from a busy sea port in Paradip, Odisha, India. In our study, high-throughput sequencing of community 16S rRNA gene amplicon was performed using 454 GS Junior platform. Metagenome contain 34,121 sequences with 16,677,333 bp and 56.3% G + C content. Metagenome sequences data are now available at NCBI under the Sequence Read Archive (SRA) database with accession no. SRX897055. Community metagenome sequence revealed the presence of 11,705 species belonging to 40 different phyla. Bacteroidetes (23%), Firmicutes (19%), Proteobacteria (17%), Spirochaetes (10%), Nitrospirae (8%), Actinobacteria (7%) and Acidobacteria (3%) are the predominant bacterial phyla in this port soil. Analysis of metagenomic sequences unfolded the interesting distribution of several phyla which pointed to the significant anthropogenic intervention influencing the bacterial community character of this port. PMID:26981374

  5. Introduction to Metagenomics at DOE JGI: Program Overview and Program Informatics (Metagenomics Informatics Challenges Workshop: 10K Genomes at a Time)

    SciTech Connect

    Tringe, Susannah

    2011-10-12

    Susannah Tringe of the DOE Joint Genome Institute talks about the Program Overview and Program Informatics at the Metagenomics Informatics Challenges Workshop held at the DOE JGI on October 12-13, 2011

  6. Introduction to Metagenomics at DOE JGI: Program Overview and Program Informatics (Metagenomics Informatics Challenges Workshop: 10K Genomes at a Time)

    ScienceCinema

    Tringe, Susannah [DOE JGI

    2013-01-22

    Susannah Tringe of the DOE Joint Genome Institute talks about the Program Overview and Program Informatics at the Metagenomics Informatics Challenges Workshop held at the DOE JGI on October 12-13, 2011

  7. Chapter 4 embedded metal fragments.

    PubMed

    Kalinich, John F; Vane, Elizabeth A; Centeno, Jose A; Gaitens, Joanna M; Squibb, Katherine S; McDiarmid, Melissa A; Kasper, Christine E

    2014-01-01

    The continued evolution of military munitions and armor on the battlefield, as well as the insurgent use of improvised explosive devices, has led to embedded fragment wounds containing metal and metal mixtures whose long-term toxicologic and carcinogenic properties are not as yet known. Advances in medical care have greatly increased the survival from these types of injuries. Standard surgical guidelines suggest leaving embedded fragments in place, thus individuals may carry these retained metal fragments for the rest of their lives. Nursing professionals will be at the forefront in caring for these wounded individuals, both immediately after the trauma and during the healing and rehabilitation process. Therefore, an understanding of the potential health effects of embedded metal fragment wounds is essential. This review will explore the history of embedded fragment wounds, current research in the field, and Department of Defense and Department of Veterans Affairs guidelines for the identification and long-term monitoring of individuals with embedded fragments. PMID:25222538

  8. MG-Digger: An Automated Pipeline to Search for Giant Virus-Related Sequences in Metagenomes.

    PubMed

    Verneau, Jonathan; Levasseur, Anthony; Raoult, Didier; La Scola, Bernard; Colson, Philippe

    2016-01-01

    The number of metagenomic studies conducted each year is growing dramatically. Storage and analysis of such big data is difficult and time-consuming. Interestingly, analysis shows that environmental and human metagenomes include a significant amount of non-annotated sequences, representing a 'dark matter.' We established a bioinformatics pipeline that automatically detects metagenome reads matching query sequences from a given set and applied this tool to the detection of sequences matching large and giant DNA viral members of the proposed order Megavirales or virophages. A total of 1,045 environmental and human metagenomes (≈ 1 Terabase) were collected, processed, and stored on our bioinformatics server. In addition, nucleotide and protein sequences from 93 Megavirales representatives, including 19 giant viruses of amoeba, and 5 virophages, were collected. The pipeline was generated by scripts written in Python language and entitled MG-Digger. Metagenomes previously found to contain megavirus-like sequences were tested as controls. MG-Digger was able to annotate 100s of metagenome sequences as best matching those of giant viruses. These sequences were most often found to be similar to phycodnavirus or mimivirus sequences, but included reads related to recently available pandoraviruses, Pithovirus sibericum, and faustoviruses. Compared to other tools, MG-Digger combined stand-alone use on Linux or Windows operating systems through a user-friendly interface, implementation of ready-to-use customized metagenome databases and query sequence databases, adjustable parameters for BLAST searches, and creation of output files containing selected reads with best match identification. Compared to Metavir 2, a reference tool in viral metagenome analysis, MG-Digger detected 8% more true positive Megavirales-related reads in a control metagenome. The present work shows that massive, automated and recurrent analyses of metagenomes are effective in improving knowledge about the

  9. MG-Digger: An Automated Pipeline to Search for Giant Virus-Related Sequences in Metagenomes

    PubMed Central

    Verneau, Jonathan; Levasseur, Anthony; Raoult, Didier; La Scola, Bernard; Colson, Philippe

    2016-01-01

    The number of metagenomic studies conducted each year is growing dramatically. Storage and analysis of such big data is difficult and time-consuming. Interestingly, analysis shows that environmental and human metagenomes include a significant amount of non-annotated sequences, representing a ‘dark matter.’ We established a bioinformatics pipeline that automatically detects metagenome reads matching query sequences from a given set and applied this tool to the detection of sequences matching large and giant DNA viral members of the proposed order Megavirales or virophages. A total of 1,045 environmental and human metagenomes (≈ 1 Terabase) were collected, processed, and stored on our bioinformatics server. In addition, nucleotide and protein sequences from 93 Megavirales representatives, including 19 giant viruses of amoeba, and 5 virophages, were collected. The pipeline was generated by scripts written in Python language and entitled MG-Digger. Metagenomes previously found to contain megavirus-like sequences were tested as controls. MG-Digger was able to annotate 100s of metagenome sequences as best matching those of giant viruses. These sequences were most often found to be similar to phycodnavirus or mimivirus sequences, but included reads related to recently available pandoraviruses, Pithovirus sibericum, and faustoviruses. Compared to other tools, MG-Digger combined stand-alone use on Linux or Windows operating systems through a user-friendly interface, implementation of ready-to-use customized metagenome databases and query sequence databases, adjustable parameters for BLAST searches, and creation of output files containing selected reads with best match identification. Compared to Metavir 2, a reference tool in viral metagenome analysis, MG-Digger detected 8% more true positive Megavirales-related reads in a control metagenome. The present work shows that massive, automated and recurrent analyses of metagenomes are effective in improving knowledge about

  10. Remote homology and the functions of metagenomic dark matter.

    PubMed

    Lobb, Briallen; Kurtz, Daniel A; Moreno-Hagelsieb, Gabriel; Doxey, Andrew C

    2015-01-01

    Predicted open reading frames (ORFs) that lack detectable homology to known proteins are termed ORFans. Despite their prevalence in metagenomes, the extent to which ORFans encode real proteins, the degree to which they can be annotated, and their functional contributions, remain unclear. To gain insights into these questions, we applied sensitive remote-homology detection methods to functionally analyze ORFans from soil, marine, and human gut metagenome collections. ORFans were identified, clustered into sequence families, and annotated through profile-profile comparison to proteins of known structure. We found that a considerable number of metagenomic ORFans (73,896 of 484,121, 15.3%) exhibit significant remote homology to structurally characterized proteins, providing a means for ORFan functional profiling. The extent of detected remote homology far exceeds that obtained for artificial protein families (1.4%). As expected for real genes, the predicted functions of ORFans are significantly similar to the functions of their gene neighbors (p < 0.001). Compared to the functional profiles predicted through standard homology searches, ORFans show biologically intriguing differences. Many ORFan-enriched functions are virus-related and tend to reflect biological processes associated with extreme sequence diversity. Each environment also possesses a large number of unique ORFan families and functions, including some known to play important community roles such as gut microbial polysaccharide digestion. Lastly, ORFans are a valuable resource for finding novel enzymes of interest, as we demonstrate through the identification of hundreds of novel ORFan metalloproteases that all possess a signature catalytic motif despite a general lack of similarity to known proteins. Our ORFan functional predictions are a valuable resource for discovering novel protein families and exploring the boundaries of protein sequence space. All remote homology predictions are available at http

  11. Functional Metagenomics of the Bronchial Microbiome in COPD.

    PubMed

    Millares, Laura; Pérez-Brocal, Vicente; Ferrari, Rafaela; Gallego, Miguel; Pomares, Xavier; García-Núñez, Marian; Montón, Concepción; Capilla, Silvia; Monsó, Eduard; Moya, Andrés

    2015-01-01

    The course of chronic obstructive pulmonary disease (COPD) is frequently aggravated by exacerbations, and changes in the composition and activity of the microbiome may be implicated in their appearance. The aim of this study was to analyse the composition and the gene content of the microbial community in bronchial secretions of COPD patients in both stability and exacerbation. Taxonomic data were obtained by 16S rRNA gene amplification and pyrosequencing, and metabolic information through shotgun metagenomics, using the Metagenomics RAST server (MG-RAST), and the PICRUSt (Phylogenetic Investigation of Communities by Reconstruction of Unobserved States) programme, which predict metagenomes from 16S data. Eight severe COPD patients provided good quality sputum samples, and no significant differences in the relative abundance of any phyla and genera were found between stability and exacerbation. Bacterial biodiversity (Chao1 and Shannon indexes) did not show statistical differences and beta-diversity analysis (Bray-Curtis dissimilarity index) showed a similar microbial composition in the two clinical situations. Four functional categories showed statistically significant differences with MG-RAST at KEGG level 2: in exacerbation, Cell growth and Death and Transport and Catabolism decreased in abundance [1.6 (0.2-2.3) vs 3.6 (3.3-6.9), p = 0.012; and 1.8 (0-3.3) vs 3.6 (1.8-5.1), p = 0.025 respectively], while Cancer and Carbohydrate Metabolism increased [0.8 (0-1.5) vs 0 (0-0.5), p = 0.043; and 7 (6.4-9) vs 5.9 (6.3-6.1), p = 0.012 respectively]. In conclusion, the bronchial microbiome as a whole is not significantly modified when exacerbation symptoms appear in severe COPD patients, but its functional metabolic capabilities show significant changes in several pathways. PMID:26632844

  12. Remote homology and the functions of metagenomic dark matter

    PubMed Central

    Lobb, Briallen; Kurtz, Daniel A.; Moreno-Hagelsieb, Gabriel; Doxey, Andrew C.

    2015-01-01

    Predicted open reading frames (ORFs) that lack detectable homology to known proteins are termed ORFans. Despite their prevalence in metagenomes, the extent to which ORFans encode real proteins, the degree to which they can be annotated, and their functional contributions, remain unclear. To gain insights into these questions, we applied sensitive remote-homology detection methods to functionally analyze ORFans from soil, marine, and human gut metagenome collections. ORFans were identified, clustered into sequence families, and annotated through profile-profile comparison to proteins of known structure. We found that a considerable number of metagenomic ORFans (73,896 of 484,121, 15.3%) exhibit significant remote homology to structurally characterized proteins, providing a means for ORFan functional profiling. The extent of detected remote homology far exceeds that obtained for artificial protein families (1.4%). As expected for real genes, the predicted functions of ORFans are significantly similar to the functions of their gene neighbors (p < 0.001). Compared to the functional profiles predicted through standard homology searches, ORFans show biologically intriguing differences. Many ORFan-enriched functions are virus-related and tend to reflect biological processes associated with extreme sequence diversity. Each environment also possesses a large number of unique ORFan families and functions, including some known to play important community roles such as gut microbial polysaccharide digestion. Lastly, ORFans are a valuable resource for finding novel enzymes of interest, as we demonstrate through the identification of hundreds of novel ORFan metalloproteases that all possess a signature catalytic motif despite a general lack of similarity to known proteins. Our ORFan functional predictions are a valuable resource for discovering novel protein families and exploring the boundaries of protein sequence space. All remote homology predictions are available at http

  13. Metagenomic insights into the fibrolytic microbiome in yak rumen.

    PubMed

    Dai, Xin; Zhu, Yaxin; Luo, Yingfeng; Song, Lei; Liu, Di; Liu, Li; Chen, Furong; Wang, Min; Li, Jiabao; Zeng, Xiaowei; Dong, Zhiyang; Hu, Songnian; Li, Lingyan; Xu, Jian; Huang, Li; Dong, Xiuzhu

    2012-01-01

    The rumen hosts one of the most efficient microbial systems for degrading plant cell walls, yet the predominant cellulolytic proteins and fibrolytic mechanism(s) remain elusive. Here we investigated the cellulolytic microbiome of the yak rumen by using a combination of metagenome-based and bacterial artificial chromosome (BAC)-based functional screening approaches. Totally 223 fibrolytic BAC clones were pyrosequenced and 10,070 ORFs were identified. Among them 150 were annotated as the glycoside hydrolase (GH) genes for fibrolytic proteins, and the majority (69%) of them were clustered or linked with genes encoding related functions. Among the 35 fibrolytic contigs of >10 Kb in length, 25 were derived from Bacteroidetes and four from Firmicutes. Coverage analysis indicated that the fibrolytic genes on most Bacteroidetes-contigs were abundantly represented in the metagenomic sequences, and they were frequently linked with genes encoding SusC/SusD-type outer-membrane proteins. GH5, GH9, and GH10 cellulase/hemicellulase genes were predominant, but no GH48 exocellulase gene was found. Most (85%) of the cellulase and hemicellulase proteins possessed a signal peptide; only a few carried carbohydrate-binding modules, and no cellulosomal domains were detected. These findings suggest that the SucC/SucD-involving mechanism, instead of one based on cellulosomes or the free-enzyme system, serves a major role in lignocellulose degradation in yak rumen. Genes encoding an endoglucanase of a novel GH5 subfamily occurred frequently in the metagenome, and the recombinant proteins encoded by the genes displayed moderate Avicelase in addition to endoglucanase activities, suggesting their important contribution to lignocellulose degradation in the exocellulase-scarce rumen. PMID:22808161

  14. Metagenomic Insights into the Fibrolytic Microbiome in Yak Rumen

    PubMed Central

    Song, Lei; Liu, Di; Liu, Li; Chen, Furong; Wang, Min; Li, Jiabao; Zeng, Xiaowei; Dong, Zhiyang; Hu, Songnian; Li, Lingyan; Xu, Jian; Huang, Li; Dong, Xiuzhu

    2012-01-01

    The rumen hosts one of the most efficient microbial systems for degrading plant cell walls, yet the predominant cellulolytic proteins and fibrolytic mechanism(s) remain elusive. Here we investigated the cellulolytic microbiome of the yak rumen by using a combination of metagenome-based and bacterial artificial chromosome (BAC)-based functional screening approaches. Totally 223 fibrolytic BAC clones were pyrosequenced and 10,070 ORFs were identified. Among them 150 were annotated as the glycoside hydrolase (GH) genes for fibrolytic proteins, and the majority (69%) of them were clustered or linked with genes encoding related functions. Among the 35 fibrolytic contigs of >10 Kb in length, 25 were derived from Bacteroidetes and four from Firmicutes. Coverage analysis indicated that the fibrolytic genes on most Bacteroidetes-contigs were abundantly represented in the metagenomic sequences, and they were frequently linked with genes encoding SusC/SusD-type outer-membrane proteins. GH5, GH9, and GH10 cellulase/hemicellulase genes were predominant, but no GH48 exocellulase gene was found. Most (85%) of the cellulase and hemicellulase proteins possessed a signal peptide; only a few carried carbohydrate-binding modules, and no cellulosomal domains were detected. These findings suggest that the SucC/SucD-involving mechanism, instead of one based on cellulosomes or the free-enzyme system, serves a major role in lignocellulose degradation in yak rumen. Genes encoding an endoglucanase of a novel GH5 subfamily occurred frequently in the metagenome, and the recombinant proteins encoded by the genes displayed moderate Avicelase in addition to endoglucanase activities, suggesting their important contribution to lignocellulose degradation in the exocellulase-scarce rumen. PMID:22808161

  15. Managing microbial communities for sequentially reconstruct genomes from complex metagenomes

    NASA Astrophysics Data System (ADS)

    Delmont, Tom O.; Vogel, Timothy M.; Simonet, Pascal

    2013-04-01

    Global understanding on environmental microbial communities is currently limited by the bottleneck of genome reconstruction. Soil is a typical example where individual cells are currently mostly uncultured and metagenomic datasets unassembled. In this study, the microbial community composition of a natural grassland soil was managed under several controlled selective pressures to experiment a "multi-evenness" stratagem for sequentially attempt to reconstruct genomes from a complex metagenome. While lowly represented in the natural community, several newly dominant genomes (an enrichment attaining 105 in some cases) were successfully reconstructed under various "harsh" tested conditions. These genomes belong to several genera including (but not restricted to) Leifsonia, Rhodanobacter, Bacillus, Ktedonobacter, Xanthomonas, Streptomyces and Burkholderia. So far, from 10 to 78% of generated metagenomic datasets were reconstructed, so providing access to more than 88 000 genes of known or unknown functions and to their genetic environment. Adaptative genes directly related to selective pressures were found, mostly in large plasmids. Functions of potential industrial interest (e.g., novel polyketide synthase modules in Streptomyces) were also discovered. Furthermore, an important phage infection snapshot (>1500X of coverage for the most represented phage) was observed among the Streptomyces population (three distinct genomes reconstructed) of a particular enrichment (mercury, 0.02g/kg) during the fourth month of incubation. This "divide and conquer" strategy could be applied to other environments and using auxiliary sequencing approaches like single cell to detect, connect and mine taxa and functions of interest while creating an extensive set of reference genomes from across the planet. Next limit could turn out to become our imagination defining novel selective pressures to sequentially make dominant the 1030 cells of the biosphere.

  16. Functional Metagenomics of the Bronchial Microbiome in COPD

    PubMed Central

    Millares, Laura; Pérez-Brocal, Vicente; Ferrari, Rafaela; Gallego, Miguel; Pomares, Xavier; García-Núñez, Marian; Montón, Concepción; Capilla, Silvia

    2015-01-01

    The course of chronic obstructive pulmonary disease (COPD) is frequently aggravated by exacerbations, and changes in the composition and activity of the microbiome may be implicated in their appearance. The aim of this study was to analyse the composition and the gene content of the microbial community in bronchial secretions of COPD patients in both stability and exacerbation. Taxonomic data were obtained by 16S rRNA gene amplification and pyrosequencing, and metabolic information through shotgun metagenomics, using the Metagenomics RAST server (MG-RAST), and the PICRUSt (Phylogenetic Investigation of Communities by Reconstruction of Unobserved States) programme, which predict metagenomes from 16S data. Eight severe COPD patients provided good quality sputum samples, and no significant differences in the relative abundance of any phyla and genera were found between stability and exacerbation. Bacterial biodiversity (Chao1 and Shannon indexes) did not show statistical differences and beta-diversity analysis (Bray-Curtis dissimilarity index) showed a similar microbial composition in the two clinical situations. Four functional categories showed statistically significant differences with MG-RAST at KEGG level 2: in exacerbation, Cell growth and Death and Transport and Catabolism decreased in abundance [1.6 (0.2–2.3) vs 3.6 (3.3–6.9), p = 0.012; and 1.8 (0–3.3) vs 3.6 (1.8–5.1), p = 0.025 respectively], while Cancer and Carbohydrate Metabolism increased [0.8 (0–1.5) vs 0 (0–0.5), p = 0.043; and 7 (6.4–9) vs 5.9 (6.3–6.1), p = 0.012 respectively]. In conclusion, the bronchial microbiome as a whole is not significantly modified when exacerbation symptoms appear in severe COPD patients, but its functional metabolic capabilities show significant changes in several pathways. PMID:26632844

  17. Diel Metagenomics and Metatranscriptomics of Elkhorn Slough Hypersaline Microbial Mat

    NASA Astrophysics Data System (ADS)

    Lee, J.; Detweiler, A. M.; Everroad, R. C.; Bebout, L. E.; Weber, P. K.; Pett-Ridge, J.; Bebout, B.

    2014-12-01

    To understand the variation in gene expression associated with the daytime oxygenic phototrophic and nighttime fermentation regimes seen in hypersaline microbial mats, a contiguous mat piece was subjected to sampling at regular intervals over a 24-hour diel period. Additionally, to understand the impact of sulfate reduction on biohydrogen consumption, molybdate was added to a parallel experiment in the same run. 4 metagenome and 12 metatranscriptome Illumina HiSeq lanes were completed over day / night, and control / molybdate experiments. Preliminary comparative examination of noon and midnight metatranscriptomic samples mapped using bowtie2 to reference genomes has revealed several notable results about the dominant mat-building cyanobacterium Microcoleus chthonoplastes PCC 7420. Dominant cyanobacterium M. chthonoplastes PCC 7420 shows expression in several pathways for nitrogen scavenging, including nitrogen fixation. Reads mapped to M. chthonoplastes PCC 7420 shows expression of two starch storage and utilization pathways, one as a starch-trehalose-maltose-glucose pathway, another through UDP-glucose-cellulose-β-1,4 glucan-glucose pathway. The overall trend of gene expression was primarily light driven up-regulation followed by down-regulation in dark, while much of the remaining expression profile appears to be constitutive. Co-assembly of quality-controlled reads from 4 metagenomes was performed using Ray Meta with progressively smaller K-mer sizes, with bins identified and filtered using principal component analysis of coverages from all libraries and a %GC filter, followed by reassembly of the remaining co-assembly reads and binned reads. Despite having relatively similar abundance profiles in each metagenome, this binning approach was able to distinctly resolve bins from dominant taxa, but also sulfate reducing bacteria that are desired for understanding molybdate inhibition. Bins generated from this iterative assembly process will be used for downstream

  18. LINKING LAND USE CHANGE, STREAM GEOMORPHOLOGY, AND AQUATIC BIODIVERSITY IN A HIERARCHICAL CLASSIFICATION SCHEME

    EPA Science Inventory

    Objective:

    We propose to develop and evaluate a watershed and stream reach classification system based on the relationship between land use change, river geomorphic condition, riparian habitat fragmentation, and riverine ecological condition. The goal is...

  19. Impact of metagenomic DNA extraction procedures on the identifiable endophytic bacterial diversity in Sorghum bicolor (L. Moench).

    PubMed

    Maropola, Mapula Kgomotso Annah; Ramond, Jean-Baptiste; Trindade, Marla

    2015-05-01

    Culture-independent studies rely on the quantity and quality of the extracted environmental metagenomic DNA (mDNA). To fully access the plant tissue microbiome, the extracted plant mDNA should allow optimal PCR applications and the genetic content must be representative of the total microbial diversity. In this study, we evaluated the endophytic bacterial diversity retrieved using different mDNA extraction procedures. Metagenomic DNA from sorghum (Sorghum bicolor L. Moench) stem and root tissues were extracted using two classical DNA extraction protocols (CTAB- and SDS-based) and five commercial kits. The mDNA yields and quality as well as the reproducibility were compared. 16S rRNA gene terminal restriction fragment length polymorphism (t-RFLP) was used to assess the impact on endophytic bacterial community structures observed. Generally, the classical protocols obtained high mDNA yields from sorghum tissues; however, they were less reproducible than the commercial kits. Commercial kits retrieved higher quality mDNA, but with lower endophytic bacterial diversities compared to classical protocols. The SDS-based protocol enabled access to the highest sorghum endophytic diversities. Therefore, "SDS-extracted" sorghum root and stem microbiome diversities were analysed via 454 pyrosequencing, and this revealed that the two tissues harbour significantly different endophytic communities. Nevertheless, both communities are dominated by agriculturally important genera such as Microbacterium, Agrobacterium, Sphingobacterium, Herbaspirillum, Erwinia, Pseudomonas and Stenotrophomonas; which have previously been shown to play a role in plant growth promotion. This study shows that DNA extraction protocols introduce biases in culture-independent studies of environmental microbial communities by influencing the mDNA quality, which impacts the microbial diversity analyses and evaluation. Using the broad-spectrum SDS-based DNA extraction protocol allows the recovery of the most

  20. Inter-conversion of catalytic abilities in a bifunctional carboxyl/feruloyl-esterase from earthworm gut metagenome.

    PubMed

    Vieites, José María; Ghazi, Azam; Beloqui, Ana; Polaina, Julio; Andreu, José M; Golyshina, Olga V; Nechitaylo, Taras Y; Waliczek, Agnes; Yakimov, Michail M; Golyshin, Peter N; Ferrer, Manuel

    2010-01-01

    Carboxyl esterases (CE) exhibit various reaction specificities despite of their overall structural similarity. In present study we have exploited functional metagenomics, saturation mutagenesis and experimental protein evolution to explore residues that have a significant role in substrate discrimination. We used an enzyme, designated 3A6, derived from the earthworm gut metagenome that exhibits CE and feruloyl esterase (FAE) activities with p-nitrophenyl and cinnamate esters, respectively, with a [(k(cat)/K(m))](CE)/[(k(cat)/K(m))](FAE) factor of 17. Modelling-guided saturation mutagenesis at specific hotspots (Lys(281), Asp(282), Asn(316) and Lys(317)) situated close to the catalytic core (Ser(143)/Asp(273)/His(305)) and a deletion of a 34-AA-long peptide fragment yielded mutants with the highest CE activity, while cinnamate ester bond hydrolysis was effectively abolished. Although, single to triple mutants with both improved activities (up to 180-fold in k(cat)/K(m) values) and enzymes with inverted specificity ((k(cat)/K(m))(CE)/(k(cat)/K(m))(FAE) ratio of ∼0.4) were identified, no CE inactive variant was found. Screening of a large error-prone PCR-generated library yielded by far less mutants for substrate discrimination. We also found that no significant changes in CE activation energy occurs after any mutation (7.3 to -5.6 J mol(-1)), whereas a direct correlation between loss/gain of FAE function and activation energies (from 33.05 to -13.7 J mol(-1)) was found. Results suggest that the FAE activity in 3A6 may have evolved via introduction of a limited number of 'hot spot' mutations in a common CE ancestor, which may retain the original hydrolytic activity due to lower restrictive energy barriers but conveys a dynamic energetically favourable switch of a second hydrolytic reaction. PMID:21255305

  1. Cloning and functional characterization of endo-β-1,4-glucanase gene from metagenomic library of vermicompost.

    PubMed

    Yasir, Muhammad; Khan, Haji; Azam, Syed Sikander; Telke, Amar; Kim, Seon Won; Chung, Young Ryun

    2013-06-01

    In the vermicomposting of paper mill sludge, the activity of earthworms is very dependent on dietetic polysaccharides including cellulose as energy sources. Most of these polymers are degraded by the host microbiota and considered potentially important source for cellulolytic enzymes. In the present study, a metagenomic library was constructed from vermicompost (VC) prepared with paper mill sludge and dairy sludge (fresh sludge, FS) and functionally screened for cellulolytic activities. Eighteen cellulase expressing clones were isolated from about 89,000 fosmid clones libraries. A short fragment library was constructed from the most active positive clone (cMGL504) and one open reading frame (ORF) of 1,092 bp encoding an endo-β-1,4-glucanase was indentified which showed 88% similarity with Cellvibrio mixtus cellulase A gene. The endo-β-1,4-glucanase cmgl504 gene was overexpressed in Escherichia coli. The purified recombinant cmgl504 cellulase displayed activities at a broad range of temperature (25-55°C) and pH (5.5-8.5). The enzyme degraded carboxymethyl cellulose (CMC) with 15.4 U, while having low activity against avicel. No detectable activity was found for xylan and laminarin. The enzyme activity was stimulated by potassium chloride. The deduced protein and three-dimensional structure of metagenome-derived cellulase cmgl504 possessed all features, including general architecture, signature motifs, and N-terminal signal peptide, followed by the catalytic domain of cellulase belonging to glycosyl hydrolase family 5 (GHF5). The cellulases cloned in this work may play important roles in the degradation of celluloses in vermicomposting process and could be exploited for industrial application in future. PMID:23812813

  2. Inter‐conversion of catalytic abilities in a bifunctional carboxyl/feruloyl‐esterase from earthworm gut metagenome

    PubMed Central

    Vieites, José María; Ghazi, Azam; Beloqui, Ana; Polaina, Julio; Andreu, José M.; Golyshina, Olga V.; Nechitaylo, Taras Y.; Waliczek, Agnes; Yakimov, Michail M.; Golyshin, Peter N.; Ferrer, Manuel

    2010-01-01

    Summary Carboxyl esterases (CE) exhibit various reaction specificities despite of their overall structural similarity. In present study we have exploited functional metagenomics, saturation mutagenesis and experimental protein evolution to explore residues that have a significant role in substrate discrimination. We used an enzyme, designated 3A6, derived from the earthworm gut metagenome that exhibits CE and feruloyl esterase (FAE) activities with p‐nitrophenyl and cinnamate esters, respectively, with a [(kcat/Km)]CE/[(kcat/Km)]FAE factor of 17. Modelling‐guided saturation mutagenesis at specific hotspots (Lys281, Asp282, Asn316 and Lys317) situated close to the catalytic core (Ser143/Asp273/His305) and a deletion of a 34‐AA–long peptide fragment yielded mutants with the highest CE activity, while cinnamate ester bond hydrolysis was effectively abolished. Although, single to triple mutants with both improved activities (up to 180‐fold in kcat/Km values) and enzymes with inverted specificity ((kcat/Km)CE/(kcat/Km)FAE ratio of ∼0.4) were identified, no CE inactive variant was found. Screening of a large error‐prone PCR‐generated library yielded by far less mutants for substrate discrimination. We also found that no significant changes in CE activation energy occurs after any mutation (7.3 to −5.6 J mol−1), whereas a direct correlation between loss/gain of FAE function and activation energies (from 33.05 to −13.7 J mol−1) was found. Results suggest that the FAE activity in 3A6 may have evolved via introduction of a limited number of ‘hot spot’ mutations in a common CE ancestor, which may retain the original hydrolytic activity due to lower restrictive energy barriers but conveys a dynamic energetically favourable switch of a second hydrolytic reaction. PMID:21255305

  3. Metagenomic analysis of the medicinal leech gut microbiota

    PubMed Central

    Maltz, Michele A.; Bomar, Lindsey; Lapierre, Pascal; Morrison, Hilary G.; McClure, Emily Ann; Sogin, Mitchell L.; Graf, Joerg

    2014-01-01

    There are trillions of microbes found throughout the human body and they exceed the number of eukaryotic cells by 10-fold. Metagenomic studies have revealed that the majority of these microbes are found within the gut, playing an important role in the host's digestion and nutrition. The complexity of the animal digestive tract, unculturable microbes, and the lack of genetic tools for most culturable microbes make it challenging to explore the nature of these microbial interactions within this niche. The medicinal leech, Hirudo verbana, has been shown to be a useful tool in overcoming these challenges, due to the simplicity of the microbiome and the availability of genetic tools for one of the two dominant gut symbionts, Aeromonas veronii. In this study, we utilize 16S rRNA gene pyrosequencing to further explore the microbial composition of the leech digestive tract, confirming the dominance of two taxa, the Rikenella-like bacterium and A. veronii. The deep sequencing approach revealed the presence of additional members of the microbial community that suggests the presence of a moderately complex microbial community with a richness of 36 taxa. The presence of a Proteus strain as a newly identified resident in the leech crop was confirmed using fluorescence in situ hybridization (FISH). The metagenome of this community was also pyrosequenced and the contigs were binned into the following taxonomic groups: Rikenella-like (3.1 MB), Aeromonas (4.5 MB), Proteus (2.9 MB), Clostridium (1.8 MB), Eryspelothrix (0.96 MB), Desulfovibrio (0.14 MB), and Fusobacterium (0.27 MB). Functional analyses on the leech gut symbionts were explored using the metagenomic data and MG-RAST. A comparison of the COG and KEGG categories of the leech gut metagenome to that of other animal digestive-tract microbiomes revealed that the leech digestive tract had a similar metabolic potential to the human digestive tract, supporting the usefulness of this system as a model for studying digestive

  4. Accessing the Hidden Majority of Marine Natural Products Through Metagenomics

    PubMed Central

    Donia, Mohamed S.; Ruffner, Duane E.; Cao, Sheng

    2012-01-01

    Tiny marine animals represent an untapped reservoir for undiscovered, bioactive natural products. However, their small size and extreme chemical variability preclude traditional chemical approaches to discovering new bioactive compounds. Here, we use a metagenomic method to directly discover and rapidly access cyanobactin class natural products from these variable samples, providing proof-of-concept for genome based discovery and supply of marine natural products. We also address practical optimization of complex, multistep ribosomal peptide pathways in heterologous hosts, which is still very challenging. The resulting methods and concepts will be applicable to ribosomal peptide and other biosynthetic pathways. PMID:21542088

  5. Marine Microbial Metagenomics: From Individual to the Environment

    PubMed Central

    Tseng, Ching-Hung; Tang, Sen-Lin

    2014-01-01

    Microbes are the most abundant biological entities on earth, therefore, studying them is important for understanding their roles in global ecology. The science of metagenomics is a relatively young field of research that has enjoyed significant effort since its inception in 1998. Studies using next-generation sequencing techniques on single genomes and collections of genomes have not only led to novel insights into microbial genomics, but also revealed a close association between environmental niches and genome evolution. Herein, we review studies investigating microbial genomics (largely in the marine ecosystem) at the individual and community levels to summarize our current understanding of microbial ecology in the environment. PMID:24857918

  6. Metagenomic analysis of fungal taxa inhabiting Mecca region, Saudi Arabia.

    PubMed

    Moussa, Tarek A A; Al-Zahrani, Hassan S; Almaghrabi, Omar A; Sabry, Nevien M; Fuller, Michael P

    2016-09-01

    The data presented contains the sequences of fungal Internal Transcribed Spacer (ITS) and 18S rRNA gene from a metagenome of the Mecca region, Saudi Arabia. Sequences were amplified using fungal specific primers, which amplified the amplicon aligned between the 18S and 28S rRNA genes. A total of 460 fungal species belonging to 133 genera, 58 families, 33 orders, 13 classes and 4 phyla were identified in four contrasting locations. The raw sequencing data used to perform this analysis along with FASTQ file are located in the NCBI Sequence Read Archive (SRA) under accession numbers: SRR3150823, SRR3144873, SRR3150825 and SRR3150846. PMID:27508121

  7. Metagenomics and the Human Virome in Asymptomatic Individuals.

    PubMed

    Rascovan, Nicolás; Duraisamy, Raja; Desnues, Christelle

    2016-09-01

    High-throughput sequencing technologies have revolutionized how we think about viruses. Investigators can now go beyond pathogenic viruses and have access to the thousands of viruses that inhabit our bodies without causing clinical symptoms. By studying their interactions with each other, with other microbes, and with host genetics and immune systems, we can learn how they affect health and disease. This article reviews current knowledge of the composition and diversity of the human virome in physiologically healthy individuals. It focuses on recent results from metagenomics studies and discusses the contribution of bacteriophages and eukaryotic viruses to human health. PMID:27607550

  8. Clinical and legal significance of fragmentation of bullets in relation to size of wounds: retrospective analysis

    PubMed Central

    Coupland, Robin

    1999-01-01

    Objective To examine the relation between fragmentation of bullets and size of wounds clinically and in the context of the Hague Declaration of 1899. Design Retrospective analysis of prospectively collected data on hospital admissions. Setting Hospitals of the International Committee of the Red Cross. Subjects 5215 people wounded by bullets in armed conflicts (5933 wounds). Main outcome measures Grade of wound computed from the Red Cross wound classification and presence of bullet fragments on radiography. Results Of the 347 wounds with fragmentation of bullets, 251 (72%) were large wounds (grade 2 or 3)—that is, those with a clinically detectable cavity. Of the 5586 wounds without fragmentation of bullets, 2915 (52.1%) were large wounds. Only 7.9% (251/3166) of large wounds were associated with fragmentation of bullets. Conclusions Fragmentation of bullets is associated with large wounds, but most large wounds do not contain bullet fragments. In addition, bullet fragments may occur in wounds that are not defined as large. Fragmentation of bullets is neither a necessary nor sufficient cause of large wounds, and surgeons should not diagnose extensive tissue damage because of the presence of fragments on radiography. Such findings also do not necessarily represent the use of bullets which contravene the law of war. Future legislation should take into account not only the construction of bullets but also their potential to transfer energy to the human body. Key messagesThe use of certain bullets has been prohibited in warWounds from bullets are caused by transfer of kinetic energy from the bullet to the tissuesThe relation between size of wound and fragmentation of bullets can be examined using the Red Cross wound classification system Fragments of bullets seen on radiographs of wounds sustained in wars do not necessarily represent large wounds or the use of illegal bulletsExisting legislation on the construction of bullets should be supplemented by legislation on

  9. Metagenomic Insights into Metabolic Capacities of the Gut Microbiota in a Fungus-Cultivating Termite (Odontotermes yunnanensis)

    PubMed Central

    Liu, Ning; Zhang, Lei; Zhou, Haokui; Zhang, Meiling; Yan, Xing; Wang, Qian; Long, Yanhua; Xie, Lei; Wang, Shengyue; Huang, Yongping; Zhou, Zhihua

    2013-01-01

    Macrotermitinae (fungus-cultivating termites) are major decomposers in tropical and subtropical areas of Asia and Africa. They have specifically evolved mutualistic associations with both a Termitomyces fungi on the nest and a gut microbiota, providing a model system for probing host-microbe interactions. Yet the symbiotic roles of gut microbes residing in its major feeding caste remain largely undefined. Here, by pyrosequencing the whole gut metagenome of adult workers of a fungus-cultivating termite (Odontotermes yunnanensis), we showed that it did harbor a broad set of genes or gene modules encoding carbohydrate-active enzymes (CAZymes) relevant to plant fiber degradation, particularly debranching enzymes and oligosaccharide-processing enzymes. Besides, it also contained a considerable number of genes encoding chitinases and glycoprotein oligosaccharide-processing enzymes for fungal cell wall degradation. To investigate the metabolic divergence of higher termites of different feeding guilds, a SEED subsystem-based gene-centric comparative analysis of the data with that of a previously sequenced wood-feeding Nasutitermes hindgut microbiome was also attempted, revealing that SEED classifications of nitrogen metabolism, and motility and chemotaxis were significantly overrepresented in the wood-feeder hindgut metagenome, while Bacteroidales conjugative transposons and subsystems related to central aromatic compounds metabolism were apparently overrepresented here. This work fills up our gaps in understanding the functional capacities of fungus-cultivating termite gut microbiota, especially their roles in the symbiotic digestion of lignocelluloses and utilization of fungal biomass, both of which greatly add to existing understandings of this peculiar symbiosis. PMID:23874908

  10. Fragmentation pathways of polymer ions.

    PubMed

    Wesdemiotis, Chrys; Solak, Nilüfer; Polce, Michael J; Dabney, David E; Chaicharoen, Kittisak; Katzenmeyer, Bryan C

    2011-01-01

    Tandem mass spectrometry (MS/MS) is increasingly applied to synthetic polymers to characterize chain-end or in-chain substituents, distinguish isobaric and isomeric species, and determine macromolecular connectivities and architectures. For confident structural assignments, the fragmentation mechanisms of polymer ions must be understood, as they provide guidelines on how to deduce the desired information from the fragments observed in MS/MS spectra. This article reviews the fragmentation pathways of synthetic polymer ions that have been energized to decompose via collisionally activated dissociation (CAD), the most widely used activation method in polymer analysis. The compounds discussed encompass polystyrenes, poly(2-vinyl pyridine), polyacrylates, poly(vinyl acetate), aliphatic polyester copolymers, polyethers, and poly(dimethylsiloxane). For a number of these polymers, several substitution patterns and architectures are considered, and questions regarding the ionization agent and internal energy of the dissociating precursor ions are also addressed. Competing and consecutive dissociations are evaluated in terms of the structural insight they provide about the macromolecular structure. The fragmentation pathways of the diverse array of polymer ions examined fall into three categories, viz. (1) charge-directed fragmentations, (2) charge-remote rearrangements, and (3) charge-remote fragmentations via radical intermediates. Charge-remote processes predominate. Depending on the ionizing agent and the functional groups in the polymer, the incipient fragments arising by pathways (1)-(3) may form ion-molecule complexes that survive long enough to permit inter-fragment hydrogen atom, proton, or hydride transfers. PMID:20623599

  11. Fragment-based drug design.

    PubMed

    Feyfant, Eric; Cross, Jason B; Paris, Kevin; Tsao, Désirée H H

    2011-01-01

    Fragment-based drug design (FBDD), which is comprised of both fragment screening and the use of fragment hits to design leads, began more than 15 years ago and has been steadily gaining in popularity and utility. Its origin lies on the fact that the coverage of chemical space and the binding efficiency of hits are directly related to the size of the compounds screened. Nevertheless, FBDD still faces challenges, among them developing fragment screening libraries that ensure optimal coverage of chemical space, physical properties and chemical tractability. Fragment screening also requires sensitive assays, often biophysical in nature, to detect weak binders. In this chapter we will introduce the technologies used to address these challenges and outline the experimental advantages that make FBDD one of the most popular new hit-to-lead process. PMID:20981527

  12. Effect of the strain Bacillus amyloliquefaciens FZB42 on the microbial community in the rhizosphere of lettuce under field conditions analyzed by whole metagenome sequencing

    PubMed Central

    Kröber, Magdalena; Wibberg, Daniel; Grosch, Rita; Eikmeyer, Felix; Verwaaijen, Bart; Chowdhury, Soumitra P.; Hartmann, Anton; Pühler, Alfred; Schlüter, Andreas

    2014-01-01

    Application of the plant associated bacterium Bacillus amyloliquefaciens FZB42 on lettuce (Lactuca sativa) confirmed its capability to promote plant growth and health by reducing disease severity (DS) caused by the phytopathogenic fungus Rhizoctonia solani. Therefore this strain is commercially applied as an eco-friendly plant protective agent. It is able to produce cyclic lipopeptides (CLP) and polyketides featuring antifungal and antibacterial properties. Production of these secondary metabolites led to the question of a possible impact of strain FZB42 on the composition of microbial rhizosphere communities after its application. Rating of DS and lettuce growth during a field trial confirmed the positive impact of strain FZB42 on the health of the host plant. To verify B. amyloliquefaciens as an environmentally compatible plant protective agent, its effect on the indigenous rhizosphere community was analyzed by metagenome sequencing. Rhizosphere microbial communities of lettuce treated with B. amyloliquefaciens FZB42 and non-treated plants were profiled by high-throughput metagenome sequencing of whole community DNA. Fragment recruitments of metagenome sequence reads on the genome sequence of B. amyloliquefaciens FZB42 proved the presence of the strain in the rhizosphere over 5 weeks of the field trial. Comparison of taxonomic community profiles only revealed marginal changes after application of strain FZB42. The orders Burkholderiales, Actinomycetales and Rhizobiales were most abundant in all samples. Depending on plant age a general shift within the composition of the microbial communities that was independent of the application of strain FZB42 was observed. In addition to the taxonomic profiling, functional analysis of annotated sequences revealed no major differences between samples regarding application of the inoculant strain. PMID:24904564

  13. Expression and characterization of a novel metagenome-derived cellulase Exo2b and its application to improve cellulase activity in Trichoderma reesei.

    PubMed

    Geng, Alei; Zou, Gen; Yan, Xing; Wang, Qianfu; Zhang, Jun; Liu, Fanghua; Zhu, Baoli; Zhou, Zhihua

    2012-11-01

    A metagenomic fosmid library containing 1 × 10(5) clones was constructed from a biogas digester fed with pig ordure and rice straw. In total, 121 clones with activity of 4-methylumbelliferyl-cellobiosidase were screened from the metagenomic library. A novel GH5 cellulase gene exo2b was identified from a sequenced clone EXO02C10 and expressed in Escherichia coli BL21. The corresponding recombinant Exo2b protein showed high specific activity toward both carboxymethylcellulose (CMC; 260 U/mg protein) and β-D-glucan from barley (849 U/mg), with an optimal pH and temperature of 7.5 and 58 °C, respectively. Exo2b showed stable activity at a wide pH range from 5.5 to 9.0 and was highly thermostable at 60 °C in the presence of 60 mM cysteine. Residual activity was maintained at nearly 100% when Exo2b was incubated at 60 °C for 15 h. A thin-layer chromatography analysis of the hydrolysis products confirmed that Exo2b was an endo-β-1,4-glucanase and it could also produce oligosaccharide smaller than cellotetraose. The fragment encoding the Exo2b catalytic domain was then fused with the cbh1 gene from Trichoderma reesei, and the fused gene was successfully expressed in T. reesei Rut-C30. Compared to that of the parent strain, the filter paper activity and CMCase activity of the secreted proteins of a selected transformant A1 increased by 24% and 18%, respectively. Besides, the glucose concentration from the hydrolysis of pretreated corn stover by the A1 secreted proteins increased by 19.8%. The present study demonstrated the potential application of metagenome originated cellulase genes to modify cellulase producing fungi. PMID:22270237

  14. FY08 LDRD Final Report Probabilistic Inference of Metabolic Pathways from Metagenomic Sequence Data

    SciTech Connect

    D'haeseleer, P

    2009-03-01

    Metagenomic 'shotgun' sequencing of environmental microbial communities has the potential to revolutionize microbial ecology, allowing a cultivation-independent, yet sequence-based analysis of the metabolic capabilities and functions present in an environmental sample. Although its intensive sequencing requirements are a good match for the continuously increasing bandwidth at sequencing centers, the complexity, seemingly inexhaustible novelty, and 'scrambled' nature of metagenomic data is also proving a tremendous challenge for analysis. In fact, many metagenomics projects do not go much further than providing a list of novel gene variants and over- or under-represented functional gene categories. In this project, we proposed to develop a set of novel metagenomic sequence analysis tools, including a binning method to group sequences by species, inference of phenotypes and metabolic pathways from these reconstructed species, and extraction of coarse-grained flux models. We proposed to closely collaborate with the DOE Joint Genome Institute to align these tools with their metagenomics analysis needs and the developing IMG/M metagenomics pipeline. Results would be cross-validated with simulated metagenomic data using a testing platform developed at the JGI.

  15. Use of whole genome shotgun metagenomics: a practical guide for the microbiome-minded physician scientist.

    PubMed

    Ma, Jun; Prince, Amanda; Aagaard, Kjersti M

    2014-01-01

    Whole genome shotgun sequencing (WGS) has been increasingly recognized as the most comprehensive and robust approach for metagenomics research. When compared with 16S-based metagenomics, it offers the advantage of identification of species level taxonomy and the estimation of metabolic pathway activities from human and environmental samples. Several large-scale metagenomic projects have been recently conducted or are currently underway utilizing WGS. With the generation of vast amounts of data, the bioinformatics and computational analysis of WGS results become vital for the success of a metagenomics study. However, each step in the WGS data analysis, including metagenome assembly, gene prediction, taxonomy identification, function annotation, and pathway analysis, is complicated by the shear amount of data. Algorithms and tools have been developed specifically to handle WGS-generated metagenomics data with the hope of reducing the requirement on computational time and storage space. Here, we present an overview of the current state of metagenomics through WGS sequencing, challenges frequently encountered, and up-to-date solutions. Several applications that are uniquely applicable to microbiome studies in reproductive and perinatal medicine are also discussed. PMID:24390915

  16. Challenges and opportunities in understanding microbial communities with metagenome assembly (accompanied by IPython Notebook tutorial)

    DOE PAGESBeta

    Howe, Adina; Chain, Patrick S. G.

    2015-07-09

    Metagenomic investigations hold great promise for informing the genetics, physiology, and ecology of environmental microorganisms. Current challenges for metagenomic analysis are related to our ability to connect the dots between sequencing reads, their population of origin, and their encoding functions. Assembly-based methods reduce dataset size by extending overlapping reads into larger contiguous sequences (contigs), providing contextual information for genetic sequences that does not rely on existing references. These methods, however, tend to be computationally intensive and are again challenged by sequencing errors as well as by genomic repeats. While numerous tools have been developed based on these methodological concepts, theymore » present confounding choices and training requirements to metagenomic investigators. To help with accessibility to assembly tools, this review also includes an IPython Notebook metagenomic assembly tutorial. This tutorial has instructions for execution any operating system using Amazon Elastic Cloud Compute and guides users through downloading, assembly, and mapping reads to contigs of a mock microbiome metagenome. Despite its challenges, metagenomic analysis has already revealed novel insights into many environments on Earth. As software, training, and data continue to emerge, metagenomic data access and its discoveries will to grow.« less

  17. Challenges and opportunities in understanding microbial communities with metagenome assembly (accompanied by IPython Notebook tutorial)

    PubMed Central

    Howe, Adina; Chain, Patrick S. G.

    2015-01-01

    Metagenomic investigations hold great promise for informing the genetics, physiology, and ecology of environmental microorganisms. Current challenges for metagenomic analysis are related to our ability to connect the dots between sequencing reads, their population of origin, and their encoding functions. Assembly-based methods reduce dataset size by extending overlapping reads into larger contiguous sequences (contigs), providing contextual information for genetic sequences that does not rely on existing references. These methods, however, tend to be computationally intensive and are again challenged by sequencing errors as well as by genomic repeats While numerous tools have been developed based on these methodological concepts, they present confounding choices and training requirements to metagenomic investigators. To help with accessibility to assembly tools, this review also includes an IPython Notebook metagenomic assembly tutorial. This tutorial has instructions for execution any operating system using Amazon Elastic Cloud Compute and guides users through downloading, assembly, and mapping reads to contigs of a mock microbiome metagenome. Despite its challenges, metagenomic analysis has already revealed novel insights into many environments on Earth. As software, training, and data continue to emerge, metagenomic data access and its discoveries will to grow. PMID:26217314

  18. Challenges and opportunities in understanding microbial communities with metagenome assembly (accompanied by IPython Notebook tutorial).

    PubMed

    Howe, Adina; Chain, Patrick S G

    2015-01-01

    Metagenomic investigations hold great promise for informing the genetics, physiology, and ecology of environmental microorganisms. Current challenges for metagenomic analysis are related to our ability to connect the dots between sequencing reads, their population of origin, and their encoding functions. Assembly-based methods reduce dataset size by extending overlapping reads into larger contiguous sequences (contigs), providing contextual information for genetic sequences that does not rely on existing references. These methods, however, tend to be computationally intensive and are again challenged by sequencing errors as well as by genomic repeats While numerous tools have been developed based on these methodological concepts, they present confounding choices and training requirements to metagenomic investigators. To help with accessibility to assembly tools, this review also includes an IPython Notebook metagenomic assembly tutorial. This tutorial has instructions for execution any operating system using Amazon Elastic Cloud Compute and guides users through downloading, assembly, and mapping reads to contigs of a mock microbiome metagenome. Despite its challenges, metagenomic analysis has already revealed novel insights into many environments on Earth. As software, training, and data continue to emerge, metagenomic data access and its discoveries will to grow. PMID:26217314

  19. Evidence-Based Clustering of Reads and Taxonomic Analysis of Metagenomic Data

    NASA Astrophysics Data System (ADS)

    Folino, Gianluigi; Gori, Fabio; Jetten, Mike S. M.; Marchiori, Elena

    The rapidly emerging field of metagenomics seeks to examine the genomic content of communities of organisms to understand their roles and interactions in an ecosystem. In this paper we focus on clustering methods and their application to taxonomic analysis of metagenomic data. Clustering analysis for metagenomics amounts to group similar partial sequences, such as raw sequence reads, into clusters in order to discover information about the internal structure of the considered dataset, or the relative abundance of protein families. Different methods for clustering analysis of metagenomic datasets have been proposed. Here we focus on evidence-based methods for clustering that employ knowledge extracted from proteins identified by a BLASTx search (proxygenes). We consider two clustering algorithms introduced in previous works and a new one. We discuss advantages and drawbacks of the algorithms, and use them to perform taxonomic analysis of metagenomic data. To this aim, three real-life benchmark datasets used in previous work on metagenomic data analysis are used. Comparison of the results indicates satisfactory coherence of the taxonomies output by the three algorithms, with respect to phylogenetic content at the class level and taxonomic distribution at phylum level. In general, the experimental comparative analysis substantiates the effectiveness of evidence-based clustering methods for taxonomic analysis of metagenomic data.

  20. Comparative Metagenomic Analysis Reveals Mechanisms for Stress Response in Hypoliths from Extreme Hyperarid Deserts.

    PubMed

    Le, Phuong Thi; Makhalanyane, Thulani P; Guerrero, Leandro D; Vikram, Surendra; Van de Peer, Yves; Cowan, Don A

    2016-01-01

    Understanding microbial adaptation to environmental stressors is crucial for interpreting broader ecological patterns. In the most extreme hot and cold deserts, cryptic niche communities are thought to play key roles in ecosystem processes and represent excellent model systems for investigating microbial responses to environmental stressors. However, relatively little is known about the genetic diversity underlying such functional processes in climatically extreme desert systems. This study presents the first comparative metagenome analysis of cyanobacteria-dominated hypolithic communities in hot (Namib Desert, Namibia) and cold (Miers Valley, Antarctica) hyperarid deserts. The most abundant phyla in both hypolith metagenomes were Actinobacteria, Proteobacteria, Cyanobacteria and Bacteroidetes with Cyanobacteria dominating in Antarctic hypoliths. However, no significant differences between the two metagenomes were identified. The Antarctic hypolithic metagenome displayed a high number of sequences assigned to sigma factors, replication, recombination and repair, translation, ribosomal structure, and biogenesis. In contrast, the Namib Desert metagenome showed a high abundance of sequences assigned to carbohydrate transport and metabolism. Metagenome data analysis also revealed significant divergence in the genetic determinants of amino acid and nucleotide metabolism between these two metagenomes and those of soil from other polar deserts, hot deserts, and non-desert soils. Our results suggest extensive niche differentiation in hypolithic microbial communities from these two extreme environments and a high genetic capacity for survival under environmental extremes. PMID:27503299

  1. Challenges and opportunities in understanding microbial communities with metagenome assembly (accompanied by IPython Notebook tutorial)

    SciTech Connect

    Howe, Adina; Chain, Patrick S. G.

    2015-07-09

    Metagenomic investigations hold great promise for informing the genetics, physiology, and ecology of environmental microorganisms. Current challenges for metagenomic analysis are related to our ability to connect the dots between sequencing reads, their population of origin, and their encoding functions. Assembly-based methods reduce dataset size by extending overlapping reads into larger contiguous sequences (contigs), providing contextual information for genetic sequences that does not rely on existing references. These methods, however, tend to be computationally intensive and are again challenged by sequencing errors as well as by genomic repeats. While numerous tools have been developed based on these methodological concepts, they present confounding choices and training requirements to metagenomic investigators. To help with accessibility to assembly tools, this review also includes an IPython Notebook metagenomic assembly tutorial. This tutorial has instructions for execution any operating system using Amazon Elastic Cloud Compute and guides users through downloading, assembly, and mapping reads to contigs of a mock microbiome metagenome. Despite its challenges, metagenomic analysis has already revealed novel insights into many environments on Earth. As software, training, and data continue to emerge, metagenomic data access and its discoveries will to grow.

  2. A highly abundant bacteriophage discovered in the unknown sequences of human faecal metagenomes.

    PubMed

    Dutilh, Bas E; Cassman, Noriko; McNair, Katelyn; Sanchez, Savannah E; Silva, Genivaldo G Z; Boling, Lance; Barr, Jeremy J; Speth, Daan R; Seguritan, Victor; Aziz, Ramy K; Felts, Ben; Dinsdale, Elizabeth A; Mokili, John L; Edwards, Robert A

    2014-01-01

    Metagenomics, or sequencing of the genetic material from a complete microbial community, is a promising tool to discover novel microbes and viruses. Viral metagenomes typically contain many unknown sequences. Here we describe the discovery of a previously unidentified bacteriophage present in the majority of published human faecal metagenomes, which we refer to as crAssphage. Its ~97 kbp genome is six times more abundant in publicly available metagenomes than all other known phages together; it comprises up to 90% and 22% of all reads in virus-like particle (VLP)-derived metagenomes and total community metagenomes, respectively; and it totals 1.68% of all human faecal metagenomic sequencing reads in the public databases. The majority of crAssphage-encoded proteins match no known sequences in the database, which is why it was not detected before. Using a new co-occurrence profiling approach, we predict a Bacteroides host for this phage, consistent with Bacteroides-related protein homologues and a unique carbohydrate-binding domain encoded in the phage genome. PMID:25058116

  3. MALINA: a web service for visual analytics of human gut microbiota whole-genome metagenomic reads

    PubMed Central

    2012-01-01

    MALINA is a web service for bioinformatic analysis of whole-genome metagenomic data obtained from human gut microbiota sequencing. As input data, it accepts metagenomic reads of various sequencing technologies, including long reads (such as Sanger and 454 sequencing) and next-generation (including SOLiD and Illumina). It is the first metagenomic web service that is capable of processing SOLiD color-space reads, to authors’ knowledge. The web service allows phylogenetic and functional profiling of metagenomic samples using coverage depth resulting from the alignment of the reads to the catalogue of reference sequences which are built into the pipeline and contain prevalent microbial genomes and genes of human gut microbiota. The obtained metagenomic composition vectors are processed by the statistical analysis and visualization module containing methods for clustering, dimension reduction and group comparison. Additionally, the MALINA database includes vectors of bacterial and functional composition for human gut microbiota samples from a large number of existing studies allowing their comparative analysis together with user samples, namely datasets from Russian Metagenome project, MetaHIT and Human Microbiome Project (downloaded from http://hmpdacc.org). MALINA is made freely available on the web at http://malina.metagenome.ru. The website is implemented in JavaScript (using Ext JS), Microsoft .NET Framework, MS SQL, Python, with all major browsers supported. PMID:23216677

  4. Self-organizing approach for meta-genomes.

    PubMed

    Zhu, Jianfeng; Zheng, Wei-Mou

    2014-12-01

    We extend the self-organizing approach for annotation of a bacterial genome to analyze the raw sequencing data of the human gut metagenome without sequence assembling. The original approach divides the genomic sequence of a bacterium into non-overlapping segments of equal length and assigns to each segment one of seven 'phases', among which one is for the noncoding regions, three for the direct coding regions to indicate the three possible codon positions of the segment starting site, and three for the reverse coding regions. The noncoding phase and the six coding phases are described by two frequency tables of the 64 triplet types or 'codon usages'. A set of codon usages can be used to update the phase assignment and vice versa. An iteration after an initialization leads to a convergent phase assignment to give an annotation of the genome. In the extension of the approach to a metagenome, we consider a mixture model of a number of categories described by different codon usages. The Illumina Genome Analyzer sequencing data of the total DNA from faecal samples are then examined to understand the diversity of the human gut microbiome. PMID:25213854

  5. Prospecting for Novel Biocatalysts in a Soil Metagenome

    PubMed Central

    Voget, S.; Leggewie, C.; Uesbeck, A.; Raasch, C.; Jaeger, K.-E.; Streit, W. R.

    2003-01-01

    The metagenomes of complex microbial communities are rich sources of novel biocatalysts. We exploited the metagenome of a mixed microbial population for isolation of more than 15 different genes encoding novel biocatalysts by using a combined cultivation and direct cloning strategy. A 16S rRNA sequence analysis revealed the presence of hitherto uncultured microbes closely related to the genera Pseudomonas, Agrobacterium, Xanthomonas, Microbulbifer, and Janthinobacterium. Total genomic DNA from this bacterial community was used to construct cosmid DNA libraries, which were functionally searched for novel enzymes of biotechnological value. Our searches in combination with cosmid sequencing resulted in identification of four clones encoding 12 putative agarase genes, most of which were organized in clusters consisting of two or three genes. Interestingly, nine of these agarase genes probably originated from gene duplications. Furthermore, we identified by DNA sequencing several other biocatalyst-encoding genes, including genes encoding a putative stereoselective amidase (amiA), two cellulases (gnuB and uvs080), an α-amylase (amyA), a 1,4-α-glucan branching enzyme (amyB), and two pectate lyases (pelA and uvs119). Also, a conserved cluster of two lipase genes was identified, which was linked to genes encoding a type I secretion system. The novel gene aguB was overexpressed in Escherichia coli, and the enzyme activities were determined. Finally, we describe more than 162 kb of DNA sequence that provides a strong platform for further characterization of this microbial consortium. PMID:14532085

  6. Tracking Strains in the Microbiome: Insights from Metagenomics and Models

    PubMed Central

    Brito, Ilana L.; Alm, Eric J.

    2016-01-01

    Transmission usually refers to the movement of pathogenic organisms. Yet, commensal microbes that inhabit the human body also move between individuals and environments. Surprisingly little is known about the transmission of these endogenous microbes, despite increasing realizations of their importance for human health. The health impacts arising from the transmission of commensal bacteria range widely, from the prevention of autoimmune disorders to the spread of antibiotic resistance genes. Despite this importance, there are outstanding basic questions: what is the fraction of the microbiome that is transmissible? What are the primary mechanisms of transmission? Which organisms are the most highly transmissible? Higher resolution genomic data is required to accurately link microbial sources (such as environmental reservoirs or other individuals) with sinks (such as a single person's microbiome). New computational advances enable strain-level resolution of organisms from shotgun metagenomic data, allowing the transmission of strains to be followed over time and after discrete exposure events. Here, we highlight the latest techniques that reveal strain-level resolution from raw metagenomic reads and new studies that are tracking strains across people and environments. We also propose how models of pathogenic transmission may be applied to study the movement of commensals between microbial communities. PMID:27242733

  7. Biogeography and individuality shape function in the human skin metagenome

    PubMed Central

    Oh, Julia; Byrd, Allyson L.; Deming, Clay; Conlan, Sean; Kong, Heidi H.; Segre, Julia A.

    2014-01-01

    Summary The varied topography of human skin offers a unique opportunity to study how the body’s microenvironments influence the functional and taxonomic composition of microbial communities. Phylogenetic marker gene-based studies have identified many bacteria and fungi that colonize distinct skin niches. Here, metagenomic analyses of diverse body sites in healthy humans demonstrate that local biogeography and strong individuality define the skin microbiome. We developed a relational analysis of bacterial, fungal, and viral communities, which showed not only site-specificity but also individual signatures. We further identified strain-level variation of dominant species as heterogeneous and multiphyletic. Reference-free analyses captured the uncharacterized metagenome through the development of a multi-kingdom gene catalog, which was used to uncover genetic signatures of species lacking reference genomes. This work is foundational for human disease studies investigating inter-kingdom interactions, metabolic changes, and strain tracking and defines the dual influence of biogeography and individuality on microbial composition and function. PMID:25279917

  8. (Meta)genomic insights into the pathogenome of Cellulosimicrobium cellulans

    DOE PAGESBeta

    Sharma, Anukriti; Gilbert, Jack A.; Lal, Rup

    2016-05-06

    Despite having serious clinical manifestations, Cellulosimicrobium cellulans remain under-reported with only three genome sequences available at the time of writing. Genome sequences of C. cellulans LMG16121, C. cellulans J36 and Cellulosimicrobium sp. strain MM were used to determine distribution of pathogenicity islands (PAIs) across C. cellulans, which revealed 49 potential marker genes with known association to human infections, e.g. Fic and VbhA toxin-antitoxin system. Oligonucleotide composition-based analysis of orthologous proteins (n = 791) across three genomes revealed significant negative correlation (P < 0.05) between frequency of optimal codons (Fopt) and gene G+C content, highlighting the G+C-biased gene conversion (gBGC) effectmore » across Cellulosimicrobium strains. Bayesian molecular-clock analysis performed on three virulent PAI proteins (Fic; D-alanyl-D-alanine-carboxypeptidase; transposase) dated the divergence event at 300 million years ago from the most common recent ancestor. Synteny-based annotation of hypothetical proteins highlighted gene transfers from non-pathogenic bacteria as a key factor in the evolution of PAIs. Additonally, deciphering the metagenomic islands using strain MM's genome with environmental data from the site of isolation (hot-spring biofilm) revealed (an)aerobic respiration as population segregation factor across the in situ cohorts. Furthermore, using reference genomes and metagenomic data, our results highlight the emergence and evolution of PAIs in the genus Cellulosimicrobium.« less

  9. Compressed sensing methods for DNA microarrays, RNA interference, and metagenomics.

    PubMed

    Rao, Aditya; P, Deepthi; Renumadhavi, C H; Chandra, M Girish; Srinivasan, Rajgopal

    2015-02-01

    Compressed sensing (CS) is a sparse signal sampling methodology for efficiently acquiring and reconstructing a signal from relatively few measurements. Recent work shows that CS is well-suited to be applied to problems in genomics, including probe design in microarrays, RNA interference (RNAi), and taxonomic assignment in metagenomics. The principle of using different CS recovery methods in these applications has thus been established, but a comprehensive study of using a wide range of CS methods has not been done. For each of these applications, we apply three hitherto unused CS methods, namely, l1-magic, CoSaMP, and l1-homotopy, in conjunction with CS measurement matrices such as randomly generated CS m matrix, Hamming matrix, and projective geometry-based matrix. We find that, in RNAi, the l1-magic (the standard package for l1 minimization) and l1-homotopy methods show significant reduction in reconstruction error compared to the baseline. In metagenomics, we find that l1-homotopy as well as CoSaMP estimate concentration with significantly reduced time when compared to the GPSR and WGSQuikr methods. PMID:25629590

  10. (Meta)genomic insights into the pathogenome of Cellulosimicrobium cellulans

    PubMed Central

    Sharma, Anukriti; Gilbert, Jack A.; Lal, Rup

    2016-01-01

    Despite having serious clinical manifestations, Cellulosimicrobium cellulans remain under-reported with only three genome sequences available at the time of writing. Genome sequences of C. cellulans LMG16121, C. cellulans J36 and Cellulosimicrobium sp. strain MM were used to determine distribution of pathogenicity islands (PAIs) across C. cellulans, which revealed 49 potential marker genes with known association to human infections, e.g. Fic and VbhA toxin-antitoxin system. Oligonucleotide composition-based analysis of orthologous proteins (n = 791) across three genomes revealed significant negative correlation (P < 0.05) between frequency of optimal codons (Fopt) and gene G+C content, highlighting the G+C-biased gene conversion (gBGC) effect across Cellulosimicrobium strains. Bayesian molecular-clock analysis performed on three virulent PAI proteins (Fic; D-alanyl-D-alanine-carboxypeptidase; transposase) dated the divergence event at 300 million years ago from the most common recent ancestor. Synteny-based annotation of hypothetical proteins highlighted gene transfers from non-pathogenic bacteria as a key factor in the evolution of PAIs. Additonally, deciphering the metagenomic islands using strain MM’s genome with environmental data from the site of isolation (hot-spring biofilm) revealed (an)aerobic respiration as population segregation factor across the in situ cohorts. Using reference genomes and metagenomic data, our results highlight the emergence and evolution of PAIs in the genus Cellulosimicrobium. PMID:27151933

  11. Diverse circovirus-like genome architectures revealed by environmental metagenomics.

    PubMed

    Rosario, Karyna; Duffy, Siobain; Breitbart, Mya

    2009-10-01

    Single-stranded DNA (ssDNA) viruses with circular genomes are the smallest viruses known to infect eukaryotes. The present study identified 10 novel genomes similar to ssDNA circoviruses through data-mining of public viral metagenomes. The metagenomic libraries included samples from reclaimed water and three different marine environments (Chesapeake Bay, British Columbia coastal waters and Sargasso Sea). All the genomes have similarities to the replication (Rep) protein of circoviruses; however, only half have genomic features consistent with known circoviruses. Some of the genomes exhibit a mixture of genomic features associated with different families of ssDNA viruses (i.e. circoviruses, geminiviruses and parvoviruses). Unique genome architectures and phylogenetic analysis of the Rep protein suggest that these viruses belong to novel genera and/or families. Investigating the complex community of ssDNA viruses in the environment can lead to the discovery of divergent species and help elucidate evolutionary links between ssDNA viruses. PMID:19570956

  12. (Meta)genomic insights into the pathogenome of Cellulosimicrobium cellulans.

    PubMed

    Sharma, Anukriti; Gilbert, Jack A; Lal, Rup

    2016-01-01

    Despite having serious clinical manifestations, Cellulosimicrobium cellulans remain under-reported with only three genome sequences available at the time of writing. Genome sequences of C. cellulans LMG16121, C. cellulans J36 and Cellulosimicrobium sp. strain MM were used to determine distribution of pathogenicity islands (PAIs) across C. cellulans, which revealed 49 potential marker genes with known association to human infections, e.g. Fic and VbhA toxin-antitoxin system. Oligonucleotide composition-based analysis of orthologous proteins (n = 791) across three genomes revealed significant negative correlation (P < 0.05) between frequency of optimal codons (Fopt) and gene G+C content, highlighting the G+C-biased gene conversion (gBGC) effect across Cellulosimicrobium strains. Bayesian molecular-clock analysis performed on three virulent PAI proteins (Fic; D-alanyl-D-alanine-carboxypeptidase; transposase) dated the divergence event at 300 million years ago from the most common recent ancestor. Synteny-based annotation of hypothetical proteins highlighted gene transfers from non-pathogenic bacteria as a key factor in the evolution of PAIs. Additonally, deciphering the metagenomic islands using strain MM's genome with environmental data from the site of isolation (hot-spring biofilm) revealed (an)aerobic respiration as population segregation factor across the in situ cohorts. Using reference genomes and metagenomic data, our results highlight the emergence and evolution of PAIs in the genus Cellulosimicrobium. PMID:27151933

  13. Metagenomic Analysis of Cerebrospinal Fluid from Patients with Multiple Sclerosis.

    PubMed

    Perlejewski, Karol; Bukowska-Ośko, Iwona; Nakamura, Shota; Motooka, Daisuke; Stokowy, Tomasz; Płoski, Rafał; Rydzanicz, Małgorzata; Zakrzewska-Pniewska, Beata; Podlecka-Piętowska, Aleksandra; Nojszewska, Monika; Gogol, Anna; Caraballo Cortés, Kamila; Demkow, Urszula; Stępień, Adam; Laskus, Tomasz; Radkowski, Marek

    2016-01-01

    Multiple sclerosis (MS) is a chronic inflammatory demyelinating disease of central nervous system of unknown etiology. However, some infectious agents have been suggested to play a significant role in its pathogenesis. Next-generation sequencing (NGS) and metagenomics can be employed to characterize microbiome of MS patients and to identify potential causative pathogens. In this study, 12 patients with idiopathic inflammatory demyelinating disorders (IIDD) of the central nervous system were studied: one patient had clinically isolated syndrome, one patient had recurrent optic neuritis, and ten patients had multiple sclerosis (MS). In addition, there was one patient with other non-inflammatory neurological disease. Cerebrospinal fluid (CSF) was sampled from all patients. RNA was extracted from CSF and subjected to a single-primer isothermal amplification followed by NGS and comprehensive data analysis. Altogether 441,608,474 reads were obtained and mapped using blastn. In a CSF sample from the patient with clinically isolated syndrome, 11 varicella-zoster virus reads were found. Other than that similar bacterial, fungal, parasitic, and protozoan reads were identified in all samples, indicating a common presence of contamination in metagenomics. In conclusion, we identified varicella zoster virus sequences in one out of the 12 patients with IIDD, which suggests that this virus could be occasionally related to the MS pathogenesis. A widespread bacterial contamination seems inherent to NGS and complicates the interpretation of results. PMID:27311319

  14. Detection of Novel Integrons in the Metagenome of Human Saliva

    PubMed Central

    Antepowicz, Agata; Mullany, Peter; Roberts, Adam P.

    2016-01-01

    Integrons are genetic elements capable of capturing and expressing open reading frames (ORFs) embedded within gene cassettes. They are involved in the dissemination of antibiotic resistance genes (ARGs) in clinically important pathogens. Although the ARGs are common in the oral cavity the association of integrons and antibiotic resistance has not been reported there. In this work, a PCR-based approach was used to investigate the presence of integrons and associated gene cassettes in human oral metagenomic DNA obtained from both the UK and Bangladesh. We identified a diverse array of gene cassettes containing ORFs predicted to confer antimicrobial resistance and other adaptive traits. The predicted proteins include a putative streptogramin A O-acetyltransferase, a bleomycin binding protein, cof-like hydrolase, competence and motility related proteins. This is the first study detecting integron gene cassettes directly from oral metagenomic DNA samples. The predicted proteins are likely to carry out a multitude of functions; however, the function of the majority is yet unknown. PMID:27304457

  15. Metagenomic insights into important microbes from the Dead Zone

    NASA Astrophysics Data System (ADS)

    Thrash, C.; Baker, B.; Seitz, K.; Temperton, B.; Gillies, L.; Rabalais, N. N.; Mason, O. U.

    2015-12-01

    Coastal regions of eutrophication-driven oxygen depletion are widespread and increasing in number. Also known as dead zones, these regions take their name from the deleterious effects of hypoxia (dissolved oxygen less than 2 mg/L) on shrimp, demersal fish, and other animal life. Dead zones result from nutrient enrichment of primary production, concomitant consumption by chemoorganotrophic aerobic microorganisms, and strong stratification that prevents ventilation of bottom water. One of the largest dead zones in the world occurs seasonally in the northern Gulf of Mexico (nGOM), where hypoxia can reach up to 22,000 square kilometers. While this dead zone shares many features with more well-known marine oxygen minimum zones, it is nevertheless understudied with regards to the microbial assemblages involved in biogeochemical cycling. We performed metagenomic and metatranscriptomic sequencing on six samples from the 2013 nGOM dead zone from both hypoxic and oxic bottom waters. Assembly and binning led to the recovery of over fifty partial to nearly complete metagenomes from key microbial taxa previously determined to be numerically abundant from 16S rRNA data, such as Thaumarcheaota, Marine Group II Euryarchaeota, SAR406, SAR324, Synechococcus spp., and Planctomycetes. These results provide information about the roles of these taxa in the nGOM dead zone, and opportunities for comparing this region of low oxygen to others around the globe.

  16. Genomics and Metagenomics of Extreme Acidophiles in Biomining Environments

    NASA Astrophysics Data System (ADS)

    Holmes, D. S.

    2015-12-01

    Over 160 draft or complete genomes of extreme acidophiles (pH < 3) have been published, many of which are from bioleaching and other biomining environments, or are closely related to such microorganisms. In addition, there are over 20 metagenomic studies of such environments. This provides a rich source of latent data that can be exploited for understanding the biology of biomining environments and for advancing biotechnological applications. Genomic and metagenomic data are already yielding valuable insights into cellular processes, including carbon and nitrogen management, heavy metal and acid resistance, iron and sulfur oxido-reduction, linking biogeochemical processes to organismal physiology. The data also allow the construction of useful models of the ecophysiology of biomining environments and provide insight into the gene and genome evolution of extreme acidophiles. Additionally, since most of these acidophiles are also chemoautolithotrophs that use minerals as energy sources or electron sinks, their genomes can be plundered for clues about the evolution of cellular metabolism and bioenergetic pathways during the Archaean abiotic/biotic transition on early Earth. Acknowledgements: Fondecyt 1130683.

  17. Metagenome of the gut of a malnourished child

    PubMed Central

    2011-01-01

    Background Malnutrition, a major health problem, affects a significant proportion of preschool children in developing countries. The devastating consequences of malnutrition include diarrhoea, malabsorption, increased intestinal permeability, suboptimal immune response, etc. Nutritional interventions and dietary solutions have not been effective for treatment of malnutrition till date. Metagenomic procedures allow one to access the complex cross-talk between the gut and its microbial flora and understand how a different community composition affects various states of human health. In this study, a metagenomic approach was employed for analysing the differences between gut microbial communities obtained from a malnourished and an apparently healthy child. Results Our results indicate that the malnourished child gut has an abundance of enteric pathogens which are known to cause intestinal inflammation resulting in malabsorption of nutrients. We also identified a few functional sub-systems from these pathogens, which probably impact the overall metabolic capabilities of the malnourished child gut. Conclusion The present study comprehensively characterizes the microbial community resident in the gut of a malnourished child. This study has attempted to extend the understanding of the basis of malnutrition beyond nutrition deprivation. PMID:21599906

  18. Preparation of fosmid libraries and functional metagenomic analysis of microbial community DNA.

    PubMed

    Martínez, Asunción; Osburne, Marcia S

    2013-01-01

    One of the most important challenges in contemporary microbial ecology is to assign a functional role to the large number of novel genes discovered through large-scale sequencing of natural microbial communities that lack similarity to genes of known function. Functional screening of metagenomic libraries, that is, screening environmental DNA clones for the ability to confer an activity of interest to a heterologous bacterial host, is a promising approach for bridging the gap between metagenomic DNA sequencing and functional characterization. Here, we describe methods for isolating environmental DNA and constructing metagenomic fosmid libraries, as well as methods for designing and implementing successful functional screens of such libraries. PMID:24060119

  19. Construction of small-insert and large-insert metagenomic libraries.

    PubMed

    Simon, Carola; Daniel, Rolf

    2010-01-01

    The vast majority of the Earth's biological diversity is hidden in uncultured and yet uncharacterized microbial genomes. The construction of metagenomic libraries is a cultivation-independent molecular approach to assess this unexplored genetic reservoir. In the last few years, a high number of novel biocatalysts have been identified by function-based or sequence-based screening of metagenomic libraries. Here, we describe detailed protocols for the construction of metagenomic small-insert and large-insert libraries in plasmids and fosmids, respectively, from environmental DNA. PMID:20830554

  20. Amplification of thermostable lipase genes fragment from thermogenic phase of domestic waste composting process

    NASA Astrophysics Data System (ADS)

    Nurhasanah, Nurbaiti, Santi; Madayanti, Fida; Akhmaloka

    2015-09-01

    Lipases are lipolytic enzymes, catalyze the hydrolysis of fatty acid ester bonds of triglycerides to produce free fatty acids and glycerol. The enzyme is widely used in various fields of biotechnological industry. Hence, lipases with unique properties (e.g.thermostable lipase) are still being explored by variation methods. One of the strategy is by using metagenomic approach to amplify the gene directly from environmental sample. This research was focused on amplification of lipase gene fragment directly from the thermogenic phase of domestic waste composting in aerated trenches. We used domestic waste compost from waste treatment at SABUGA, ITB for the sample. Total chromosomal DNA were directly extracted from several stages at thermogenic phase of compost. The DNA was then directly used as a template for amplification of thermostable lipase gene fragments using a set of internal primers namely Flip-1a and Rlip-1a that has been affixed with a GC clamp in reverse primer. The results showed that the primers amplified the gene from four stages of thermogenic phase with the size of lipase gene fragment of approximately 570 base pairs (bp). These results were further used for Denaturing Gradient Gel Electrophoresis (DGGE) analysis to determine diversity of thermostable lipase gene fragments.

  1. Fragmentation Function in Thermofield Dynamics

    NASA Astrophysics Data System (ADS)

    Ladrem, M.; Chekerker, M.; Khanna, F. C.; Santana, A. E.

    2013-04-01

    The fragmentation function at high energy experiments is introduced by using thermofield dynamics (TFD), a real-time finite-temperature quantum field formalism. Due to the structure of TFD, the results at T = 0 and T ≠ 0 are split in a direct way. As an application, we consider the temperature effect on the fragmentation function of a hadron leading to quark-antiquark pairs. Using a definition of Wilson-loop in real-time, we find that the fragmentation function decreases in magnitude with an increase in the temperature.

  2. Velocity fluctuations of fission fragments

    NASA Astrophysics Data System (ADS)

    Llanes-Estrada, Felipe J.; Carmona, Belén Martínez; Martínez, Jose L. Muñoz

    2016-02-01

    We propose event by event velocity fluctuations of nuclear fission fragments as an additional interesting observable that gives access to the nuclear temperature in an independent way from spectral measurements and relates the diffusion and friction coefficients for the relative fragment coordinate in Kramers-like models (in which some aspects of fission can be understood as the diffusion of a collective variable through a potential barrier). We point out that neutron emission by the heavy fragments can be treated in effective theory if corrections to the velocity distribution are needed.

  3. Metagenomic study of the oral microbiota by Illumina high-throughput sequencing

    PubMed Central

    Lazarevic, Vladimir; Whiteson, Katrine; Huse, Susan; Hernandez, David; Farinelli, Laurent; Østerås, Magne; Schrenzel, Jacques; François, Patrice

    2013-01-01

    To date, metagenomic studies have relied on the utilization and analysis of reads obtained using 454 pyrosequencing to replace conventional Sanger sequencing. After extensively scanning the 16S ribosomal RNA (rRNA) gene, we identified the V5 hypervariable region as a short region providing reliable identification of bacterial sequences available in public databases such as the Human Oral Microbiome Database. We amplified samples from the oral cavity of three healthy individuals using primers covering an ~82-base segment of the V5 loop, and sequenced using the Illumina technology in a single orientation. We identified 135 genera or higher taxonomic ranks from the resulting 1,373,824 sequences. While the abundances of the most common phyla (Firmicutes, Proteobacteria, Actinobacteria, Fusobacteria and TM7) are largely comparable to previous studies, Bacteroidetes were less present. Potential sources for this difference include classification bias in this region of the 16S rRNA gene, human sample variation, sample preparation and primer bias. Using an Illumina sequencing approach, we achieved a much greater depth of coverage than previous oral microbiota studies, allowing us to identify several taxa not yet discovered in these types of samples, and to assess that at least 30,000 additional reads would be required to identify only one additional phylotype. The evolution of high-throughput sequencing technologies, and their subsequent improvements in read length enable the utilization of different platforms for studying communities of complex flora. Access to large amounts of data is already leading to a better representation of sample diversity at a reasonable cost. PMID:19796657

  4. Discovery of a new polyhydroxyalkanoate synthase from limestone soil through metagenomic approach.

    PubMed

    Tai, Yen Teng; Foong, Choon Pin; Najimudin, Nazalan; Sudesh, Kumar

    2016-04-01

    PHA synthase (PhaC) is the key enzyme in the production of biodegradable plastics known as polyhydroxyalkanoate (PHA). Nevertheless, most of these enzymes are isolated from cultivable bacteria using traditional isolation method. Most of the microorganisms found in nature could not be successfully cultivated due to the lack of knowledge on their growth conditions. In this study, a culture-independent approach was applied. The presence of phaC genes in limestone soil was screened using primers targeting the class I and II PHA synthases. Based on the partial gene sequences, a total of 19 gene clusters have been identified and 7 clones were selected for full length amplification through genome walking. The complete phaC gene sequence of one of the clones (SC8) was obtained and it revealed 81% nucleotide identity to the PHA synthase gene of Chromobacterium violaceum ATCC 12472. This gene obtained from uncultured bacterium was successfully cloned and expressed in a Cupriavidus necator PHB(-)4 PHA-negative mutant resulting in the accumulation of significant amount of PHA. The PHA synthase activity of this transformant was 64 ± 12 U/g proteins. This paper presents a pioneering study on the discovery of phaC in a limestone area using metagenomic approach. Through this study, a new functional phaC was discovered from uncultured bacterium. Phylogenetic classification for all the phaCs isolated from this study has revealed that limestone hill harbors a great diversity of PhaCs with activities that have not yet been investigated. PMID:26467694

  5. New Classification of Headache

    PubMed Central

    Gawel, Marek J.

    1992-01-01

    The Headache Classification Committee of the International Headache Society has developed a new classification system for headache, cranial neuralgia, and facial pain. The value of the classification for the practising clinician is that it forces him or her to take a more careful history in order to determine the nature of the headache. This article reviews the classification system and gives examples of case histories and subsequent diagnoses. PMID:21221276

  6. Fragmentation of drying paint layers

    NASA Astrophysics Data System (ADS)

    Bakos, Katinka; Dombi, András; Járai-Szabó, Ferenc; Néda, Zoltán

    2013-11-01

    Fragmentation of thin layers of drying granular materials on a frictional surface are studied both by experiments and computer simulations. Besides a qualitative description of the fragmentation phenomenon, the dependence of the average fragment size as a function of the layer thickness is thoroughly investigated. Experiments are done using a special nail polish, which forms characteristic crack structures during drying. In order to control the layer thickness, we diluted the nail polish in acetone and evaporated in a controlled manner different volumes of this solution on glass surfaces. During the evaporation process we managed to get an instable paint layer, which formed cracks as it dried out. In order to understand the obtained structures a previously developed spring-block model was implemented in a three-dimensional version. The experimental and simulation results proved to be in excellent qualitative and quantitative agreement. An earlier suggested scaling relation between the average fragment size and the layer thickness is reconfirmed.

  7. Classification of articulators.

    PubMed

    Rihani, A

    1980-03-01

    A simple classification in familiar terms with definite, clear characteristics can be adopted. This classification system is based on the number of records used and the adjustments necessary for the articulator to accept these records. The classification divides the articulators into nonadjustable, semiadjustable, and fully adjustable articulators (Table I). PMID:6928204

  8. Government Classification: An Overview.

    ERIC Educational Resources Information Center

    Brown, Karen M.

    Classification of government documents (confidential, secret, top secret) is a system used by the executive branch to, in part, protect national security and foreign policy interests. The systematic use of classification markings with precise definitions was established during World War I, and since 1936 major changes in classification have…

  9. (Natural fragmentation of exploding cylinders)

    SciTech Connect

    Grady, D.E.; Hightower, M.M.

    1990-01-01

    The natural fragmentation of a 4140 steel cylinder fully loaded with RX-35-AN insensitive high explosive is investigated through experiment and analysis. Methods of Taylor and Gurney are used to determine the fracture strain and kinematic state of the expanding cylinder. Energy methods based on mechanisms of both tension fracture and adiabatic shear fracture are used to calculate the circumferential fragmentation intensity. 9 refs., 5 figs.

  10. Classification: Something to Think About.

    ERIC Educational Resources Information Center

    Isenberg, Joan P.; Jacobs, Judith E.

    1981-01-01

    Advocates the use of classification activities in the elementary school curriculum as a means of developing thinking skills in children. Critical preclassification skills, classification activities (including simple and multiple classification), and classification tasks and materials are discussed. (Author/RH)

  11. Signature Peptide-Enabled Metagenomics (Seventh Annual Sequencing, Finishing, Analysis in the Future (SFAF) Meeting 2012)

    ScienceCinema

    McMahon, Ben [LANL

    2013-01-25

    Ben McMahon of Los Alamos National Laboratory (LANL) presents "Signature Peptide-Enabled Metagenomics" at the 7th Annual Sequencing, Finishing, Analysis in the Future (SFAF) Meeting held in June, 2012 in Santa Fe, NM.

  12. Screening for novel enzymes from metagenome and SIGEX, as a way to improve it

    PubMed Central

    Yun, Jiae; Ryu, Sangryeol

    2005-01-01

    Metagenomics has been successfully applied to isolate novel biocatalysts from the uncultured microbiota in the environment. Two types of screening have been used to identify clones carrying desired traits from metagenomic libraries: function-based screening, and sequence-based screening. Both function- and sequence- based screening have individual advantages and disadvantages, and they have been applied successfully to discover biocatalysts from metagenome. However, both strategies are laborious and tedious because of the low frequency of screening hits. A recent paper introduced a high throughput screening strategy, termed substrate-induced gene-expression screening (SIGEX). SIGEX is designed to select the clones harboring catabolic genes induced by various substrates in concert with fluorescence activated cell sorting (FACS). This method was applied successfully to isolate aromatic hydrocarbon-induced genes from a metagenomic library. Although SIGEX has many limitations, it is expected to provide economic advantages, especially to industry. PMID:15790425

  13. Big Data, Evolution, and Metagenomes: Predicting Disease from Gut Microbiota Codon Usage Profiles.

    PubMed

    Fabijanić, Maja; Vlahoviček, Kristian

    2016-01-01

    Metagenomics projects use next-generation sequencing to unravel genetic potential in microbial communities from a wealth of environmental niches, including those associated with human body and relevant to human health. In order to understand large datasets collected in metagenomics surveys and interpret them in context of how a community metabolism as a whole adapts and interacts with the environment, it is necessary to extend beyond the conventional approaches of decomposing metagenomes into microbial species' constituents and performing analysis on separate components. By applying concepts of translational optimization through codon usage adaptation on entire metagenomic datasets, we demonstrate that a bias in codon usage present throughout the entire microbial community can be used as a powerful analytical tool to predict for community lifestyle-specific metabolism. Here we demonstrate this approach combined with machine learning, to classify human gut microbiome samples according to the pathological condition diagnosed in the human host. PMID:27115650

  14. Signature Peptide-Enabled Metagenomics (Seventh Annual Sequencing, Finishing, Analysis in the Future (SFAF) Meeting 2012)

    SciTech Connect

    McMahon, Ben

    2012-06-01

    Ben McMahon of Los Alamos National Laboratory (LANL) presents "Signature Peptide-Enabled Metagenomics" at the 7th Annual Sequencing, Finishing, Analysis in the Future (SFAF) Meeting held in June, 2012 in Santa Fe, NM.

  15. EBI metagenomics—a new resource for the analysis and archiving of metagenomic data

    PubMed Central

    Hunter, Sarah; Corbett, Matthew; Denise, Hubert; Fraser, Matthew; Gonzalez-Beltran, Alejandra; Hunter, Christopher; Jones, Philip; Leinonen, Rasko; McAnulla, Craig; Maguire, Eamonn; Maslen, John; Mitchell, Alex; Nuka, Gift; Oisel, Arnaud; Pesseat, Sebastien; Radhakrishnan, Rajesh; Rocca-Serra, Philippe; Scheremetjew, Maxim; Sterk, Peter; Vaughan, Daniel; Cochrane, Guy; Field, Dawn; Sansone, Susanna-Assunta

    2014-01-01

    Metagenomics is a relatively recently established but rapidly expanding field that uses high-throughput next-generation sequencing technologies to characterize the microbial communities inhabiting different ecosystems (including oceans, lakes, soil, tundra, plants and body sites). Metagenomics brings with it a number of challenges, including the management, analysis, storage and sharing of data. In response to these challenges, we have developed a new metagenomics resource (http://www.ebi.ac.uk/metagenomics/) that allows users to easily submit raw nucleotide reads for functional and taxonomic analysis by a state-of-the-art pipeline, and have them automatically stored (together with descriptive, standards-compliant metadata) in the European Nucleotide Archive. PMID:24165880

  16. Metagenomic data of fungal internal transcribed spacer from serofluid dish, a traditional Chinese fermented food

    PubMed Central

    Chen, Peng; Zhao, Yang; Wu, Zhengrong; Liu, Ronghui; Xu, Ruixiang; Yan, Lei; Li, Hongyu

    2015-01-01

    Serofluid dish (or Jiangshui, in Chinese), a traditional food in the Chinese culture for thousands of years, is made from vegetables by fermentation. In this work, microorganism community of the fermented serofluid dish was investigated by the culture-independent method. The metagenomic data in this article contains the sequences of fungal internal transcribed spacer (ITS) regions of rRNA genes from 12 different serofluid dish samples. The metagenome comprised of 50,865 average raw reads with an average of 8,958,220 bp and G + C content is 45.62%. This is the first report on metagenomic data of fungal ITS from serofluid dish employing Illumina platform to profile the fungal communities of this little known fermented food from Gansu Province, China. The Metagenomic data of fungal internal transcribed spacer can be accessed at NCBI, SRA database accession no. SRP067411. PMID:26981389

  17. Use of simulated data sets to evaluate the fidelity of metagenomic processing methods

    SciTech Connect

    Mavromatis, K; Ivanova, N; Barry, Kerrie; Shapiro, Harris; Goltsman, Eugene; McHardy, Alice C.; Rigoutsos, Isidore; Salamov, Asaf; Korzeniewski, Frank; Land, Miriam L; Lapidus, Alla L.; Grigoriev, Igor; Hugenholtz, Philip; Kyrpides, Nikos C

    2007-01-01

    Metagenomics is a rapidly emerging field of research for studying microbial communities. To evaluate methods presently used to process metagenomic sequences, we constructed three simulated data sets of varying complexity by combining sequencing reads randomly selected from 113 isolate genomes. These data sets were designed to model real metagenomes in terms of complexity and phylogenetic composition. We assembled sampled reads using three commonly used genome assemblers (Phrap, Arachne and JAZZ), and predicted genes using two popular gene-finding pipelines (fgenesb and CRITICA/GLIMMER). The phylogenetic origins of the assembled contigs were predicted using one sequence similarity-based ( blast hit distribution) and two sequence composition-based (PhyloPythia, oligonucleotide frequencies) binning methods. We explored the effects of the simulated community structure and method combinations on the fidelity of each processing step by comparison to the corresponding isolate genomes. The simulated data sets are available online to facilitate standardized benchmarking of tools for metagenomic analysis.

  18. Metagenomic data of fungal internal transcribed spacer from serofluid dish, a traditional Chinese fermented food.

    PubMed

    Chen, Peng; Zhao, Yang; Wu, Zhengrong; Liu, Ronghui; Xu, Ruixiang; Yan, Lei; Li, Hongyu

    2016-03-01

    Serofluid dish (or Jiangshui, in Chinese), a traditional food in the Chinese culture for thousands of years, is made from vegetables by fermentation. In this work, microorganism community of the fermented serofluid dish was investigated by the culture-independent method. The metagenomic data in this article contains the sequences of fungal internal transcribed spacer (ITS) regions of rRNA genes from 12 different serofluid dish samples. The metagenome comprised of 50,865 average raw reads with an average of 8,958,220 bp and G + C content is 45.62%. This is the first report on metagenomic data of fungal ITS from serofluid dish employing Illumina platform to profile the fungal communities of this little known fermented food from Gansu Province, China. The Metagenomic data of fungal internal transcribed spacer can be accessed at NCBI, SRA database accession no. SRP067411. PMID:26981389

  19. Life in Oligotropic Desert Environments: Contrasting Taxonomic and Functional Diversity of Two Microbial Mats with Metagenomics

    NASA Astrophysics Data System (ADS)

    Bonilla-Rosso, G.; Peimbert, M.; Olmedo, G.; Alcaraz, L. D.; Eguiarte, L. E.; Souza, V.

    2010-04-01

    The metagenomic analysis of two microbial mats from the oligotrophic waters in the Cuatrociéngas basin reveals large differences both at taxonomic and functional level. These are explained in terms of environmental stability and nutrient availability.

  20. Experimental Design and Bioinformatics Analysis for the Application of Metagenomics in Environmental Sciences and Biotechnology.

    PubMed

    Ju, Feng; Zhang, Tong

    2015-11-01

    Recent advances in DNA sequencing technologies have prompted the widespread application of metagenomics for the investigation of novel bioresources (e.g., industrial enzymes and bioactive molecules) and unknown biohazards (e.g., pathogens and antibiotic resistance genes) in natural and engineered microbial systems across multiple disciplines. This review discusses the rigorous experimental design and sample preparation in the context of applying metagenomics in environmental sciences and biotechnology. Moreover, this review summarizes the principles, methodologies, and state-of-the-art bioinformatics procedures, tools and database resources for metagenomics applications and discusses two popular strategies (analysis of unassembled reads versus assembled contigs/draft genomes) for quantitative or qualitative insights of microbial community structure and functions. Overall, this review aims to facilitate more extensive application of metagenomics in the investigation of uncultured microorganisms, novel enzymes, microbe-environment interactions, and biohazards in biotechnological applications where microbial communities are engineered for bioenergy production, wastewater treatment, and bioremediation. PMID:26451629