Science.gov

Sample records for metagenome fragment classification

  1. Metagenome Fragment Classification Using N-Mer Frequency Profiles

    PubMed Central

    Rosen, Gail; Garbarine, Elaine; Caseiro, Diamantino; Polikar, Robi; Sokhansanj, Bahrad

    2008-01-01

    A vast amount of microbial sequencing data is being generated through large-scale projects in ecology, agriculture, and human health. Efficient high-throughput methods are needed to analyze the mass amounts of metagenomic data, all DNA present in an environmental sample. A major obstacle in metagenomics is the inability to obtain accuracy using technology that yields short reads. We construct the unique N-mer frequency profiles of 635 microbial genomes publicly available as of February 2008. These profiles are used to train a naive Bayes classifier (NBC) that can be used to identify the genome of any fragment. We show that our method is comparable to BLAST for small 25 bp fragments but does not have the ambiguity of BLAST's tied top scores. We demonstrate that this approach is scalable to identify any fragment from hundreds of genomes. It also performs quite well at the strain, species, and genera levels and achieves strain resolution despite classifying ubiquitous genomic fragments (gene and nongene regions). Cross-validation analysis demonstrates that species-accuracy achieves 90% for highly-represented species containing an average of 8 strains. We demonstrate that such a tool can be used on the Sargasso Sea dataset, and our analysis shows that NBC can be further enhanced. PMID:19956701

  2. Metagenome fragment classification using N-mer frequency profiles.

    PubMed

    Rosen, Gail; Garbarine, Elaine; Caseiro, Diamantino; Polikar, Robi; Sokhansanj, Bahrad

    2008-01-01

    A vast amount of microbial sequencing data is being generated through large-scale projects in ecology, agriculture, and human health. Efficient high-throughput methods are needed to analyze the mass amounts of metagenomic data, all DNA present in an environmental sample. A major obstacle in metagenomics is the inability to obtain accuracy using technology that yields short reads. We construct the unique N-mer frequency profiles of 635 microbial genomes publicly available as of February 2008. These profiles are used to train a naive Bayes classifier (NBC) that can be used to identify the genome of any fragment. We show that our method is comparable to BLAST for small 25 bp fragments but does not have the ambiguity of BLAST's tied top scores. We demonstrate that this approach is scalable to identify any fragment from hundreds of genomes. It also performs quite well at the strain, species, and genera levels and achieves strain resolution despite classifying ubiquitous genomic fragments (gene and nongene regions). Cross-validation analysis demonstrates that species-accuracy achieves 90% for highly-represented species containing an average of 8 strains. We demonstrate that such a tool can be used on the Sargasso Sea dataset, and our analysis shows that NBC can be further enhanced.

  3. Analysis of composition-based metagenomic classification.

    PubMed

    Higashi, Susan; Barreto, André da Motta Salles; Cantão, Maurício Egidio; de Vasconcelos, Ana Tereza Ribeiro

    2012-01-01

    An essential step of a metagenomic study is the taxonomic classification, that is, the identification of the taxonomic lineage of the organisms in a given sample. The taxonomic classification process involves a series of decisions. Currently, in the context of metagenomics, such decisions are usually based on empirical studies that consider one specific type of classifier. In this study we propose a general framework for analyzing the impact that several decisions can have on the classification problem. Instead of focusing on any specific classifier, we define a generic score function that provides a measure of the difficulty of the classification task. Using this framework, we analyze the impact of the following parameters on the taxonomic classification problem: (i) the length of n-mers used to encode the metagenomic sequences, (ii) the similarity measure used to compare sequences, and (iii) the type of taxonomic classification, which can be conventional or hierarchical, depending on whether the classification process occurs in a single shot or in several steps according to the taxonomic tree. We defined a score function that measures the degree of separability of the taxonomic classes under a given configuration induced by the parameters above. We conducted an extensive computational experiment and found out that reasonable values for the parameters of interest could be (i) intermediate values of n, the length of the n-mers; (ii) any similarity measure, because all of them resulted in similar scores; and (iii) the hierarchical strategy, which performed better in all of the cases. As expected, short n-mers generate lower configuration scores because they give rise to frequency vectors that represent distinct sequences in a similar way. On the other hand, large values for n result in sparse frequency vectors that represent differently metagenomic fragments that are in fact similar, also leading to low configuration scores. Regarding the similarity measure, in

  4. Gene prediction in metagenomic fragments based on the SVM algorithm

    PubMed Central

    2013-01-01

    Background Metagenomic sequencing is becoming a powerful technology for exploring micro-ogranisms from various environments, such as human body, without isolation and cultivation. Accurately identifying genes from metagenomic fragments is one of the most fundamental issues. Results In this article, we present a novel gene prediction method named MetaGUN for metagenomic fragments based on a machine learning approach of SVM. It implements in a three-stage strategy to predict genes. Firstly, it classifies input fragments into phylogenetic groups by a k-mer based sequence binning method. Then, protein-coding sequences are identified for each group independently with SVM classifiers that integrate entropy density profiles (EDP) of codon usage, translation initiation site (TIS) scores and open reading frame (ORF) length as input patterns. Finally, the TISs are adjusted by employing a modified version of MetaTISA. To identify protein-coding sequences, MetaGun builds the universal module and the novel module. The former is based on a set of representative species, while the latter is designed to find potential functionary DNA sequences with conserved domains. Conclusions Comparisons on artificial shotgun fragments with multiple current metagenomic gene finders show that MetaGUN predicts better results on both 3' and 5' ends of genes with fragments of various lengths. Especially, it makes the most reliable predictions among these methods. As an application, MetaGUN was used to predict genes for two samples of human gut microbiome. It identifies thousands of additional genes with significant evidences. Further analysis indicates that MetaGUN tends to predict more potential novel genes than other current metagenomic gene finders. PMID:23735199

  5. Metagenomic DNA fragments that affect Escherichia coli mutational pathways.

    PubMed

    Yang, Hanjing; To, Kam H; Aguila, Sharon J; Miller, Jeffrey H

    2006-08-01

    A multicopy cloning approach was used to search for metagenomic DNA fragments that affect Escherichia coli mutational pathways. Soil metagenomic expression libraries were constructed with DNA samples prepared directly from soil samples collected from the UCLA Botanical Garden. Using frameshift mutator screening, we obtained a total of 26 unique metagenomic fragments that stimulate frameshift rates in an E. coli wild-type host. Mutational enhancer strains such as an ndk-deficient strain and a temperature sensitive mutS strain (mutS60) were used to further verify the mutator phenotype. We found that the presence of multiple copies of certain types of metagenomic DNA sequence repeats cause general genome instability in the wild-type E. coli host and the effect can be suppressed by overproducing a DNA mismatch component MutL. In addition, we identified nine metagenomic mutator genes (designated as smu genes) that encode proteins that have not been linked to mutator phenotypes prior to this study including a putative RNA methyltransferase Smu10A. The strain overproducing Smu10A displays one prominent base substitution hotspot in the rpoB gene, which coincides with the base substitution hotspot we have observed in cells that are partially deficient in the proofreading function carried out by the DNA polymerase III epsilon subunit. Based on the structural conservation of DNA replication/recombination/repair machineries among microorganisms, this approach would allow us to both identify new mutational pathways in E. coli and to find genes involved in DNA replication, recombination or DNA repair from vast unculturable microbes.

  6. Scalable metagenomic taxonomy classification using a reference genome database

    PubMed Central

    Ames, Sasha K.; Hysom, David A.; Gardner, Shea N.; Lloyd, G. Scott; Gokhale, Maya B.; Allen, Jonathan E.

    2013-01-01

    Motivation: Deep metagenomic sequencing of biological samples has the potential to recover otherwise difficult-to-detect microorganisms and accurately characterize biological samples with limited prior knowledge of sample contents. Existing metagenomic taxonomic classification algorithms, however, do not scale well to analyze large metagenomic datasets, and balancing classification accuracy with computational efficiency presents a fundamental challenge. Results: A method is presented to shift computational costs to an off-line computation by creating a taxonomy/genome index that supports scalable metagenomic classification. Scalable performance is demonstrated on real and simulated data to show accurate classification in the presence of novel organisms on samples that include viruses, prokaryotes, fungi and protists. Taxonomic classification of the previously published 150 giga-base Tyrolean Iceman dataset was found to take <20 h on a single node 40 core large memory machine and provide new insights on the metagenomic contents of the sample. Availability: Software was implemented in C++ and is freely available at http://sourceforge.net/projects/lmat Contact: allen99@llnl.gov Supplementary information: Supplementary data are available at Bioinformatics online. PMID:23828782

  7. RAIphy: phylogenetic classification of metagenomics samples using iterative refinement of relative abundance index profiles.

    PubMed

    Nalbantoglu, Ozkan U; Way, Samuel F; Hinrichs, Steven H; Sayood, Khalid

    2011-01-31

    Computational analysis of metagenomes requires the taxonomical assignment of the genome contigs assembled from DNA reads of environmental samples. Because of the diverse nature of microbiomes, the length of the assemblies obtained can vary between a few hundred bp to a few hundred Kbp. Current taxonomic classification algorithms provide accurate classification for long contigs or for short fragments from organisms that have close relatives with annotated genomes. These are significant limitations for metagenome analysis because of the complexity of microbiomes and the paucity of existing annotated genomes. We propose a robust taxonomic classification method, RAIphy, that uses a novel sequence similarity metric with iterative refinement of taxonomic models and functions effectively without these limitations. We have tested RAIphy with synthetic metagenomics data ranging between 100 bp to 50 Kbp. Within a sequence read range of 100 bp-1000 bp, the sensitivity of RAIphy ranges between 38%-81% outperforming the currently popular composition-based methods for reads in this range. Comparison with computationally more intensive sequence similarity methods shows that RAIphy performs competitively while being significantly faster. The sensitivity-specificity characteristics for relatively longer contigs were compared with the PhyloPythia and TACOA algorithms. RAIphy performs better than these algorithms at varying clade-levels. For an acid mine drainage (AMD) metagenome, RAIphy was able to taxonomically bin the sequence read set more accurately than the currently available methods, Phymm and MEGAN, and more accurately in two out of three tests than the much more computationally intensive method, PhymmBL. With the introduction of the relative abundance index metric and an iterative classification method, we propose a taxonomic classification algorithm that performs competitively for a large range of DNA contig lengths assembled from metagenome data. Because of its speed

  8. Accelerating metagenomic read classification on CUDA-enabled GPUs.

    PubMed

    Kobus, Robin; Hundt, Christian; Müller, André; Schmidt, Bertil

    2017-01-03

    Metagenomic sequencing studies are becoming increasingly popular with prominent examples including the sequencing of human microbiomes and diverse environments. A fundamental computational problem in this context is read classification; i.e. the assignment of each read to a taxonomic label. Due to the large number of reads produced by modern high-throughput sequencing technologies and the rapidly increasing number of available reference genomes software tools for fast and accurate metagenomic read classification are urgently needed. We present cuCLARK, a read-level classifier for CUDA-enabled GPUs, based on the fast and accurate classification of metagenomic sequences using reduced k-mers (CLARK) method. Using the processing power of a single Titan X GPU, cuCLARK can reach classification speeds of up to 50 million reads per minute. Corresponding speedups for species- (genus-)level classification range between 3.2 and 6.6 (3.7 and 6.4) compared to multi-threaded CLARK executed on a 16-core Xeon CPU workstation. cuCLARK can perform metagenomic read classification at superior speeds on CUDA-enabled GPUs. It is free software licensed under GPL and can be downloaded at https://github.com/funatiq/cuclark free of charge.

  9. Alignment-free supervised classification of metagenomes by recursive SVM

    PubMed Central

    2013-01-01

    Background Comparison and classification of metagenome samples is one of the major tasks in the study of microbial communities of natural environments or niches on human bodies. Bioinformatics methods play important roles on this task, including 16S rRNA gene analysis and some alignment-based or alignment-free methods on metagenomic data. Alignment-free methods have the advantage of not depending on known genome annotations and therefore have high potential in studying complicated microbiomes. However, the existing alignment-free methods are all based on unsupervised learning strategy (e.g., PCA or hierarchical clustering). These types of methods are powerful in revealing major similarities and grouping relations between microbiome samples, but cannot be applied for discriminating predefined classes of interest which might not be the dominating assortment in the data. Supervised classification is needed in the latter scenario, with the goal of classifying samples into predefined classes and finding the features that can discriminate the classes. The effectiveness of supervised classification with alignment-based features on metagenomic data have been shown in some recent studies. The application of alignment-free supervised classification methods on metagenome data has not been well explored yet. Results We developed a method for this task using k-tuple frequencies as features counted directly from metagenome short reads and the R-SVM (Recursive SVM) for feature selection and classification. We tested our method on a simulation dataset, a real dataset composed of several known genomes, and a real metagenome NGS short reads dataset. Experiments on simulated data showed that the method can classify the classes almost perfectly and can recover major sequence signatures that distinguish the two classes. On the real human gut metagenome data, the method can discriminate samples of inflammatory bowel disease (IBD) patients from control samples with high accuracy, which

  10. Accurate phylogenetic classification of variable-length DNA fragments.

    PubMed

    McHardy, Alice Carolyn; Martín, Héctor García; Tsirigos, Aristotelis; Hugenholtz, Philip; Rigoutsos, Isidore

    2007-01-01

    Metagenome studies have retrieved vast amounts of sequence data from a variety of environments leading to new discoveries and insights into the uncultured microbial world. Except for very simple communities, the encountered diversity has made fragment assembly and the subsequent analysis a challenging problem. A taxonomic characterization of metagenomic fragments is required for a deeper understanding of shotgun-sequenced microbial communities, but success has mostly been limited to sequences containing phylogenetic marker genes. Here we present PhyloPythia, a composition-based classifier that combines higher-level generic clades from a set of 340 completed genomes with sample-derived population models. Extensive analyses on synthetic and real metagenome data sets showed that PhyloPythia allows the accurate classification of most sequence fragments across all considered taxonomic ranks, even for unknown organisms. The method requires no more than 100 kb of training sequence for the creation of accurate models of sample-specific populations and can assign fragments >or=1 kb with high specificity.

  11. Metagenomic Classification Using an Abstraction Augmented Markov Model

    PubMed Central

    Zhu, Xiujun (Sylvia)

    2016-01-01

    Abstract The abstraction augmented Markov model (AAMM) is an extension of a Markov model that can be used for the analysis of genetic sequences. It is developed using the frequencies of all possible consecutive words with same length (p-mers). This article will review the theory behind AAMM and apply the theory behind AAMM in metagenomic classification. PMID:26618474

  12. Multi-Layer and Recursive Neural Networks for Metagenomic Classification.

    PubMed

    Ditzler, Gregory; Polikar, Robi; Rosen, Gail

    2015-09-01

    Recent advances in machine learning, specifically in deep learning with neural networks, has made a profound impact on fields such as natural language processing, image classification, and language modeling; however, feasibility and potential benefits of the approaches to metagenomic data analysis has been largely under-explored. Deep learning exploits many layers of learning nonlinear feature representations, typically in an unsupervised fashion, and recent results have shown outstanding generalization performance on previously unseen data. Furthermore, some deep learning methods can also represent the structure in a data set. Consequently, deep learning and neural networks may prove to be an appropriate approach for metagenomic data. To determine whether such approaches are indeed appropriate for metagenomics, we experiment with two deep learning methods: i) a deep belief network, and ii) a recursive neural network, the latter of which provides a tree representing the structure of the data. We compare these approaches to the standard multi-layer perceptron, which has been well-established in the machine learning community as a powerful prediction algorithm, though its presence is largely missing in metagenomics literature. We find that traditional neural networks can be quite powerful classifiers on metagenomic data compared to baseline methods, such as random forests. On the other hand, while the deep learning approaches did not result in improvements to the classification accuracy, they do provide the ability to learn hierarchical representations of a data set that standard classification methods do not allow. Our goal in this effort is not to determine the best algorithm in terms accuracy-as that depends on the specific application-but rather to highlight the benefits and drawbacks of each of the approach we discuss and provide insight on how they can be improved for predictive metagenomic analysis.

  13. Fast and sensitive taxonomic classification for metagenomics with Kaiju.

    PubMed

    Menzel, Peter; Ng, Kim Lee; Krogh, Anders

    2016-04-13

    Metagenomics emerged as an important field of research not only in microbial ecology but also for human health and disease, and metagenomic studies are performed on increasingly larger scales. While recent taxonomic classification programs achieve high speed by comparing genomic k-mers, they often lack sensitivity for overcoming evolutionary divergence, so that large fractions of the metagenomic reads remain unclassified. Here we present the novel metagenome classifier Kaiju, which finds maximum (in-)exact matches on the protein-level using the Burrows-Wheeler transform. We show in a genome exclusion benchmark that Kaiju classifies reads with higher sensitivity and similar precision compared with current k-mer-based classifiers, especially in genera that are underrepresented in reference databases. We also demonstrate that Kaiju classifies up to 10 times more reads in real metagenomes. Kaiju can process millions of reads per minute and can run on a standard PC. Source code and web server are available at http://kaiju.binf.ku.dk.

  14. Fast and sensitive taxonomic classification for metagenomics with Kaiju

    PubMed Central

    Menzel, Peter; Ng, Kim Lee; Krogh, Anders

    2016-01-01

    Metagenomics emerged as an important field of research not only in microbial ecology but also for human health and disease, and metagenomic studies are performed on increasingly larger scales. While recent taxonomic classification programs achieve high speed by comparing genomic k-mers, they often lack sensitivity for overcoming evolutionary divergence, so that large fractions of the metagenomic reads remain unclassified. Here we present the novel metagenome classifier Kaiju, which finds maximum (in-)exact matches on the protein-level using the Burrows–Wheeler transform. We show in a genome exclusion benchmark that Kaiju classifies reads with higher sensitivity and similar precision compared with current k-mer-based classifiers, especially in genera that are underrepresented in reference databases. We also demonstrate that Kaiju classifies up to 10 times more reads in real metagenomes. Kaiju can process millions of reads per minute and can run on a standard PC. Source code and web server are available at http://kaiju.binf.ku.dk. PMID:27071849

  15. Kraken: ultrafast metagenomic sequence classification using exact alignments

    PubMed Central

    2014-01-01

    Kraken is an ultrafast and highly accurate program for assigning taxonomic labels to metagenomic DNA sequences. Previous programs designed for this task have been relatively slow and computationally expensive, forcing researchers to use faster abundance estimation programs, which only classify small subsets of metagenomic data. Using exact alignment of k-mers, Kraken achieves classification accuracy comparable to the fastest BLAST program. In its fastest mode, Kraken classifies 100 base pair reads at a rate of over 4.1 million reads per minute, 909 times faster than Megablast and 11 times faster than the abundance estimation program MetaPhlAn. Kraken is available at http://ccb.jhu.edu/software/kraken/. PMID:24580807

  16. Fragmentation and Coverage Variation in Viral Metagenome Assemblies, and Their Effect in Diversity Calculations

    PubMed Central

    García-López, Rodrigo; Vázquez-Castellanos, Jorge Francisco; Moya, Andrés

    2015-01-01

    Metagenomic libraries consist of DNA fragments from diverse species, with varying genome size and abundance. High-throughput sequencing platforms produce large volumes of reads from these libraries, which may be assembled into contigs, ideally resembling the original larger genomic sequences. The uneven species distribution, along with the stochasticity in sample processing and sequencing bias, impacts the success of accurate sequence assembly. Several assemblers enable the processing of viral metagenomic data de novo, generally using overlap layout consensus or de Bruijn graph approaches for contig assembly. The success of viral genomic reconstruction in these datasets is limited by the degree of fragmentation of each genome in the sample, which is dependent on the sequencing effort and the genome length. Depending on ecological, biological, or procedural biases, some fragments have a higher prevalence, or coverage, in the assembly. However, assemblers must face challenges, such as the formation of chimerical structures and intra-species variability. Diversity calculation relies on the classification of the sequences that comprise a metagenomic dataset. Whenever the corresponding genomic and taxonomic information is available, contigs matching the same species can be classified accordingly and the coverage of its genome can be calculated for that species. This may be used to compare populations by estimating abundance and assessing species distribution from this data. Nevertheless, the coverage does not take into account the degree of fragmentation, or else genome completeness, and is not necessarily representative of actual species distribution in the samples. Furthermore, undetermined sequences are abundant in viral metagenomic datasets, resulting in several independent contigs that cannot be assigned by homology or genomic information. These may only be classified as different operational taxonomic units (OTUs), sometimes remaining inadvisably unrelated. Thus

  17. Fragmentation and Coverage Variation in Viral Metagenome Assemblies, and Their Effect in Diversity Calculations.

    PubMed

    García-López, Rodrigo; Vázquez-Castellanos, Jorge Francisco; Moya, Andrés

    2015-01-01

    Metagenomic libraries consist of DNA fragments from diverse species, with varying genome size and abundance. High-throughput sequencing platforms produce large volumes of reads from these libraries, which may be assembled into contigs, ideally resembling the original larger genomic sequences. The uneven species distribution, along with the stochasticity in sample processing and sequencing bias, impacts the success of accurate sequence assembly. Several assemblers enable the processing of viral metagenomic data de novo, generally using overlap layout consensus or de Bruijn graph approaches for contig assembly. The success of viral genomic reconstruction in these datasets is limited by the degree of fragmentation of each genome in the sample, which is dependent on the sequencing effort and the genome length. Depending on ecological, biological, or procedural biases, some fragments have a higher prevalence, or coverage, in the assembly. However, assemblers must face challenges, such as the formation of chimerical structures and intra-species variability. Diversity calculation relies on the classification of the sequences that comprise a metagenomic dataset. Whenever the corresponding genomic and taxonomic information is available, contigs matching the same species can be classified accordingly and the coverage of its genome can be calculated for that species. This may be used to compare populations by estimating abundance and assessing species distribution from this data. Nevertheless, the coverage does not take into account the degree of fragmentation, or else genome completeness, and is not necessarily representative of actual species distribution in the samples. Furthermore, undetermined sequences are abundant in viral metagenomic datasets, resulting in several independent contigs that cannot be assigned by homology or genomic information. These may only be classified as different operational taxonomic units (OTUs), sometimes remaining inadvisably unrelated. Thus

  18. What's in the mix: phylogenetic classification of metagenome sequence samples.

    PubMed

    McHardy, Alice C; Rigoutsos, Isidore

    2007-10-01

    Metagenomics is a novel field which deals with the sequencing and study of microbial organisms or viruses isolated directly from a particular environment. This has already provided a wealth of information and new insights for the inhabitants of various environmental niches. For a given sample, one would like to determine the phylogenetic provenance of the obtained fragments, the relative abundance of its different members, their metabolic capabilities, and the functional properties of the community as a whole. To this end, computational analyses are becoming increasingly indispensable tools. In this review, we focus on the problem of determining the phylogenetic identity of the sample fragments, a procedure known as 'binning'. This step is essential for the reconstruction of the metabolic capabilities of individual organisms or phylogenetic clades of a community, and the study of their interactions.

  19. Phylogeny, classification and metagenomic bioprospecting of microbial acetyl xylan esterases.

    PubMed

    Adesioye, Fiyinfoluwa A; Makhalanyane, Thulani P; Biely, Peter; Cowan, Don A

    2016-11-01

    Acetyl xylan esterases (AcXEs), also termed xylan deacetylases, are broad specificity Carbohydrate-Active Enzymes (CAZymes) that hydrolyse ester bonds to liberate acetic acid from acetylated hemicellulose (typically polymeric xylan and xylooligosaccharides). They belong to eight families within the Carbohydrate Esterase (CE) class of the CAZy database. AcXE classification is largely based on sequence-dependent phylogenetic relationships, supported in some instances with substrate specificity data. However, some sequence-based predictions of AcXE-encoding gene identity have proved to be functionally incorrect. Such ambiguities can lead to mis-assignment of genes and enzymes during sequence data-mining, reinforcing the necessity for the experimental confirmation of the functional properties of putative AcXE-encoding gene products. Although one-third of all characterized CEs within CAZy families 1-7 and 16 are AcXEs, there is a need to expand the sequence database in order to strengthen the link between AcXE gene sequence and specificity. Currently, most AcXEs are derived from a limited range of (mostly microbial) sources and have been identified via culture-based bioprospecting methods, restricting current knowledge of AcXEs to data from relatively few microbial species. More recently, the successful identification of AcXEs via genome and metagenome mining has emphasised the huge potential of culture-independent bioprospecting strategies. We note, however, that the functional metagenomics approach is still hampered by screening bottlenecks. The most relevant recent reviews of AcXEs have focused primarily on the biochemical and functional properties of these enzymes. In this review, we focus on AcXE phylogeny, classification and the future of metagenomic bioprospecting for novel AcXEs.

  20. Centrifuge: rapid and sensitive classification of metagenomic sequences.

    PubMed

    Kim, Daehwan; Song, Li; Breitwieser, Florian P; Salzberg, Steven L

    2016-12-01

    Centrifuge is a novel microbial classification engine that enables rapid, accurate, and sensitive labeling of reads and quantification of species on desktop computers. The system uses an indexing scheme based on the Burrows-Wheeler transform (BWT) and the Ferragina-Manzini (FM) index, optimized specifically for the metagenomic classification problem. Centrifuge requires a relatively small index (4.2 GB for 4078 bacterial and 200 archaeal genomes) and classifies sequences at very high speed, allowing it to process the millions of reads from a typical high-throughput DNA sequencing run within a few minutes. Together, these advances enable timely and accurate analysis of large metagenomics data sets on conventional desktop computers. Because of its space-optimized indexing schemes, Centrifuge also makes it possible to index the entire NCBI nonredundant nucleotide sequence database (a total of 109 billion bases) with an index size of 69 GB, in contrast to k-mer-based indexing schemes, which require far more extensive space. © 2016 Kim et al.; Published by Cold Spring Harbor Laboratory Press.

  1. Centrifuge: rapid and sensitive classification of metagenomic sequences

    PubMed Central

    Song, Li; Breitwieser, Florian P.

    2016-01-01

    Centrifuge is a novel microbial classification engine that enables rapid, accurate, and sensitive labeling of reads and quantification of species on desktop computers. The system uses an indexing scheme based on the Burrows-Wheeler transform (BWT) and the Ferragina-Manzini (FM) index, optimized specifically for the metagenomic classification problem. Centrifuge requires a relatively small index (4.2 GB for 4078 bacterial and 200 archaeal genomes) and classifies sequences at very high speed, allowing it to process the millions of reads from a typical high-throughput DNA sequencing run within a few minutes. Together, these advances enable timely and accurate analysis of large metagenomics data sets on conventional desktop computers. Because of its space-optimized indexing schemes, Centrifuge also makes it possible to index the entire NCBI nonredundant nucleotide sequence database (a total of 109 billion bases) with an index size of 69 GB, in contrast to k-mer-based indexing schemes, which require far more extensive space. PMID:27852649

  2. Large-scale machine learning for metagenomics sequence classification.

    PubMed

    Vervier, Kévin; Mahé, Pierre; Tournoud, Maud; Veyrieras, Jean-Baptiste; Vert, Jean-Philippe

    2016-04-01

    Metagenomics characterizes the taxonomic diversity of microbial communities by sequencing DNA directly from an environmental sample. One of the main challenges in metagenomics data analysis is the binning step, where each sequenced read is assigned to a taxonomic clade. Because of the large volume of metagenomics datasets, binning methods need fast and accurate algorithms that can operate with reasonable computing requirements. While standard alignment-based methods provide state-of-the-art performance, compositional approaches that assign a taxonomic class to a DNA read based on the k-mers it contains have the potential to provide faster solutions. We propose a new rank-flexible machine learning-based compositional approach for taxonomic assignment of metagenomics reads and show that it benefits from increasing the number of fragments sampled from reference genome to tune its parameters, up to a coverage of about 10, and from increasing the k-mer size to about 12. Tuning the method involves training machine learning models on about 10(8) samples in 10(7) dimensions, which is out of reach of standard softwares but can be done efficiently with modern implementations for large-scale machine learning. The resulting method is competitive in terms of accuracy with well-established alignment and composition-based tools for problems involving a small to moderate number of candidate species and for reasonable amounts of sequencing errors. We show, however, that machine learning-based compositional approaches are still limited in their ability to deal with problems involving a greater number of species and more sensitive to sequencing errors. We finally show that the new method outperforms the state-of-the-art in its ability to classify reads from species of lineage absent from the reference database and confirm that compositional approaches achieve faster prediction times, with a gain of 2-17 times with respect to the BWA-MEM short read mapper, depending on the number of

  3. Large-scale machine learning for metagenomics sequence classification

    PubMed Central

    Vervier, Kévin; Mahé, Pierre; Tournoud, Maud; Veyrieras, Jean-Baptiste; Vert, Jean-Philippe

    2016-01-01

    Motivation: Metagenomics characterizes the taxonomic diversity of microbial communities by sequencing DNA directly from an environmental sample. One of the main challenges in metagenomics data analysis is the binning step, where each sequenced read is assigned to a taxonomic clade. Because of the large volume of metagenomics datasets, binning methods need fast and accurate algorithms that can operate with reasonable computing requirements. While standard alignment-based methods provide state-of-the-art performance, compositional approaches that assign a taxonomic class to a DNA read based on the k-mers it contains have the potential to provide faster solutions. Results: We propose a new rank-flexible machine learning-based compositional approach for taxonomic assignment of metagenomics reads and show that it benefits from increasing the number of fragments sampled from reference genome to tune its parameters, up to a coverage of about 10, and from increasing the k-mer size to about 12. Tuning the method involves training machine learning models on about 108 samples in 107 dimensions, which is out of reach of standard softwares but can be done efficiently with modern implementations for large-scale machine learning. The resulting method is competitive in terms of accuracy with well-established alignment and composition-based tools for problems involving a small to moderate number of candidate species and for reasonable amounts of sequencing errors. We show, however, that machine learning-based compositional approaches are still limited in their ability to deal with problems involving a greater number of species and more sensitive to sequencing errors. We finally show that the new method outperforms the state-of-the-art in its ability to classify reads from species of lineage absent from the reference database and confirm that compositional approaches achieve faster prediction times, with a gain of 2–17 times with respect to the BWA-MEM short read mapper, depending

  4. Metagenomics.

    PubMed

    Gilbert, Jack A; Laverock, Bonnie; Temperton, Ben; Thomas, Simon; Muhling, Martin; Hughes, Margaret

    2011-01-01

    Metagenomics has evolved over the last 3 decades from the analysis of single genes and their apparent diversity in an ecosystem to the provision of complex genetic information relating to whole ecosystems. Metagenomics is a vast subject area in terms of methodology, which encompasses a suite of molecular technologies employed to investigate genomic information from all members of a microbial community. However, the relatively recent developments in high-throughput sequencing platforms have meant that metagenomic can be performed simply by extracting DNA and sequencing it. Here, we outline explicit methodologies for the extraction of metagenomic DNA from marine and sediments/soil environmental samples, the subsequent production and sequencing of large-insert metagenomic libraries, and also shotgun pyrosequencing considerations. We also provide relevant advice on bioinformatic analyses of the complex metagenomic datasets. We hope that the information provided here will be useful to establish the techniques in most reasonably equipped molecular biology laboratories.

  5. WGSQuikr: fast whole-genome shotgun metagenomic classification.

    PubMed

    Koslicki, David; Foucart, Simon; Rosen, Gail

    2014-01-01

    With the decrease in cost and increase in output of whole-genome shotgun technologies, many metagenomic studies are utilizing this approach in lieu of the more traditional 16S rRNA amplicon technique. Due to the large number of relatively short reads output from whole-genome shotgun technologies, there is a need for fast and accurate short-read OTU classifiers. While there are relatively fast and accurate algorithms available, such as MetaPhlAn, MetaPhyler, PhyloPythiaS, and PhymmBL, these algorithms still classify samples in a read-by-read fashion and so execution times can range from hours to days on large datasets. We introduce WGSQuikr, a reconstruction method which can compute a vector of taxonomic assignments and their proportions in the sample with remarkable speed and accuracy. We demonstrate on simulated data that WGSQuikr is typically more accurate and up to an order of magnitude faster than the aforementioned classification algorithms. We also verify the utility of WGSQuikr on real biological data in the form of a mock community. WGSQuikr is a Whole-Genome Shotgun QUadratic, Iterative, K-mer based Reconstruction method which extends the previously introduced 16S rRNA-based algorithm Quikr. A MATLAB implementation of WGSQuikr is available at: http://sourceforge.net/projects/wgsquikr.

  6. Information-theoretic approaches to SVM feature selection for metagenome read classification.

    PubMed

    Garbarine, Elaine; DePasquale, Joseph; Gadia, Vinay; Polikar, Robi; Rosen, Gail

    2011-06-01

    Analysis of DNA sequences isolated directly from the environment, known as metagenomics, produces a large quantity of genome fragments that need to be classified into specific taxa. Most composition-based classification methods use all features instead of a subset of features that may maximize classifier accuracy. We show that feature selection methods can boost performance of taxonomic classifiers. This work proposes three different filter-based feature selection methods that stem from information theory: (1) a technique that combines Kullback-Leibler, Mutual Information, and distance information, (2) a text mining technique, TF-IDF, and (3) minimum redundancy-maximum-relevance (mRMR). The feature selection methods are compared by how well they improve support vector machine classification of genomic reads. Overall, the 6mer mRMR method performs well, especially on the phyla-level. If the number of total features is very large, feature selection becomes difficult because a small subset of features that captures a majority of the data variance is less likely to exist. Therefore, we conclude that there is a trade-off between feature set size and feature selection method to optimize classification performance. For larger feature set sizes, TF-IDF works better for finer-resolutions while mRMR performs the best out of any method for N=6 for all taxonomic levels.

  7. Fragment recruitment on metabolic pathways: comparative metabolic profiling of metagenomes and metatranscriptomes.

    PubMed

    Desai, Dhwani K; Schunck, Harald; Löser, Johannes W; Laroche, Julie

    2013-03-15

    The sheer scale of the metagenomic and metatranscriptomic datasets that are now available warrants the development of automated protocols for organizing, annotating and comparing the samples in terms of their metabolic profiles. We describe a user-friendly java program FROMP (Fragment Recruitment on Metabolic Pathways) for mapping and visualizing enzyme annotations onto the Kyoto Encyclopedia of Genes and Genomes (KEGG) metabolic pathways or custom-made pathways and comparing the samples in terms of their Pathway Completeness Scores, their relative Activity Scores or enzyme enrichment odds ratios. This program along with our fully configurable PERL-based annotation organization pipeline Meta2Pro (METAbolic PROfiling of META-omic data) offers a quick and accurate standalone solution for metabolic profiling of environmental samples or cultures from different treatments. Apart from pictorial comparisons, FROMP can also generate score matrices for multiple meta-omics samples, which can be used directly by other statistical programs.

  8. Classification and quantification of bacteriophage taxa in human gut metagenomes

    PubMed Central

    Waller, Alison S; Yamada, Takuji; Kristensen, David M; Kultima, Jens Roat; Sunagawa, Shinichi; Koonin, Eugene V; Bork, Peer

    2014-01-01

    Bacteriophages have key roles in microbial communities, to a large extent shaping the taxonomic and functional composition of the microbiome, but data on the connections between phage diversity and the composition of communities are scarce. Using taxon-specific marker genes, we identified and monitored 20 viral taxa in 252 human gut metagenomic samples, mostly at the level of genera. On average, five phage taxa were identified in each sample, with up to three of these being highly abundant. The abundances of most phage taxa vary by up to four orders of magnitude between the samples, and several taxa that are highly abundant in some samples are absent in others. Significant correlations exist between the abundances of some phage taxa and human host metadata: for example, ‘Group 936 lactococcal phages' are more prevalent and abundant in Danish samples than in samples from Spain or the United States of America. Quantification of phages that exist as integrated prophages revealed that the abundance profiles of prophages are highly individual-specific and remain unique to an individual over a 1-year time period, and prediction of prophage lysis across the samples identified hundreds of prophages that are apparently active in the gut and vary across the samples, in terms of presence and lytic state. Finally, a prophage–host network of the human gut was established and includes numerous novel host–phage associations. PMID:24621522

  9. Scalable metagenomics alignment research tool (SMART): a scalable, rapid, and complete search heuristic for the classification of metagenomic sequences from complex sequence populations.

    PubMed

    Lee, Aaron Y; Lee, Cecilia S; Van Gelder, Russell N

    2016-07-28

    Next generation sequencing technology has enabled characterization of metagenomics through massively parallel genomic DNA sequencing. The complexity and diversity of environmental samples such as the human gut microflora, combined with the sustained exponential growth in sequencing capacity, has led to the challenge of identifying microbial organisms by DNA sequence. We sought to validate a Scalable Metagenomics Alignment Research Tool (SMART), a novel searching heuristic for shotgun metagenomics sequencing results. After retrieving all genomic DNA sequences from the NCBI GenBank, over 1 × 10(11) base pairs of 3.3 × 10(6) sequences from 9.25 × 10(5) species were indexed using 4 base pair hashtable shards. A MapReduce searching strategy was used to distribute the search workload in a computing cluster environment. In addition, a one base pair permutation algorithm was used to account for single nucleotide polymorphisms and sequencing errors. Simulated datasets used to evaluate Kraken, a similar metagenomics classification tool, were used to measure and compare precision and accuracy. Finally using a same set of training sequences we compared Kraken, CLARK, and SMART within the same computing environment. Utilizing 12 computational nodes, we completed the classification of all datasets in under 10 min each using exact matching with an average throughput of over 1.95 × 10(6) reads classified per minute. With permutation matching, we achieved sensitivity greater than 83 % and precision greater than 94 % with simulated datasets at the species classification level. We demonstrated the application of this technique applied to conjunctival and gut microbiome metagenomics sequencing results. In our head to head comparison, SMART and CLARK had similar accuracy gains over Kraken at the species classification level, but SMART required approximately half the amount of RAM of CLARK. SMART is the first scalable, efficient, and rapid metagenomics classification algorithm

  10. Accurate phylogenetic classification of DNA fragments based onsequence composition

    SciTech Connect

    McHardy, Alice C.; Garcia Martin, Hector; Tsirigos, Aristotelis; Hugenholtz, Philip; Rigoutsos, Isidore

    2006-05-01

    Metagenome studies have retrieved vast amounts of sequenceout of a variety of environments, leading to novel discoveries and greatinsights into the uncultured microbial world. Except for very simplecommunities, diversity makes sequence assembly and analysis a verychallenging problem. To understand the structure a 5 nd function ofmicrobial communities, a taxonomic characterization of the obtainedsequence fragments is highly desirable, yet currently limited mostly tothose sequences that contain phylogenetic marker genes. We show that forclades at the rank of domain down to genus, sequence composition allowsthe very accurate phylogenetic 10 characterization of genomic sequence.We developed a composition-based classifier, PhyloPythia, for de novophylogenetic sequence characterization and have trained it on adata setof 340 genomes. By extensive evaluation experiments we show that themethodis accurate across all taxonomic ranks considered, even forsequences that originate fromnovel organisms and are as short as 1kb.Application to two metagenome datasets 15 obtained from samples ofphosphorus-removing sludge showed that the method allows the accurateclassification at genus level of most sequence fragments from thedominant populations, while at the same time correctly characterizingeven larger parts of the samples at higher taxonomic levels.

  11. Characterization of Uncultured Genome Fragment from Soil Metagenomic Library Exposed Rare Mismatch of Internal Tetranucleotide Frequency

    PubMed Central

    Liu, Yunpeng; Yang, Dongqing; Zhang, Nan; Chen, Lin; Cui, Zhongli; Shen, Qirong; Zhang, Ruifu

    2016-01-01

    Exploring the genomic information of a specific uncultured soil bacterium is vital to understand its function in the ecosystem but is still a challenge due to the lack of culture techniques. To examine the genomes of uncultured bacteria, a metagenomic bacterial artificial chromosome library derived from a soil sample was screened for 16S rDNA-containing clones. Five clones (4C6, 5E7, 5G4, 5G12, and 5H7) containing uncultured soil bacteria genome fragment (with low 16S rDNA similarity to isolated bacteria) were selected for sequencing. Clone 5E7 and 5G4 showed only 82 and 83% of 16S rDNA identity to known sequences. Phylogenetic analysis of 16S rDNA indicated that 5E7 and 5G4 were potentially from new class of Chloroflexi. Only one-third of the 5G4 open reading frames have significant hits against HMMER. Internal tetranucleotide frequency analysis indicated that the unknown region of 5G4 was poorly correlated with other parts of the clone, indicating that this section might be obtained through lateral transfer. It was suggested that this region rich for unknown genes is under fast evolution. PMID:28066395

  12. Methods for virus classification and the challenge of incorporating metagenomic sequence data.

    PubMed

    Simmonds, Peter

    2015-06-01

    The division of viruses into orders, families, genera and species provides a classification framework that seeks to organize and make sense of the diversity of viruses infecting animals, plants and bacteria. Classifications are based on similarities in genome structure and organization, the presence of homologous genes and sequence motifs and at lower levels such as species, host range, nucleotide and antigenic relatedness and epidemiology. Classification below the level of family must also be consistent with phylogeny and virus evolutionary histories. Recently developed methods such as PASC, DEMaRC and NVR offer alternative strategies for genus and species assignments that are based purely on degrees of divergence between genome sequences. They offer the possibility of automating classification of the vast number of novel virus sequences being generated by next-generation metagenomic sequencing. However, distance-based methods struggle to deal with the complex evolutionary history of virus genomes that are shuffled by recombination and reassortment, and where taxonomic lineages evolve at different rates. In biological terms, classifications based on sequence distances alone are also arbitrary whereas the current system of virus taxonomy is of utility precisely because it is primarily based upon phenotypic characteristics. However, a separate system is clearly needed by which virus variants that lack biological information might be incorporated into the ICTV classification even if based solely on sequence relationships to existing taxa. For these, simplified taxonomic proposals and naming conventions represent a practical way to expand the existing virus classification and catalogue our rapidly increasing knowledge of virus diversity.

  13. Rapid phylogenetic and functional classification of short genomic fragments with signature peptides

    PubMed Central

    2012-01-01

    Background Classification is difficult for shotgun metagenomics data from environments such as soils, where the diversity of sequences is high and where reference sequences from close relatives may not exist. Approaches based on sequence-similarity scores must deal with the confounding effects that inheritance and functional pressures exert on the relation between scores and phylogenetic distance, while approaches based on sequence alignment and tree-building are typically limited to a small fraction of gene families. We describe an approach based on finding one or more exact matches between a read and a precomputed set of peptide 10-mers. Results At even the largest phylogenetic distances, thousands of 10-mer peptide exact matches can be found between pairs of bacterial genomes. Genes that share one or more peptide 10-mers typically have high reciprocal BLAST scores. Among a set of 403 representative bacterial genomes, some 20 million 10-mer peptides were found to be shared. We assign each of these peptides as a signature of a particular node in a phylogenetic reference tree based on the RNA polymerase genes. We classify the phylogeny of a genomic fragment (e.g., read) at the most specific node on the reference tree that is consistent with the phylogeny of observed signature peptides it contains. Using both synthetic data from four newly-sequenced soil-bacterium genomes and ten real soil metagenomics data sets, we demonstrate a sensitivity and specificity comparable to that of the MEGAN metagenomics analysis package using BLASTX against the NR database. Phylogenetic and functional similarity metrics applied to real metagenomics data indicates a signal-to-noise ratio of approximately 400 for distinguishing among environments. Our method assigns ~6.6 Gbp/hr on a single CPU, compared with 25 kbp/hr for methods based on BLASTX against the NR database. Conclusions Classification by exact matching against a precomputed list of signature peptides provides comparable

  14. Evaluation of shotgun metagenomics sequence classification methods using in silico and in vitro simulated communities.

    PubMed

    Peabody, Michael A; Van Rossum, Thea; Lo, Raymond; Brinkman, Fiona S L

    2015-11-04

    The field of metagenomics (study of genetic material recovered directly from an environment) has grown rapidly, with many bioinformatics analysis methods being developed. To ensure appropriate use of such methods, robust comparative evaluation of their accuracy and features is needed. For taxonomic classification of sequence reads, such evaluation should include use of clade exclusion, which better evaluates a method's accuracy when identical sequences are not present in any reference database, as is common in metagenomic analysis. To date, relatively small evaluations have been performed, with evaluation approaches like clade exclusion limited to assessment of new methods by the authors of the given method. What is needed is a rigorous, independent comparison between multiple major methods, using the same in silico and in vitro test datasets, with and without approaches like clade exclusion, to better characterize accuracy under different conditions. An overview of the features of 38 bioinformatics methods is provided, evaluating accuracy with a focus on 11 programs that have reference databases that can be modified and therefore most robustly evaluated with clade exclusion. Taxonomic classification of sequence reads was evaluated using both in silico and in vitro mock bacterial communities. Clade exclusion was used at taxonomic levels from species to class-identifying how well methods perform in progressively more difficult scenarios. A wide range of variability was found in the sensitivity, precision, overall accuracy, and computational demand for the programs evaluated. In experiments where distilled water was spiked with only 11 bacterial species, frequently dozens to hundreds of species were falsely predicted by the most popular programs. The different features of each method (forces predictions or not, etc.) are summarized, and additional analysis considerations discussed. The accuracy of shotgun metagenomics classification methods varies widely. No one

  15. IDENTIFICATION OF AVIAN-SPECIFIC FECAL METAGENOMIC SEQUENCES USING GENOME FRAGMENT ENRICHMENTS

    EPA Science Inventory

    Sequence analysis of microbial genomes has provided biologists the opportunity to compare genetic differences between closely related microorganisms. While random sequencing has also been used to study natural microbial communities, metagenomic comparisons via sequencing analysis...

  16. IDENTIFICATION OF AVIAN-SPECIFIC FECAL METAGENOMIC SEQUENCES USING GENOME FRAGMENT ENRICHMENTS

    EPA Science Inventory

    Sequence analysis of microbial genomes has provided biologists the opportunity to compare genetic differences between closely related microorganisms. While random sequencing has also been used to study natural microbial communities, metagenomic comparisons via sequencing analysis...

  17. Sparse distance-based learning for simultaneous multiclass classification and feature selection of metagenomic data.

    PubMed

    Liu, Zhenqiu; Hsiao, William; Cantarel, Brandi L; Drábek, Elliott Franco; Fraser-Liggett, Claire

    2011-12-01

    Direct sequencing of microbes in human ecosystems (the human microbiome) has complemented single genome cultivation and sequencing to understand and explore the impact of commensal microbes on human health. As sequencing technologies improve and costs decline, the sophistication of data has outgrown available computational methods. While several existing machine learning methods have been adapted for analyzing microbiome data recently, there is not yet an efficient and dedicated algorithm available for multiclass classification of human microbiota. By combining instance-based and model-based learning, we propose a novel sparse distance-based learning method for simultaneous class prediction and feature (variable or taxa, which is used interchangeably) selection from multiple treatment populations on the basis of 16S rRNA sequence count data. Our proposed method simultaneously minimizes the intraclass distance and maximizes the interclass distance with many fewer estimated parameters than other methods. It is very efficient for problems with small sample sizes and unbalanced classes, which are common in metagenomic studies. We implemented this method in a MATLAB toolbox called MetaDistance. We also propose several approaches for data normalization and variance stabilization transformation in MetaDistance. We validate this method on several real and simulated 16S rRNA datasets to show that it outperforms existing methods for classifying metagenomic data. This article is the first to address simultaneous multifeature selection and class prediction with metagenomic count data. The MATLAB toolbox is freely available online at http://metadistance.igs.umaryland.edu/. zliu@umm.edu Supplementary data are available at Bioinformatics online.

  18. Conservative fragments in bacterial 16S rRNA genes and primer design for 16S ribosomal DNA amplicons in metagenomic studies.

    PubMed

    Wang, Yong; Qian, Pei-Yuan

    2009-10-09

    Bacterial 16S ribosomal DNA (rDNA) amplicons have been widely used in the classification of uncultured bacteria inhabiting environmental niches. Primers targeting conservative regions of the rDNAs are used to generate amplicons of variant regions that are informative in taxonomic assignment. One problem is that the percentage coverage and application scope of the primers used in previous studies are largely unknown. In this study, conservative fragments of available rDNA sequences were first mined and then used to search for candidate primers within the fragments by measuring the coverage rate defined as the percentage of bacterial sequences containing the target. Thirty predicted primers with a high coverage rate (>90%) were identified, which were basically located in the same conservative regions as known primers in previous reports, whereas 30% of the known primers were associated with a coverage rate of <90%. The application scope of the primers was also examined by calculating the percentages of failed detections in bacterial phyla. Primers A519-539, E969-983, E1063-1081, U515 and E517, are highly recommended because of their high coverage in almost all phyla. As expected, the three predominant phyla, Firmicutes, Gemmatimonadetes and Proteobacteria, are best covered by the predicted primers. The primers recommended in this report shall facilitate a comprehensive and reliable survey of bacterial diversity in metagenomic studies.

  19. Composition-based classification of short metagenomic sequences elucidates the landscapes of taxonomic and functional enrichment of microorganisms.

    PubMed

    Liu, Jiemeng; Wang, Haifeng; Yang, Hongxing; Zhang, Yizhe; Wang, Jinfeng; Zhao, Fangqing; Qi, Ji

    2013-01-07

    Compared with traditional algorithms for long metagenomic sequence classification, characterizing microorganisms' taxonomic and functional abundance based on tens of millions of very short reads are much more challenging. We describe an efficient composition and phylogeny-based algorithm [Metagenome Composition Vector (MetaCV)] to classify very short metagenomic reads (75-100 bp) into specific taxonomic and functional groups. We applied MetaCV to the Meta-HIT data (371-Gb 75-bp reads of 109 human gut metagenomes), and this single-read-based, instead of assembly-based, classification has a high resolution to characterize the composition and structure of human gut microbiota, especially for low abundance species. Most strikingly, it only took MetaCV 10 days to do all the computation work on a server with five 24-core nodes. To our knowledge, MetaCV, benefited from the strategy of composition comparison, is the first algorithm that can classify millions of very short reads within affordable time.

  20. Composition-based classification of short metagenomic sequences elucidates the landscapes of taxonomic and functional enrichment of microorganisms

    PubMed Central

    Liu, Jiemeng; Wang, Haifeng; Yang, Hongxing; Zhang, Yizhe; Wang, Jinfeng; Zhao, Fangqing; Qi, Ji

    2013-01-01

    Compared with traditional algorithms for long metagenomic sequence classification, characterizing microorganisms’ taxonomic and functional abundance based on tens of millions of very short reads are much more challenging. We describe an efficient composition and phylogeny-based algorithm [Metagenome Composition Vector (MetaCV)] to classify very short metagenomic reads (75–100 bp) into specific taxonomic and functional groups. We applied MetaCV to the Meta-HIT data (371-Gb 75-bp reads of 109 human gut metagenomes), and this single-read-based, instead of assembly-based, classification has a high resolution to characterize the composition and structure of human gut microbiota, especially for low abundance species. Most strikingly, it only took MetaCV 10 days to do all the computation work on a server with five 24-core nodes. To our knowledge, MetaCV, benefited from the strategy of composition comparison, is the first algorithm that can classify millions of very short reads within affordable time. PMID:22941634

  1. Metagenomic Classification and Characterization Marine Actinobacteria from the Gulf of Maine without Representative Genomes

    NASA Astrophysics Data System (ADS)

    Sachdeva, R.; Heidelberg, J.

    2012-12-01

    Actinobacteria represent one of the largest and most diverse bacterial phyla and unlike most marine prokaryotes are gram-positive. This phylum encompasses a broad range of physiologies, morphologies, and metabolic properties with a broad array of lifestyles. The marine actinobacterial assemblage is dominated by the orders Actinomycetales and Acidimicrobiales (also known as the marine Actinobacteria clade). The Acidimicrobiales bacteria typically outnumber the Actinomycetales bacteria and are mostly represented by the OCS155 group. Although bacteria of the order Acidimicrobiales make up ~7.6% of the 16S matches from the Global Ocean Survey shotgun metagenomic libraries; very little is known about their potential function and role in biogeochemical cycling. Samples were collected from surface seawater samples in the Gulf of Maine (GOM) from the summer and winter of 2006. Sanger sequences were generated from the 0.1-0.8 μm fractions using paired-end medium insert shotgun libraries. The resulting 2.2 Gb were assembled using the Celera Assembler package into 280 Mb of non-redundant scaffolds. Putative actinobacterial assemblies were identified using (1) ribosomal RNA genes (16S and 23S), (2) phylogenetically informative non-ribosomal core genes thought to be resistant to horizontal gene transfer (e.g. RecA and RpoB) and (3) compositional binning using oligonucleotide frequency pattern based hierarchical clustering. Binning resulted in 3.6 Mb (4.2X coverage) of actinobacterial scaffolds that were comprised of 15.1 Mb of unassembled reads. Putative actinobacterial assemblies included both summer and winter reads demonstrating that the Actinobacteria are abundant year round. Classification reveals that all of the sampled Actinobacteria are from the orders Acidimicrobiales and Actinomycetales and are similar to those found in the global ocean. The GOM Actinobacteria show a broad range of G+C % content (32-66%) indicating a high level of genomic diversity. Those assemblies

  2. RIEMS: a software pipeline for sensitive and comprehensive taxonomic classification of reads from metagenomics datasets.

    PubMed

    Scheuch, Matthias; Höper, Dirk; Beer, Martin

    2015-03-03

    Fuelled by the advent and subsequent development of next generation sequencing technologies, metagenomics became a powerful tool for the analysis of microbial communities both scientifically and diagnostically. The biggest challenge is the extraction of relevant information from the huge sequence datasets generated for metagenomics studies. Although a plethora of tools are available, data analysis is still a bottleneck. To overcome the bottleneck of data analysis, we developed an automated computational workflow called RIEMS - Reliable Information Extraction from Metagenomic Sequence datasets. RIEMS assigns every individual read sequence within a dataset taxonomically by cascading different sequence analyses with decreasing stringency of the assignments using various software applications. After completion of the analyses, the results are summarised in a clearly structured result protocol organised taxonomically. The high accuracy and performance of RIEMS analyses were proven in comparison with other tools for metagenomics data analysis using simulated sequencing read datasets. RIEMS has the potential to fill the gap that still exists with regard to data analysis for metagenomics studies. The usefulness and power of RIEMS for the analysis of genuine sequencing datasets was demonstrated with an early version of RIEMS in 2011 when it was used to detect the orthobunyavirus sequences leading to the discovery of Schmallenberg virus.

  3. Studying long 16S rDNA sequences with ultrafast-metagenomic sequence classification using exact alignments (Kraken).

    PubMed

    Valenzuela-González, Fabiola; Martínez-Porchas, Marcel; Villalpando-Canchola, Enrique; Vargas-Albores, Francisco

    2016-03-01

    Ultrafast-metagenomic sequence classification using exact alignments (Kraken) is a novel approach to classify 16S rDNA sequences. The classifier is based on mapping short sequences to the lowest ancestor and performing alignments to form subtrees with specific weights in each taxon node. This study aimed to evaluate the classification performance of Kraken with long 16S rDNA random environmental sequences produced by cloning and then Sanger sequenced. A total of 480 clones were isolated and expanded, and 264 of these clones formed contigs (1352 ± 153 bp). The same sequences were analyzed using the Ribosomal Database Project (RDP) classifier. Deeper classification performance was achieved by Kraken than by the RDP: 73% of the contigs were classified up to the species or variety levels, whereas 67% of these contigs were classified no further than the genus level by the RDP. The results also demonstrated that unassembled sequences analyzed by Kraken provide similar or inclusively deeper information. Moreover, sequences that did not form contigs, which are usually discarded by other programs, provided meaningful information when analyzed by Kraken. Finally, it appears that the assembly step for Sanger sequences can be eliminated when using Kraken. Kraken cumulates the information of both sequence senses, providing additional elements for the classification. In conclusion, the results demonstrate that Kraken is an excellent choice for use in the taxonomic assignment of sequences obtained by Sanger sequencing or based on third generation sequencing, of which the main goal is to generate larger sequences.

  4. Beyond classification: gene-family phylogenies from shotgun metagenomic reads enable accurate community analysis

    PubMed Central

    2013-01-01

    Background Sequence-based phylogenetic trees are a well-established tool for characterizing diversity of both macroorganisms and microorganisms. Phylogenetic methods have recently been applied to shotgun metagenomic data from microbial communities, particularly with the aim of classifying reads. But the accuracy of gene-family phylogenies that characterize evolutionary relationships among short, non-overlapping sequencing reads has not been thoroughly evaluated. Results To quantify errors in metagenomic read trees, we developed MetaPASSAGE, a software pipeline to generate in silico bacterial communities, simulate a sample of shotgun reads from a gene family represented in the community, orient or translate reads, and produce a profile-based alignment of the reads from which a gene-family phylogenetic tree can be built. We applied MetaPASSAGE to a variety of RNA and protein-coding gene families, built trees using a range of different phylogenetic methods, and compared the resulting trees using topological and branch-length error metrics. We identified read length as one of the major sources of error. Because phylogenetic methods use a reference database of full-length sequences from the gene family to guide construction of alignments and trees, we found that error can also be substantially reduced through increasing the size and diversity of the reference database. Finally, UniFrac analysis, which compares metagenomic samples based on a summary statistic computed over all branches in a read tree, is very robust to the level of error we observe. Conclusions Bacterial community diversity can be quantified using phylogenetic approaches applied to shotgun metagenomic data. As sequencing reads get longer and more genomes across the bacterial tree of life are sequenced, the accuracy of this approach will continue to improve, opening the door to more applications. PMID:23799973

  5. Beyond classification: gene-family phylogenies from shotgun metagenomic reads enable accurate community analysis.

    PubMed

    Riesenfeld, Samantha J; Pollard, Katherine S

    2013-06-22

    Sequence-based phylogenetic trees are a well-established tool for characterizing diversity of both macroorganisms and microorganisms. Phylogenetic methods have recently been applied to shotgun metagenomic data from microbial communities, particularly with the aim of classifying reads. But the accuracy of gene-family phylogenies that characterize evolutionary relationships among short, non-overlapping sequencing reads has not been thoroughly evaluated. To quantify errors in metagenomic read trees, we developed MetaPASSAGE, a software pipeline to generate in silico bacterial communities, simulate a sample of shotgun reads from a gene family represented in the community, orient or translate reads, and produce a profile-based alignment of the reads from which a gene-family phylogenetic tree can be built. We applied MetaPASSAGE to a variety of RNA and protein-coding gene families, built trees using a range of different phylogenetic methods, and compared the resulting trees using topological and branch-length error metrics. We identified read length as one of the major sources of error. Because phylogenetic methods use a reference database of full-length sequences from the gene family to guide construction of alignments and trees, we found that error can also be substantially reduced through increasing the size and diversity of the reference database. Finally, UniFrac analysis, which compares metagenomic samples based on a summary statistic computed over all branches in a read tree, is very robust to the level of error we observe. Bacterial community diversity can be quantified using phylogenetic approaches applied to shotgun metagenomic data. As sequencing reads get longer and more genomes across the bacterial tree of life are sequenced, the accuracy of this approach will continue to improve, opening the door to more applications.

  6. Signal Processing for Metagenomics: Extracting Information from the Soup

    PubMed Central

    Rosen, Gail L.; Sokhansanj, Bahrad A.; Polikar, Robi; Bruns, Mary Ann; Russell, Jacob; Garbarine, Elaine; Essinger, Steve; Yok, Non

    2009-01-01

    Traditionally, studies in microbial genomics have focused on single-genomes from cultured species, thereby limiting their focus to the small percentage of species that can be cultured outside their natural environment. Fortunately, recent advances in high-throughput sequencing and computational analyses have ushered in the new field of metagenomics, which aims to decode the genomes of microbes from natural communities without the need for cultivation. Although metagenomic studies have shed a great deal of insight into bacterial diversity and coding capacity, several computational challenges remain due to the massive size and complexity of metagenomic sequence data. Current tools and techniques are reviewed in this paper which address challenges in 1) genomic fragment annotation, 2) phylogenetic reconstruction, 3) functional classification of samples, and 4) interpreting complementary metaproteomics and metametabolomics data. Also surveyed are important applications of metagenomic studies, including microbial forensics and the roles of microbial communities in shaping human health and soil ecology. PMID:20436876

  7. Signal processing for metagenomics: extracting information from the soup.

    PubMed

    Rosen, Gail L; Sokhansanj, Bahrad A; Polikar, Robi; Bruns, Mary Ann; Russell, Jacob; Garbarine, Elaine; Essinger, Steve; Yok, Non

    2009-11-01

    Traditionally, studies in microbial genomics have focused on single-genomes from cultured species, thereby limiting their focus to the small percentage of species that can be cultured outside their natural environment. Fortunately, recent advances in high-throughput sequencing and computational analyses have ushered in the new field of metagenomics, which aims to decode the genomes of microbes from natural communities without the need for cultivation. Although metagenomic studies have shed a great deal of insight into bacterial diversity and coding capacity, several computational challenges remain due to the massive size and complexity of metagenomic sequence data. Current tools and techniques are reviewed in this paper which address challenges in 1) genomic fragment annotation, 2) phylogenetic reconstruction, 3) functional classification of samples, and 4) interpreting complementary metaproteomics and metametabolomics data. Also surveyed are important applications of metagenomic studies, including microbial forensics and the roles of microbial communities in shaping human health and soil ecology.

  8. 16S Classifier: A Tool for Fast and Accurate Taxonomic Classification of 16S rRNA Hypervariable Regions in Metagenomic Datasets

    PubMed Central

    Chaudhary, Nikhil; Sharma, Ashok K.; Agarwal, Piyush; Gupta, Ankit; Sharma, Vineet K.

    2015-01-01

    The diversity of microbial species in a metagenomic study is commonly assessed using 16S rRNA gene sequencing. With the rapid developments in genome sequencing technologies, the focus has shifted towards the sequencing of hypervariable regions of 16S rRNA gene instead of full length gene sequencing. Therefore, 16S Classifier is developed using a machine learning method, Random Forest, for faster and accurate taxonomic classification of short hypervariable regions of 16S rRNA sequence. It displayed precision values of up to 0.91 on training datasets and the precision values of up to 0.98 on the test dataset. On real metagenomic datasets, it showed up to 99.7% accuracy at the phylum level and up to 99.0% accuracy at the genus level. 16S Classifier is available freely at http://metagenomics.iiserb.ac.in/16Sclassifier and http://metabiosys.iiserb.ac.in/16Sclassifier. PMID:25646627

  9. File Fragment Classification - The Case for Specialized Approaches

    DTIC Science & Technology

    2009-05-01

    advances in file carving, memory analysis and network forensics requires the ability to identify the underlying type of a file given only a file fragment...will be necessary to make progress in this area. 1. Introduction Most forensic practitioners can look at a piece of data, such as a disk block or...a network packet, and readily identify what kind of data it carries. This skill is important in many forensic tasks, from diagnosing break-ins

  10. DiScRIBinATE: a rapid method for accurate taxonomic classification of metagenomic sequences

    PubMed Central

    2010-01-01

    Background In metagenomic sequence data, majority of sequences/reads originate from new or partially characterized genomes, the corresponding sequences of which are absent in existing reference databases. Since taxonomic assignment of reads is based on their similarity to sequences from known organisms, the presence of reads originating from new organisms poses a major challenge to taxonomic binning methods. The recently published SOrt-ITEMS algorithm uses an elaborate work-flow to assign reads originating from hitherto unknown genomes with significant accuracy and specificity. Nevertheless, a significant proportion of reads still get misclassified. Besides, the use of an alignment-based orthology step (for improving the specificity of assignments) increases the total binning time of SOrt-ITEMS. Results In this paper, we introduce a rapid binning approach called DiScRIBinATE (Distance Score Ratio for Improved Binning And Taxonomic Estimation). DiScRIBinATE replaces the orthology approach of SOrt-ITEMS with a quicker 'alignment-free' approach. We demonstrate that incorporating this approach reduces binning time by half without any loss in the specificity and accuracy of assignments. Besides, a novel reclassification strategy incorporated in DiScRIBinATE results in reducing the overall misclassification rate to around 3 - 7%. This misclassification rate is 1.5 - 3 times lower as compared to that by SOrt-ITEMS, and 3 - 30 times lower as compared to that by MEGAN. Conclusions A significant reduction in binning time, coupled with a superior assignment accuracy (as compared to existing binning methods), indicates the immense applicability of the proposed algorithm in rapidly mapping the taxonomic diversity of large metagenomic samples with high accuracy and specificity. Availability The program is available on request from the authors. PMID:21106121

  11. Novel organic solvent-tolerant esterase isolated by metagenomics: insights into the lipase/esterase classification.

    PubMed

    Berlemont, Renaud; Spee, Olivier; Delsaute, Maud; Lara, Yannick; Schuldes, Jörg; Simon, Carola; Power, Pablo; Daniel, Rolf; Galleni, Moreno

    2013-01-01

    in order to isolate novel organic solvent-tolerant (OST) lipases, a metagenomic library was built using DNA derived from a temperate forest soil sample. A two-step activity-based screening allowed the isolation of a lipolytic clone active in the presence of organic solvents. Sequencing of the plasmid pRBest recovered from the positive clone revealed the presence of a putative lipase/esterase encoding gene. The deduced amino acid sequence (RBest1) contains the conserved lipolytic enzyme signature and is related to the previously described OST lipase from Lysinibacillus sphaericus 205y, which is the sole studied prokaryotic enzyme belonging to the 4.4 α/β hydrolase subgroup (abH04.04). Both in vivo and in vitro studies of the substrate specificity of RBest1, using triacylglycerols or nitrophenyl-esters, respectively, revealed that the enzyme is highly specific for butyrate (C4) compounds, behaving as an esterase rather than a lipase. The RBest1 esterase was purified and biochemically characterized. The optimal esterase activity was observed at pH 6.5 and at temperatures ranging from 38 to 45 °C. Enzymatic activity, determined by hydrolysis of p-nitrophenyl esters, was found to be affected by the presence of different miscible and non-miscible organic solvents, and salts. Noteworthy, RBest1 remains significantly active at high ionic strength. These findings suggest that RBest1 possesses the ability of OST enzymes to molecular adaptation in the presence of organic compounds and resistance of halophilic proteins.

  12. Motif-Based Text Mining of Microbial Metagenome Redundancy Profiling Data for Disease Classification.

    PubMed

    Wang, Yin; Li, Rudong; Zhou, Yuhua; Ling, Zongxin; Guo, Xiaokui; Xie, Lu; Liu, Lei

    2016-01-01

    Text data of 16S rRNA are informative for classifications of microbiota-associated diseases. However, the raw text data need to be systematically processed so that features for classification can be defined/extracted; moreover, the high-dimension feature spaces generated by the text data also pose an additional difficulty. Here we present a Phylogenetic Tree-Based Motif Finding algorithm (PMF) to analyze 16S rRNA text data. By integrating phylogenetic rules and other statistical indexes for classification, we can effectively reduce the dimension of the large feature spaces generated by the text datasets. Using the retrieved motifs in combination with common classification methods, we can discriminate different samples of both pneumonia and dental caries better than other existing methods. We extend the phylogenetic approaches to perform supervised learning on microbiota text data to discriminate the pathological states for pneumonia and dental caries. The results have shown that PMF may enhance the efficiency and reliability in analyzing high-dimension text data.

  13. MetaSAMS--a novel software platform for taxonomic classification, functional annotation and comparative analysis of metagenome datasets.

    PubMed

    Zakrzewski, Martha; Bekel, Thomas; Ander, Christina; Pühler, Alfred; Rupp, Oliver; Stoye, Jens; Schlüter, Andreas; Goesmann, Alexander

    2013-08-20

    Metagenomics aims at exploring microbial communities concerning their composition and functioning. Application of high-throughput sequencing technologies for the analysis of environmental DNA-preparations can generate large sets of metagenome sequence data which have to be analyzed by means of bioinformatics tools to unveil the taxonomic composition of the analyzed community as well as the repertoire of genes and gene functions. A bioinformatics software platform is required that allows the automated taxonomic and functional analysis and interpretation of metagenome datasets without manual effort. To address current demands in metagenome data analyses, the novel platform MetaSAMS was developed. MetaSAMS automatically accomplishes the tasks necessary for analyzing the composition and functional repertoire of a given microbial community from metagenome sequence data by implementing two software pipelines: (i) the first pipeline consists of three different classifiers performing the taxonomic profiling of metagenome sequences and (ii) the second functional pipeline accomplishes region predictions on assembled contigs and assigns functional information to predicted coding sequences. Moreover, MetaSAMS provides tools for statistical and comparative analyses based on the taxonomic and functional annotations. The capabilities of MetaSAMS are demonstrated for two metagenome datasets obtained from a biogas-producing microbial community of a production-scale biogas plant. The MetaSAMS web interface is available at https://metasams.cebitec.uni-bielefeld.de. Copyright © 2012 Elsevier B.V. All rights reserved.

  14. Anatomy and classification of the posterior tibial fragment in ankle fractures.

    PubMed

    Bartoníček, Jan; Rammelt, Stefan; Kostlivý, Karel; Vaněček, Václav; Klika, Daniel; Trešl, Ivo

    2015-04-01

    The aim of this study was to analyze the pathoanatomy of the posterior fragment on the basis of a comprehensive CT examination, including 3D reconstructions, in a large patient cohort. One hundred and forty one consecutive individuals with an ankle fracture or fracture-dislocation of types Weber B or Weber C and evidence of a posterior tibial fragment in standard radiographs were included in the study. The mean patient age was 49 years (range 19-83 years). The exclusion criteria were patients below 18 years of age, inability to provide written consent, fractures of the tibial pilon, posttraumatic arthritis and pre-existing deformities. In all patients, post-injury radiographs were obtained in anteroposterior, mortise and lateral views. All patients underwent CT scanning in transverse, sagittal and frontal planes. 3D CT reconstruction was performed in 91 patients. We were able to classify 137 cases into one of the following four types with constant pathoanatomic features: type 1: extraincisural fragment with an intact fibular notch, type 2: posterolateral fragment extending into the fibular notch, type 3: posteromedial two-part fragment involving the medial malleolus, type 4: large posterolateral triangular fragment. In the 4 cases it was not possible to classify the type of the posterior tibial fragment. These were collectively termed type 5 (irregular, osteoporotic fragments). It is impossible to assess the shape and size of the posterior malleolar fragment, involvement of the fibular notch, or the medial malleolus, on the basis of plain radiographs. The system that we propose for classification of fractures of the posterior malleolus is based on CT examination and takes into account the size, shape and location of the fragment, stability of the tibio-talar joint and the integrity of the fibular notch. It may be a useful indication for surgery and defining the most useful approach to these injuries.

  15. Genomic characterization of Defluviitoga tunisiensis L3, a key hydrolytic bacterium in a thermophilic biogas plant and its abundance as determined by metagenome fragment recruitment.

    PubMed

    Maus, Irena; Cibis, Katharina Gabriela; Bremges, Andreas; Stolze, Yvonne; Wibberg, Daniel; Tomazetto, Geizecler; Blom, Jochen; Sczyrba, Alexander; König, Helmut; Pühler, Alfred; Schlüter, Andreas

    2016-08-20

    The genome sequence of Defluviitoga tunisiensis L3 originating from a thermophilic biogas-production plant was established and recently published as Genome Announcement by our group. The circular chromosome of D. tunisiensis L3 has a size of 2,053,097bp and a mean GC content of 31.38%. To analyze the D. tunisiensis L3 genome sequence in more detail, a phylogenetic analysis of completely sequenced Thermotogae strains based on shared core genes was performed. It appeared that Petrotoga mobilis DSM 10674(T), originally isolated from a North Sea oil-production well, is the closest relative of D. tunisiensis L3. Comparative genome analyses of P. mobilis DSM 10674(T) and D. tunisiensis L3 showed moderate similarities regarding occurrence of orthologous genes. Both genomes share a common set of 1351 core genes. Reconstruction of metabolic pathways important for the biogas production process revealed that the D. tunisiensis L3 genome encodes a large set of genes predicted to facilitate utilization of a variety of complex polysaccharides including cellulose, chitin and xylan. Ethanol, acetate, hydrogen (H2) and carbon dioxide (CO2) were found as possible end-products of the fermentation process. The latter three metabolites are considered to represent substrates for methanogenic Archaea, the key organisms in the final step of the anaerobic digestion process. To determine the degree of relatedness between D. tunisiensis L3 and dominant biogas community members within the thermophilic biogas-production plant, metagenome sequences obtained from the corresponding microbial community were mapped onto the L3 genome sequence. This fragment recruitment revealed that the D. tunisiensis L3 genome is almost completely covered with metagenome sequences featuring high matching accuracy. This result indicates that strains highly related or even identical to the reference strain D. tunisiensis L3 play a dominant role within the community of the thermophilic biogas-production plant

  16. Localizing text in scene images by boundary clustering, stroke segmentation, and string fragment classification.

    PubMed

    Yi, Chucai; Tian, Yingli

    2012-09-01

    In this paper, we propose a novel framework to extract text regions from scene images with complex backgrounds and multiple text appearances. This framework consists of three main steps: boundary clustering (BC), stroke segmentation, and string fragment classification. In BC, we propose a new bigram-color-uniformity-based method to model both text and attachment surface, and cluster edge pixels based on color pairs and spatial positions into boundary layers. Then, stroke segmentation is performed at each boundary layer by color assignment to extract character candidates. We propose two algorithms to combine the structural analysis of text stroke with color assignment and filter out background interferences. Further, we design a robust string fragment classification based on Gabor-based text features. The features are obtained from feature maps of gradient, stroke distribution, and stroke width. The proposed framework of text localization is evaluated on scene images, born-digital images, broadcast video images, and images of handheld objects captured by blind persons. Experimental results on respective datasets demonstrate that the framework outperforms state-of-the-art localization algorithms.

  17. Metagenome from a Spirulina digesting biogas reactor: analysis via binning of contigs and classification of short reads.

    PubMed

    Nolla-Ardèvol, Vimac; Peces, Miriam; Strous, Marc; Tegetmeyer, Halina E

    2015-12-17

    Anaerobic digestion is a biological process in which a consortium of microorganisms transforms a complex substrate into methane and carbon dioxide. A good understanding of the interactions between the populations that form this consortium can contribute to a successful anaerobic digestion of the substrate. In this study we combine the analysis of the biogas production in a laboratory anaerobic digester fed with the microalgae Spirulina, a protein rich substrate, with the analysis of the metagenome of the consortium responsible for digestion, obtained by high-throughput DNA sequencing. The obtained metagenome was also compared with a metagenome from a full scale biogas plant fed with cellulose rich material. The optimal organic loading rate for the anaerobic digestion of Spirulina was determined to be 4.0 g Spirulina L(-1) day(-1) with a specific biogas production of 350 mL biogas g Spirulina (-1) with a methane content of 68 %. Firmicutes dominated the microbial consortium at 38 % abundance followed by Bacteroidetes, Chloroflexi and Thermotogae. Euryarchaeota represented 3.5 % of the total abundance. The most abundant organism (14.9 %) was related to Tissierella, a bacterium known to use proteinaceous substrates for growth. Methanomicrobiales and Methanosarcinales dominated the archaeal community. Compared to the full scale cellulose-fed digesters, Pfam domains related to protein degradation were more frequently detected and Pfam domains related to cellulose degradation were less frequent in our sample. The results presented in this study suggest that Spirulina is a suitable substrate for the production of biogas. The proteinaceous substrate appeared to have a selective impact on the bacterial community that performed anaerobic digestion. A direct influence of the substrate on the selection of specific methanogenic populations was not observed.

  18. MPI-blastn and NCBI-TaxCollector: improving metagenomic analysis with high performance classification and wide taxonomic attachment.

    PubMed

    Dias, R; Xavier, M G; Rossi, F D; Neves, M V; Lange, T A P; Giongo, A; De Rose, C A F; Triplett, E W

    2014-06-01

    Metagenomic sequencing technologies are advancing rapidly and the size of output data from high-throughput genetic sequencing has increased substantially over the years. This brings us to a scenario where advanced computational optimizations are requested to perform a metagenomic analysis. In this paper, we describe a new parallel implementation of nucleotide BLAST (MPI-blastn) and a new tool for taxonomic attachment of Basic Local Alignment Search Tool (BLAST) results that supports the NCBI taxonomy (NCBI-TaxCollector). MPI-blastn obtained a high performance when compared to the mpiBLAST and ScalaBLAST. In our best case, MPI-blastn was able to run 408 times faster in 384 cores. Our evaluations demonstrated that NCBI-TaxCollector is able to perform taxonomic attachments 125 times faster and needs 120 times less RAM than the previous TaxCollector. Through our optimizations, a multiple sequence search that currently takes 37 hours can be performed in less than 6 min and a post processing with NCBI taxonomic data attachment, which takes 48 hours, now is able to run in 23 min.

  19. MetaShot: an accurate workflow for taxon classification of host-associated microbiome from shotgun metagenomic data.

    PubMed

    Fosso, B; Santamaria, M; D'Antonio, M; Lovero, D; Corrado, G; Vizza, E; Passaro, N; Garbuglia, A R; Capobianchi, M R; Crescenzi, M; Valiente, G; Pesole, G

    2017-06-01

    Shotgun metagenomics by high-throughput sequencing may allow deep and accurate characterization of host-associated total microbiomes, including bacteria, viruses, protists and fungi. However, the analysis of such sequencing data is still extremely challenging in terms of both overall accuracy and computational efficiency, and current methodologies show substantial variability in misclassification rate and resolution at lower taxonomic ranks or are limited to specific life domains (e.g. only bacteria). We present here MetaShot, a workflow for assessing the total microbiome composition from host-associated shotgun sequence data, and show its overall optimal accuracy performance by analyzing both simulated and real datasets. https://github.com/bfosso/MetaShot. graziano.pesole@uniba.it. Supplementary data are available at Bioinformatics online.

  20. MetaShot: an accurate workflow for taxon classification of host-associated microbiome from shotgun metagenomic data

    PubMed Central

    Fosso, B.; Santamaria, M.; D’Antonio, M.; Lovero, D.; Corrado, G.; Vizza, E.; Passaro, N.; Garbuglia, A.R.; Capobianchi, M.R.; Crescenzi, M.; Valiente, G.

    2017-01-01

    Abstract Summary: Shotgun metagenomics by high-throughput sequencing may allow deep and accurate characterization of host-associated total microbiomes, including bacteria, viruses, protists and fungi. However, the analysis of such sequencing data is still extremely challenging in terms of both overall accuracy and computational efficiency, and current methodologies show substantial variability in misclassification rate and resolution at lower taxonomic ranks or are limited to specific life domains (e.g. only bacteria). We present here MetaShot, a workflow for assessing the total microbiome composition from host-associated shotgun sequence data, and show its overall optimal accuracy performance by analyzing both simulated and real datasets. Availability and Implementation: https://github.com/bfosso/MetaShot Contact: graziano.pesole@uniba.it Supplementary information: Supplementary data are available at Bioinformatics online. PMID:28130230

  1. The Phylogenetic Diversity of Metagenomes

    PubMed Central

    Kembel, Steven W.; Eisen, Jonathan A.; Pollard, Katherine S.; Green, Jessica L.

    2011-01-01

    Phylogenetic diversity—patterns of phylogenetic relatedness among organisms in ecological communities—provides important insights into the mechanisms underlying community assembly. Studies that measure phylogenetic diversity in microbial communities have primarily been limited to a single marker gene approach, using the small subunit of the rRNA gene (SSU-rRNA) to quantify phylogenetic relationships among microbial taxa. In this study, we present an approach for inferring phylogenetic relationships among microorganisms based on the random metagenomic sequencing of DNA fragments. To overcome challenges caused by the fragmentary nature of metagenomic data, we leveraged fully sequenced bacterial genomes as a scaffold to enable inference of phylogenetic relationships among metagenomic sequences from multiple phylogenetic marker gene families. The resulting metagenomic phylogeny can be used to quantify the phylogenetic diversity of microbial communities based on metagenomic data sets. We applied this method to understand patterns of microbial phylogenetic diversity and community assembly along an oceanic depth gradient, and compared our findings to previous studies of this gradient using SSU-rRNA gene and metagenomic analyses. Bacterial phylogenetic diversity was highest at intermediate depths beneath the ocean surface, whereas taxonomic diversity (diversity measured by binning sequences into taxonomically similar groups) showed no relationship with depth. Phylogenetic diversity estimates based on the SSU-rRNA gene and the multi-gene metagenomic phylogeny were broadly concordant, suggesting that our approach will be applicable to other metagenomic data sets for which corresponding SSU-rRNA gene sequences are unavailable. Our approach opens up the possibility of using metagenomic data to study microbial diversity in a phylogenetic context. PMID:21912589

  2. The phylogenetic diversity of metagenomes.

    PubMed

    Kembel, Steven W; Eisen, Jonathan A; Pollard, Katherine S; Green, Jessica L

    2011-01-01

    Phylogenetic diversity--patterns of phylogenetic relatedness among organisms in ecological communities--provides important insights into the mechanisms underlying community assembly. Studies that measure phylogenetic diversity in microbial communities have primarily been limited to a single marker gene approach, using the small subunit of the rRNA gene (SSU-rRNA) to quantify phylogenetic relationships among microbial taxa. In this study, we present an approach for inferring phylogenetic relationships among microorganisms based on the random metagenomic sequencing of DNA fragments. To overcome challenges caused by the fragmentary nature of metagenomic data, we leveraged fully sequenced bacterial genomes as a scaffold to enable inference of phylogenetic relationships among metagenomic sequences from multiple phylogenetic marker gene families. The resulting metagenomic phylogeny can be used to quantify the phylogenetic diversity of microbial communities based on metagenomic data sets. We applied this method to understand patterns of microbial phylogenetic diversity and community assembly along an oceanic depth gradient, and compared our findings to previous studies of this gradient using SSU-rRNA gene and metagenomic analyses. Bacterial phylogenetic diversity was highest at intermediate depths beneath the ocean surface, whereas taxonomic diversity (diversity measured by binning sequences into taxonomically similar groups) showed no relationship with depth. Phylogenetic diversity estimates based on the SSU-rRNA gene and the multi-gene metagenomic phylogeny were broadly concordant, suggesting that our approach will be applicable to other metagenomic data sets for which corresponding SSU-rRNA gene sequences are unavailable. Our approach opens up the possibility of using metagenomic data to study microbial diversity in a phylogenetic context.

  3. The Metagenomic Telescope

    PubMed Central

    Szalkai, Balázs; Scheer, Ildikó; Nagy, Kinga; Vértessy, Beáta G.; Grolmusz, Vince

    2014-01-01

    Next generation sequencing technologies led to the discovery of numerous new microbe species in diverse environmental samples. Some of the new species contain genes never encountered before. Some of these genes encode proteins with novel functions, and some of these genes encode proteins that perform some well-known function in a novel way. A tool, named the Metagenomic Telescope, is described here that applies artificial intelligence methods, and seems to be capable of identifying new protein functions even in the well-studied model organisms. As a proof-of-principle demonstration of the Metagenomic Telescope, we considered DNA repair enzymes in the present work. First we identified proteins in DNA repair in well–known organisms (i.e., proteins in base excision repair, nucleotide excision repair, mismatch repair and DNA break repair); next we applied multiple alignments and then built hidden Markov profiles for each protein separately, across well–researched organisms; next, using public depositories of metagenomes, originating from extreme environments, we identified DNA repair genes in the samples. While the phylogenetic classification of the metagenomic samples are not typically available, we hypothesized that some very special DNA repair strategies need to be applied in bacteria and Archaea living in those extreme circumstances. It is a difficult task to evaluate the results obtained from mostly unknown species; therefore we applied again the hidden Markov profiling: for the identified DNA repair genes in the extreme metagenomes, we prepared new hidden Markov profiles (for each genes separately, subsequent to a cluster analysis); and we searched for similarities to those profiles in model organisms. We have found well known DNA repair proteins, numerous proteins with unknown functions, and also proteins with known, but different functions in the model organisms. PMID:25054802

  4. The metagenomic telescope.

    PubMed

    Szalkai, Balázs; Scheer, Ildikó; Nagy, Kinga; Vértessy, Beáta G; Grolmusz, Vince

    2014-01-01

    Next generation sequencing technologies led to the discovery of numerous new microbe species in diverse environmental samples. Some of the new species contain genes never encountered before. Some of these genes encode proteins with novel functions, and some of these genes encode proteins that perform some well-known function in a novel way. A tool, named the Metagenomic Telescope, is described here that applies artificial intelligence methods, and seems to be capable of identifying new protein functions even in the well-studied model organisms. As a proof-of-principle demonstration of the Metagenomic Telescope, we considered DNA repair enzymes in the present work. First we identified proteins in DNA repair in well-known organisms (i.e., proteins in base excision repair, nucleotide excision repair, mismatch repair and DNA break repair); next we applied multiple alignments and then built hidden Markov profiles for each protein separately, across well-researched organisms; next, using public depositories of metagenomes, originating from extreme environments, we identified DNA repair genes in the samples. While the phylogenetic classification of the metagenomic samples are not typically available, we hypothesized that some very special DNA repair strategies need to be applied in bacteria and Archaea living in those extreme circumstances. It is a difficult task to evaluate the results obtained from mostly unknown species; therefore we applied again the hidden Markov profiling: for the identified DNA repair genes in the extreme metagenomes, we prepared new hidden Markov profiles (for each genes separately, subsequent to a cluster analysis); and we searched for similarities to those profiles in model organisms. We have found well known DNA repair proteins, numerous proteins with unknown functions, and also proteins with known, but different functions in the model organisms.

  5. Enhanced acylcarnitine annotation in high-resolution mass spectrometry data: fragmentation analysis for the classification and annotation of acylcarnitines.

    PubMed

    van der Hooft, Justin J J; Ridder, Lars; Barrett, Michael P; Burgess, Karl E V

    2015-01-01

    Metabolite annotation and identification are primary challenges in untargeted metabolomics experiments. Rigorous workflows for reliable annotation of mass features with chemical structures or compound classes are needed to enhance the power of untargeted mass spectrometry. High-resolution mass spectrometry considerably improves the confidence in assigning elemental formulas to mass features in comparison to nominal mass spectrometry, and embedding of fragmentation methods enables more reliable metabolite annotations and facilitates metabolite classification. However, the analysis of mass fragmentation spectra can be a time-consuming step and requires expert knowledge. This study demonstrates how characteristic fragmentations, specific to compound classes, can be used to systematically analyze their presence in complex biological extracts like urine that have undergone untargeted mass spectrometry combined with data dependent or targeted fragmentation. Human urine extracts were analyzed using normal phase liquid chromatography (hydrophilic interaction chromatography) coupled to an Ion Trap-Orbitrap hybrid instrument. Subsequently, mass chromatograms and collision-induced dissociation and higher-energy collisional dissociation (HCD) fragments were annotated using the freely available MAGMa software. Acylcarnitines play a central role in energy metabolism by transporting fatty acids into the mitochondrial matrix. By filtering on a combination of a mass fragment and neutral loss designed based on the MAGMa fragment annotations, we were able to classify and annotate 50 acylcarnitines in human urine extracts, based on high-resolution mass spectrometry HCD fragmentation spectra at different energies for all of them. Of these annotated acylcarnitines, 31 are not described in HMDB yet and for only 4 annotated acylcarnitines the fragmentation spectra could be matched to reference spectra. Therefore, we conclude that the use of mass fragmentation filters within the context of

  6. Enhanced Acylcarnitine Annotation in High-Resolution Mass Spectrometry Data: Fragmentation Analysis for the Classification and Annotation of Acylcarnitines

    PubMed Central

    van der Hooft, Justin J. J.; Ridder, Lars; Barrett, Michael P.; Burgess, Karl E. V.

    2015-01-01

    Metabolite annotation and identification are primary challenges in untargeted metabolomics experiments. Rigorous workflows for reliable annotation of mass features with chemical structures or compound classes are needed to enhance the power of untargeted mass spectrometry. High-resolution mass spectrometry considerably improves the confidence in assigning elemental formulas to mass features in comparison to nominal mass spectrometry, and embedding of fragmentation methods enables more reliable metabolite annotations and facilitates metabolite classification. However, the analysis of mass fragmentation spectra can be a time-consuming step and requires expert knowledge. This study demonstrates how characteristic fragmentations, specific to compound classes, can be used to systematically analyze their presence in complex biological extracts like urine that have undergone untargeted mass spectrometry combined with data dependent or targeted fragmentation. Human urine extracts were analyzed using normal phase liquid chromatography (hydrophilic interaction chromatography) coupled to an Ion Trap-Orbitrap hybrid instrument. Subsequently, mass chromatograms and collision-induced dissociation and higher-energy collisional dissociation (HCD) fragments were annotated using the freely available MAGMa software1. Acylcarnitines play a central role in energy metabolism by transporting fatty acids into the mitochondrial matrix. By filtering on a combination of a mass fragment and neutral loss designed based on the MAGMa fragment annotations, we were able to classify and annotate 50 acylcarnitines in human urine extracts, based on high-resolution mass spectrometry HCD fragmentation spectra at different energies for all of them. Of these annotated acylcarnitines, 31 are not described in HMDB yet and for only 4 annotated acylcarnitines the fragmentation spectra could be matched to reference spectra. Therefore, we conclude that the use of mass fragmentation filters within the context

  7. The Genomes OnLine Database (GOLD) v.5: a metadata management system based on a four level (meta)genome project classification

    SciTech Connect

    Reddy, Tatiparthi B. K.; Thomas, Alex D.; Stamatis, Dimitri; Bertsch, Jon; Isbandi, Michelle; Jansson, Jakob; Mallajosyula, Jyothi; Pagani, Ioanna; Lobos, Elizabeth A.; Kyrpides, Nikos C.

    2014-10-27

    The Genomes OnLine Database (GOLD; http://www.genomesonline.org) is a comprehensive online resource to catalog and monitor genetic studies worldwide. GOLD provides up-to-date status on complete and ongoing sequencing projects along with a broad array of curated metadata. Within this paper, we report version 5 (v.5) of the database. The newly designed database schema and web user interface supports several new features including the implementation of a four level (meta)genome project classification system and a simplified intuitive web interface to access reports and launch search tools. The database currently hosts information for about 19 200 studies, 56 000 Biosamples, 56 000 sequencing projects and 39 400 analysis projects. More than just a catalog of worldwide genome projects, GOLD is a manually curated, quality-controlled metadata warehouse. The problems encountered in integrating disparate and varying quality data into GOLD are briefly highlighted. Lastly, GOLD fully supports and follows the Genomic Standards Consortium (GSC) Minimum Information standards.

  8. The Genomes OnLine Database (GOLD) v.5: a metadata management system based on a four level (meta)genome project classification.

    PubMed

    Reddy, T B K; Thomas, Alex D; Stamatis, Dimitri; Bertsch, Jon; Isbandi, Michelle; Jansson, Jakob; Mallajosyula, Jyothi; Pagani, Ioanna; Lobos, Elizabeth A; Kyrpides, Nikos C

    2015-01-01

    The Genomes OnLine Database (GOLD; http://www.genomesonline.org) is a comprehensive online resource to catalog and monitor genetic studies worldwide. GOLD provides up-to-date status on complete and ongoing sequencing projects along with a broad array of curated metadata. Here we report version 5 (v.5) of the database. The newly designed database schema and web user interface supports several new features including the implementation of a four level (meta)genome project classification system and a simplified intuitive web interface to access reports and launch search tools. The database currently hosts information for about 19,200 studies, 56,000 Biosamples, 56,000 sequencing projects and 39,400 analysis projects. More than just a catalog of worldwide genome projects, GOLD is a manually curated, quality-controlled metadata warehouse. The problems encountered in integrating disparate and varying quality data into GOLD are briefly highlighted. GOLD fully supports and follows the Genomic Standards Consortium (GSC) Minimum Information standards.

  9. The Genomes OnLine Database (GOLD) v.5: a metadata management system based on a four level (meta)genome project classification

    PubMed Central

    Reddy, T.B.K.; Thomas, Alex D.; Stamatis, Dimitri; Bertsch, Jon; Isbandi, Michelle; Jansson, Jakob; Mallajosyula, Jyothi; Pagani, Ioanna; Lobos, Elizabeth A.; Kyrpides, Nikos C.

    2015-01-01

    The Genomes OnLine Database (GOLD; http://www.genomesonline.org) is a comprehensive online resource to catalog and monitor genetic studies worldwide. GOLD provides up-to-date status on complete and ongoing sequencing projects along with a broad array of curated metadata. Here we report version 5 (v.5) of the database. The newly designed database schema and web user interface supports several new features including the implementation of a four level (meta)genome project classification system and a simplified intuitive web interface to access reports and launch search tools. The database currently hosts information for about 19 200 studies, 56 000 Biosamples, 56 000 sequencing projects and 39 400 analysis projects. More than just a catalog of worldwide genome projects, GOLD is a manually curated, quality-controlled metadata warehouse. The problems encountered in integrating disparate and varying quality data into GOLD are briefly highlighted. GOLD fully supports and follows the Genomic Standards Consortium (GSC) Minimum Information standards. PMID:25348402

  10. Non-linear classification for on-the-fly fractional mass filtering and targeted precursor fragmentation in mass spectrometry experiments.

    PubMed

    Kirchner, Marc; Timm, Wiebke; Fong, Peying; Wangemann, Philine; Steen, Hanno

    2010-03-15

    Mass spectrometry (MS) has become the method of choice for protein/peptide sequence and modification analysis. The technology employs a two-step approach: ionized peptide precursor masses are detected, selected for fragmentation, and the fragment mass spectra are collected for computational analysis. Current precursor selection schemes are based on data- or information-dependent acquisition (DDA/IDA), where fragmentation mass candidates are selected by intensity and are subsequently included in a dynamic exclusion list to avoid constant refragmentation of highly abundant species. DDA/IDA methods do not exploit valuable information that is contained in the fractional mass of high-accuracy precursor mass measurements delivered by current instrumentation. We extend previous contributions that suggest that fractional mass information allows targeted fragmentation of analytes of interest. We introduce a non-linear Random Forest classification and a discrete mapping approach, which can be trained to discriminate among arbitrary fractional mass patterns for an arbitrary number of classes of analytes. These methods can be used to increase fragmentation efficiency for specific subsets of analytes or to select suitable fragmentation technologies on-the-fly. We show that theoretical generalization error estimates transfer into practical application, and that their quality depends on the accuracy of prior distribution estimate of the analyte classes. The methods are applied to two real-world proteomics datasets. All software used in this study is available from http://software.steenlab.org/fmf hanno.steen@childrens.harvard.edu Supplementary data are available at Bioinformatics online.

  11. Exploration of Noncoding Sequences in Metagenomes

    PubMed Central

    Tobar-Tosse, Fabián; Rodríguez, Adrián C.; Vélez, Patricia E.; Zambrano, María M.; Moreno, Pedro A.

    2013-01-01

    Environment-dependent genomic features have been defined for different metagenomes, whose genes and their associated processes are related to specific environments. Identification of ORFs and their functional categories are the most common methods for association between functional and environmental features. However, this analysis based on finding ORFs misses noncoding sequences and, therefore, some metagenome regulatory or structural information could be discarded. In this work we analyzed 23 whole metagenomes, including coding and noncoding sequences using the following sequence patterns: (G+C) content, Codon Usage (Cd), Trinucleotide Usage (Tn), and functional assignments for ORF prediction. Herein, we present evidence of a high proportion of noncoding sequences discarded in common similarity-based methods in metagenomics, and the kind of relevant information present in those. We found a high density of trinucleotide repeat sequences (TRS) in noncoding sequences, with a regulatory and adaptive function for metagenome communities. We present associations between trinucleotide values and gene function, where metagenome clustering correlate with microorganism adaptations and kinds of metagenomes. We propose here that noncoding sequences have relevant information to describe metagenomes that could be considered in a whole metagenome analysis in order to improve their organization, classification protocols, and their relation with the environment. PMID:23536879

  12. Does semi-automatic bone-fragment segmentation improve the reproducibility of the Letournel acetabular fracture classification?

    PubMed

    Boudissa, M; Orfeuvre, B; Chabanas, M; Tonetti, J

    2017-09-01

    The Letournel classification of acetabular fracture shows poor reproducibility in inexperienced observers, despite the introduction of 3D imaging. We therefore developed a method of semi-automatic segmentation based on CT data. The present prospective study aimed to assess: (1) whether semi-automatic bone-fragment segmentation increased the rate of correct classification; (2) if so, in which fracture types; and (3) feasibility using the open-source itksnap 3.0 software package without incurring extra cost for users. Semi-automatic segmentation of acetabular fractures significantly increases the rate of correct classification by orthopedic surgery residents. Twelve orthopedic surgery residents classified 23 acetabular fractures. Six used conventional 3D reconstructions provided by the center's radiology department (conventional group) and 6 others used reconstructions obtained by semi-automatic segmentation using the open-source itksnap 3.0 software package (segmentation group). Bone fragments were identified by specific colors. Correct classification rates were compared between groups on Chi(2) test. Assessment was repeated 2 weeks later, to determine intra-observer reproducibility. Correct classification rates were significantly higher in the "segmentation" group: 114/138 (83%) versus 71/138 (52%); P<0.0001. The difference was greater for simple (36/36 (100%) versus 17/36 (47%); P<0.0001) than complex fractures (79/102 (77%) versus 54/102 (53%); P=0.0004). Mean segmentation time per fracture was 27±3min [range, 21-35min]. The segmentation group showed excellent intra-observer correlation coefficients, overall (ICC=0.88), and for simple (ICC=0.92) and complex fractures (ICC=0.84). Semi-automatic segmentation, identifying the various bone fragments, was effective in increasing the rate of correct acetabular fracture classification on the Letournel system by orthopedic surgery residents. It may be considered for routine use in education and training. III

  13. METAXA2: improved identification and taxonomic classification of small and large subunit rRNA in metagenomic data.

    PubMed

    Bengtsson-Palme, Johan; Hartmann, Martin; Eriksson, Karl Martin; Pal, Chandan; Thorell, Kaisa; Larsson, Dan Göran Joakim; Nilsson, Rolf Henrik

    2015-11-01

    The ribosomal rRNA genes are widely used as genetic markers for taxonomic identification of microbes. Particularly the small subunit (SSU; 16S/18S) rRNA gene is frequently used for species- or genus-level identification, but also the large subunit (LSU; 23S/28S) rRNA gene is employed in taxonomic assignment. The METAXA software tool is a popular utility for extracting partial rRNA sequences from large sequencing data sets and assigning them to an archaeal, bacterial, nuclear eukaryote, mitochondrial or chloroplast origin. This study describes a comprehensive update to METAXA - METAXA2 - that extends the capabilities of the tool, introducing support for the LSU rRNA gene, a greatly improved classifier allowing classification down to genus or species level, as well as enhanced support for short-read (100 bp) and paired-end sequences, among other changes. The performance of METAXA2 was compared to other commonly used taxonomic classifiers, showing that METAXA2 often outperforms previous methods in terms of making correct predictions while maintaining a low misclassification rate. METAXA2 is freely available from http://microbiology.se/software/metaxa2/.

  14. Non-linear classification for on-the-fly fractional mass filtering and targeted precursor fragmentation in mass spectrometry experiments

    PubMed Central

    Kirchner, Marc; Timm, Wiebke; Fong, Peying; Wangemann, Philine; Steen, Hanno

    2010-01-01

    Motivation: Mass spectrometry (MS) has become the method of choice for protein/peptide sequence and modification analysis. The technology employs a two-step approach: ionized peptide precursor masses are detected, selected for fragmentation, and the fragment mass spectra are collected for computational analysis. Current precursor selection schemes are based on data- or information-dependent acquisition (DDA/IDA), where fragmentation mass candidates are selected by intensity and are subsequently included in a dynamic exclusion list to avoid constant refragmentation of highly abundant species. DDA/IDA methods do not exploit valuable information that is contained in the fractional mass of high-accuracy precursor mass measurements delivered by current instrumentation. Results: We extend previous contributions that suggest that fractional mass information allows targeted fragmentation of analytes of interest. We introduce a non-linear Random Forest classification and a discrete mapping approach, which can be trained to discriminate among arbitrary fractional mass patterns for an arbitrary number of classes of analytes. These methods can be used to increase fragmentation efficiency for specific subsets of analytes or to select suitable fragmentation technologies on-the-fly. We show that theoretical generalization error estimates transfer into practical application, and that their quality depends on the accuracy of prior distribution estimate of the analyte classes. The methods are applied to two real-world proteomics datasets. Availability: All software used in this study is available from http://software.steenlab.org/fmf Contact: hanno.steen@childrens.harvard.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:20134030

  15. Distribution and Classification of Serine β-Lactamases in Brazilian Hospital Sewage and Other Environmental Metagenomes Deposited in Public Databases

    PubMed Central

    Fróes, Adriana M.; da Mota, Fábio F.; Cuadrat, Rafael R. C.; Dávila, Alberto M. R.

    2016-01-01

    serine β-lactamases, indicating the specificity and high sensitivity of this approach in large datasets, contributing for the identification and classification of a large number of homologous genes, comprising possible new ones. Phylogenetic analysis revealed the potential reservoir of β-lactam resistance genes in the environment, contributing to understanding the evolution and dissemination of these genes. PMID:27895627

  16. Distribution and Classification of Serine β-Lactamases in Brazilian Hospital Sewage and Other Environmental Metagenomes Deposited in Public Databases.

    PubMed

    Fróes, Adriana M; da Mota, Fábio F; Cuadrat, Rafael R C; Dávila, Alberto M R

    2016-01-01

    serine β-lactamases, indicating the specificity and high sensitivity of this approach in large datasets, contributing for the identification and classification of a large number of homologous genes, comprising possible new ones. Phylogenetic analysis revealed the potential reservoir of β-lactam resistance genes in the environment, contributing to understanding the evolution and dissemination of these genes.

  17. Rapid identification and classification of bacteria by 16S rDNA restriction fragment melting curve analyses (RFMCA).

    PubMed

    Rudi, Knut; Kleiberg, Gro H; Heiberg, Ragnhild; Rosnes, Jan T

    2007-08-01

    The aim of this work was to evaluate restriction fragment melting curve analyses (RFMCA) as a novel approach for rapid classification of bacteria during food production. RFMCA was evaluated for bacteria isolated from sous vide food products, and raw materials used for sous vide production. We identified four major bacterial groups in the material analysed (cluster I-Streptococcus, cluster II-Carnobacterium/Bacillus, cluster III-Staphylococcus and cluster IV-Actinomycetales). The accuracy of RFMCA was evaluated by comparison with 16S rDNA sequencing. The strains satisfying the RFMCA quality filtering criteria (73%, n=57), with both 16S rDNA sequence information and RFMCA data (n=45) gave identical group assignments with the two methods. RFMCA enabled rapid and accurate classification of bacteria that is database compatible. Potential application of RFMCA in the food or pharmaceutical industry will include development of classification models for the bacteria expected in a given product, and then to build an RFMCA database as a part of the product quality control.

  18. Metagenomic Analysis of Silage

    PubMed Central

    Tennant, Richard K.; Sambles, Christine M.; Diffey, Georgina E.; Moore, Karen A.; Love, John

    2017-01-01

    Metagenomics is defined as the direct analysis of deoxyribonucleic acid (DNA) purified from environmental samples and enables taxonomic identification of the microbial communities present within them. Two main metagenomic approaches exist; sequencing the 16S rRNA gene coding region, which exhibits sufficient variation between taxa for identification, and shotgun sequencing, in which genomes of the organisms that are present in the sample are analyzed and ascribed to "operational taxonomic units"; species, genera or families depending on the extent of sequencing coverage. In this study, shotgun sequencing was used to analyze the microbial community present in cattle silage and, coupled with a range of bioinformatics tools to quality check and filter the DNA sequence reads, perform taxonomic classification of the microbial populations present within the sampled silage, and achieve functional annotation of the sequences. These methods were employed to identify potentially harmful bacteria that existed within the silage, an indication of silage spoilage. If spoiled silage is not remediated, then upon ingestion it could be potentially fatal to the livestock. PMID:28117801

  19. Swine Fecal Metagenomics

    EPA Science Inventory

    Metagenomic approaches are providing rapid and more robust means to investigate the composition and functional genetic potential of complex microbial communities. In this study, we utilized a metagenomic approach to further understand the functional diversity of the swine gut. To...

  20. Swine Fecal Metagenomics

    EPA Science Inventory

    Metagenomic approaches are providing rapid and more robust means to investigate the composition and functional genetic potential of complex microbial communities. In this study, we utilized a metagenomic approach to further understand the functional diversity of the swine gut. To...

  1. Fragment-based Analysis of Ligand Dockings Improves Classification of Actives

    PubMed Central

    Forli, Stefano; Goodsell, David; O’Donnell, T. J.; Olson, Arthur

    2016-01-01

    We describe ADChemCast, a method for using results from virtual screening to create a richer representation of a target binding site, which may be used to improve ranking of compounds and characterize the determinants of ligand-receptor specificity. ADChemCast clusters docked conformations of ligands based on shared pairwise receptor-ligand interactions within chemically similar structural fragments, building a set of attributes characteristic of binders and non-binders. Machine learning is then used to build rules from the most informational attributes for use in reranking of compounds. In this report, we use ADChemCast to improve the ranking of compounds in 11 diverse proteins from the Database of Useful Decoys-Enhanced (DUD-E), and demonstrate the utility of the method for characterizing relevant binding attributes in HIV reverse transcriptase. PMID:27384036

  2. The effect of sequencing errors on metagenomic gene prediction.

    PubMed

    Hoff, Katharina J

    2009-11-12

    Gene prediction is an essential step in the annotation of metagenomic sequencing reads. Since most metagenomic reads cannot be assembled into long contigs, specialized statistical gene prediction tools have been developed for short and anonymous DNA fragments, e.g. MetaGeneAnnotator and Orphelia. While conventional gene prediction methods have been subject to a benchmark study on real sequencing reads with typical errors, such a comparison has not been conducted for specialized tools, yet. Their gene prediction accuracy was mostly measured on error free DNA fragments. In this study, Sanger and pyrosequencing reads were simulated on the basis of models that take all types of sequencing errors into account. All metagenomic gene prediction tools showed decreasing accuracy with increasing sequencing error rates. Performance results on an established metagenomic benchmark dataset are also reported. In addition, we demonstrate that ESTScan, a tool for sequencing error compensation in eukaryotic expressed sequence tags, outperforms some metagenomic gene prediction tools on reads with high error rates although it was not designed for the task at hand. This study fills an important gap in metagenomic gene prediction research. Specialized methods are evaluated and compared with respect to sequencing error robustness. Results indicate that the integration of error-compensating methods into metagenomic gene prediction tools would be beneficial to improve metagenome annotation quality.

  3. A highly optimized grid deployment: the metagenomic analysis example.

    PubMed

    Aparicio, Gabriel; Blanquer, Ignacio; Hernández, Vicente

    2008-01-01

    Computational resources and computationally expensive processes are two topics that are not growing at the same ratio. The availability of large amounts of computing resources in Grid infrastructures does not mean that efficiency is not an important issue. It is necessary to analyze the whole process to improve partitioning and submission schemas, especially in the most critical experiments. This is the case of metagenomic analysis, and this text shows the work done in order to optimize a Grid deployment, which has led to a reduction of the response time and the failure rates. Metagenomic studies aim at processing samples of multiple specimens to extract the genes and proteins that belong to the different species. In many cases, the sequencing of the DNA of many microorganisms is hindered by the impossibility of growing significant samples of isolated specimens. Many bacteria cannot survive alone, and require the interaction with other organisms. In such cases, the information of the DNA available belongs to different kinds of organisms. One important stage in Metagenomic analysis consists on the extraction of fragments followed by the comparison and analysis of their function stage. By the comparison to existing chains, whose function is well known, fragments can be classified. This process is computationally intensive and requires of several iterations of alignment and phylogeny classification steps. Source samples reach several millions of sequences, which could reach up to thousands of nucleotides each. These sequences are compared to a selected part of the "Non-redundant" database which only implies the information from eukaryotic species. From this first analysis, a refining process is performed and alignment analysis is restarted from the results. This process implies several CPU years. The article describes and analyzes the difficulties to fragment, automate and check the above operations in current Grid production environments. This environment has been

  4. Evolutionary dynamics of clustered irregularly interspaced short palindromic repeat systems in the ocean metagenome.

    PubMed

    Sorokin, Valery A; Gelfand, Mikhail S; Artamonova, Irena I

    2010-04-01

    Clustered regularly interspaced short palindromic repeats (CRISPRs) form a recently characterized type of prokaryotic antiphage defense system. The phage-host interactions involving CRISPRs have been studied in experiments with selected bacterial or archaeal species and, computationally, in completely sequenced genomes. However, these studies do not allow one to take prokaryotic population diversity and phage-host interaction dynamics into account. This gap can be filled by using metagenomic data: in particular, the largest existing data set, generated from the Sorcerer II Global Ocean Sampling expedition. The application of three publicly available CRISPR recognition programs to the Global Ocean metagenome produced a large proportion of false-positive results. To address this problem, a filtering procedure was designed. It resulted in about 200 reliable CRISPR cassettes, which were then studied in detail. The repeat consensuses were clustered into several stable classes that differed from the existing classification. Short fragments of DNA similar to the cassette spacers were more frequently present in the same geographical location than in other locations (P, <0.0001). We developed a catalogue of elementary CRISPR-forming events and reconstructed the likely evolutionary history of cassettes that had common spacers. Metagenomic collections allow for relatively unbiased analysis of phage-host interactions and CRISPR evolution. The results of this study demonstrate that CRISPR cassettes retain the memory of the local virus population at a particular ocean location. CRISPR evolution may be described using a limited vocabulary of elementary events that have a natural biological interpretation.

  5. MGC: a metagenomic gene caller.

    PubMed

    El Allali, Achraf; Rose, John R

    2013-01-01

    Computational gene finding algorithms have proven their robustness in identifying genes in complete genomes. However, metagenomic sequencing has presented new challenges due to the incomplete and fragmented nature of the data. During the last few years, attempts have been made to extract complete and incomplete open reading frames (ORFs) directly from short reads and identify the coding ORFs, bypassing other challenging tasks such as the assembly of the metagenome. In this paper we introduce a metagenomics gene caller (MGC) which is an improvement over the state-of-the-art prediction algorithm Orphelia. Orphelia uses a two-stage machine learning approach and computes a model that classifies extracted ORFs from fragmented sequences. We hypothesise and demonstrate evidence that sequences need separate models based on their local GC-content in order to avoid the noise introduced to a single model computed with sequences from the entire GC spectrum. We have also added two amino-acid features based on the benefit of amino-acid usage shown in our previous research. Our algorithm is able to predict genes and translation initiation sites (TIS) more accurately than Orphelia which uses a single model. Learning separate models for several pre-defined GC-content regions as opposed to a single model approach improves the performance of the neural network as demonstrated by the experimental results presented in this paper. The inclusion of amino-acid usage features also helps improve the overall accuracy of our algorithm. MGC's improvement sets the ground for further investigation into the use of GC-content to separate data for training models in machine learning based gene finders.

  6. Interactive metagenomic visualization in a Web browser

    PubMed Central

    2011-01-01

    Background A critical output of metagenomic studies is the estimation of abundances of taxonomical or functional groups. The inherent uncertainty in assignments to these groups makes it important to consider both their hierarchical contexts and their prediction confidence. The current tools for visualizing metagenomic data, however, omit or distort quantitative hierarchical relationships and lack the facility for displaying secondary variables. Results Here we present Krona, a new visualization tool that allows intuitive exploration of relative abundances and confidences within the complex hierarchies of metagenomic classifications. Krona combines a variant of radial, space-filling displays with parametric coloring and interactive polar-coordinate zooming. The HTML5 and JavaScript implementation enables fully interactive charts that can be explored with any modern Web browser, without the need for installed software or plug-ins. This Web-based architecture also allows each chart to be an independent document, making them easy to share via e-mail or post to a standard Web server. To illustrate Krona's utility, we describe its application to various metagenomic data sets and its compatibility with popular metagenomic analysis tools. Conclusions Krona is both a powerful metagenomic visualization tool and a demonstration of the potential of HTML5 for highly accessible bioinformatic visualizations. Its rich and interactive displays facilitate more informed interpretations of metagenomic analyses, while its implementation as a browser-based application makes it extremely portable and easily adopted into existing analysis packages. Both the Krona rendering code and conversion tools are freely available under a BSD open-source license, and available from: http://krona.sourceforge.net. PMID:21961884

  7. Interactive metagenomic visualization in a Web browser.

    PubMed

    Ondov, Brian D; Bergman, Nicholas H; Phillippy, Adam M

    2011-09-30

    A critical output of metagenomic studies is the estimation of abundances of taxonomical or functional groups. The inherent uncertainty in assignments to these groups makes it important to consider both their hierarchical contexts and their prediction confidence. The current tools for visualizing metagenomic data, however, omit or distort quantitative hierarchical relationships and lack the facility for displaying secondary variables. Here we present Krona, a new visualization tool that allows intuitive exploration of relative abundances and confidences within the complex hierarchies of metagenomic classifications. Krona combines a variant of radial, space-filling displays with parametric coloring and interactive polar-coordinate zooming. The HTML5 and JavaScript implementation enables fully interactive charts that can be explored with any modern Web browser, without the need for installed software or plug-ins. This Web-based architecture also allows each chart to be an independent document, making them easy to share via e-mail or post to a standard Web server. To illustrate Krona's utility, we describe its application to various metagenomic data sets and its compatibility with popular metagenomic analysis tools. Krona is both a powerful metagenomic visualization tool and a demonstration of the potential of HTML5 for highly accessible bioinformatic visualizations. Its rich and interactive displays facilitate more informed interpretations of metagenomic analyses, while its implementation as a browser-based application makes it extremely portable and easily adopted into existing analysis packages. Both the Krona rendering code and conversion tools are freely available under a BSD open-source license, and available from: http://krona.sourceforge.net.

  8. Megraft: A software package to graft ribosomal small subunit (16S/18S) fragments onto full-length sequences for accurate species richness and sequencing depth analysis in pyrosequencing-length metagenomes

    USDA-ARS?s Scientific Manuscript database

    Metagenomic libraries represent subsamples of the total DNA found at a study site and offer unprecedented opportunities to study ecological and functional aspects of microbial communities. To examine the depth of the sequencing effort, rarefaction analysis of the ribosomal small sub-unit (SSU/16S/18...

  9. Classification

    ERIC Educational Resources Information Center

    Clary, Renee; Wandersee, James

    2013-01-01

    In this article, Renee Clary and James Wandersee describe the beginnings of "Classification," which lies at the very heart of science and depends upon pattern recognition. Clary and Wandersee approach patterns by first telling the story of the "Linnaean classification system," introduced by Carl Linnacus (1707-1778), who is…

  10. Classification

    ERIC Educational Resources Information Center

    Clary, Renee; Wandersee, James

    2013-01-01

    In this article, Renee Clary and James Wandersee describe the beginnings of "Classification," which lies at the very heart of science and depends upon pattern recognition. Clary and Wandersee approach patterns by first telling the story of the "Linnaean classification system," introduced by Carl Linnacus (1707-1778), who is…

  11. A Primer on Metagenomics

    PubMed Central

    Wooley, John C.; Godzik, Adam; Friedberg, Iddo

    2010-01-01

    Metagenomics is a discipline that enables the genomic study of uncultured microorganisms. Faster, cheaper sequencing technologies and the ability to sequence uncultured microbes sampled directly from their habitats are expanding and transforming our view of the microbial world. Distilling meaningful information from the millions of new genomic sequences presents a serious challenge to bioinformaticians. In cultured microbes, the genomic data come from a single clone, making sequence assembly and annotation tractable. In metagenomics, the data come from heterogeneous microbial communities, sometimes containing more than 10,000 species, with the sequence data being noisy and partial. From sampling, to assembly, to gene calling and function prediction, bioinformatics faces new demands in interpreting voluminous, noisy, and often partial sequence data. Although metagenomics is a relative newcomer to science, the past few years have seen an explosion in computational methods applied to metagenomic-based research. It is therefore not within the scope of this article to provide an exhaustive review. Rather, we provide here a concise yet comprehensive introduction to the current computational requirements presented by metagenomics, and review the recent progress made. We also note whether there is software that implements any of the methods presented here, and briefly review its utility. Nevertheless, it would be useful if readers of this article would avail themselves of the comment section provided by this journal, and relate their own experiences. Finally, the last section of this article provides a few representative studies illustrating different facets of recent scientific discoveries made using metagenomics. PMID:20195499

  12. Consensus statement: Virus taxonomy in the age of metagenomics.

    PubMed

    Simmonds, Peter; Adams, Mike J; Benkő, Mária; Breitbart, Mya; Brister, J Rodney; Carstens, Eric B; Davison, Andrew J; Delwart, Eric; Gorbalenya, Alexander E; Harrach, Balázs; Hull, Roger; King, Andrew M Q; Koonin, Eugene V; Krupovic, Mart; Kuhn, Jens H; Lefkowitz, Elliot J; Nibert, Max L; Orton, Richard; Roossinck, Marilyn J; Sabanadzovic, Sead; Sullivan, Matthew B; Suttle, Curtis A; Tesh, Robert B; van der Vlugt, René A; Varsani, Arvind; Zerbini, F Murilo

    2017-03-01

    The number and diversity of viral sequences that are identified in metagenomic data far exceeds that of experimentally characterized virus isolates. In a recent workshop, a panel of experts discussed the proposal that, with appropriate quality control, viruses that are known only from metagenomic data can, and should be, incorporated into the official classification scheme of the International Committee on Taxonomy of Viruses (ICTV). Although a taxonomy that is based on metagenomic sequence data alone represents a substantial departure from the traditional reliance on phenotypic properties, the development of a robust framework for sequence-based virus taxonomy is indispensable for the comprehensive characterization of the global virome. In this Consensus Statement article, we consider the rationale for why metagenomic sequence data should, and how it can, be incorporated into the ICTV taxonomy, and present proposals that have been endorsed by the Executive Committee of the ICTV.

  13. QRS Fragmentation Patterns Representing Myocardial Scar Need to Be Separated from Benign Normal Variants: Hypotheses and Proposal for Morphology based Classification.

    PubMed

    Haukilahti, M Anette E; Eranti, Antti; Kenttä, Tuomas; Huikuri, Heikki V

    2016-01-01

    The presence of a fragmented QRS complex (fQRS) in two contiguous leads of a standard 12-lead electrocardiogram (ECG) has been shown to be an indicator of myocardial scar in multiple different populations of cardiac patients. QRS fragmentation is also a predictor of adverse prognosis in acute myocardial infarction, coronary artery disease, and ischemic cardiomyopathy and a prognostic tool in structural heart diseases. An increased risk of sudden cardiac death associated with fQRS has been documented in patients with ischemic cardiomyopathy and hypertrophic cardiomyopathy. However, fQRS is also frequently observed in apparently healthy subjects. Thus, a more detailed classification of different QRS fragmentations is needed to identify the pathological fragmentation patterns and refine the role of fQRS as a risk marker of adverse cardiac events and sudden cardiac death. In most studies fQRS has been defined by the presence of an additional R wave (R'), or notching in the nadir of the S wave, or the presence of >1 R' in two contiguous leads corresponding to a major coronary territory. However, this approach does not discriminate between minor and major fragmentations and the location of the fQRS is also neglected. In addition to this, the method is susceptible to large interobserver variability. We suppose that some fQRS subtypes result from conduction delays in the His-Purkinje system, which is a benign finding and thus can weaken the prognostic values of fQRS. The classification of fQRSs to subtypes with unambiguous definitions is needed to overcome the interobserver variability related issues and to separate fQRSs caused by myocardial scarring from benign normal variants. In this paper, we review the anatomic correlates of fQRS and the current knowledge of prognostic significance of fQRS. We also propose a detailed fQRS classification for research purposes which can later be simplified after the truly pathological morphologies have been identified. The research

  14. QRS Fragmentation Patterns Representing Myocardial Scar Need to Be Separated from Benign Normal Variants: Hypotheses and Proposal for Morphology based Classification

    PubMed Central

    Haukilahti, M. Anette E.; Eranti, Antti; Kenttä, Tuomas; Huikuri, Heikki V.

    2016-01-01

    The presence of a fragmented QRS complex (fQRS) in two contiguous leads of a standard 12-lead electrocardiogram (ECG) has been shown to be an indicator of myocardial scar in multiple different populations of cardiac patients. QRS fragmentation is also a predictor of adverse prognosis in acute myocardial infarction, coronary artery disease, and ischemic cardiomyopathy and a prognostic tool in structural heart diseases. An increased risk of sudden cardiac death associated with fQRS has been documented in patients with ischemic cardiomyopathy and hypertrophic cardiomyopathy. However, fQRS is also frequently observed in apparently healthy subjects. Thus, a more detailed classification of different QRS fragmentations is needed to identify the pathological fragmentation patterns and refine the role of fQRS as a risk marker of adverse cardiac events and sudden cardiac death. In most studies fQRS has been defined by the presence of an additional R wave (R′), or notching in the nadir of the S wave, or the presence of >1 R′ in two contiguous leads corresponding to a major coronary territory. However, this approach does not discriminate between minor and major fragmentations and the location of the fQRS is also neglected. In addition to this, the method is susceptible to large interobserver variability. We suppose that some fQRS subtypes result from conduction delays in the His-Purkinje system, which is a benign finding and thus can weaken the prognostic values of fQRS. The classification of fQRSs to subtypes with unambiguous definitions is needed to overcome the interobserver variability related issues and to separate fQRSs caused by myocardial scarring from benign normal variants. In this paper, we review the anatomic correlates of fQRS and the current knowledge of prognostic significance of fQRS. We also propose a detailed fQRS classification for research purposes which can later be simplified after the truly pathological morphologies have been identified. The research

  15. Structural and Functional Insights from the Metagenome of an Acidic Hot Spring Microbial Planktonic Community in the Colombian Andes

    PubMed Central

    Jiménez, Diego Javier; Andreote, Fernando Dini; Chaves, Diego; Montaña, José Salvador; Osorio-Forero, Cesar; Junca, Howard; Zambrano, María Mercedes; Baena, Sandra

    2012-01-01

    A taxonomic and annotated functional description of microbial life was deduced from 53 Mb of metagenomic sequence retrieved from a planktonic fraction of the Neotropical high Andean (3,973 meters above sea level) acidic hot spring El Coquito (EC). A classification of unassembled metagenomic reads using different databases showed a high proportion of Gammaproteobacteria and Alphaproteobacteria (in total read affiliation), and through taxonomic affiliation of 16S rRNA gene fragments we observed the presence of Proteobacteria, micro-algae chloroplast and Firmicutes. Reads mapped against the genomes Acidiphilium cryptum JF-5, Legionella pneumophila str. Corby and Acidithiobacillus caldus revealed the presence of transposase-like sequences, potentially involved in horizontal gene transfer. Functional annotation and hierarchical comparison with different datasets obtained by pyrosequencing in different ecosystems showed that the microbial community also contained extensive DNA repair systems, possibly to cope with ultraviolet radiation at such high altitudes. Analysis of genes involved in the nitrogen cycle indicated the presence of dissimilatory nitrate reduction to N2 (narGHI, nirS, norBCDQ and nosZ), associated with Proteobacteria-like sequences. Genes involved in the sulfur cycle (cysDN, cysNC and aprA) indicated adenylsulfate and sulfite production that were affiliated to several bacterial species. In summary, metagenomic sequence data provided insight regarding the structure and possible functions of this hot spring microbial community, describing some groups potentially involved in the nitrogen and sulfur cycling in this environment. PMID:23251687

  16. Random Whole Metagenomic Sequencing for Forensic Discrimination of Soils

    PubMed Central

    Khodakova, Anastasia S.; Smith, Renee J.; Burgoyne, Leigh; Abarno, Damien; Linacre, Adrian

    2014-01-01

    Here we assess the ability of random whole metagenomic sequencing approaches to discriminate between similar soils from two geographically distinct urban sites for application in forensic science. Repeat samples from two parklands in residential areas separated by approximately 3 km were collected and the DNA was extracted. Shotgun, whole genome amplification (WGA) and single arbitrarily primed DNA amplification (AP-PCR) based sequencing techniques were then used to generate soil metagenomic profiles. Full and subsampled metagenomic datasets were then annotated against M5NR/M5RNA (taxonomic classification) and SEED Subsystems (metabolic classification) databases. Further comparative analyses were performed using a number of statistical tools including: hierarchical agglomerative clustering (CLUSTER); similarity profile analysis (SIMPROF); non-metric multidimensional scaling (NMDS); and canonical analysis of principal coordinates (CAP) at all major levels of taxonomic and metabolic classification. Our data showed that shotgun and WGA-based approaches generated highly similar metagenomic profiles for the soil samples such that the soil samples could not be distinguished accurately. An AP-PCR based approach was shown to be successful at obtaining reproducible site-specific metagenomic DNA profiles, which in turn were employed for successful discrimination of visually similar soil samples collected from two different locations. PMID:25111003

  17. Random whole metagenomic sequencing for forensic discrimination of soils.

    PubMed

    Khodakova, Anastasia S; Smith, Renee J; Burgoyne, Leigh; Abarno, Damien; Linacre, Adrian

    2014-01-01

    Here we assess the ability of random whole metagenomic sequencing approaches to discriminate between similar soils from two geographically distinct urban sites for application in forensic science. Repeat samples from two parklands in residential areas separated by approximately 3 km were collected and the DNA was extracted. Shotgun, whole genome amplification (WGA) and single arbitrarily primed DNA amplification (AP-PCR) based sequencing techniques were then used to generate soil metagenomic profiles. Full and subsampled metagenomic datasets were then annotated against M5NR/M5RNA (taxonomic classification) and SEED Subsystems (metabolic classification) databases. Further comparative analyses were performed using a number of statistical tools including: hierarchical agglomerative clustering (CLUSTER); similarity profile analysis (SIMPROF); non-metric multidimensional scaling (NMDS); and canonical analysis of principal coordinates (CAP) at all major levels of taxonomic and metabolic classification. Our data showed that shotgun and WGA-based approaches generated highly similar metagenomic profiles for the soil samples such that the soil samples could not be distinguished accurately. An AP-PCR based approach was shown to be successful at obtaining reproducible site-specific metagenomic DNA profiles, which in turn were employed for successful discrimination of visually similar soil samples collected from two different locations.

  18. Reconstruction of novel cyanobacterial siphovirus genomes from Mediterranean metagenomic fosmids.

    PubMed

    Mizuno, Carolina Megumi; Rodriguez-Valera, Francisco; Garcia-Heredia, Inmaculada; Martin-Cuadrado, Ana-Belen; Ghai, Rohit

    2013-01-01

    Cellular metagenomes are primarily used for investigating microbial community structure and function. However, cloned fosmids from such metagenomes capture phage genome fragments that can be used as a source of phage genomes. We show that fosmid cloning from cellular metagenomes and sequencing at a high coverage is a credible alternative to constructing metaviriomes and allows capturing and assembling novel, complete phage genomes. It is likely that phages recovered from cellular metagenomes are those replicating within cells during sample collection and represent "active" phages, naturally amplifying their genomic DNA and increasing chances for cloning. We describe five sets of siphoviral contigs (MEDS1, MEDS2, MEDS3, MEDS4, and MEDS5), obtained by sequencing fosmids from the cellular metagenome of the deep chlorophyll maximum in the Mediterranean. Three of these represent complete siphoviral genomes and two represent partial ones. This is the first set of phage genomes assembled directly from cellular metagenomic fosmid libraries. They exhibit low sequence similarities to one another and to known siphoviruses but are remarkably similar in overall genome architecture. We present evidence suggesting they infect picocyanobacteria, likely Synechococcus. Four of these sets also define a novel branch in the phylogenetic tree of phage large subunit terminases. Moreover, some of these siphoviral groups are globally distributed and abundant in the oceans, comparable to some known myoviruses and podoviruses. This suggests that, as more siphoviral genomes become available, we will be better able to assess the abundance and influence of this diverse and polyphyletic group in the marine habitat.

  19. Classification.

    PubMed

    Tuxhorn, Ingrid; Kotagal, Prakash

    2008-07-01

    In this article, we review the practical approach and diagnostic relevance of current seizure and epilepsy classification concepts and principles as a basic framework for good management of patients with epileptic seizures and epilepsy. Inaccurate generalizations about terminology, diagnosis, and treatment may be the single most important factor, next to an inadequately obtained history, that determines the misdiagnosis and mismanagement of patients with epilepsy. A stepwise signs and symptoms approach for diagnosis, evaluation, and management along the guidelines of the International League Against Epilepsy and definitions of epileptic seizures and epilepsy syndromes offers a state-of-the-art clinical approach to managing patients with epilepsy.

  20. Classification

    NASA Technical Reports Server (NTRS)

    Oza, Nikunj C.

    2011-01-01

    A supervised learning task involves constructing a mapping from input data (normally described by several features) to the appropriate outputs. Within supervised learning, one type of task is a classification learning task, in which each output is one or more classes to which the input belongs. In supervised learning, a set of training examples---examples with known output values---is used by a learning algorithm to generate a model. This model is intended to approximate the mapping between the inputs and outputs. This model can be used to generate predicted outputs for inputs that have not been seen before. For example, we may have data consisting of observations of sunspots. In a classification learning task, our goal may be to learn to classify sunspots into one of several types. Each example may correspond to one candidate sunspot with various measurements or just an image. A learning algorithm would use the supplied examples to generate a model that approximates the mapping between each supplied set of measurements and the type of sunspot. This model can then be used to classify previously unseen sunspots based on the candidate's measurements. This chapter discusses methods to perform machine learning, with examples involving astronomy.

  1. TheViral MetaGenome Annotation Pipeline(VMGAP):an automated tool for the functional annotation of viral Metagenomic shotgun sequencing data

    PubMed Central

    Lorenzi, Hernan A.; Hoover, Jeff; Inman, Jason; Safford, Todd; Murphy, Sean; Kagan, Leonid; Williamson, Shannon J.

    2011-01-01

    In the past few years, the field of metagenomics has been growing at an accelerated pace, particularly in response to advancements in new sequencing technologies. The large volume of sequence data from novel organisms generated by metagenomic projects has triggered the development of specialized databases and tools focused on particular groups of organisms or data types. Here we describe a pipeline for the functional annotation of viral metagenomic sequence data. The Viral MetaGenome Annotation Pipeline (VMGAP) pipeline takes advantage of a number of specialized databases, such as collections of mobile genetic elements and environmental metagenomes to improve the classification and functional prediction of viral gene products. The pipeline assigns a functional term to each predicted protein sequence following a suite of comprehensive analyses whose results are ranked according to a priority rules hierarchy. Additional annotation is provided in the form of enzyme commission (EC) numbers, GO/MeGO terms and Hidden Markov Models together with supporting evidence. PMID:21886867

  2. METAGENassist: a comprehensive web server for comparative metagenomics.

    PubMed

    Arndt, David; Xia, Jianguo; Liu, Yifeng; Zhou, You; Guo, An Chi; Cruz, Joseph A; Sinelnikov, Igor; Budwill, Karen; Nesbø, Camilla L; Wishart, David S

    2012-07-01

    With recent improvements in DNA sequencing and sample extraction techniques, the quantity and quality of metagenomic data are now growing exponentially. This abundance of richly annotated metagenomic data and bacterial census information has spawned a new branch of microbiology called comparative metagenomics. Comparative metagenomics involves the comparison of bacterial populations between different environmental samples, different culture conditions or different microbial hosts. However, in order to do comparative metagenomics, one typically requires a sophisticated knowledge of multivariate statistics and/or advanced software programming skills. To make comparative metagenomics more accessible to microbiologists, we have developed a freely accessible, easy-to-use web server for comparative metagenomic analysis called METAGENassist. Users can upload their bacterial census data from a wide variety of common formats, using either amplified 16S rRNA data or shotgun metagenomic data. Metadata concerning environmental, culture, or host conditions can also be uploaded. During the data upload process, METAGENassist also performs an automated taxonomic-to-phenotypic mapping. Phenotypic information covering nearly 20 functional categories such as GC content, genome size, oxygen requirements, energy sources and preferred temperature range is automatically generated from the taxonomic input data. Using this phenotypically enriched data, users can then perform a variety of multivariate and univariate data analyses including fold change analysis, t-tests, PCA, PLS-DA, clustering and classification. To facilitate data processing, users are guided through a step-by-step analysis workflow using a variety of menus, information hyperlinks and check boxes. METAGENassist also generates colorful, publication quality tables and graphs that can be downloaded and used directly in the preparation of scientific papers. METAGENassist is available at http://www.metagenassist.ca.

  3. The new intra-articular calcaneal fracture classification system in term of sustentacular fragment configurations and incorporation of posterior calcaneal facet fractures with fracture components of the calcaneal body.

    PubMed

    Harnroongroj, Thossart; Harnroongroj, Thos; Suntharapa, Thongchai; Arunakul, Marut

    2016-10-01

    The aim of this study was to develop a new calcaneal fracture classification system which will consider sustentacular fragment configuration and relation of posterior calcaneal facet to calcaneal body. The new classification system used sustentacular fragment configuration and relation of posterior calcaneal facet fracture with fracture components of calcaneal body as key aspects of main types and subtypes. Between 2000 and 2014, 126 intraarticular calcaneal fractures were classified according to the new classification system by using computed tomography images. The new classification system was studied in term of reliability, correlation to choices of treatment, implant fixation and quality of fracture reduction. Types of sustentacular fragment comprised type A, B and C. Type A sustentacular fragment included sustentacular tali containing middle calcaneal facet. In Type B and C fractures sustentacular fragment included medial aspect and entire posterior calcaneal facet as a single unit, respectively. The fractures with type A, B and C sustentacular fragments were classified as main type A, B and C intra-articular calcaneal fractures. The main type A and B comprised 4 subtypes. Subtypes A1, A3, B1, and B3 associated with avulsion and bending fragments of calcaneal body. Subtype A2, B2, and B4 associated with burst calcaneal body. Subtype B4 was not found in the study. Main type C had no subtype and associated with burst calcaneal body. The data showed good reliability. The study showed that our new intra-articular calcaneal fracture classification system correlates to choices of treatment, implant fixation and quality of fracture reduction. Level IV, Study of Diagnostic Test. Copyright © 2016 Turkish Association of Orthopaedics and Traumatology. Production and hosting by Elsevier B.V. All rights reserved.

  4. MetAnnotate: function-specific taxonomic profiling and comparison of metagenomes.

    PubMed

    Petrenko, Pavel; Lobb, Briallen; Kurtz, Daniel A; Neufeld, Josh D; Doxey, Andrew C

    2015-11-05

    Metagenomes provide access to the taxonomic composition and functional capabilities of microbial communities. Although metagenomic analysis methods exist for estimating overall community composition or metabolic potential, identifying specific taxa that encode specific functions or pathways of interest can be more challenging. Here we present MetAnnotate, which addresses the common question: "which organisms perform my function of interest within my metagenome(s) of interest?" MetAnnotate uses profile hidden Markov models to analyze shotgun metagenomes for genes and pathways of interest, classifies retrieved sequences either through a phylogenetic placement or best hit approach, and enables comparison of these profiles between metagenomes. Based on a simulated metagenome dataset, the tool achieves high taxonomic classification accuracy for a broad range of genes, including both markers of community abundance and specific biological pathways. Lastly, we demonstrate MetAnnotate by analyzing for cobalamin (vitamin B12) synthesis genes across hundreds of aquatic metagenomes in a fraction of the time required by the commonly used Basic Local Alignment Search Tool top hit approach. MetAnnotate is multi-threaded and installable as a local web application or command-line tool on Linux systems. Metannotate is a useful framework for general and/or function-specific taxonomic profiling and comparison of metagenomes.

  5. Recent progresses in metagenomics

    USDA-ARS?s Scientific Manuscript database

    Metagenomics addresses the collective genetic structure and functional composition of a microbial community at its native habitat. This approach has emerged as a powerful tool to study the structure and function of the microbiota for the past few years and is revolutionizing studies of microbial ec...

  6. Bambus 2: scaffolding metagenomes

    PubMed Central

    Koren, Sergey; Treangen, Todd J.; Pop, Mihai

    2011-01-01

    Motivation: Sequencing projects increasingly target samples from non-clonal sources. In particular, metagenomics has enabled scientists to begin to characterize the structure of microbial communities. The software tools developed for assembling and analyzing sequencing data for clonal organisms are, however, unable to adequately process data derived from non-clonal sources. Results: We present a new scaffolder, Bambus 2, to address some of the challenges encountered when analyzing metagenomes. Our approach relies on a combination of a novel method for detecting genomic repeats and algorithms that analyze assembly graphs to identify biologically meaningful genomic variants. We compare our software to current assemblers using simulated and real data. We demonstrate that the repeat detection algorithms have higher sensitivity than current approaches without sacrificing specificity. In metagenomic datasets, the scaffolder avoids false joins between distantly related organisms while obtaining long-range contiguity. Bambus 2 represents a first step toward automated metagenomic assembly. Availability: Bambus 2 is open source and available from http://amos.sf.net. Contact: mpop@umiacs.umd.edu Supplementary Information: Supplementary data are available at Bioinformatics online. PMID:21926123

  7. Deciphering diversity and ecological function from marine metagenomes.

    PubMed

    Bik, Holly M

    2014-10-01

    Metagenomic sequencing now represents a common, powerful approach for investigating diversity and functional relationships in marine ecosystems. High-throughput datasets generated from random fragments of environmental DNA can provide a less biased view of organismal abundance (versus PCR-based amplicon sequencing) and enable novel exploration of microbial genomes by recovering genome assemblies from uncultured species, identifying ecological functions, and reconstructing metabolic pathways. This review highlights the current state of knowledge in marine metagenomics, focusing on biological insights gained from recent environmental studies and detailing commonly employed methods for data collection and analysis. © 2014 Marine Biological Laboratory.

  8. Beyond biodiversity: fish metagenomes.

    PubMed

    Ardura, Alba; Planes, Serge; Garcia-Vazquez, Eva

    2011-01-01

    Biodiversity and intra-specific genetic diversity are interrelated and determine the potential of a community to survive and evolve. Both are considered together in Prokaryote communities treated as metagenomes or ensembles of functional variants beyond species limits.Many factors alter biodiversity in higher Eukaryote communities, and human exploitation can be one of the most important for some groups of plants and animals. For example, fisheries can modify both biodiversity and genetic diversity (intra specific). Intra-specific diversity can be drastically altered by overfishing. Intense fishing pressure on one stock may imply extinction of some genetic variants and subsequent loss of intra-specific diversity. The objective of this study was to apply a metagenome approach to fish communities and explore its value for rapid evaluation of biodiversity and genetic diversity at community level. Here we have applied the metagenome approach employing the barcoding target gene coi as a model sequence in catch from four very different fish assemblages exploited by fisheries: freshwater communities from the Amazon River and northern Spanish rivers, and marine communities from the Cantabric and Mediterranean seas.Treating all sequences obtained from each regional catch as a biological unit (exploited community) we found that metagenomic diversity indices of the Amazonian catch sample here examined were lower than expected. Reduced diversity could be explained, at least partially, by overexploitation of the fish community that had been independently estimated by other methods.We propose using a metagenome approach for estimating diversity in Eukaryote communities and early evaluating genetic variation losses at multi-species level.

  9. Recovering full-length viral genomes from metagenomes

    PubMed Central

    Smits, Saskia L.; Bodewes, Rogier; Ruiz-González, Aritz; Baumgärtner, Wolfgang; Koopmans, Marion P.; Osterhaus, Albert D. M. E.; Schürch, Anita C.

    2015-01-01

    Infectious disease metagenomics is driven by the question: “what is causing the disease?” in contrast to classical metagenome studies which are guided by “what is out there?” In case of a novel virus, a first step to eventually establishing etiology can be to recover a full-length viral genome from a metagenomic sample. However, retrieval of a full-length genome of a divergent virus is technically challenging and can be time-consuming and costly. Here we discuss different assembly and fragment linkage strategies such as iterative assembly, motif searches, k-mer frequency profiling, coverage profile binning, and other strategies used to recover genomes of potential viral pathogens in a timely and cost-effective manner. PMID:26483782

  10. Metagenomic small molecule discovery methods

    PubMed Central

    Charlop-Powers, Zachary; Milshteyn, Aleksandr; Brady, Sean F.

    2014-01-01

    Metagenomic approaches to natural product discovery provide the means of harvesting bioactive small molecules synthesized by environmental bacteria without the requirement of first culturing these organisms. Advances in sequencing technologies and general metagenomic methods are beginning to provide the tools necessary to unlock the unexplored biosynthetic potential encoded by the genomes of uncultured environmental bacteria. Here, we highlight recent advances in sequence- and functional- based metagenomic approaches that promise to facilitate antibiotic discovery from diverse environmental microbiomes. PMID:25000402

  11. Accessing the Soil Metagenome for Studies of Microbial Diversity▿ †

    PubMed Central

    Delmont, Tom O.; Robe, Patrick; Cecillon, Sébastien; Clark, Ian M.; Constancias, Florentin; Simonet, Pascal; Hirsch, Penny R.; Vogel, Timothy M.

    2011-01-01

    Soil microbial communities contain the highest level of prokaryotic diversity of any environment, and metagenomic approaches involving the extraction of DNA from soil can improve our access to these communities. Most analyses of soil biodiversity and function assume that the DNA extracted represents the microbial community in the soil, but subsequent interpretations are limited by the DNA recovered from the soil. Unfortunately, extraction methods do not provide a uniform and unbiased subsample of metagenomic DNA, and as a consequence, accurate species distributions cannot be determined. Moreover, any bias will propagate errors in estimations of overall microbial diversity and may exclude some microbial classes from study and exploitation. To improve metagenomic approaches, investigate DNA extraction biases, and provide tools for assessing the relative abundances of different groups, we explored the biodiversity of the accessible community DNA by fractioning the metagenomic DNA as a function of (i) vertical soil sampling, (ii) density gradients (cell separation), (iii) cell lysis stringency, and (iv) DNA fragment size distribution. Each fraction had a unique genetic diversity, with different predominant and rare species (based on ribosomal intergenic spacer analysis [RISA] fingerprinting and phylochips). All fractions contributed to the number of bacterial groups uncovered in the metagenome, thus increasing the DNA pool for further applications. Indeed, we were able to access a more genetically diverse proportion of the metagenome (a gain of more than 80% compared to the best single extraction method), limit the predominance of a few genomes, and increase the species richness per sequencing effort. This work stresses the difference between extracted DNA pools and the currently inaccessible complete soil metagenome. PMID:21183646

  12. Metagenomics and CAZyme Discovery.

    PubMed

    Kunath, Benoit J; Bremges, Andreas; Weimann, Aaron; McHardy, Alice C; Pope, Phillip B

    2017-01-01

    Microorganisms play a primary role in regulating biogeochemical cycles and are a valuable source of enzymes that have biotechnological applications, such as carbohydrate-active enzymes (CAZymes). However, the inability to culture the majority of microorganisms that exist in natural ecosystems using common culture-dependent techniques restricts access to potentially novel cellulolytic bacteria and beneficial enzymes. The development of molecular-based culture-independent methods such as metagenomics enables researchers to study microbial communities directly from environmental samples, and presents a platform from which enzymes of interest can be sourced. We outline key methodological stages that are required as well as describe specific protocols that are currently used for metagenomic projects dedicated to CAZyme discovery.

  13. Hot Spring Metagenomics

    PubMed Central

    López-López, Olalla; Cerdán, María Esperanza; González-Siso, María Isabel

    2013-01-01

    Hot springs have been investigated since the XIX century, but isolation and examination of their thermophilic microbial inhabitants did not start until the 1950s. Many thermophilic microorganisms and their viruses have since been discovered, although the real complexity of thermal communities was envisaged when research based on PCR amplification of the 16S rRNA genes arose. Thereafter, the possibility of cloning and sequencing the total environmental DNA, defined as metagenome, and the study of the genes rescued in the metagenomic libraries and assemblies made it possible to gain a more comprehensive understanding of microbial communities—their diversity, structure, the interactions existing between their components, and the factors shaping the nature of these communities. In the last decade, hot springs have been a source of thermophilic enzymes of industrial interest, encouraging further study of the poorly understood diversity of microbial life in these habitats. PMID:25369743

  14. Metagenomic Analysis of Bacterial Communities of Antarctic Surface Snow

    PubMed Central

    Lopatina, Anna; Medvedeva, Sofia; Shmakov, Sergey; Logacheva, Maria D.; Krylenkov, Vjacheslav; Severinov, Konstantin

    2016-01-01

    The diversity of bacteria present in surface snow around four Russian stations in Eastern Antarctica was studied by high throughput sequencing of amplified 16S rRNA gene fragments and shotgun metagenomic sequencing. Considerable class- and genus-level variation between the samples was revealed indicating a presence of inter-site diversity of bacteria in Antarctic snow. Flavobacterium was a major genus in one sampling site and was also detected in other sites. The diversity of flavobacterial type II-C CRISPR spacers in the samples was investigated by metagenome sequencing. Thousands of unique spacers were revealed with less than 35% overlap between the sampling sites, indicating an enormous natural variety of flavobacterial CRISPR spacers and, by extension, high level of adaptive activity of the corresponding CRISPR-Cas system. None of the spacers matched known spacers of flavobacterial isolates from the Northern hemisphere. Moreover, the percentage of spacers with matches with Antarctic metagenomic sequences obtained in this work was significantly higher than with sequences from much larger publically available environmental metagenomic database. The results indicate that despite the overall very high level of diversity, Antarctic Flavobacteria comprise a separate pool that experiences pressures from mobile genetic elements different from those present in other parts of the world. The results also establish analysis of metagenomic CRISPR spacer content as a powerful tool to study bacterial populations diversity. PMID:27064693

  15. A Graph-Centric Approach for Metagenome-Guided Peptide and Protein Identification in Metaproteomics.

    PubMed

    Tang, Haixu; Li, Sujun; Ye, Yuzhen

    2016-12-01

    Metaproteomic studies adopt the common bottom-up proteomics approach to investigate the protein composition and the dynamics of protein expression in microbial communities. When matched metagenomic and/or metatranscriptomic data of the microbial communities are available, metaproteomic data analyses often employ a metagenome-guided approach, in which complete or fragmental protein-coding genes are first directly predicted from metagenomic (and/or metatranscriptomic) sequences or from their assemblies, and the resulting protein sequences are then used as the reference database for peptide/protein identification from MS/MS spectra. This approach is often limited because protein coding genes predicted from metagenomes are incomplete and fragmental. In this paper, we present a graph-centric approach to improving metagenome-guided peptide and protein identification in metaproteomics. Our method exploits the de Bruijn graph structure reported by metagenome assembly algorithms to generate a comprehensive database of protein sequences encoded in the community. We tested our method using several public metaproteomic datasets with matched metagenomic and metatranscriptomic sequencing data acquired from complex microbial communities in a biological wastewater treatment plant. The results showed that many more peptides and proteins can be identified when assembly graphs were utilized, improving the characterization of the proteins expressed in the microbial communities. The additional proteins we identified contribute to the characterization of important pathways such as those involved in degradation of chemical hazards. Our tools are released as open-source software on github at https://github.com/COL-IU/Graph2Pro.

  16. A Graph-Centric Approach for Metagenome-Guided Peptide and Protein Identification in Metaproteomics

    PubMed Central

    Tang, Haixu; Li, Sujun; Ye, Yuzhen

    2016-01-01

    Metaproteomic studies adopt the common bottom-up proteomics approach to investigate the protein composition and the dynamics of protein expression in microbial communities. When matched metagenomic and/or metatranscriptomic data of the microbial communities are available, metaproteomic data analyses often employ a metagenome-guided approach, in which complete or fragmental protein-coding genes are first directly predicted from metagenomic (and/or metatranscriptomic) sequences or from their assemblies, and the resulting protein sequences are then used as the reference database for peptide/protein identification from MS/MS spectra. This approach is often limited because protein coding genes predicted from metagenomes are incomplete and fragmental. In this paper, we present a graph-centric approach to improving metagenome-guided peptide and protein identification in metaproteomics. Our method exploits the de Bruijn graph structure reported by metagenome assembly algorithms to generate a comprehensive database of protein sequences encoded in the community. We tested our method using several public metaproteomic datasets with matched metagenomic and metatranscriptomic sequencing data acquired from complex microbial communities in a biological wastewater treatment plant. The results showed that many more peptides and proteins can be identified when assembly graphs were utilized, improving the characterization of the proteins expressed in the microbial communities. The additional proteins we identified contribute to the characterization of important pathways such as those involved in degradation of chemical hazards. Our tools are released as open-source software on github at https://github.com/COL-IU/Graph2Pro. PMID:27918579

  17. A novel genome signature based on inter-nucleotide distances profiles for visualization of metagenomic data

    NASA Astrophysics Data System (ADS)

    Xie, Xian-Hua; Yu, Zu-Guo; Ma, Yuan-Lin; Han, Guo-Sheng; Anh, Vo

    2017-09-01

    There has been a growing interest in visualization of metagenomic data. The present study focuses on the visualization of metagenomic data using inter-nucleotide distances profile. We first convert the fragment sequences into inter-nucleotide distances profiles. Then we analyze these profiles by principal component analysis. Finally the principal components are used to obtain the 2-D scattered plot according to their source of species. We name our method as inter-nucleotide distances profiles (INP) method. Our method is evaluated on three benchmark data sets used in previous published papers. Our results demonstrate that the INP method is good, alternative and efficient for visualization of metagenomic data.

  18. MetaStorm: A Public Resource for Customizable Metagenomics Annotation.

    PubMed

    Arango-Argoty, Gustavo; Singh, Gargi; Heath, Lenwood S; Pruden, Amy; Xiao, Weidong; Zhang, Liqing

    2016-01-01

    Metagenomics is a trending research area, calling for the need to analyze large quantities of data generated from next generation DNA sequencing technologies. The need to store, retrieve, analyze, share, and visualize such data challenges current online computational systems. Interpretation and annotation of specific information is especially a challenge for metagenomic data sets derived from environmental samples, because current annotation systems only offer broad classification of microbial diversity and function. Moreover, existing resources are not configured to readily address common questions relevant to environmental systems. Here we developed a new online user-friendly metagenomic analysis server called MetaStorm (http://bench.cs.vt.edu/MetaStorm/), which facilitates customization of computational analysis for metagenomic data sets. Users can upload their own reference databases to tailor the metagenomics annotation to focus on various taxonomic and functional gene markers of interest. MetaStorm offers two major analysis pipelines: an assembly-based annotation pipeline and the standard read annotation pipeline used by existing web servers. These pipelines can be selected individually or together. Overall, MetaStorm provides enhanced interactive visualization to allow researchers to explore and manipulate taxonomy and functional annotation at various levels of resolution.

  19. Databases of the marine metagenomics.

    PubMed

    Mineta, Katsuhiko; Gojobori, Takashi

    2016-02-01

    The metagenomic data obtained from marine environments is significantly useful for understanding marine microbial communities. In comparison with the conventional amplicon-based approach of metagenomics, the recent shotgun sequencing-based approach has become a powerful tool that provides an efficient way of grasping a diversity of the entire microbial community at a sampling point in the sea. However, this approach accelerates accumulation of the metagenome data as well as increase of data complexity. Moreover, when metagenomic approach is used for monitoring a time change of marine environments at multiple locations of the seawater, accumulation of metagenomics data will become tremendous with an enormous speed. Because this kind of situation has started becoming of reality at many marine research institutions and stations all over the world, it looks obvious that the data management and analysis will be confronted by the so-called Big Data issues such as how the database can be constructed in an efficient way and how useful knowledge should be extracted from a vast amount of the data. In this review, we summarize the outline of all the major databases of marine metagenome that are currently publically available, noting that database exclusively on marine metagenome is none but the number of metagenome databases including marine metagenome data are six, unexpectedly still small. We also extend our explanation to the databases, as reference database we call, that will be useful for constructing a marine metagenome database as well as complementing important information with the database. Then, we would point out a number of challenges to be conquered in constructing the marine metagenome database.

  20. A statistical toolbox for metagenomics: assessing functional diversity in microbial communities

    PubMed Central

    Schloss, Patrick D; Handelsman, Jo

    2008-01-01

    Background The 99% of bacteria in the environment that are recalcitrant to culturing have spurred the development of metagenomics, a culture-independent approach to sample and characterize microbial genomes. Massive datasets of metagenomic sequences have been accumulated, but analysis of these sequences has focused primarily on the descriptive comparison of the relative abundance of proteins that belong to specific functional categories. More robust statistical methods are needed to make inferences from metagenomic data. In this study, we developed and applied a suite of tools to describe and compare the richness, membership, and structure of microbial communities using peptide fragment sequences extracted from metagenomic sequence data. Results Application of these tools to acid mine drainage, soil, and whale fall metagenomic sequence collections revealed groups of peptide fragments with a relatively high abundance and no known function. When combined with analysis of 16S rRNA gene fragments from the same communities these tools enabled us to demonstrate that although there was no overlap in the types of 16S rRNA gene sequence observed, there was a core collection of operational protein families that was shared among the three environments. Conclusion The results of comparisons between the three habitats were surprising considering the relatively low overlap of membership and the distinctively different characteristics of the three habitats. These tools will facilitate the use of metagenomics to pursue statistically sound genome-based ecological analyses. PMID:18215273

  1. Microbial Metagenomics: Beyond the Genome

    NASA Astrophysics Data System (ADS)

    Gilbert, Jack A.; Dupont, Christopher L.

    2011-01-01

    Metagenomics literally means “beyond the genome.” Marine microbial metagenomic databases presently comprise ˜400 billion base pairs of DNA, only ˜3% of that found in 1 ml of seawater. Very soon a trillion-base-pair sequence run will be feasible, so it is time to reflect on what we have learned from metagenomics. We review the impact of metagenomics on our understanding of marine microbial communities. We consider the studies facilitated by data generated through the Global Ocean Sampling expedition, as well as the revolution wrought at the individual laboratory level through next generation sequencing technologies. We review recent studies and discoveries since 2008, provide a discussion of bioinformatic analyses, including conceptual pipelines and sequence annotation and predict the future of metagenomics, with suggestions of collaborative community studies tailored toward answering some of the fundamental questions in marine microbial ecology.

  2. Precision Metagenomics: Rapid Metagenomic Analyses for Infectious Disease Diagnostics and Public Health Surveillance.

    PubMed

    Afshinnekoo, Ebrahim; Chou, Chou; Alexander, Noah; Ahsanuddin, Sofia; Schuetz, Audrey N; Mason, Christopher E

    2017-04-01

    Next-generation sequencing (NGS) technologies have ushered in the era of precision medicine, transforming the way we treat cancer patients and diagnose disease. Concomitantly, the advent of these technologies has created a surge of microbiome and metagenomic studies over the last decade, many of which are focused on investigating the host-gene-microbial interactions responsible for the development and spread of infectious diseases, as well as delineating their key role in maintaining health. As we continue to discover more information about the etiology of infectious diseases, the translational potential of metagenomic NGS methods for treatment and rapid diagnosis is becoming abundantly clear. Here, we present a robust protocol for the implementation and application of "precision metagenomics" across various sequencing platforms for clinical samples. Such a pipeline integrates DNA/RNA extraction, library preparation, sequencing, and bioinformatics analyses for taxonomic classification, antimicrobial resistance (AMR) marker screening, and functional analysis (biochemical and metabolic pathway abundance). Moreover, the pipeline has 3 tracks: STAT for results within 24 h; Comprehensive that affords a more in-depth analysis and takes between 5 and 7 d, but offers antimicrobial resistance information; and Targeted, which also requires 5-7 d, but with more sensitive analysis for specific pathogens. Finally, we discuss the challenges that need to be addressed before full integration in the clinical setting.

  3. Use of Substrate-Induced Gene Expression in Metagenomic Analysis of an Aromatic Hydrocarbon-Contaminated Soil

    PubMed Central

    Meier, Matthew J.; Paterson, E. Suzanne

    2015-01-01

    Metagenomics allows the study of genes related to xenobiotic degradation in a culture-independent manner, but many of these studies are limited by the lack of genomic context for metagenomic sequences. This study combined a phenotypic screen known as substrate-induced gene expression (SIGEX) with whole-metagenome shotgun sequencing. SIGEX is a high-throughput promoter-trap method that relies on transcriptional activation of a green fluorescent protein (GFP) reporter gene in response to an inducing compound and subsequent fluorescence-activated cell sorting to isolate individual inducible clones from a metagenomic DNA library. We describe a SIGEX procedure with improved library construction from fragmented metagenomic DNA and improved flow cytometry sorting procedures. We used SIGEX to interrogate an aromatic hydrocarbon (AH)-contaminated soil metagenome. The recovered clones contained sequences with various degrees of similarity to genes (or partial genes) involved in aromatic metabolism, for example, nahG (salicylate oxygenase) family genes and their respective upstream nahR regulators. To obtain a broader context for the recovered fragments, clones were mapped to contigs derived from de novo assembly of shotgun-sequenced metagenomic DNA which, in most cases, contained complete operons involved in aromatic metabolism, providing greater insight into the origin of the metagenomic fragments. A comparable set of contigs was generated using a significantly less computationally intensive procedure in which assembly of shotgun-sequenced metagenomic DNA was directed by the SIGEX-recovered sequences. This methodology may have broad applicability in identifying biologically relevant subsets of metagenomes (including both novel and known sequences) that can be targeted computationally by in silico assembly and prediction tools. PMID:26590287

  4. Use of Substrate-Induced Gene Expression in Metagenomic Analysis of an Aromatic Hydrocarbon-Contaminated Soil.

    PubMed

    Meier, Matthew J; Paterson, E Suzanne; Lambert, Iain B

    2015-11-20

    Metagenomics allows the study of genes related to xenobiotic degradation in a culture-independent manner, but many of these studies are limited by the lack of genomic context for metagenomic sequences. This study combined a phenotypic screen known as substrate-induced gene expression (SIGEX) with whole-metagenome shotgun sequencing. SIGEX is a high-throughput promoter-trap method that relies on transcriptional activation of a green fluorescent protein (GFP) reporter gene in response to an inducing compound and subsequent fluorescence-activated cell sorting to isolate individual inducible clones from a metagenomic DNA library. We describe a SIGEX procedure with improved library construction from fragmented metagenomic DNA and improved flow cytometry sorting procedures. We used SIGEX to interrogate an aromatic hydrocarbon (AH)-contaminated soil metagenome. The recovered clones contained sequences with various degrees of similarity to genes (or partial genes) involved in aromatic metabolism, for example, nahG (salicylate oxygenase) family genes and their respective upstream nahR regulators. To obtain a broader context for the recovered fragments, clones were mapped to contigs derived from de novo assembly of shotgun-sequenced metagenomic DNA which, in most cases, contained complete operons involved in aromatic metabolism, providing greater insight into the origin of the metagenomic fragments. A comparable set of contigs was generated using a significantly less computationally intensive procedure in which assembly of shotgun-sequenced metagenomic DNA was directed by the SIGEX-recovered sequences. This methodology may have broad applicability in identifying biologically relevant subsets of metagenomes (including both novel and known sequences) that can be targeted computationally by in silico assembly and prediction tools.

  5. Captured metagenomics: large-scale targeting of genes based on 'sequence capture' reveals functional diversity in soils.

    PubMed

    Manoharan, Lokeshwaran; Kushwaha, Sandeep K; Hedlund, Katarina; Ahrén, Dag

    2015-12-01

    Microbial enzyme diversity is a key to understand many ecosystem processes. Whole metagenome sequencing (WMG) obtains information on functional genes, but it is costly and inefficient due to large amount of sequencing that is required. In this study, we have applied a captured metagenomics technique for functional genes in soil microorganisms, as an alternative to WMG. Large-scale targeting of functional genes, coding for enzymes related to organic matter degradation, was applied to two agricultural soil communities through captured metagenomics. Captured metagenomics uses custom-designed, hybridization-based oligonucleotide probes that enrich functional genes of interest in metagenomic libraries where only probe-bound DNA fragments are sequenced. The captured metagenomes were highly enriched with targeted genes while maintaining their target diversity and their taxonomic distribution correlated well with the traditional ribosomal sequencing. The captured metagenomes were highly enriched with genes related to organic matter degradation; at least five times more than similar, publicly available soil WMG projects. This target enrichment technique also preserves the functional representation of the soils, thereby facilitating comparative metagenomics projects. Here, we present the first study that applies the captured metagenomics approach in large scale, and this novel method allows deep investigations of central ecosystem processes by studying functional gene abundances. © The Author 2015. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.

  6. Captured metagenomics: large-scale targeting of genes based on ‘sequence capture’ reveals functional diversity in soils

    PubMed Central

    Manoharan, Lokeshwaran; Kushwaha, Sandeep K.; Hedlund, Katarina; Ahrén, Dag

    2015-01-01

    Microbial enzyme diversity is a key to understand many ecosystem processes. Whole metagenome sequencing (WMG) obtains information on functional genes, but it is costly and inefficient due to large amount of sequencing that is required. In this study, we have applied a captured metagenomics technique for functional genes in soil microorganisms, as an alternative to WMG. Large-scale targeting of functional genes, coding for enzymes related to organic matter degradation, was applied to two agricultural soil communities through captured metagenomics. Captured metagenomics uses custom-designed, hybridization-based oligonucleotide probes that enrich functional genes of interest in metagenomic libraries where only probe-bound DNA fragments are sequenced. The captured metagenomes were highly enriched with targeted genes while maintaining their target diversity and their taxonomic distribution correlated well with the traditional ribosomal sequencing. The captured metagenomes were highly enriched with genes related to organic matter degradation; at least five times more than similar, publicly available soil WMG projects. This target enrichment technique also preserves the functional representation of the soils, thereby facilitating comparative metagenomics projects. Here, we present the first study that applies the captured metagenomics approach in large scale, and this novel method allows deep investigations of central ecosystem processes by studying functional gene abundances. PMID:26490729

  7. Reconstructing the genomic content of microbiome taxa through shotgun metagenomic deconvolution.

    PubMed

    Carr, Rogan; Shen-Orr, Shai S; Borenstein, Elhanan

    2013-01-01

    Metagenomics has transformed our understanding of the microbial world, allowing researchers to bypass the need to isolate and culture individual taxa and to directly characterize both the taxonomic and gene compositions of environmental samples. However, associating the genes found in a metagenomic sample with the specific taxa of origin remains a critical challenge. Existing binning methods, based on nucleotide composition or alignment to reference genomes allow only a coarse-grained classification and rely heavily on the availability of sequenced genomes from closely related taxa. Here, we introduce a novel computational framework, integrating variation in gene abundances across multiple samples with taxonomic abundance data to deconvolve metagenomic samples into taxa-specific gene profiles and to reconstruct the genomic content of community members. This assembly-free method is not bounded by various factors limiting previously described methods of metagenomic binning or metagenomic assembly and represents a fundamentally different approach to metagenomic-based genome reconstruction. An implementation of this framework is available at http://elbo.gs.washington.edu/software.html. We first describe the mathematical foundations of our framework and discuss considerations for implementing its various components. We demonstrate the ability of this framework to accurately deconvolve a set of metagenomic samples and to recover the gene content of individual taxa using synthetic metagenomic samples. We specifically characterize determinants of prediction accuracy and examine the impact of annotation errors on the reconstructed genomes. We finally apply metagenomic deconvolution to samples from the Human Microbiome Project, successfully reconstructing genus-level genomic content of various microbial genera, based solely on variation in gene count. These reconstructed genera are shown to correctly capture genus-specific properties. With the accumulation of metagenomic

  8. Binning of shallowly sampled metagenomic sequence fragments reveals that low abundance bacteria play important roles in sulfur cycling and degradation of complex organic polymers in an acid mine drainage community

    NASA Astrophysics Data System (ADS)

    Dick, G. J.; Andersson, A.; Banfield, J. F.

    2007-12-01

    Our understanding of environmental microbiology has been greatly enhanced by community genome sequencing of DNA recovered directly the environment. Community genomics provides insights into the diversity, community structure, metabolic function, and evolution of natural populations of uncultivated microbes, thereby revealing dynamics of how microorganisms interact with each other and their environment. Recent studies have demonstrated the potential for reconstructing near-complete genomes from natural environments while highlighting the challenges of analyzing community genomic sequence, especially from diverse environments. A major challenge of shotgun community genome sequencing is identification of DNA fragments from minor community members for which only low coverage of genomic sequence is present. We analyzed community genome sequence retrieved from biofilms in an acid mine drainage (AMD) system in the Richmond Mine at Iron Mountain, CA, with an emphasis on identification and assembly of DNA fragments from low-abundance community members. The Richmond mine hosts an extensive, relatively low diversity subterranean chemolithoautotrophic community that is sustained entirely by oxidative dissolution of pyrite. The activity of these microorganisms greatly accelerates the generation of AMD. Previous and ongoing work in our laboratory has focused on reconstrucing genomes of dominant community members, including several bacteria and archaea. We binned contigs from several samples (including one new sample and two that had been previously analyzed) by tetranucleotide frequency with clustering by Self-Organizing Maps (SOM). The binning, evaluated by comparison with information from the manually curated assembly of the dominant organisms, was found to be very effective: fragments were correctly assigned with 95% accuracy. Improperly assigned fragments often contained sequences that are either evolutionarily constrained (e.g. 16S rRNA genes) or mobile elements that are

  9. High throughput whole rumen metagenome profiling using untargeted massively parallel sequencing

    PubMed Central

    2012-01-01

    Background Variation of microorganism communities in the rumen of cattle (Bos taurus) is of great interest because of possible links to economically or environmentally important traits, such as feed conversion efficiency or methane emission levels. The resolution of studies investigating this variation may be improved by utilizing untargeted massively parallel sequencing (MPS), that is, sequencing without targeted amplification of genes. The objective of this study was to develop a method which used MPS to generate “rumen metagenome profiles”, and to investigate if these profiles were repeatable among samples taken from the same cow. Given faecal samples are much easier to obtain than rumen fluid samples; we also investigated whether rumen metagenome profiles were predictive of faecal metagenome profiles. Results Rather than focusing on individual organisms within the rumen, our method used MPS data to generate quantitative rumen micro-biome profiles, regardless of taxonomic classifications. The method requires a previously assembled reference metagenome. A number of such reference metagenomes were considered, including two rumen derived metagenomes, a human faecal microflora metagenome and a reference metagenome made up of publically available prokaryote sequences. Sequence reads from each test sample were aligned to these references. The “rumen metagenome profile” was generated from the number of the reads that aligned to each contig in the database. We used this method to test the hypothesis that rumen fluid microbial community profiles vary more between cows than within multiple samples from the same cow. Rumen fluid samples were taken from three cows, at three locations within the rumen. DNA from the samples was sequenced on the Illumina GAIIx. When the reads were aligned to a rumen metagenome reference, the rumen metagenome profiles were repeatable (P < 0.00001) by cow regardless of location of sampling rumen fluid. The repeatability was estimated

  10. New Hydrocarbon Degradation Pathways in the Microbial Metagenome from Brazilian Petroleum Reservoirs

    PubMed Central

    Sierra-García, Isabel Natalia; Correa Alvarez, Javier; Pantaroto de Vasconcellos, Suzan; Pereira de Souza, Anete; dos Santos Neto, Eugenio Vaz; de Oliveira, Valéria Maia

    2014-01-01

    Current knowledge of the microbial diversity and metabolic pathways involved in hydrocarbon degradation in petroleum reservoirs is still limited, mostly due to the difficulty in recovering the complex community from such an extreme environment. Metagenomics is a valuable tool to investigate the genetic and functional diversity of previously uncultured microorganisms in natural environments. Using a function-driven metagenomic approach, we investigated the metabolic abilities of microbial communities in oil reservoirs. Here, we describe novel functional metabolic pathways involved in the biodegradation of aromatic compounds in a metagenomic library obtained from an oil reservoir. Although many of the deduced proteins shared homology with known enzymes of different well-described aerobic and anaerobic catabolic pathways, the metagenomic fragments did not contain the complete clusters known to be involved in hydrocarbon degradation. Instead, the metagenomic fragments comprised genes belonging to different pathways, showing novel gene arrangements. These results reinforce the potential of the metagenomic approach for the identification and elucidation of new genes and pathways in poorly studied environments and contribute to a broader perspective on the hydrocarbon degradation processes in petroleum reservoirs. PMID:24587220

  11. New hydrocarbon degradation pathways in the microbial metagenome from Brazilian petroleum reservoirs.

    PubMed

    Sierra-García, Isabel Natalia; Correa Alvarez, Javier; de Vasconcellos, Suzan Pantaroto; Pereira de Souza, Anete; dos Santos Neto, Eugenio Vaz; de Oliveira, Valéria Maia

    2014-01-01

    Current knowledge of the microbial diversity and metabolic pathways involved in hydrocarbon degradation in petroleum reservoirs is still limited, mostly due to the difficulty in recovering the complex community from such an extreme environment. Metagenomics is a valuable tool to investigate the genetic and functional diversity of previously uncultured microorganisms in natural environments. Using a function-driven metagenomic approach, we investigated the metabolic abilities of microbial communities in oil reservoirs. Here, we describe novel functional metabolic pathways involved in the biodegradation of aromatic compounds in a metagenomic library obtained from an oil reservoir. Although many of the deduced proteins shared homology with known enzymes of different well-described aerobic and anaerobic catabolic pathways, the metagenomic fragments did not contain the complete clusters known to be involved in hydrocarbon degradation. Instead, the metagenomic fragments comprised genes belonging to different pathways, showing novel gene arrangements. These results reinforce the potential of the metagenomic approach for the identification and elucidation of new genes and pathways in poorly studied environments and contribute to a broader perspective on the hydrocarbon degradation processes in petroleum reservoirs.

  12. A study of the effectiveness of machine learning methods for classification of clinical interview fragments into a large number of categories.

    PubMed

    Hasan, Mehedi; Kotov, Alexander; Idalski Carcone, April; Dong, Ming; Naar, Sylvie; Brogan Hartlieb, Kathryn

    2016-08-01

    This study examines the effectiveness of state-of-the-art supervised machine learning methods in conjunction with different feature types for the task of automatic annotation of fragments of clinical text based on codebooks with a large number of categories. We used a collection of motivational interview transcripts consisting of 11,353 utterances, which were manually annotated by two human coders as the gold standard, and experimented with state-of-art classifiers, including Naïve Bayes, J48 Decision Tree, Support Vector Machine (SVM), Random Forest (RF), AdaBoost, DiscLDA, Conditional Random Fields (CRF) and Convolutional Neural Network (CNN) in conjunction with lexical, contextual (label of the previous utterance) and semantic (distribution of words in the utterance across the Linguistic Inquiry and Word Count dictionaries) features. We found out that, when the number of classes is large, the performance of CNN and CRF is inferior to SVM. When only lexical features were used, interview transcripts were automatically annotated by SVM with the highest classification accuracy among all classifiers of 70.8%, 61% and 53.7% based on the codebooks consisting of 17, 20 and 41 codes, respectively. Using contextual and semantic features, as well as their combination, in addition to lexical ones, improved the accuracy of SVM for annotation of utterances in motivational interview transcripts with a codebook consisting of 17 classes to 71.5%, 74.2%, and 75.1%, respectively. Our results demonstrate the potential of using machine learning methods in conjunction with lexical, semantic and contextual features for automatic annotation of clinical interview transcripts with near-human accuracy.

  13. Bacillus subtilis as a tool for screening soil metagenomic libraries for antimicrobial activities.

    PubMed

    Biver, Sophie; Steels, Sébastien; Portetelle, Daniel; Vandenbol, Micheline

    2013-06-28

    Finding new antimicrobial activities by functional metagenomics has been shown to depend on the heterologous host used to express the foreign DNA. Therefore, efforts are devoted to developing new tools for constructing metagenomic libraries in shuttle vectors replicatable in phylogenetically distinct hosts. Here we evaluated the use of the Escherichia coli-Bacillus subtilis shuttle vector pHT01 to construct a forest-soil metagenomic library. This library was screened in both hosts for antimicrobial activities against four opportunistic bacteria: Proteus vulgaris, Bacillus cereus, Staphylococcus epidermidis, and Micrococcus luteus. A new antibacterial activity against B. cereus was found upon screening in B. subtilis. The new antimicrobial agent, sensitive to proteinase K, was not active when the corresponding DNA fragment was expressed in E. coli. Our results validate the use of pHT01 as a shuttle vector and B. subtilis as a host to isolate new activities by functional metagenomics.

  14. IMG/M 4 version of the integrated metagenome comparative analysis system.

    PubMed

    Markowitz, Victor M; Chen, I-Min A; Chu, Ken; Szeto, Ernest; Palaniappan, Krishna; Pillay, Manoj; Ratner, Anna; Huang, Jinghua; Pagani, Ioanna; Tringe, Susannah; Huntemann, Marcel; Billis, Konstantinos; Varghese, Neha; Tennessen, Kristin; Mavromatis, Konstantinos; Pati, Amrita; Ivanova, Natalia N; Kyrpides, Nikos C

    2014-01-01

    IMG/M (http://img.jgi.doe.gov/m) provides support for comparative analysis of microbial community aggregate genomes (metagenomes) in the context of a comprehensive set of reference genomes from all three domains of life, as well as plasmids, viruses and genome fragments. IMG/M's data content and analytical tools have expanded continuously since its first version was released in 2007. Since the last report published in the 2012 NAR Database Issue, IMG/M's database architecture, annotation and data integration pipelines and analysis tools have been extended to copewith the rapid growth in the number and size of metagenome data sets handled by the system. IMG/M data marts provide support for the analysis of publicly available genomes, expert review of metagenome annotations (IMG/M ER: http://img.jgi.doe.gov/mer) and Human Microbiome Project (HMP)-specific metagenome samples (IMG/M HMP: http://img.jgi.doe.gov/imgm_hmp).

  15. Crass: identification and reconstruction of CRISPR from unassembled metagenomic data.

    PubMed

    Skennerton, Connor T; Imelfort, Michael; Tyson, Gene W

    2013-05-01

    Clustered regularly interspaced short palindromic repeats (CRISPR) constitute a bacterial and archaeal adaptive immune system that protect against bacteriophage (phage). Analysis of CRISPR loci reveals the history of phage infections and provides a direct link between phage and their hosts. All current tools for CRISPR identification have been developed to analyse completed genomes and are not well suited to the analysis of metagenomic data sets, where CRISPR loci are difficult to assemble owing to their repetitive structure and population heterogeneity. Here, we introduce a new algorithm, Crass, which is designed to identify and reconstruct CRISPR loci from raw metagenomic data without the need for assembly or prior knowledge of CRISPR in the data set. CRISPR in assembled data are often fragmented across many contigs/scaffolds and do not fully represent the population heterogeneity of CRISPR loci. Crass identified substantially more CRISPR in metagenomes previously analysed using assembly-based approaches. Using Crass, we were able to detect CRISPR that contained spacers with sequence homology to phage in the system, which would not have been identified using other approaches. The increased sensitivity, specificity and speed of Crass will facilitate comprehensive analysis of CRISPRs in metagenomic data sets, increasing our understanding of phage-host interactions and co-evolution within microbial communities.

  16. Crass: identification and reconstruction of CRISPR from unassembled metagenomic data

    PubMed Central

    Skennerton, Connor T.; Imelfort, Michael; Tyson, Gene W.

    2013-01-01

    Clustered regularly interspaced short palindromic repeats (CRISPR) constitute a bacterial and archaeal adaptive immune system that protect against bacteriophage (phage). Analysis of CRISPR loci reveals the history of phage infections and provides a direct link between phage and their hosts. All current tools for CRISPR identification have been developed to analyse completed genomes and are not well suited to the analysis of metagenomic data sets, where CRISPR loci are difficult to assemble owing to their repetitive structure and population heterogeneity. Here, we introduce a new algorithm, Crass, which is designed to identify and reconstruct CRISPR loci from raw metagenomic data without the need for assembly or prior knowledge of CRISPR in the data set. CRISPR in assembled data are often fragmented across many contigs/scaffolds and do not fully represent the population heterogeneity of CRISPR loci. Crass identified substantially more CRISPR in metagenomes previously analysed using assembly-based approaches. Using Crass, we were able to detect CRISPR that contained spacers with sequence homology to phage in the system, which would not have been identified using other approaches. The increased sensitivity, specificity and speed of Crass will facilitate comprehensive analysis of CRISPRs in metagenomic data sets, increasing our understanding of phage-host interactions and co-evolution within microbial communities. PMID:23511966

  17. METAGENOMICS AND PERSONALIZED MEDICINE

    PubMed Central

    Virgin, Herbert W.; Todd, John A.

    2015-01-01

    The microbiome is a complex community of Bacteria, Archaea, Eukarya and viruses that infect humans and live in our tissues. It contributes the majority of genetic information to our metagenome, and consequently, to our resistance and susceptibility to diseases, especially common inflammatory diseases, such as type 1 diabetes, ulcerative colitis, and Crohn's disease. Here we discuss how host-gene-microbial interactions are major determinants for the development of these multifactorial chronic disorders and thus, for the relationship between genotype and phenotype. We also explore how genome-wide association studies (GWAS) on autoimmune and inflammatory diseases are uncovering mechanism-based sub-types for these disorders. Applying these emerging concepts will permit a more complete understanding of the etiologies of complex diseases and underpin the development of both next generation animal models and new therapeutic strategies for targeting personalized disease phenotypes. PMID:21962506

  18. A Bioinformatician's Guide to Metagenomics

    SciTech Connect

    Kunin, Victor; Copeland, Alex; Lapidus, Alla; Mavromatis, Konstantinos; Hugenholtz, Philip

    2008-08-01

    As random shotgun metagenomic projects proliferate and become the dominant source of publicly available sequence data, procedures for best practices in their execution and analysis become increasingly important. Based on our experience at the Joint Genome Institute, we describe step-by-step the chain of decisions accompanying a metagenomic project from the viewpoint of a bioinformatician. We guide the reader through a standard workflow for a metagenomic project beginning with pre-sequencing considerations such as community composition and sequence data type that will greatly influence downstream analyses. We proceed with recommendations for sampling and data generation including sample and metadata collection, community profiling, construction of shotgun libraries and sequencing strategies. We then discuss the application of generic sequence processing steps (read preprocessing, assembly, and gene prediction and annotation) to metagenomic datasets by contrast to genome projects. Different types of data analyses particular to metagenomes are then presented including binning, dominant population analysis and gene-centric analysis. Finally data management systems and issues are presented and discussed. We hope that this review will assist bioinformaticians and biologists in making better-informed decisions on their journey during a metagenomic project.

  19. Open resource metagenomics: a model for sharing metagenomic libraries.

    PubMed

    Neufeld, J D; Engel, K; Cheng, J; Moreno-Hagelsieb, G; Rose, D R; Charles, T C

    2011-11-30

    Both sequence-based and activity-based exploitation of environmental DNA have provided unprecedented access to the genomic content of cultivated and uncultivated microorganisms. Although researchers deposit microbial strains in culture collections and DNA sequences in databases, activity-based metagenomic studies typically only publish sequences from the hits retrieved from specific screens. Physical metagenomic libraries, conceptually similar to entire sequence datasets, are usually not straightforward to obtain by interested parties subsequent to publication. In order to facilitate unrestricted distribution of metagenomic libraries, we propose the adoption of open resource metagenomics, in line with the trend towards open access publishing, and similar to culture- and mutant-strain collections that have been the backbone of traditional microbiology and microbial genetics. The concept of open resource metagenomics includes preparation of physical DNA libraries, preferably in versatile vectors that facilitate screening in a diversity of host organisms, and pooling of clones so that single aliquots containing complete libraries can be easily distributed upon request. Database deposition of associated metadata and sequence data for each library provides researchers with information to select the most appropriate libraries for further research projects. As a starting point, we have established the Canadian MetaMicroBiome Library (CM(2)BL [1]). The CM(2)BL is a publicly accessible collection of cosmid libraries containing environmental DNA from soils collected from across Canada, spanning multiple biomes. The libraries were constructed such that the cloned DNA can be easily transferred to Gateway® compliant vectors, facilitating functional screening in virtually any surrogate microbial host for which there are available plasmid vectors. The libraries, which we are placing in the public domain, will be distributed upon request without restriction to members of both the

  20. Open resource metagenomics: a model for sharing metagenomic libraries

    PubMed Central

    Neufeld, J.D.; Engel, K.; Cheng, J.; Moreno-Hagelsieb, G.; Rose, D.R.; Charles, T.C.

    2011-01-01

    Both sequence-based and activity-based exploitation of environmental DNA have provided unprecedented access to the genomic content of cultivated and uncultivated microorganisms. Although researchers deposit microbial strains in culture collections and DNA sequences in databases, activity-based metagenomic studies typically only publish sequences from the hits retrieved from specific screens. Physical metagenomic libraries, conceptually similar to entire sequence datasets, are usually not straightforward to obtain by interested parties subsequent to publication. In order to facilitate unrestricted distribution of metagenomic libraries, we propose the adoption of open resource metagenomics, in line with the trend towards open access publishing, and similar to culture- and mutant-strain collections that have been the backbone of traditional microbiology and microbial genetics. The concept of open resource metagenomics includes preparation of physical DNA libraries, preferably in versatile vectors that facilitate screening in a diversity of host organisms, and pooling of clones so that single aliquots containing complete libraries can be easily distributed upon request. Database deposition of associated metadata and sequence data for each library provides researchers with information to select the most appropriate libraries for further research projects. As a starting point, we have established the Canadian MetaMicroBiome Library (CM2BL [1]). The CM2BL is a publicly accessible collection of cosmid libraries containing environmental DNA from soils collected from across Canada, spanning multiple biomes. The libraries were constructed such that the cloned DNA can be easily transferred to Gateway® compliant vectors, facilitating functional screening in virtually any surrogate microbial host for which there are available plasmid vectors. The libraries, which we are placing in the public domain, will be distributed upon request without restriction to members of both the

  1. CLaMS: Classifier for Metagenomic Sequences

    SciTech Connect

    Pati, Amrita

    2010-12-01

    CLaMS-"Classifer for Metagenonic Sequences" is a Java application for binning assembled metagenomes wings user-specified training sequence sets and other user-specified initial parameters. Since ClAmS analyzes and matches sequence composition-based genomic signatures, it is much faster than binning tools that rely on alignments to homologs; CLaMS can bin ~20,000 sequences in 3 minutes on a laptop with a 2.4 Ghz. Intel Core 2 Duo processor and 2 GB Ram. CLaMS is meant to be desktop application for biologist and can be run on any machine under any operating system on which the Java Runtime Environment is enabled. CLaMS is freely available in both GVI-based and command-line based forms.

  2. MetaBAT: Metagenome Binning based on Abundance and Tetranucleotide frequence

    SciTech Connect

    Kang, Dongwan; Froula, Jeff; Egan, Rob; Wang, Zhong

    2014-03-21

    Grouping large fragments assembled from shotgun metagenomic sequences to deconvolute complex microbial communities, or metagenome binning, enables the study of individual organisms and their interactions. Here we developed automated metagenome binning software, called MetaBAT, which integrates empirical probabilistic distances of genome abundance and tetranucleotide frequency. On synthetic datasets MetaBAT on average achieves 98percent precision and 90percent recall at the strain level with 281 near complete unique genomes. Applying MetaBAT to a human gut microbiome data set we recovered 176 genome bins with 92percent precision and 80percent recall. Further analyses suggest MetaBAT is able to recover genome fragments missed in reference genomes up to 19percent, while 53 genome bins are novel. In summary, we believe MetaBAT is a powerful tool to facilitate comprehensive understanding of complex microbial communities.

  3. Functional metagenomics to decipher food-microbe-host crosstalk.

    PubMed

    Larraufie, Pierre; de Wouters, Tomas; Potocki-Veronese, Gabrielle; Blottière, Hervé M; Doré, Joël

    2015-02-01

    The recent developments of metagenomics permit an extremely high-resolution molecular scan of the intestinal microbiota giving new insights and opening perspectives for clinical applications. Beyond the unprecedented vision of the intestinal microbiota given by large-scale quantitative metagenomics studies, such as the EU MetaHIT project, functional metagenomics tools allow the exploration of fine interactions between food constituents, microbiota and host, leading to the identification of signals and intimate mechanisms of crosstalk, especially between bacteria and human cells. Cloning of large genome fragments, either from complex intestinal communities or from selected bacteria, allows the screening of these biological resources for bioactivity towards complex plant polymers or functional food such as prebiotics. This permitted identification of novel carbohydrate-active enzyme families involved in dietary fibre and host glycan breakdown, and highlighted unsuspected bacterial players at the top of the intestinal microbial food chain. Similarly, exposure of fractions from genomic and metagenomic clones onto human cells engineered with reporter systems to track modulation of immune response, cell proliferation or cell metabolism has allowed the identification of bioactive clones modulating key cell signalling pathways or the induction of specific genes. This opens the possibility to decipher mechanisms by which commensal bacteria or candidate probiotics can modulate the activity of cells in the intestinal epithelium or even in distal organs such as the liver, adipose tissue or the brain. Hence, in spite of our inability to culture many of the dominant microbes of the human intestine, functional metagenomics open a new window for the exploration of food-microbe-host crosstalk.

  4. Challenges of the Unknown: Clinical Application of Microbial Metagenomics.

    PubMed

    Rose, Graham; Wooldridge, David J; Anscombe, Catherine; Mee, Edward T; Misra, Raju V; Gharbia, Saheer

    2015-01-01

    Availability of fast, high throughput and low cost whole genome sequencing holds great promise within public health microbiology, with applications ranging from outbreak detection and tracking transmission events to understanding the role played by microbial communities in health and disease. Within clinical metagenomics, identifying microorganisms from a complex and host enriched background remains a central computational challenge. As proof of principle, we sequenced two metagenomic samples, a known viral mixture of 25 human pathogens and an unknown complex biological model using benchtop technology. The datasets were then analysed using a bioinformatic pipeline developed around recent fast classification methods. A targeted approach was able to detect 20 of the viruses against a background of host contamination from multiple sources and bacterial contamination. An alternative untargeted identification method was highly correlated with these classifications, and over 1,600 species were identified when applied to the complex biological model, including several species captured at over 50% genome coverage. In summary, this study demonstrates the great potential of applying metagenomics within the clinical laboratory setting and that this can be achieved using infrastructure available to nondedicated sequencing centres.

  5. Challenges of the Unknown: Clinical Application of Microbial Metagenomics

    PubMed Central

    Rose, Graham; Wooldridge, David J.; Anscombe, Catherine; Mee, Edward T.; Misra, Raju V.; Gharbia, Saheer

    2015-01-01

    Availability of fast, high throughput and low cost whole genome sequencing holds great promise within public health microbiology, with applications ranging from outbreak detection and tracking transmission events to understanding the role played by microbial communities in health and disease. Within clinical metagenomics, identifying microorganisms from a complex and host enriched background remains a central computational challenge. As proof of principle, we sequenced two metagenomic samples, a known viral mixture of 25 human pathogens and an unknown complex biological model using benchtop technology. The datasets were then analysed using a bioinformatic pipeline developed around recent fast classification methods. A targeted approach was able to detect 20 of the viruses against a background of host contamination from multiple sources and bacterial contamination. An alternative untargeted identification method was highly correlated with these classifications, and over 1,600 species were identified when applied to the complex biological model, including several species captured at over 50% genome coverage. In summary, this study demonstrates the great potential of applying metagenomics within the clinical laboratory setting and that this can be achieved using infrastructure available to nondedicated sequencing centres. PMID:26451363

  6. Web Resources for Metagenomics Studies

    PubMed Central

    Dudhagara, Pravin; Bhavsar, Sunil; Bhagat, Chintan; Ghelani, Anjana; Bhatt, Shreyas; Patel, Rajesh

    2015-01-01

    The development of next-generation sequencing (NGS) platforms spawned an enormous volume of data. This explosion in data has unearthed new scalability challenges for existing bioinformatics tools. The analysis of metagenomic sequences using bioinformatics pipelines is complicated by the substantial complexity of these data. In this article, we review several commonly-used online tools for metagenomics data analysis with respect to their quality and detail of analysis using simulated metagenomics data. There are at least a dozen such software tools presently available in the public domain. Among them, MGRAST, IMG/M, and METAVIR are the most well-known tools according to the number of citations by peer-reviewed scientific media up to mid-2015. Here, we describe 12 online tools with respect to their web link, annotation pipelines, clustering methods, online user support, and availability of data storage. We have also done the rating for each tool to screen more potential and preferential tools and evaluated five best tools using synthetic metagenome. The article comprehensively deals with the contemporary problems and the prospects of metagenomics from a bioinformatics viewpoint. PMID:26602607

  7. Web Resources for Metagenomics Studies.

    PubMed

    Dudhagara, Pravin; Bhavsar, Sunil; Bhagat, Chintan; Ghelani, Anjana; Bhatt, Shreyas; Patel, Rajesh

    2015-10-01

    The development of next-generation sequencing (NGS) platforms spawned an enormous volume of data. This explosion in data has unearthed new scalability challenges for existing bioinformatics tools. The analysis of metagenomic sequences using bioinformatics pipelines is complicated by the substantial complexity of these data. In this article, we review several commonly-used online tools for metagenomics data analysis with respect to their quality and detail of analysis using simulated metagenomics data. There are at least a dozen such software tools presently available in the public domain. Among them, MGRAST, IMG/M, and METAVIR are the most well-known tools according to the number of citations by peer-reviewed scientific media up to mid-2015. Here, we describe 12 online tools with respect to their web link, annotation pipelines, clustering methods, online user support, and availability of data storage. We have also done the rating for each tool to screen more potential and preferential tools and evaluated five best tools using synthetic metagenome. The article comprehensively deals with the contemporary problems and the prospects of metagenomics from a bioinformatics viewpoint. Copyright © 2015. Production and hosting by Elsevier Ltd.

  8. Metagenomics and the niche concept.

    PubMed

    Marco, Diana

    2008-08-01

    The metagenomics approach has revolutionised the fields of bacterial diversity, ecology and evolution, as well as derived applications like bioremediation and obtaining bioproducts. A further associated conceptual change has also occurred since in the metagenomics methodology the species is no longer the unit of study, but rather partial genome arrangements or even isolated genes. In spite of this, concepts coming from ecological and evolutionary fields traditionally centred on the species, like the concept of niche, are still being applied without further revision. A reformulation of the niche concept is necessary to deal with the new operative and epistemological challenges posed by the metagenomics approach. To contribute to this end, I review past and present uses of the niche concept in ecology and in microbiological studies, showing that a new, updated definition need to be used in the context of the metagenomics. Finally, I give some insights into a more adequate conceptual background for the utilisation of the niche concept in metagenomic studies. In particular, I raise the necessity of including the microbial genetic background as another variable into the niche space.

  9. Metazen - metadata capture for metagenomes.

    PubMed

    Bischof, Jared; Harrison, Travis; Paczian, Tobias; Glass, Elizabeth; Wilke, Andreas; Meyer, Folker

    2014-01-01

    As the impact and prevalence of large-scale metagenomic surveys grow, so does the acute need for more complete and standards compliant metadata. Metadata (data describing data) provides an essential complement to experimental data, helping to answer questions about its source, mode of collection, and reliability. Metadata collection and interpretation have become vital to the genomics and metagenomics communities, but considerable challenges remain, including exchange, curation, and distribution. Currently, tools are available for capturing basic field metadata during sampling, and for storing, updating and viewing it. Unfortunately, these tools are not specifically designed for metagenomic surveys; in particular, they lack the appropriate metadata collection templates, a centralized storage repository, and a unique ID linking system that can be used to easily port complete and compatible metagenomic metadata into widely used assembly and sequence analysis tools. Metazen was developed as a comprehensive framework designed to enable metadata capture for metagenomic sequencing projects. Specifically, Metazen provides a rapid, easy-to-use portal to encourage early deposition of project and sample metadata. Metazen is an interactive tool that aids users in recording their metadata in a complete and valid format. A defined set of mandatory fields captures vital information, while the option to add fields provides flexibility.

  10. IDENTIFICATION OF CHICKEN-SPECIFIC FECAL MICROBIAL SEQUENCES USING A METAGENOMIC APPROACH

    EPA Science Inventory

    In this study, we applied a genome fragment enrichment (GFE) method to select for genomic regions that differ between different fecal metagenomes. Competitive DNA hybridizations were performed between chicken fecal DNA and pig fecal DNA (C-P) and between chicken fecal DNA and an ...

  11. IDENTIFICATION OF CHICKEN-SPECIFIC FECAL MICROBIAL SEQUENCES USING A METAGENOMIC APPROACH

    EPA Science Inventory

    In this study, we applied a genome fragment enrichment (GFE) method to select for genomic regions that differ between different fecal metagenomes. Competitive DNA hybridizations were performed between chicken fecal DNA and pig fecal DNA (C-P) and between chicken fecal DNA and an ...

  12. Real time metagenomics: using k-mers to annotate metagenomes.

    PubMed

    Edwards, Robert A; Olson, Robert; Disz, Terry; Pusch, Gordon D; Vonstein, Veronika; Stevens, Rick; Overbeek, Ross

    2012-12-15

    Annotation of metagenomes involves comparing the individual sequence reads with a database of known sequences and assigning a unique function to each read. This is a time-consuming task that is computationally intensive (though not computationally complex). Here we present a novel approach to annotate metagenomes using unique k-mer oligopeptide sequences from 7 to 12 amino acids long. We demonstrate that k-mer-based annotations are faster and approach the sensitivity and precision of blastx-based annotations without loosing accuracy. A last-common ancestor approach was also developed to describe the members of the community.

  13. MetaCAA: A clustering-aided methodology for efficient assembly of metagenomic datasets.

    PubMed

    Reddy, Rachamalla Maheedhar; Mohammed, Monzoorul Haque; Mande, Sharmila S

    2014-01-01

    A key challenge in analyzing metagenomics data pertains to assembly of sequenced DNA fragments (i.e. reads) originating from various microbes in a given environmental sample. Several existing methodologies can assemble reads originating from a single genome. However, these methodologies cannot be applied for efficient assembly of metagenomic sequence datasets. In this study, we present MetaCAA - a clustering-aided methodology which helps in improving the quality of metagenomic sequence assembly. MetaCAA initially groups sequences constituting a given metagenome into smaller clusters. Subsequently, sequences in each cluster are independently assembled using CAP3, an existing single genome assembly program. Contigs formed in each of the clusters along with the unassembled reads are then subjected to another round of assembly for generating the final set of contigs. Validation using simulated and real-world metagenomic datasets indicates that MetaCAA aids in improving the overall quality of assembly. A software implementation of MetaCAA is available at https://metagenomics.atc.tcs.com/MetaCAA. Copyright © 2014 Elsevier Inc. All rights reserved.

  14. Metagenomics and novel gene discovery

    PubMed Central

    Culligan, Eamonn P; Sleator, Roy D; Marchesi, Julian R; Hill, Colin

    2014-01-01

    Metagenomics provides a means of assessing the total genetic pool of all the microbes in a particular environment, in a culture-independent manner. It has revealed unprecedented diversity in microbial community composition, which is further reflected in the encoded functional diversity of the genomes, a large proportion of which consists of novel genes. Herein, we review both sequence-based and functional metagenomic methods to uncover novel genes and outline some of the associated problems of each type of approach, as well as potential solutions. Furthermore, we discuss the potential for metagenomic biotherapeutic discovery, with a particular focus on the human gut microbiome and finally, we outline how the discovery of novel genes may be used to create bioengineered probiotics. PMID:24317337

  15. Novel Strategies for Applied Metagenomics.

    PubMed

    Moore-Connors, Jessica M; Dunn, Katherine A; Bielawski, Joseph P; Van Limbergen, Johan

    2016-03-01

    Detailed analyses of the gut microbiome and its effect on human physiology and disease are emerging, thanks to advances in high-throughput DNA-sequencing technology and the burgeoning field of metagenomics. Metagenomics examines the structure and functional potential of microbial communities in their native habitats through the direct isolation and analysis of community DNA. In inflammatory bowel disease, gut microbiome studies have shown an association with perturbations in community composition and, especially, function. In this review, we discuss the application of next-generation sequencing to microbiome research and highlight the importance of modeling microbiome structure and function to the future of inflammatory bowel disease research and treatment.

  16. Metagenomic biomarker discovery and explanation

    PubMed Central

    2011-01-01

    This study describes and validates a new method for metagenomic biomarker discovery by way of class comparison, tests of biological consistency and effect size estimation. This addresses the challenge of finding organisms, genes, or pathways that consistently explain the differences between two or more microbial communities, which is a central problem to the study of metagenomics. We extensively validate our method on several microbiomes and a convenient online interface for the method is provided at http://huttenhower.sph.harvard.edu/lefse/. PMID:21702898

  17. Precision Metagenomics: Rapid Metagenomic Analyses for Infectious Disease Diagnostics and Public Health Surveillance

    PubMed Central

    Afshinnekoo, Ebrahim; Chou, Chou; Alexander, Noah; Ahsanuddin, Sofia; Schuetz, Audrey N.; Mason, Christopher E.

    2017-01-01

    Next-generation sequencing (NGS) technologies have ushered in the era of precision medicine, transforming the way we treat cancer patients and diagnose disease. Concomitantly, the advent of these technologies has created a surge of microbiome and metagenomic studies over the last decade, many of which are focused on investigating the host-gene-microbial interactions responsible for the development and spread of infectious diseases, as well as delineating their key role in maintaining health. As we continue to discover more information about the etiology of infectious diseases, the translational potential of metagenomic NGS methods for treatment and rapid diagnosis is becoming abundantly clear. Here, we present a robust protocol for the implementation and application of “precision metagenomics” across various sequencing platforms for clinical samples. Such a pipeline integrates DNA/RNA extraction, library preparation, sequencing, and bioinformatics analyses for taxonomic classification, antimicrobial resistance (AMR) marker screening, and functional analysis (biochemical and metabolic pathway abundance). Moreover, the pipeline has 3 tracks: STAT for results within 24 h; Comprehensive that affords a more in-depth analysis and takes between 5 and 7 d, but offers antimicrobial resistance information; and Targeted, which also requires 5–7 d, but with more sensitive analysis for specific pathogens. Finally, we discuss the challenges that need to be addressed before full integration in the clinical setting. PMID:28337072

  18. riboFrame: An Improved Method for Microbial Taxonomy Profiling from Non-Targeted Metagenomics

    PubMed Central

    Ramazzotti, Matteo; Donati, Claudio; Cavalieri, Duccio

    2015-01-01

    Non-targeted metagenomics offers the unprecedented possibility of simultaneously investigate the microbial profile and the genetic capabilities of a sample by a direct analysis of its entire DNA content. The assessment of the microbial taxonomic composition is frequently obtained by mapping reads to genomic databases that, although growing, are still limited and biased. Here we present riboFrame, a novel procedure for microbial profiling based on the identification and classification of 16S rDNA sequences in non-targeted metagenomics datasets. Reads overlapping the 16S rDNA genes are identified using Hidden Markov Models and a taxonomic assignment is obtained by naïve Bayesian classification. All reads identified as ribosomal are coherently positioned in the 16S rDNA gene, allowing the use of the topology of the gene (i.e., the secondary structure and the location of variable regions) to guide the abundance analysis. We tested and verified the effectiveness of our method on simulated ribosomal data, on simulated metagenomes and on a real dataset. riboFrame exploits the taxonomic potentialities of the 16S rDNA gene in the context of non-targeted metagenomics, giving an accurate perspective on the microbial profile in metagenomic samples. PMID:26635865

  19. riboFrame: An Improved Method for Microbial Taxonomy Profiling from Non-Targeted Metagenomics.

    PubMed

    Ramazzotti, Matteo; Berná, Luisa; Donati, Claudio; Cavalieri, Duccio

    2015-01-01

    Non-targeted metagenomics offers the unprecedented possibility of simultaneously investigate the microbial profile and the genetic capabilities of a sample by a direct analysis of its entire DNA content. The assessment of the microbial taxonomic composition is frequently obtained by mapping reads to genomic databases that, although growing, are still limited and biased. Here we present riboFrame, a novel procedure for microbial profiling based on the identification and classification of 16S rDNA sequences in non-targeted metagenomics datasets. Reads overlapping the 16S rDNA genes are identified using Hidden Markov Models and a taxonomic assignment is obtained by naïve Bayesian classification. All reads identified as ribosomal are coherently positioned in the 16S rDNA gene, allowing the use of the topology of the gene (i.e., the secondary structure and the location of variable regions) to guide the abundance analysis. We tested and verified the effectiveness of our method on simulated ribosomal data, on simulated metagenomes and on a real dataset. riboFrame exploits the taxonomic potentialities of the 16S rDNA gene in the context of non-targeted metagenomics, giving an accurate perspective on the microbial profile in metagenomic samples.

  20. Characterization of the Viral Microbiome in Patients with Severe Lower Respiratory Tract Infections, Using Metagenomic Sequencing

    PubMed Central

    Lysholm, Fredrik; Wetterbom, Anna; Lindau, Cecilia; Darban, Hamid; Bjerkner, Annelie; Fahlander, Kristina; Lindberg, A. Michael; Persson, Bengt; Allander, Tobias; Andersson, Björn

    2012-01-01

    The human respiratory tract is heavily exposed to microorganisms. Viral respiratory tract pathogens, like RSV, influenza and rhinoviruses cause major morbidity and mortality from respiratory tract disease. Furthermore, as viruses have limited means of transmission, viruses that cause pathogenicity in other tissues may be transmitted through the respiratory tract. It is therefore important to chart the human virome in this compartment. We have studied nasopharyngeal aspirate samples submitted to the Karolinska University Laboratory, Stockholm, Sweden from March 2004 to May 2005 for diagnosis of respiratory tract infections. We have used a metagenomic sequencing strategy to characterize viruses, as this provides the most unbiased view of the samples. Virus enrichment followed by 454 sequencing resulted in totally 703,790 reads and 110,931 of these were found to be of viral origin by using an automated classification pipeline. The snapshot of the respiratory tract virome of these 210 patients revealed 39 species and many more strains of viruses. Most of the viral sequences were classified into one of three major families; Paramyxoviridae, Picornaviridae or Orthomyxoviridae. The study also identified one novel type of Rhinovirus C, and identified a number of previously undescribed viral genetic fragments of unknown origin. PMID:22355331

  1. Multivariate Analysis of Functional Metagenomes

    PubMed Central

    Dinsdale, Elizabeth A.; Edwards, Robert A.; Bailey, Barbara A.; Tuba, Imre; Akhter, Sajia; McNair, Katelyn; Schmieder, Robert; Apkarian, Naneh; Creek, Michelle; Guan, Eric; Hernandez, Mayra; Isaacs, Katherine; Peterson, Chris; Regh, Todd; Ponomarenko, Vadim

    2013-01-01

    Metagenomics is a primary tool for the description of microbial and viral communities. The sheer magnitude of the data generated in each metagenome makes identifying key differences in the function and taxonomy between communities difficult to elucidate. Here we discuss the application of seven different data mining and statistical analyses by comparing and contrasting the metabolic functions of 212 microbial metagenomes within and between 10 environments. Not all approaches are appropriate for all questions, and researchers should decide which approach addresses their questions. This work demonstrated the use of each approach: for example, random forests provided a robust and enlightening description of both the clustering of metagenomes and the metabolic processes that were important in separating microbial communities from different environments. All analyses identified that the presence of phage genes within the microbial community was a predictor of whether the microbial community was host-associated or free-living. Several analyses identified the subtle differences that occur with environments, such as those seen in different regions of the marine environment. PMID:23579547

  2. Estimating richness from phage metagenomes

    USDA-ARS?s Scientific Manuscript database

    Bacteriophages are important drivers of ecosystem functions, yet little is known about the vast majority of phages. Phage metagenomics, or the study of the collective genome of an assemblage of phages, enables the investigation of broad ecological questions in phage communities. One ecological cha...

  3. Metagenomic analysis of the pygmy loris fecal microbiome reveals unique functional capacity related to metabolism of aromatic compounds.

    PubMed

    Xu, Bo; Xu, Weijiang; Yang, Fuya; Li, Junjun; Yang, Yunjuan; Tang, Xianghua; Mu, Yuelin; Zhou, Junpei; Huang, Zunxi

    2013-01-01

    The animal gastrointestinal tract contains a complex community of microbes, whose composition ultimately reflects the co-evolution of microorganisms with their animal host. An analysis of 78,619 pyrosequencing reads generated from pygmy loris fecal DNA extracts was performed to help better understand the microbial diversity and functional capacity of the pygmy loris gut microbiome. The taxonomic analysis of the metagenomic reads indicated that pygmy loris fecal microbiomes were dominated by Bacteroidetes and Proteobacteria phyla. The hierarchical clustering of several gastrointestinal metagenomes demonstrated the similarities of the microbial community structures of pygmy loris and mouse gut systems despite their differences in functional capacity. The comparative analysis of function classification revealed that the metagenome of the pygmy loris was characterized by an overrepresentation of those sequences involved in aromatic compound metabolism compared with humans and other animals. The key enzymes related to the benzoate degradation pathway were identified based on the Kyoto Encyclopedia of Genes and Genomes pathway assignment. These results would contribute to the limited body of primate metagenome studies and provide a framework for comparative metagenomic analysis between human and non-human primates, as well as a comparative understanding of the evolution of humans and their microbiome. However, future studies on the metagenome sequencing of pygmy loris and other prosimians regarding the effects of age, genetics, and environment on the composition and activity of the metagenomes are required.

  4. Metagenomic Analysis of the Pygmy Loris Fecal Microbiome Reveals Unique Functional Capacity Related to Metabolism of Aromatic Compounds

    PubMed Central

    Xu, Bo; Xu, Weijiang; Yang, Fuya; Li, Junjun; Yang, Yunjuan; Tang, Xianghua; Mu, Yuelin; Zhou, Junpei; Huang, Zunxi

    2013-01-01

    The animal gastrointestinal tract contains a complex community of microbes, whose composition ultimately reflects the co-evolution of microorganisms with their animal host. An analysis of 78,619 pyrosequencing reads generated from pygmy loris fecal DNA extracts was performed to help better understand the microbial diversity and functional capacity of the pygmy loris gut microbiome. The taxonomic analysis of the metagenomic reads indicated that pygmy loris fecal microbiomes were dominated by Bacteroidetes and Proteobacteria phyla. The hierarchical clustering of several gastrointestinal metagenomes demonstrated the similarities of the microbial community structures of pygmy loris and mouse gut systems despite their differences in functional capacity. The comparative analysis of function classification revealed that the metagenome of the pygmy loris was characterized by an overrepresentation of those sequences involved in aromatic compound metabolism compared with humans and other animals. The key enzymes related to the benzoate degradation pathway were identified based on the Kyoto Encyclopedia of Genes and Genomes pathway assignment. These results would contribute to the limited body of primate metagenome studies and provide a framework for comparative metagenomic analysis between human and non-human primates, as well as a comparative understanding of the evolution of humans and their microbiome. However, future studies on the metagenome sequencing of pygmy loris and other prosimians regarding the effects of age, genetics, and environment on the composition and activity of the metagenomes are required. PMID:23457582

  5. Practical application of self-organizing maps to interrelate biodiversity and functional data in NGS-based metagenomics

    PubMed Central

    Weber, Marc; Teeling, Hanno; Huang, Sixing; Waldmann, Jost; Kassabgy, Mariette; Fuchs, Bernhard M; Klindworth, Anna; Klockow, Christine; Wichels, Antje; Gerdts, Gunnar; Amann, Rudolf; Glöckner, Frank Oliver

    2011-01-01

    Next-generation sequencing (NGS) technologies have enabled the application of broad-scale sequencing in microbial biodiversity and metagenome studies. Biodiversity is usually targeted by classifying 16S ribosomal RNA genes, while metagenomic approaches target metabolic genes. However, both approaches remain isolated, as long as the taxonomic and functional information cannot be interrelated. Techniques like self-organizing maps (SOMs) have been applied to cluster metagenomes into taxon-specific bins in order to link biodiversity with functions, but have not been applied to broad-scale NGS-based metagenomics yet. Here, we provide a novel implementation, demonstrate its potential and practicability, and provide a web-based service for public usage. Evaluation with published data sets mimicking varyingly complex habitats resulted into classification specificities and sensitivities of close to 100% to above 90% from phylum to genus level for assemblies exceeding 8 kb for low and medium complexity data. When applied to five real-world metagenomes of medium complexity from direct pyrosequencing of marine subsurface waters, classifications of assemblies above 2.5 kb were in good agreement with fluorescence in situ hybridizations, indicating that biodiversity was mostly retained within the metagenomes, and confirming high classification specificities. This was validated by two protein-based classifications (PBCs) methods. SOMs were able to retrieve the relevant taxa down to the genus level, while surpassing PBCs in resolution. In order to make the approach accessible to a broad audience, we implemented a feature-rich web-based SOM application named TaxSOM, which is freely available at http://www.megx.net/toolbox/taxsom. TaxSOM can classify reads or assemblies exceeding 2.5 kb with high accuracy and thus assists in linking biodiversity and functions in metagenome studies, which is a precondition to study microbial ecology in a holistic fashion. PMID:21160538

  6. Practical application of self-organizing maps to interrelate biodiversity and functional data in NGS-based metagenomics.

    PubMed

    Weber, Marc; Teeling, Hanno; Huang, Sixing; Waldmann, Jost; Kassabgy, Mariette; Fuchs, Bernhard M; Klindworth, Anna; Klockow, Christine; Wichels, Antje; Gerdts, Gunnar; Amann, Rudolf; Glöckner, Frank Oliver

    2011-05-01

    Next-generation sequencing (NGS) technologies have enabled the application of broad-scale sequencing in microbial biodiversity and metagenome studies. Biodiversity is usually targeted by classifying 16S ribosomal RNA genes, while metagenomic approaches target metabolic genes. However, both approaches remain isolated, as long as the taxonomic and functional information cannot be interrelated. Techniques like self-organizing maps (SOMs) have been applied to cluster metagenomes into taxon-specific bins in order to link biodiversity with functions, but have not been applied to broad-scale NGS-based metagenomics yet. Here, we provide a novel implementation, demonstrate its potential and practicability, and provide a web-based service for public usage. Evaluation with published data sets mimicking varyingly complex habitats resulted into classification specificities and sensitivities of close to 100% to above 90% from phylum to genus level for assemblies exceeding 8 kb for low and medium complexity data. When applied to five real-world metagenomes of medium complexity from direct pyrosequencing of marine subsurface waters, classifications of assemblies above 2.5 kb were in good agreement with fluorescence in situ hybridizations, indicating that biodiversity was mostly retained within the metagenomes, and confirming high classification specificities. This was validated by two protein-based classifications (PBCs) methods. SOMs were able to retrieve the relevant taxa down to the genus level, while surpassing PBCs in resolution. In order to make the approach accessible to a broad audience, we implemented a feature-rich web-based SOM application named TaxSOM, which is freely available at http://www.megx.net/toolbox/taxsom. TaxSOM can classify reads or assemblies exceeding 2.5 kb with high accuracy and thus assists in linking biodiversity and functions in metagenome studies, which is a precondition to study microbial ecology in a holistic fashion.

  7. Use of object-oriented classification and fragmentation analysis (1985-2008) to identify important areas for conservation in Cockpit Country, Jamaica.

    PubMed

    Newman, Minke E; McLaren, Kurt P; Wilson, Byron S

    2011-01-01

    Forest fragmentation is one of the most important threats to global biodiversity, particularly in tropical developing countries. Identifying priority areas for conservation within these forests is essential to their effective management. However, this requires current, accurate environmental information that is often lacking in developing countries. The Cockpit Country, Jamaica, contains forests of international importance in terms of levels of endemism and overall diversity. These forests are under severe threat from the prospect of bauxite mining and other anthropogenic disturbances. In the absence of adequate, up-to-date ecological information, we used satellite remote sensing data and fragmentation analysis to identify interior forested areas that have experienced little or no change as priority conservation sites. We classified Landsat images from 1985, 1989, 1995, 2002, and 2008, using an object-oriented method, which allowed for the inclusion of roads. We conducted our fragmentation analysis using metrics to quantify changes in forest patch number, area, shape, and aggregation. Deforestation and fragmentation fluctuated within the 23-year period but were mostly confined to the periphery of the forest, close to roads and access trails. An area of core forest that remained intact over the period of study was identified within the largest forest patch, most of which was located within the boundaries of a forest reserve and included the last remaining patches of closed-broadleaf forest. These areas should be given highest priority for conservation, as they constitute important refuges for endemic or threatened biodiversity. Minimizing and controlling access will be important in maintaining this core.

  8. HPMCD: the database of human microbial communities from metagenomic datasets and microbial reference genomes

    PubMed Central

    Forster, Samuel C.; Browne, Hilary P.; Kumar, Nitin; Hunt, Martin; Denise, Hubert; Mitchell, Alex; Finn, Robert D.; Lawley, Trevor D.

    2016-01-01

    The Human Pan-Microbe Communities (HPMC) database (http://www.hpmcd.org/) provides a manually curated, searchable, metagenomic resource to facilitate investigation of human gastrointestinal microbiota. Over the past decade, the application of metagenome sequencing to elucidate the microbial composition and functional capacity present in the human microbiome has revolutionized many concepts in our basic biology. When sufficient high quality reference genomes are available, whole genome metagenomic sequencing can provide direct biological insights and high-resolution classification. The HPMC database provides species level, standardized phylogenetic classification of over 1800 human gastrointestinal metagenomic samples. This is achieved by combining a manually curated list of bacterial genomes from human faecal samples with over 21000 additional reference genomes representing bacteria, viruses, archaea and fungi with manually curated species classification and enhanced sample metadata annotation. A user-friendly, web-based interface provides the ability to search for (i) microbial groups associated with health or disease state, (ii) health or disease states and community structure associated with a microbial group, (iii) the enrichment of a microbial gene or sequence and (iv) enrichment of a functional annotation. The HPMC database enables detailed analysis of human microbial communities and supports research from basic microbiology and immunology to therapeutic development in human health and disease. PMID:26578596

  9. Metagenomes from Argonne's MG-RAST Metagenomics Analysis Server

    DOE Data Explorer

    MG-RAST has a large number of datasets that researchers have deposited for public use. As of July, 2014, the number of metagenomes represented by MG-RAST numbered more than 18,500, and the number of available sequences was more than 75 million! The public can browse the collection several different ways, and researchers can login to deposit new data. Researchers have the choice of keeping a dataset private so that it is viewable only by them when logged in, or they can choose to make a dataset public at any time with a simple click of a link. MG-RAST was launched in 2007 by the Mathematics and Computer Science Division at Argonne National Laboratory (ANL). It is part of the toolkit available to the Terragenomics project, which seeks to do a comprehensive metagenomics study of U.S. soil. The Terragenomics project page is located at http://www.mcs.anl.gov/research/projects/terragenomics/.

  10. C16S - a Hidden Markov Model based algorithm for taxonomic classification of 16S rRNA gene sequences.

    PubMed

    Ghosh, Tarini Shankar; Gajjalla, Purnachander; Mohammed, Monzoorul Haque; Mande, Sharmila S

    2012-04-01

    Recent advances in high throughput sequencing technologies and concurrent refinements in 16S rDNA isolation techniques have facilitated the rapid extraction and sequencing of 16S rDNA content of microbial communities. The taxonomic affiliation of these 16S rDNA fragments is subsequently obtained using either BLAST-based or word frequency based approaches. However, the classification accuracy of such methods is observed to be limited in typical metagenomic scenarios, wherein a majority of organisms are hitherto unknown. In this study, we present a 16S rDNA classification algorithm, called C16S, that uses genus-specific Hidden Markov Models for taxonomic classification of 16S rDNA sequences. Results obtained using C16S have been compared with the widely used RDP classifier. The performance of C16S algorithm was observed to be consistently higher than the RDP classifier. In some scenarios, this increase in accuracy is as high as 34%. A web-server for the C16S algorithm is available at http://metagenomics.atc.tcs.com/C16S/.

  11. An application of statistics to comparative metagenomics

    PubMed Central

    Rodriguez-Brito, Beltran; Rohwer, Forest; Edwards, Robert A

    2006-01-01

    Background Metagenomics, sequence analyses of genomic DNA isolated directly from the environments, can be used to identify organisms and model community dynamics of a particular ecosystem. Metagenomics also has the potential to identify significantly different metabolic potential in different environments. Results Here we use a statistical method to compare curated subsystems, to predict the physiology, metabolism, and ecology from metagenomes. This approach can be used to identify those subsystems that are significantly different between metagenome sequences. Subsystems that were overrepresented in the Sargasso Sea and Acid Mine Drainage metagenome when compared to non-redundant databases were identified. Conclusion The methodology described herein applies statistics to the comparisons of metabolic potential in metagenomes. This analysis reveals those subsystems that are more, or less, represented in the different environments that are compared. These differences in metabolic potential lead to several testable hypotheses about physiology and metabolism of microbes from these ecosystems. PMID:16549025

  12. The metagenomics RAST server - a public resource for the automatic phylogenetic and functional analysis of metagenomes.

    SciTech Connect

    Meyer, F.; Paarmann, D.; D'Souza, M.; Olson, R.; Glass, E. M.; Kubal, M.; Paczian, T.; Stevens, R.; Wilke, A.; Wilkening, J.; Edwards, R. A.; Rodriguez, A.; Mathematics and Computer Science; Univ. of Chicago; San Diego State Univ.

    2008-09-19

    Random community genomes (metagenomes) are now commonly used to study microbes in different environments. Over the past few years, the major challenge associated with metagenomics shifted from generating to analyzing sequences. High-throughput, low-cost next-generation sequencing has provided access to metagenomics to a wide range of researchers. A high-throughput pipeline has been constructed to provide high-performance computing to all researchers interested in using metagenomics. The pipeline produces automated functional assignments of sequences in the metagenome by comparing both protein and nucleotide databases. phylogenetic and functional summaries of the metagenomes are generated, and tools for comparative metagenomics are incorporated into the standard views. user access is controlled to ensure data privacy, but the collaborative environment underpinning the service provides a framework for sharing databasets between multiple users. In the metagenomics RAST, all users retain full control of their data, and everything is available for download in a variety of formats. The open-source metagenomics RAST service provides a new paradigm for the annotation and analysis of metagenomes. With built-in support for multiple data sources and a back end that houses abstract data types, the metagenomics RAST is stable, extensible, and freely available to all researchers. This service has removed one of the primary bottlenecks in metagenome sequence analysis--the available of high-performance computing for annotating the data.

  13. Metagenome Assembly at the DOE JGI (Metagenomics Informatics Challenges Workshop: 10K Genomes at a Time)

    SciTech Connect

    Chain, Patrick

    2011-10-13

    Patrick Chain of DOE JGI at LANL, Co-Chair of the Metagenome-specific Assembly session, on "Metagenome Assembly at the DOE JGI" at the Metagenomics Informatics Challenges Workshop held at the DOE JGI on October 12-13, 2011.

  14. Metagenome Assembly at the DOE JGI (Metagenomics Informatics Challenges Workshop: 10K Genomes at a Time)

    ScienceCinema

    Chain, Patrick [DOE JGI at LANL

    2016-07-12

    Patrick Chain of DOE JGI at LANL, Co-Chair of the Metagenome-specific Assembly session, on "Metagenome Assembly at the DOE JGI" at the Metagenomics Informatics Challenges Workshop held at the DOE JGI on October 12-13, 2011.

  15. Suppressive subtractive hybridization as a tool for identifying genetic diversity in an environmental metagenome: the rumen as a model.

    PubMed

    Galbraith, Elizabeth A; Antonopoulos, Dionysios A; White, Bryan A

    2004-09-01

    Molecular techniques previously used for genome comparisons of closely related bacterial species could prove extremely valuable for comparisons of complex microbial communities, or metagenomes. Our study aimed to determine the breadth and value of suppressive subtractive hybridization (SSH) in a pilot-scale analysis of metagenomic DNA from communities of microorganisms in the rumen. Suppressive subtractive hybridization was performed using total genomic DNA isolated from rumen fluid samples of two hay-fed steers, arbitrarily designated as tester or driver. Ninety-six subtraction DNA fragments from the tester metagenome were amplified, cloned and the DNA sequences were determined. Verification of the isolation of DNA fragments unique to the tester metagenome was accomplished through dot blot and Southern blot hybridizations. Tester-specific SSH fragments were found in 95 of 96 randomly selected clones. DNA sequences of subtraction fragments were analysed by computer assisted DNA and amino acid comparisons. Putative translations of 26 (32.1%) subtractive hybridization fragments exhibited significant similarity to Bacterial proteins, whereas 15 (18.5%) distinctive subtracted fragments had significant similarity to proteins from Archaea. The remainder of the subtractive hybridization fragments displayed no similarity to GenBank sequences. This metagenomic approach has exposed an unexpectedly large difference in Archaeal community structure between the rumen microbial populations of two steers fed identical diets and housed together. 16S rRNA dot blot hybridizations revealed similar proportions of Bacteria and Archaea in both rumen samples and suggest that the differences uncovered by SSH are the result of varying community structural composition. Our study demonstrates a novel approach to comparative analyses of environmental microbial communities through the use of SSH.

  16. The Largest Fragment of a Homogeneous Fragmentation Process

    NASA Astrophysics Data System (ADS)

    Kyprianou, Andreas; Lane, Francis; Mörters, Peter

    2017-03-01

    We show that in homogeneous fragmentation processes the largest fragment at time t has size e^{-t Φ '(overline{p})}t^{-3/2 (log Φ )'(overline{p})+o(1)}, where Φ is the Lévy exponent of the fragmentation process, and overline{p} is the unique solution of the equation (log Φ )'(bar{p})=1/1+bar{p}. We argue that this result is in line with predictions arising from the classification of homogeneous fragmentation processes as logarithmically correlated random fields.

  17. Marine metagenomics as a source for bioprospecting.

    PubMed

    Kodzius, Rimantas; Gojobori, Takashi

    2015-12-01

    This review summarizes usage of genome-editing technologies for metagenomic studies; these studies are used to retrieve and modify valuable microorganisms for production, particularly in marine metagenomics. Organisms may be cultivable or uncultivable. Metagenomics is providing especially valuable information for uncultivable samples. The novel genes, pathways and genomes can be deducted. Therefore, metagenomics, particularly genome engineering and system biology, allows for the enhancement of biological and chemical producers and the creation of novel bioresources. With natural resources rapidly depleting, genomics may be an effective way to efficiently produce quantities of known and novel foods, livestock feed, fuels, pharmaceuticals and fine or bulk chemicals.

  18. Metagenomic Assembly: Overview, Challenges and Applications

    PubMed Central

    Ghurye, Jay S.; Cepeda-Espinoza, Victoria; Pop, Mihai

    2016-01-01

    Advances in sequencing technologies have led to the increased use of high throughput sequencing in characterizing the microbial communities associated with our bodies and our environment. Critical to the analysis of the resulting data are sequence assembly algorithms able to reconstruct genes and organisms from complex mixtures. Metagenomic assembly involves new computational challenges due to the specific characteristics of the metagenomic data. In this survey, we focus on major algorithmic approaches for genome and metagenome assembly, and discuss the new challenges and opportunities afforded by this new field. We also review several applications of metagenome assembly in addressing interesting biological problems. PMID:27698619

  19. Binning sequences using very sparse labels within a metagenome

    PubMed Central

    Chan, Chon-Kit Kenneth; Hsu, Arthur L; Halgamuge, Saman K; Tang, Sen-Lin

    2008-01-01

    Background In metagenomic studies, a process called binning is necessary to assign contigs that belong to multiple species to their respective phylogenetic groups. Most of the current methods of binning, such as BLAST, k-mer and PhyloPythia, involve assigning sequence fragments by comparing sequence similarity or sequence composition with already-sequenced genomes that are still far from comprehensive. We propose a semi-supervised seeding method for binning that does not depend on knowledge of completed genomes. Instead, it extracts the flanking sequences of highly conserved 16S rRNA from the metagenome and uses them as seeds (labels) to assign other reads based on their compositional similarity. Results The proposed seeding method is implemented on an unsupervised Growing Self-Organising Map (GSOM), and called Seeded GSOM (S-GSOM). We compared it with four well-known semi-supervised learning methods in a preliminary test, separating random-length prokaryotic sequence fragments sampled from the NCBI genome database. We identified the flanking sequences of the highly conserved 16S rRNA as suitable seeds that could be used to group the sequence fragments according to their species. S-GSOM showed superior performance compared to the semi-supervised methods tested. Additionally, S-GSOM may also be used to visually identify some species that do not have seeds. The proposed method was then applied to simulated metagenomic datasets using two different confidence threshold settings and compared with PhyloPythia, k-mer and BLAST. At the reference taxonomic level Order, S-GSOM outperformed all k-mer and BLAST results and showed comparable results with PhyloPythia for each of the corresponding confidence settings, where S-GSOM performed better than PhyloPythia in the ≥ 10 reads datasets and comparable in the ≥ 8 kb benchmark tests. Conclusion In the task of binning using semi-supervised learning methods, results indicate S-GSOM to be the best of the methods tested. Most

  20. Functional metagenomics of extreme environments.

    PubMed

    Mirete, Salvador; Morgante, Verónica; González-Pastor, José Eduardo

    2016-04-01

    The bioprospecting of enzymes that operate under extreme conditions is of particular interest for many biotechnological and industrial processes. Nevertheless, there is a considerable limitation to retrieve novel enzymes as only a small fraction of microorganisms derived from extreme environments can be cultured under standard laboratory conditions. Functional metagenomics has the advantage of not requiring the cultivation of microorganisms or previous sequence information to known genes, thus representing a valuable approach for mining enzymes with new features. In this review, we summarize studies showing how functional metagenomics was employed to retrieve genes encoding for proteins involved not only in molecular adaptation and resistance to extreme environmental conditions but also in other enzymatic activities of biotechnological interest. Copyright © 2016 Elsevier Ltd. All rights reserved.

  1. Metagenomics and the protein universe

    PubMed Central

    Godzik, Adam

    2011-01-01

    Metagenomics sequencing projects have dramatically increased our knowledge of the protein universe and provided over one-half of currently known protein sequences; they have also introduced a much broader phylogenetic diversity into the protein databases. The full analysis of metagenomic datasets is only beginning, but it has already led to the discovery of thousands of new protein families, likely representing novel functions specific to given environments. At the same time, a deeper analysis of such novel families, including experimental structure determination of some representatives, suggests that most of them represent distant homologs of already characterized protein families, and thus most of the protein diversity present in the new environments are due to functional divergence of the known protein families rather than the emergence of new ones. PMID:21497084

  2. Metagenomics of the Svalbard reindeer rumen microbiome reveals abundance of polysaccharide utilization loci.

    PubMed

    Pope, Phillip B; Mackenzie, Alasdair K; Gregor, Ivan; Smith, Wendy; Sundset, Monica A; McHardy, Alice C; Morrison, Mark; Eijsink, Vincent G H

    2012-01-01

    Lignocellulosic biomass remains a largely untapped source of renewable energy predominantly due to its recalcitrance and an incomplete understanding of how this is overcome in nature. We present here a compositional and comparative analysis of metagenomic data pertaining to a natural biomass-converting ecosystem adapted to austere arctic nutritional conditions, namely the rumen microbiome of Svalbard reindeer (Rangifer tarandus platyrhynchus). Community analysis showed that deeply-branched cellulolytic lineages affiliated to the Bacteroidetes and Firmicutes are dominant, whilst sequence binning methods facilitated the assemblage of metagenomic sequence for a dominant and novel Bacteroidales clade (SRM-1). Analysis of unassembled metagenomic sequence as well as metabolic reconstruction of SRM-1 revealed the presence of multiple polysaccharide utilization loci-like systems (PULs) as well as members of more than 20 glycoside hydrolase and other carbohydrate-active enzyme families targeting various polysaccharides including cellulose, xylan and pectin. Functional screening of cloned metagenome fragments revealed high cellulolytic activity and an abundance of PULs that are rich in endoglucanases (GH5) but devoid of other common enzymes thought to be involved in cellulose degradation. Combining these results with known and partly re-evaluated metagenomic data strongly indicates that much like the human distal gut, the digestive system of herbivores harbours high numbers of deeply branched and as-yet uncultured members of the Bacteroidetes that depend on PUL-like systems for plant biomass degradation.

  3. MIPE: A metagenome-based community structure explorer and SSU primer evaluation tool.

    PubMed

    Zou, Bin; Li, JieFu; Zhou, Quan; Quan, Zhe-Xue

    2017-01-01

    An understanding of microbial community structure is an important issue in the field of molecular ecology. The traditional molecular method involves amplification of small subunit ribosomal RNA (SSU rRNA) genes by polymerase chain reaction (PCR). However, PCR-based amplicon approaches are affected by primer bias and chimeras. With the development of high-throughput sequencing technology, unbiased SSU rRNA gene sequences can be mined from shotgun sequencing-based metagenomic or metatranscriptomic datasets to obtain a reflection of the microbial community structure in specific types of environment and to evaluate SSU primers. However, the use of short reads obtained through next-generation sequencing for primer evaluation has not been well resolved. The software MIPE (MIcrobiota metagenome Primer Explorer) was developed to adapt numerous short reads from metagenomes and metatranscriptomes. Using metagenomic or metatranscriptomic datasets as input, MIPE extracts and aligns rRNA to reveal detailed information on microbial composition and evaluate SSU rRNA primers. A mock dataset, a real Metagenomics Rapid Annotation using Subsystem Technology (MG-RAST) test dataset, two PrimerProspector test datasets and a real metatranscriptomic dataset were used to validate MIPE. The software calls Mothur (v1.33.3) and the SILVA database (v119) for the alignment and classification of rRNA genes from a metagenome or metatranscriptome. MIPE can effectively extract shotgun rRNA reads from a metagenome or metatranscriptome and is capable of classifying these sequences and exhibiting sensitivity to different SSU rRNA PCR primers. Therefore, MIPE can be used to guide primer design for specific environmental samples.

  4. MIPE: A metagenome-based community structure explorer and SSU primer evaluation tool

    PubMed Central

    Zhou, Quan

    2017-01-01

    An understanding of microbial community structure is an important issue in the field of molecular ecology. The traditional molecular method involves amplification of small subunit ribosomal RNA (SSU rRNA) genes by polymerase chain reaction (PCR). However, PCR-based amplicon approaches are affected by primer bias and chimeras. With the development of high-throughput sequencing technology, unbiased SSU rRNA gene sequences can be mined from shotgun sequencing-based metagenomic or metatranscriptomic datasets to obtain a reflection of the microbial community structure in specific types of environment and to evaluate SSU primers. However, the use of short reads obtained through next-generation sequencing for primer evaluation has not been well resolved. The software MIPE (MIcrobiota metagenome Primer Explorer) was developed to adapt numerous short reads from metagenomes and metatranscriptomes. Using metagenomic or metatranscriptomic datasets as input, MIPE extracts and aligns rRNA to reveal detailed information on microbial composition and evaluate SSU rRNA primers. A mock dataset, a real Metagenomics Rapid Annotation using Subsystem Technology (MG-RAST) test dataset, two PrimerProspector test datasets and a real metatranscriptomic dataset were used to validate MIPE. The software calls Mothur (v1.33.3) and the SILVA database (v119) for the alignment and classification of rRNA genes from a metagenome or metatranscriptome. MIPE can effectively extract shotgun rRNA reads from a metagenome or metatranscriptome and is capable of classifying these sequences and exhibiting sensitivity to different SSU rRNA PCR primers. Therefore, MIPE can be used to guide primer design for specific environmental samples. PMID:28350876

  5. The human gut mobile metagenome

    PubMed Central

    2010-01-01

    Using the culture independent TRACA system in conjunction with a comparative metagenomic approach, we have recently explored the pool of plasmids associated with the human gut mobile metagenome. This revealed that some plasmids or plasmid families are present in the gut microbiomes of geographically isolated human hosts with a broad global distribution (America, Japan and Europe), and are potentially unique to the human gut microbiome. Functions encoded by the most widely distributed plasmid (pTRACA22) were found to be enriched in the human gut microbiome when compared to microbial communities from other environments, and of particular interest was the increased prevalence of a putative RelBE toxin-antitoxin (TA) addiction module. Subsequent analysis revealed that this was most closely related to putative TA modules from gut associated bacteria belonging to the Firmicutes, but homologues of the RelE toxin were associated with all major bacterial divisions comprising the human gut microbiota. In this addendum, functions of the gut mobile metagenome are considered from the perspective of the human host, and within the context of the hologenome theory of human evolution. In doing so, our original analysis is also extended to include the gut metagenomes of a further 124 individuals comprising the METAHIT dataset. Differences in the incidence and relative abundance of pTRACA22 and associated TA modules between healthy individuals and those with inflammatory bowel diseases are explored, and potential functions of pTRACA22 type RelBE modules in the human gut microbiome are discussed. PMID:21468227

  6. Integrative workflows for metagenomic analysis

    PubMed Central

    Ladoukakis, Efthymios; Kolisis, Fragiskos N.; Chatziioannou, Aristotelis A.

    2014-01-01

    The rapid evolution of all sequencing technologies, described by the term Next Generation Sequencing (NGS), have revolutionized metagenomic analysis. They constitute a combination of high-throughput analytical protocols, coupled to delicate measuring techniques, in order to potentially discover, properly assemble and map allelic sequences to the correct genomes, achieving particularly high yields for only a fraction of the cost of traditional processes (i.e., Sanger). From a bioinformatic perspective, this boils down to many GB of data being generated from each single sequencing experiment, rendering the management or even the storage, critical bottlenecks with respect to the overall analytical endeavor. The enormous complexity is even more aggravated by the versatility of the processing steps available, represented by the numerous bioinformatic tools that are essential, for each analytical task, in order to fully unveil the genetic content of a metagenomic dataset. These disparate tasks range from simple, nonetheless non-trivial, quality control of raw data to exceptionally complex protein annotation procedures, requesting a high level of expertise for their proper application or the neat implementation of the whole workflow. Furthermore, a bioinformatic analysis of such scale, requires grand computational resources, imposing as the sole realistic solution, the utilization of cloud computing infrastructures. In this review article we discuss different, integrative, bioinformatic solutions available, which address the aforementioned issues, by performing a critical assessment of the available automated pipelines for data management, quality control, and annotation of metagenomic data, embracing various, major sequencing technologies and applications. PMID:25478562

  7. Subtractive assembly for comparative metagenomics, and its application to type 2 diabetes metagenomes.

    PubMed

    Wang, Mingjie; Doak, Thomas G; Ye, Yuzhen

    2015-11-02

    Comparative metagenomics remains challenging due to the size and complexity of metagenomic datasets. Here we introduce subtractive assembly, a de novo assembly approach for comparative metagenomics that directly assembles only the differential reads that distinguish between two groups of metagenomes. Using simulated datasets, we show it improves both the efficiency of the assembly and the assembly quality of the differential genomes and genes. Further, its application to type 2 diabetes (T2D) metagenomic datasets reveals clear signatures of the T2D gut microbiome, revealing new phylogenetic and functional features of the gut microbial communities associated with T2D.

  8. Exploring Neighborhoods in the Metagenome Universe

    PubMed Central

    Aßhauer, Kathrin P.; Klingenberg, Heiner; Lingner, Thomas; Meinicke, Peter

    2014-01-01

    The variety of metagenomes in current databases provides a rapidly growing source of information for comparative studies. However, the quantity and quality of supplementary metadata is still lagging behind. It is therefore important to be able to identify related metagenomes by means of the available sequence data alone. We have studied efficient sequence-based methods for large-scale identification of similar metagenomes within a database retrieval context. In a broad comparison of different profiling methods we found that vector-based distance measures are well-suitable for the detection of metagenomic neighbors. Our evaluation on more than 1700 publicly available metagenomes indicates that for a query metagenome from a particular habitat on average nine out of ten nearest neighbors represent the same habitat category independent of the utilized profiling method or distance measure. While for well-defined labels a neighborhood accuracy of 100% can be achieved, in general the neighbor detection is severely affected by a natural overlap of manually annotated categories. In addition, we present results of a novel visualization method that is able to reflect the similarity of metagenomes in a 2D scatter plot. The visualization method shows a similarly high accuracy in the reduced space as compared with the high-dimensional profile space. Our study suggests that for inspection of metagenome neighborhoods the profiling methods and distance measures can be chosen to provide a convenient interpretation of results in terms of the underlying features. Furthermore, supplementary metadata of metagenome samples in the future needs to comply with readily available ontologies for fine-grained and standardized annotation. To make profile-based k-nearest-neighbor search and the 2D-visualization of the metagenome universe available to the research community, we included the proposed methods in our CoMet-Universe server for comparative metagenome analysis. PMID:25026170

  9. Exploring neighborhoods in the metagenome universe.

    PubMed

    Aßhauer, Kathrin P; Klingenberg, Heiner; Lingner, Thomas; Meinicke, Peter

    2014-07-14

    The variety of metagenomes in current databases provides a rapidly growing source of information for comparative studies. However, the quantity and quality of supplementary metadata is still lagging behind. It is therefore important to be able to identify related metagenomes by means of the available sequence data alone. We have studied efficient sequence-based methods for large-scale identification of similar metagenomes within a database retrieval context. In a broad comparison of different profiling methods we found that vector-based distance measures are well-suitable for the detection of metagenomic neighbors. Our evaluation on more than 1700 publicly available metagenomes indicates that for a query metagenome from a particular habitat on average nine out of ten nearest neighbors represent the same habitat category independent of the utilized profiling method or distance measure. While for well-defined labels a neighborhood accuracy of 100% can be achieved, in general the neighbor detection is severely affected by a natural overlap of manually annotated categories. In addition, we present results of a novel visualization method that is able to reflect the similarity of metagenomes in a 2D scatter plot. The visualization method shows a similarly high accuracy in the reduced space as compared with the high-dimensional profile space. Our study suggests that for inspection of metagenome neighborhoods the profiling methods and distance measures can be chosen to provide a convenient interpretation of results in terms of the underlying features. Furthermore, supplementary metadata of metagenome samples in the future needs to comply with readily available ontologies for fine-grained and standardized annotation. To make profile-based k-nearest-neighbor search and the 2D-visualization of the metagenome universe available to the research community, we included the proposed methods in our CoMet-Universe server for comparative metagenome analysis.

  10. Resolving the Complexity of Human Skin Metagenomes Using Single-Molecule Sequencing

    PubMed Central

    Tsai, Yu-Chih; Deming, Clayton; Segre, Julia A.; Kong, Heidi H.; Korlach, Jonas

    2016-01-01

    ABSTRACT Deep metagenomic shotgun sequencing has emerged as a powerful tool to interrogate composition and function of complex microbial communities. Computational approaches to assemble genome fragments have been demonstrated to be an effective tool for de novo reconstruction of genomes from these communities. However, the resultant “genomes” are typically fragmented and incomplete due to the limited ability of short-read sequence data to assemble complex or low-coverage regions. Here, we use single-molecule, real-time (SMRT) sequencing to reconstruct a high-quality, closed genome of a previously uncharacterized Corynebacterium simulans and its companion bacteriophage from a skin metagenomic sample. Considerable improvement in assembly quality occurs in hybrid approaches incorporating short-read data, with even relatively small amounts of long-read data being sufficient to improve metagenome reconstruction. Using short-read data to evaluate strain variation of this C. simulans in its skin community at single-nucleotide resolution, we observed a dominant C. simulans strain with moderate allelic heterozygosity throughout the population. We demonstrate the utility of SMRT sequencing and hybrid approaches in metagenome quantitation, reconstruction, and annotation. PMID:26861018

  11. Current and future resources for functional metagenomics.

    PubMed

    Lam, Kathy N; Cheng, Jiujun; Engel, Katja; Neufeld, Josh D; Charles, Trevor C

    2015-01-01

    Functional metagenomics is a powerful experimental approach for studying gene function, starting from the extracted DNA of mixed microbial populations. A functional approach relies on the construction and screening of metagenomic libraries-physical libraries that contain DNA cloned from environmental metagenomes. The information obtained from functional metagenomics can help in future annotations of gene function and serve as a complement to sequence-based metagenomics. In this Perspective, we begin by summarizing the technical challenges of constructing metagenomic libraries and emphasize their value as resources. We then discuss libraries constructed using the popular cloning vector, pCC1FOS, and highlight the strengths and shortcomings of this system, alongside possible strategies to maximize existing pCC1FOS-based libraries by screening in diverse hosts. Finally, we discuss the known bias of libraries constructed from human gut and marine water samples, present results that suggest bias may also occur for soil libraries, and consider factors that bias metagenomic libraries in general. We anticipate that discussion of current resources and limitations will advance tools and technologies for functional metagenomics research.

  12. Visual and statistical comparison of metagenomes.

    PubMed

    Mitra, Suparna; Klar, Bernhard; Huson, Daniel H

    2009-08-01

    Metagenomics is the study of the genomic content of an environmental sample of microbes. Advances in the through-put and cost-efficiency of sequencing technology is fueling a rapid increase in the number and size of metagenomic datasets being generated. Bioinformatics is faced with the problem of how to handle and analyze these datasets in an efficient and useful way. One goal of these metagenomic studies is to get a basic understanding of the microbial world both surrounding us and within us. One major challenge is how to compare multiple datasets. Furthermore, there is a need for bioinformatics tools that can process many large datasets and are easy to use. This article describes two new and helpful techniques for comparing multiple metagenomic datasets. The first is a visualization technique for multiple datasets and the second is a new statistical method for highlighting the differences in a pairwise comparison. We have developed implementations of both methods that are suitable for very large datasets and provide these in Version 3 of our standalone metagenome analysis tool MEGAN. These new methods are suitable for the visual comparison of many large metagenomes and the statistical comparison of two metagenomes at a time. Nevertheless, more work needs to be done to support the comparative analysis of multiple metagenome datasets. Version 3 of MEGAN, which implements all ideas presented in this article, can be obtained from our web site at: www-ab.informatik.uni-tuebingen.de/software/megan. Supplementary data are available at Bioinformatics online.

  13. Metagenomes from the Saline Desert of Kutch

    PubMed Central

    Pandit, A. S.; Joshi, M. N.; Bhargava, P.; Ayachit, G. N.; Shaikh, I. M.; Saiyed, Z. M.; Saxena, A. K.

    2014-01-01

    We provide the first report on the metagenomic approach for unveiling the microbial diversity in the saline desert of Kutch. High-throughput metagenomic sequencing of environmental DNA isolated from soil collected from seven locations in Kutch was performed on an Ion Torrent platform. PMID:24831151

  14. Explaining Diversity in Metagenomic Datasets by Phylogenetic-Based Feature Weighting

    PubMed Central

    Albanese, Davide; De Filippo, Carlotta; Cavalieri, Duccio; Donati, Claudio

    2015-01-01

    Metagenomics is revolutionizing our understanding of microbial communities, showing that their structure and composition have profound effects on the ecosystem and in a variety of health and disease conditions. Despite the flourishing of new analysis methods, current approaches based on statistical comparisons between high-level taxonomic classes often fail to identify the microbial taxa that are differentially distributed between sets of samples, since in many cases the taxonomic schema do not allow an adequate description of the structure of the microbiota. This constitutes a severe limitation to the use of metagenomic data in therapeutic and diagnostic applications. To provide a more robust statistical framework, we introduce a class of feature-weighting algorithms that discriminate the taxa responsible for the classification of metagenomic samples. The method unambiguously groups the relevant taxa into clades without relying on pre-defined taxonomic categories, thus including in the analysis also those sequences for which a taxonomic classification is difficult. The phylogenetic clades are weighted and ranked according to their abundance measuring their contribution to the differentiation of the classes of samples, and a criterion is provided to define a reduced set of most relevant clades. Applying the method to public datasets, we show that the data-driven definition of relevant phylogenetic clades accomplished by our ranking strategy identifies features in the samples that are lost if phylogenetic relationships are not considered, improving our ability to mine metagenomic datasets. Comparison with supervised classification methods currently used in metagenomic data analysis highlights the advantages of using phylogenetic information. PMID:25815895

  15. Metagenomics using next-generation sequencing.

    PubMed

    Bragg, Lauren; Tyson, Gene W

    2014-01-01

    Traditionally, microbial genome sequencing has been restricted to the small number of species that can be grown in pure culture. The progressive development of culture-independent methods over the last 15 years now allows researchers to sequence microbial communities directly from environmental samples. This approach is commonly referred to as "metagenomics" or "community genomics". However, the term metagenomics is applied liberally in the literature to describe any culture-independent analysis of microbial communities. Here, we define metagenomics as shotgun ("random") sequencing of the genomic DNA of a sample taken directly from the environment. The metagenome can be thought of as a sampling of the collective genome of the microbial community. We outline the considerations and analyses that should be undertaken to ensure the success of a metagenomic sequencing project, including the choice of sequencing platform and methods for assembly, binning, annotation, and comparative analysis.

  16. Metagenomics: Facts and Artifacts, and Computational Challenges*

    PubMed

    Wooley, John C; Ye, Yuzhen

    2009-01-01

    Metagenomics is the study of microbial communities sampled directly from their natural environment, without prior culturing. By enabling an analysis of populations including many (so-far) unculturable and often unknown microbes, metagenomics is revolutionizing the field of microbiology, and has excited researchers in many disciplines that could benefit from the study of environmental microbes, including those in ecology, environmental sciences, and biomedicine. Specific computational and statistical tools have been developed for metagenomic data analysis and comparison. New studies, however, have revealed various kinds of artifacts present in metagenomics data caused by limitations in the experimental protocols and/or inadequate data analysis procedures, which often lead to incorrect conclusions about a microbial community. Here, we review some of the artifacts, such as overestimation of species diversity and incorrect estimation of gene family frequencies, and discuss emerging computational approaches to address them. We also review potential challenges that metagenomics may encounter with the extensive application of next-generation sequencing (NGS) techniques.

  17. Metagenomic applications in environmental monitoring and bioremediation

    SciTech Connect

    Techtmann, Stephen M.; Hazen, Terry C.

    2016-01-01

    With the rapid advances in sequencing technology, the cost of sequencing has dramatically dropped and the scale of sequencing projects has increased accordingly. This has provided the opportunity for the routine use of sequencing techniques in the monitoring of environmental microbes. While metagenomic applications have been routinely applied to better understand the ecology and diversity of microbes, their use in environmental monitoring and bioremediation is increasingly common. In this review we seek to provide an overview of some of the metagenomic techniques used in environmental systems biology, addressing their application and limitation. We will also provide several recent examples of the application of metagenomics to bioremediation. We discuss examples where microbial communities have been used to predict the presence and extent of contamination, examples of how metagenomics can be used to characterize the process of natural attenuation by unculturable microbes, as well as examples detailing the use of metagenomics to understand the impact of biostimulation on microbial communities.

  18. Metagenomics: Future of microbial gene mining.

    PubMed

    Vakhlu, J; Sudan, Avneet Kour; Johri, B N

    2008-06-01

    Modern biotechnology has a steadily increasing demand for novel genes for application in various industrial processes and development of genetically modified organisms. Identification, isolation and cloning for novel genes at a reasonable pace is the main driving force behind the development of unprecedented experimental approaches. Metagenomics is one such novel approach for engendering novel genes. Metagenomics of complex microbial communities (both cultivable and uncultivable) is a rich source of novel genes for biotechnological purposes. The contributions made by metagenomics to the already existing repository of prokaryotic genes is quite impressive but nevertheless, this technique is still in its infancy. In the present review we have drawn comparison between routine cloning techniques and metagenomic approach for harvesting novel microbial genes and described various methods to reach down to the specific genes in the metagenome. Accomplishments made thus far, limitations and future prospects of this resourceful technique are discussed.

  19. Accurate binning of metagenomic contigs via automated clustering sequences using information of genomic signatures and marker genes

    PubMed Central

    Lin, Hsin-Hung; Liao, Yu-Chieh

    2016-01-01

    Metagenomics, the application of shotgun sequencing, facilitates the reconstruction of the genomes of individual species from natural environments. A major challenge in the genome recovery domain is to agglomerate or ‘bin’ sequences assembled from metagenomic reads into individual groups. Metagenomic binning without consideration of reference sequences enables the comprehensive discovery of new microbial organisms and aids in the microbial genome reconstruction process. Here we present MyCC, an automated binning tool that combines genomic signatures, marker genes and optional contig coverages within one or multiple samples, in order to visualize the metagenomes and to identify the reconstructed genomic fragments. We demonstrate the superior performance of MyCC compared to other binning tools including CONCOCT, GroopM, MaxBin and MetaBAT on both synthetic and real human gut communities with a small sample size (one to 11 samples), as well as on a large metagenome dataset (over 250 samples). Moreover, we demonstrate the visualization of metagenomes in MyCC to aid in the reconstruction of genomes from distinct bins. MyCC is freely available at http://sourceforge.net/projects/sb2nhri/files/MyCC/. PMID:27067514

  20. Multisubstrate isotope labeling and metagenomic analysis of active soil bacterial communities.

    PubMed

    Verastegui, Y; Cheng, J; Engel, K; Kolczynski, D; Mortimer, S; Lavigne, J; Montalibet, J; Romantsov, T; Hall, M; McConkey, B J; Rose, D R; Tomashek, J J; Scott, B R; Charles, T C; Neufeld, J D

    2014-07-15

    Soil microbial diversity represents the largest global reservoir of novel microorganisms and enzymes. In this study, we coupled functional metagenomics and DNA stable-isotope probing (DNA-SIP) using multiple plant-derived carbon substrates and diverse soils to characterize active soil bacterial communities and their glycoside hydrolase genes, which have value for industrial applications. We incubated samples from three disparate Canadian soils (tundra, temperate rainforest, and agricultural) with five native carbon ((12)C) or stable-isotope-labeled ((13)C) carbohydrates (glucose, cellobiose, xylose, arabinose, and cellulose). Indicator species analysis revealed high specificity and fidelity for many uncultured and unclassified bacterial taxa in the heavy DNA for all soils and substrates. Among characterized taxa, Actinomycetales (Salinibacterium), Rhizobiales (Devosia), Rhodospirillales (Telmatospirillum), and Caulobacterales (Phenylobacterium and Asticcacaulis) were bacterial indicator species for the heavy substrates and soils tested. Both Actinomycetales and Caulobacterales (Phenylobacterium) were associated with metabolism of cellulose, and Alphaproteobacteria were associated with the metabolism of arabinose; members of the order Rhizobiales were strongly associated with the metabolism of xylose. Annotated metagenomic data suggested diverse glycoside hydrolase gene representation within the pooled heavy DNA. By screening 2,876 cloned fragments derived from the (13)C-labeled DNA isolated from soils incubated with cellulose, we demonstrate the power of combining DNA-SIP, multiple-displacement amplification (MDA), and functional metagenomics by efficiently isolating multiple clones with activity on carboxymethyl cellulose and fluorogenic proxy substrates for carbohydrate-active enzymes. Importance: The ability to identify genes based on function, instead of sequence homology, allows the discovery of genes that would not be identified through sequence alone. This

  1. GB Virus C/Hepatitis G Virus Groups and Subgroups: Classification by a Restriction Fragment Length Polymorphism Method Based on Phylogenetic Analysis of the 5′ Untranslated Region

    PubMed Central

    Quarleri, J. F.; Mathet, V. L.; Feld, M.; Ferrario, D.; della Latta, M. P.; Verdun, R.; Sánchez, D. O.; Oubiña, J. R.

    1999-01-01

    A phylogenetic tree based on 150 5′ untranslated region sequences deposited in GenBank database allowed segregation of the sequences into three major groups, including two subgroups, i.e., 1, 2a, 2b, and 3, supported by bootstrap analysis. Restriction site analysis of these sequences predicted that HinfI and either AatII or AciI could be used for genomic typing with 99.4% accuracy. cDNA sequencing and subsequent alignment of 21 Argentine GB virus C/hepatitis G virus strains confirmed restriction fragment length polymorphism patterns theoretically predicted. This method may be useful for a rapid screening of samples when either epidemiological or transmission studies of this agent are carried out. PMID:10203483

  2. Real Time Metagenomics: Using k-mers to annotate metagenomes

    PubMed Central

    Edwards, Robert A.; Olson, Robert; Disz, Terry; Pusch, Gordon D.; Vonstein, Veronika; Stevens, Rick; Overbeek, Ross

    2012-01-01

    Summary: Annotation of metagenomes involves comparing the individual sequence reads with a database of known sequences and assigning a unique function to each read. This is a time-consuming task that is computationally intensive (though not computationally complex). Here we present a novel approach to annotate metagenomes using unique k-mer oligopeptide sequences from 7 to 12 amino acids long. We demonstrate that k-mer-based annotations are faster and approach the sensitivity and precision of blastx-based annotations without loosing accuracy. A last-common ancestor approach was also developed to describe the members of the community. Availability and implementation: This open-source application was implemented in Perl and can be accessed via a user-friendly website at http://edwards.sdsu.edu/rtmg. In addition, code to access the annotation servers is available for download from http://www.theseed.org/. FIGfams and k-mers are available for download from ftp://ftp.theseed.org/FIGfams/. Contact: redwards@mail.sdsu.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:23047562

  3. MTR: taxonomic annotation of short metagenomic reads using clustering at multiple taxonomic ranks

    PubMed Central

    Gori, Fabio; Folino, Gianluigi; Jetten, Mike S. M.; Marchiori, Elena

    2011-01-01

    Motivation: Metagenomics is a recent field of biology that studies microbial communities by analyzing their genomic content directly sequenced from the environment. A metagenomic dataset consists of many short DNA or RNA fragments called reads. One interesting problem in metagenomic data analysis is the discovery of the taxonomic composition of a given dataset. A simple method for this task, called the Lowest Common Ancestor (LCA), is employed in state-of-the-art computational tools for metagenomic data analysis of very short reads (about 100 bp). However LCA has two main drawbacks: it possibly assigns many reads to high taxonomic ranks and it discards a high number of reads. Results: We present MTR, a new method for tackling these drawbacks using clustering at Multiple Taxonomic Ranks. Unlike LCA, which processes the reads one-by-one, MTR exploits information shared by reads. Specifically, MTR consists of two main phases. First, for each taxonomic rank, a collection of potential clusters of reads is generated, and each potential cluster is associated to a taxon at that rank. Next, a small number of clusters is selected at each rank using a combinatorial optimization algorithm. The effectiveness of the resulting method is tested on a large number of simulated and real-life metagenomes. Results of experiments show that MTR improves on LCA by discarding a significantly smaller number of reads and by assigning much more reads at lower taxonomic ranks. Moreover, MTR provides a more faithful taxonomic characterization of the metagenome population distribution. Availability: Matlab and C++ source codes of the method available at http://cs.ru.nl/˜gori/software/MTR.tar.gz. Contact: gori@cs.ru.nl; elenam@cs.ru.nl Supplementary information: Supplementary data are available at Bioinformatics online. PMID:21127032

  4. Metagenomic reconstructions of bacterial CRISPR loci constrain population histories.

    PubMed

    Sun, Christine L; Thomas, Brian C; Barrangou, Rodolphe; Banfield, Jillian F

    2016-04-01

    Bacterial CRISPR-Cas systems provide insight into recent population history because they rapidly incorporate, in a unidirectional manner, short fragments (spacers) from coexisting infective virus populations into host chromosomes. Immunity is achieved by sequence identity between transcripts of spacers and their targets. Here, we used metagenomics to study the stability and dynamics of the type I-E CRISPR-Cas locus of Leptospirillum group II bacteria in biofilms sampled over 5 years from an acid mine drainage (AMD) system. Despite recovery of 452,686 spacers from CRISPR amplicons and metagenomic data, rarefaction curves of spacers show no saturation. The vast repertoire of spacers is attributed to phage/plasmid population diversity and retention of old spacers, despite rapid evolution of the targeted phage/plasmid genome regions (proto-spacers). The oldest spacers (spacers found at the trailer end) are conserved for at least 5 years, and 12% of these retain perfect or near-perfect matches to proto-spacer targets. The majority of proto-spacer regions contain an AAG proto-spacer adjacent motif (PAM). Spacers throughout the locus target the same phage population (AMDV1), but there are blocks of consecutive spacers without AMDV1 target sequences. Results suggest long-term coexistence of Leptospirillum with AMDV1 and periods when AMDV1 was less dominant. Metagenomics can be applied to millions of cells in a single sample to provide an extremely large spacer inventory, allow identification of phage/plasmids and enable analysis of previous phage/plasmid exposure. Thus, this approach can provide insights into prior bacterial environment and genetic interplay between hosts and their viruses.

  5. Assembling the Marine Metagenome, One Cell at a Time

    SciTech Connect

    Woyke, Tanja; Xie, Gary; Copeland, Alex; Gonzalez, Jose M.; Han, Cliff; Kiss, Hajnalka; Saw, Jimmy H.; Senin, Pavel; Yang, Chi; Chatterji, Sourav; Cheng, Jan-Fang; Eisen, Jonathan A.; Sieracki, Michael E.; Stepanauskas, Ramunas

    2010-06-24

    The difficulty associated with the cultivation of most microorganisms and the complexity of natural microbial assemblages, such as marine plankton or human microbiome, hinder genome reconstruction of representative taxa using cultivation or metagenomic approaches. Here we used an alternative, single cell sequencing approach to obtain high-quality genome assemblies of two uncultured, numerically significant marine microorganisms. We employed fluorescence-activated cell sorting and multiple displacement amplification to obtain hundreds of micrograms of genomic DNA from individual, uncultured cells of two marine flavobacteria from the Gulf of Maine that were phylogenetically distant from existing cultured strains. Shotgun sequencing and genome finishing yielded 1.9 Mbp in 17 contigs and 1.5 Mbp in 21 contigs for the two flavobacteria, with estimated genome recoveries of about 91percent and 78percent, respectively. Only 0.24percent of the assembling sequences were contaminants and were removed from further analysis using rigorous quality control. In contrast to all cultured strains of marine flavobacteria, the two single cell genomes were excellent Global Ocean Sampling (GOS) metagenome fragment recruiters, demonstrating their numerical significance in the ocean. The geographic distribution of GOS recruits along the Northwest Atlantic coast coincided with ocean surface currents. Metabolic reconstruction indicated diverse potential energy sources, including biopolymer degradation, proteorhodopsin photometabolism, and hydrogen oxidation. Compared to cultured relatives, the two uncultured flavobacteria have small genome sizes, few non-coding nucleotides, and few paralogous genes, suggesting adaptations to narrow ecological niches. These features may have contributed to the abundance of the two taxa in specific regions of the ocean, and may have hindered their cultivation. We demonstrate the power of single cell DNA sequencing to generate reference genomes of uncultured

  6. Assembling The Marine Metagenome, One Cell At A Time

    SciTech Connect

    Xie, Gang; Han, Shunsheng; Kiss, Hajnalka; Saw, Jimmy; Senin, Pavel; Woyke, Tanja; Copeland, Alex; Gonzalez, Jose; Chatterji, Sourav; Cheng, Jan - Fang; Eisen, Jonathan A; Sieracki, Michael E; Stepanauskas, Ramunas

    2008-01-01

    The difficulty associated with the cultivation of most microorganisms and the complexity of natural microbial assemblages, such as marine plankton or human microbiome, hinder genome reconstruction of representative taxa using cultivation or metagenomic approaches. Here we used an alternative, single cell sequencing approach to obtain high-quality genome assemblies of two uncultured, numerically significant marine microorganisms. We employed fluorescence-activated cell sorting and multiple displacement amplification to obtain hundreds of micrograms of genomic DNA from individual, uncultured cells of two marine flavobacteria from the Gulf of Maine that were phylogenetically distant from existing cultured strains. Shotgun sequencing and genome finishing yielded 1.9 Mbp in 17 contigs and 1.5 Mbp in 21 contigs for the two flavobacteria, with estimated genome recoveries of about 91% and 78%, respectively. Only 0.24% of the assembling sequences were contaminants and were removed from further analysis using rigorous quality control. In contrast to all cultured strains of marine flavobacteria, the two single cell genomes were excellent Global Ocean Sampling (GOS) metagenome fragment recruiters, demonstrating their numerical significance in the ocean. The geographic distribution of GOS recruits along the Northwest Atlantic coast coincided with ocean surface currents. Metabolic reconstruction indicated diverse potential energy sources, including biopolymer degradation, proteorhodopsin photometabolism, and hydrogen oxidation. Compared to cultured relatives, the two uncultured flavobacteria have small genome sizes, few non-coding nucleotides, and few paralogous genes, suggesting adaptations to narrow ecological niches. These features may have contributed to the abundance of the two taxa in specific regions of the ocean, and may have hindered their cultivation. We demonstrate the power of single cell DNA sequencing to generate reference genomes of uncultured taxa from a complex

  7. Assembling the Marine Metagenome, One Cell at a Time

    PubMed Central

    Woyke, Tanja; Xie, Gary; Copeland, Alex; González, José M.; Han, Cliff; Kiss, Hajnalka; Saw, Jimmy H.; Senin, Pavel; Yang, Chi; Chatterji, Sourav; Cheng, Jan-Fang; Eisen, Jonathan A.; Sieracki, Michael E.; Stepanauskas, Ramunas

    2009-01-01

    The difficulty associated with the cultivation of most microorganisms and the complexity of natural microbial assemblages, such as marine plankton or human microbiome, hinder genome reconstruction of representative taxa using cultivation or metagenomic approaches. Here we used an alternative, single cell sequencing approach to obtain high-quality genome assemblies of two uncultured, numerically significant marine microorganisms. We employed fluorescence-activated cell sorting and multiple displacement amplification to obtain hundreds of micrograms of genomic DNA from individual, uncultured cells of two marine flavobacteria from the Gulf of Maine that were phylogenetically distant from existing cultured strains. Shotgun sequencing and genome finishing yielded 1.9 Mbp in 17 contigs and 1.5 Mbp in 21 contigs for the two flavobacteria, with estimated genome recoveries of about 91% and 78%, respectively. Only 0.24% of the assembling sequences were contaminants and were removed from further analysis using rigorous quality control. In contrast to all cultured strains of marine flavobacteria, the two single cell genomes were excellent Global Ocean Sampling (GOS) metagenome fragment recruiters, demonstrating their numerical significance in the ocean. The geographic distribution of GOS recruits along the Northwest Atlantic coast coincided with ocean surface currents. Metabolic reconstruction indicated diverse potential energy sources, including biopolymer degradation, proteorhodopsin photometabolism, and hydrogen oxidation. Compared to cultured relatives, the two uncultured flavobacteria have small genome sizes, few non-coding nucleotides, and few paralogous genes, suggesting adaptations to narrow ecological niches. These features may have contributed to the abundance of the two taxa in specific regions of the ocean, and may have hindered their cultivation. We demonstrate the power of single cell DNA sequencing to generate reference genomes of uncultured taxa from a complex

  8. The future of skin metagenomics.

    PubMed

    Mathieu, Alban; Vogel, Timothy M; Simonet, Pascal

    2014-01-01

    Metagenomics, the direct exploitation of environmental microbial DNA, is complementary to traditional culture-based approaches for deciphering taxonomic and functional microbial diversity in a plethora of ecosystems, including those related to the human body such as the mouth, saliva, teeth, gut or skin. DNA extracted from human skin analyzed by sequencing the PCR-amplified rrs gene has already revealed the taxonomic diversity of microbial communities colonizing the human skin ("skin microbiome"). Each individual possesses his/her own skin microbial community structure, with marked taxonomic differences between different parts of the body and temporal evolution depending on physical and chemical conditions (sweat, washing etc.). However, technical limitations due to the low bacterial density at the surface of the human skin or contamination by human DNA still has inhibited extended use of the metagenomic approach for investigating the skin microbiome at a functional level. These difficulties have been overcome in part by the new generation of sequencing platforms that now provide sequences describing the genes and functions carried out by skin bacteria. These methodological advances should help us understand the mechanisms by which these microorganisms adapt to the specific chemical composition of each skin and thereby lead to a better understanding of bacteria/human host interdependence. This knowledge will pave the way for more systemic and individualized pharmaceutical and cosmetic applications. Copyright © 2013 Institut Pasteur. Published by Elsevier Masson SAS. All rights reserved.

  9. Human milk metagenome: a functional capacity analysis

    PubMed Central

    2013-01-01

    Background Human milk contains a diverse population of bacteria that likely influences colonization of the infant gastrointestinal tract. Recent studies, however, have been limited to characterization of this microbial community by 16S rRNA analysis. In the present study, a metagenomic approach using Illumina sequencing of a pooled milk sample (ten donors) was employed to determine the genera of bacteria and the types of bacterial open reading frames in human milk that may influence bacterial establishment and stability in this primal food matrix. The human milk metagenome was also compared to that of breast-fed and formula-fed infants’ feces (n = 5, each) and mothers’ feces (n = 3) at the phylum level and at a functional level using open reading frame abundance. Additionally, immune-modulatory bacterial-DNA motifs were also searched for within human milk. Results The bacterial community in human milk contained over 360 prokaryotic genera, with sequences aligning predominantly to the phyla of Proteobacteria (65%) and Firmicutes (34%), and the genera of Pseudomonas (61.1%), Staphylococcus (33.4%) and Streptococcus (0.5%). From assembled human milk-derived contigs, 30,128 open reading frames were annotated and assigned to functional categories. When compared to the metagenome of infants’ and mothers’ feces, the human milk metagenome was less diverse at the phylum level, and contained more open reading frames associated with nitrogen metabolism, membrane transport and stress response (P < 0.05). The human milk metagenome also contained a similar occurrence of immune-modulatory DNA motifs to that of infants’ and mothers’ fecal metagenomes. Conclusions Our results further expand the complexity of the human milk metagenome and enforce the benefits of human milk ingestion on the microbial colonization of the infant gut and immunity. Discovery of immune-modulatory motifs in the metagenome of human milk indicates more exhaustive analyses of the

  10. Human milk metagenome: a functional capacity analysis.

    PubMed

    Ward, Tonya L; Hosid, Sergey; Ioshikhes, Ilya; Altosaar, Illimar

    2013-05-25

    Human milk contains a diverse population of bacteria that likely influences colonization of the infant gastrointestinal tract. Recent studies, however, have been limited to characterization of this microbial community by 16S rRNA analysis. In the present study, a metagenomic approach using Illumina sequencing of a pooled milk sample (ten donors) was employed to determine the genera of bacteria and the types of bacterial open reading frames in human milk that may influence bacterial establishment and stability in this primal food matrix. The human milk metagenome was also compared to that of breast-fed and formula-fed infants' feces (n = 5, each) and mothers' feces (n = 3) at the phylum level and at a functional level using open reading frame abundance. Additionally, immune-modulatory bacterial-DNA motifs were also searched for within human milk. The bacterial community in human milk contained over 360 prokaryotic genera, with sequences aligning predominantly to the phyla of Proteobacteria (65%) and Firmicutes (34%), and the genera of Pseudomonas (61.1%), Staphylococcus (33.4%) and Streptococcus (0.5%). From assembled human milk-derived contigs, 30,128 open reading frames were annotated and assigned to functional categories. When compared to the metagenome of infants' and mothers' feces, the human milk metagenome was less diverse at the phylum level, and contained more open reading frames associated with nitrogen metabolism, membrane transport and stress response (P < 0.05). The human milk metagenome also contained a similar occurrence of immune-modulatory DNA motifs to that of infants' and mothers' fecal metagenomes. Our results further expand the complexity of the human milk metagenome and enforce the benefits of human milk ingestion on the microbial colonization of the infant gut and immunity. Discovery of immune-modulatory motifs in the metagenome of human milk indicates more exhaustive analyses of the functionality of the human milk metagenome are

  11. EBI metagenomics--a new resource for the analysis and archiving of metagenomic data.

    PubMed

    Hunter, Sarah; Corbett, Matthew; Denise, Hubert; Fraser, Matthew; Gonzalez-Beltran, Alejandra; Hunter, Christopher; Jones, Philip; Leinonen, Rasko; McAnulla, Craig; Maguire, Eamonn; Maslen, John; Mitchell, Alex; Nuka, Gift; Oisel, Arnaud; Pesseat, Sebastien; Radhakrishnan, Rajesh; Rocca-Serra, Philippe; Scheremetjew, Maxim; Sterk, Peter; Vaughan, Daniel; Cochrane, Guy; Field, Dawn; Sansone, Susanna-Assunta

    2014-01-01

    Metagenomics is a relatively recently established but rapidly expanding field that uses high-throughput next-generation sequencing technologies to characterize the microbial communities inhabiting different ecosystems (including oceans, lakes, soil, tundra, plants and body sites). Metagenomics brings with it a number of challenges, including the management, analysis, storage and sharing of data. In response to these challenges, we have developed a new metagenomics resource (http://www.ebi.ac.uk/metagenomics/) that allows users to easily submit raw nucleotide reads for functional and taxonomic analysis by a state-of-the-art pipeline, and have them automatically stored (together with descriptive, standards-compliant metadata) in the European Nucleotide Archive.

  12. Chitinase genes revealed and compared in bacterial isolates, DNA extracts and a metagenomic library from a phytopathogen suppressive soil

    SciTech Connect

    Hjort, K.; Bergstrom, M.; Adesina, M.F.; Jansson, J.K.; Smalla, K.; Sjoling, S.

    2009-09-01

    Soil that is suppressive to disease caused by fungal pathogens is an interesting source to target for novel chitinases that might be contributing towards disease suppression. In this study we screened for chitinase genes, in a phytopathogen-suppressive soil in three ways: (1) from a metagenomic library constructed from microbial cells extracted from soil, (2) from directly extracted DNA and (3) from bacterial isolates with antifungal and chitinase activities. Terminal-restriction fragment length polymorphism (T-RFLP) of chitinase genes revealed differences in amplified chitinase genes from the metagenomic library and the directly extracted DNA, but approximately 40% of the identified chitinase terminal-restriction fragments (TRFs) were found in both sources. All of the chitinase TRFs from the isolates were matched to TRFs in the directly extracted DNA and the metagenomic library. The most abundant chitinase TRF in the soil DNA and the metagenomic library corresponded to the TRF{sup 103} of the isolate, Streptomyces mutomycini and/or Streptomyces clavifer. There were good matches between T-RFLP profiles of chitinase gene fragments obtained from different sources of DNA. However, there were also differences in both the chitinase and the 16S rRNA gene T-RFLP patterns depending on the source of DNA, emphasizing the lack of complete coverage of the gene diversity by any of the approaches used.

  13. The Sorcerer II Global Ocean Sampling Expedition: Metagenomic Characterization of Viruses within Aquatic Microbial Samples

    PubMed Central

    Williamson, Shannon J.; Rusch, Douglas B.; Yooseph, Shibu; Halpern, Aaron L.; Heidelberg, Karla B.; Glass, John I.; Andrews-Pfannkoch, Cynthia; Fadrosh, Douglas; Miller, Christopher S.; Sutton, Granger; Frazier, Marvin; Venter, J. Craig

    2008-01-01

    Viruses are the most abundant biological entities on our planet. Interactions between viruses and their hosts impact several important biological processes in the world's oceans such as horizontal gene transfer, microbial diversity and biogeochemical cycling. Interrogation of microbial metagenomic sequence data collected as part of the Sorcerer II Global Ocean Expedition (GOS) revealed a high abundance of viral sequences, representing approximately 3% of the total predicted proteins. Cluster analyses of the viral sequences revealed hundreds to thousands of viral genes encoding various metabolic and cellular functions. Quantitative analyses of viral genes of host origin performed on the viral fraction of aquatic samples confirmed the viral nature of these sequences and suggested that significant portions of aquatic viral communities behave as reservoirs of such genetic material. Distributional and phylogenetic analyses of these host-derived viral sequences also suggested that viral acquisition of environmentally relevant genes of host origin is a more abundant and widespread phenomenon than previously appreciated. The predominant viral sequences identified within microbial fractions originated from tailed bacteriophages and exhibited varying global distributions according to viral family. Recruitment of GOS viral sequence fragments against 27 complete aquatic viral genomes revealed that only one reference bacteriophage genome was highly abundant and was closely related, but not identical, to the cyanomyovirus P-SSM4. The co-distribution across all sampling sites of P-SSM4-like sequences with the dominant ecotype of its host, Prochlorococcus supports the classification of the viral sequences as P-SSM4-like and suggests that this virus may influence the abundance, distribution and diversity of one of the most dominant components of picophytoplankton in oligotrophic oceans. In summary, the abundance and broad geographical distribution of viral sequences within

  14. The Sorcerer II Global Ocean Sampling Expedition: metagenomic characterization of viruses within aquatic microbial samples.

    PubMed

    Williamson, Shannon J; Rusch, Douglas B; Yooseph, Shibu; Halpern, Aaron L; Heidelberg, Karla B; Glass, John I; Andrews-Pfannkoch, Cynthia; Fadrosh, Douglas; Miller, Christopher S; Sutton, Granger; Frazier, Marvin; Venter, J Craig

    2008-01-23

    Viruses are the most abundant biological entities on our planet. Interactions between viruses and their hosts impact several important biological processes in the world's oceans such as horizontal gene transfer, microbial diversity and biogeochemical cycling. Interrogation of microbial metagenomic sequence data collected as part of the Sorcerer II Global Ocean Expedition (GOS) revealed a high abundance of viral sequences, representing approximately 3% of the total predicted proteins. Cluster analyses of the viral sequences revealed hundreds to thousands of viral genes encoding various metabolic and cellular functions. Quantitative analyses of viral genes of host origin performed on the viral fraction of aquatic samples confirmed the viral nature of these sequences and suggested that significant portions of aquatic viral communities behave as reservoirs of such genetic material. Distributional and phylogenetic analyses of these host-derived viral sequences also suggested that viral acquisition of environmentally relevant genes of host origin is a more abundant and widespread phenomenon than previously appreciated. The predominant viral sequences identified within microbial fractions originated from tailed bacteriophages and exhibited varying global distributions according to viral family. Recruitment of GOS viral sequence fragments against 27 complete aquatic viral genomes revealed that only one reference bacteriophage genome was highly abundant and was closely related, but not identical, to the cyanomyovirus P-SSM4. The co-distribution across all sampling sites of P-SSM4-like sequences with the dominant ecotype of its host, Prochlorococcus supports the classification of the viral sequences as P-SSM4-like and suggests that this virus may influence the abundance, distribution and diversity of one of the most dominant components of picophytoplankton in oligotrophic oceans. In summary, the abundance and broad geographical distribution of viral sequences within

  15. Estimating DNA coverage and abundance in metagenomes using a gamma approximation

    SciTech Connect

    Hooper, Sean D; Dalevi, Daniel; Pati, Amrita; Mavromatis, Konstantinos; Ivanova, Natalia N; Kyrpides, Nikos C

    2010-01-01

    Shotgun sequencing generates large numbers of short DNA reads from either an isolated organism or, in the case of metagenomics projects, from the aggregate genome of a microbial community. These reads are then assembled based on overlapping sequences into larger, contiguous sequences (contigs). The feasibility of assembly and the coverage achieved (reads per nucleotide or distinct sequence of nucleotides) depend on several factors: the number of reads sequenced, the read length and the relative abundances of their source genomes in the microbial community. A low coverage suggests that most of the genomic DNA in the sample has not been sequenced, but it is often difficult to estimate either the extent of the uncaptured diversity or the amount of additional sequencing that would be most efficacious. In this work, we regard a metagenome as a population of DNA fragments (bins), each of which may be covered by one or more reads. We employ a gamma distribution to model this bin population due to its flexibility and ease of use. When a gamma approximation can be found that adequately fits the data, we may estimate the number of bins that were not sequenced and that could potentially be revealed by additional sequencing. We evaluated the performance of this model using simulated metagenomes and demonstrate its applicability on three recent metagenomic datasets.

  16. IMG/M 4 version of the integrated metagenome comparative analysis system

    PubMed Central

    Markowitz, Victor M.; Chen, I-Min A.; Chu, Ken; Szeto, Ernest; Palaniappan, Krishna; Pillay, Manoj; Ratner, Anna; Huang, Jinghua; Pagani, Ioanna; Tringe, Susannah; Huntemann, Marcel; Billis, Konstantinos; Varghese, Neha; Tennessen, Kristin; Mavromatis, Konstantinos; Pati, Amrita; Ivanova, Natalia N.; Kyrpides, Nikos C.

    2014-01-01

    IMG/M (http://img.jgi.doe.gov/m) provides support for comparative analysis of microbial community aggregate genomes (metagenomes) in the context of a comprehensive set of reference genomes from all three domains of life, as well as plasmids, viruses and genome fragments. IMG/M’s data content and analytical tools have expanded continuously since its first version was released in 2007. Since the last report published in the 2012 NAR Database Issue, IMG/M’s database architecture, annotation and data integration pipelines and analysis tools have been extended to copewith the rapid growth in the number and size of metagenome data sets handled by the system. IMG/M data marts provide support for the analysis of publicly available genomes, expert review of metagenome annotations (IMG/M ER: http://img.jgi.doe.gov/mer) and Human Microbiome Project (HMP)-specific metagenome samples (IMG/M HMP: http://img.jgi.doe.gov/imgm_hmp). PMID:24136997

  17. Toward Accurate and Quantitative Comparative Metagenomics.

    PubMed

    Nayfach, Stephen; Pollard, Katherine S

    2016-08-25

    Shotgun metagenomics and computational analysis are used to compare the taxonomic and functional profiles of microbial communities. Leveraging this approach to understand roles of microbes in human biology and other environments requires quantitative data summaries whose values are comparable across samples and studies. Comparability is currently hampered by the use of abundance statistics that do not estimate a meaningful parameter of the microbial community and biases introduced by experimental protocols and data-cleaning approaches. Addressing these challenges, along with improving study design, data access, metadata standardization, and analysis tools, will enable accurate comparative metagenomics. We envision a future in which microbiome studies are replicable and new metagenomes are easily and rapidly integrated with existing data. Only then can the potential of metagenomics for predictive ecological modeling, well-powered association studies, and effective microbiome medicine be fully realized. Copyright © 2016 Elsevier Inc. All rights reserved.

  18. [Pathology and viral metagenomics, a recent history].

    PubMed

    Bernardo, Pauline; Albina, Emmanuel; Eloit, Marc; Roumagnac, Philippe

    2013-05-01

    Human, animal and plant viral diseases have greatly benefited from recent metagenomics developments. Viral metagenomics is a culture-independent approach used to investigate the complete viral genetic populations of a sample. During the last decade, metagenomics concepts and techniques that were first used by ecologists progressively spread into the scientific field of viral pathology. The sample, which was first for ecologists a fraction of ecosystem, became for pathologists an organism that hosts millions of microbes and viruses. This new approach, providing without a priori high resolution qualitative and quantitative data on the viral diversity, is now revolutionizing the way pathologists decipher viral diseases. This review describes the very last improvements of the high throughput next generation sequencing methods and discusses the applications of viral metagenomics in viral pathology, including discovery of novel viruses, viral surveillance and diagnostic, large-scale molecular epidemiology, and viral evolution. © 2013 médecine/sciences – Inserm.

  19. Toward Accurate and Quantitative Comparative Metagenomics

    PubMed Central

    Nayfach, Stephen; Pollard, Katherine S.

    2016-01-01

    Shotgun metagenomics and computational analysis are used to compare the taxonomic and functional profiles of microbial communities. Leveraging this approach to understand roles of microbes in human biology and other environments requires quantitative data summaries whose values are comparable across samples and studies. Comparability is currently hampered by the use of abundance statistics that do not estimate a meaningful parameter of the microbial community and biases introduced by experimental protocols and data-cleaning approaches. Addressing these challenges, along with improving study design, data access, metadata standardization, and analysis tools, will enable accurate comparative metagenomics. We envision a future in which microbiome studies are replicable and new metagenomes are easily and rapidly integrated with existing data. Only then can the potential of metagenomics for predictive ecological modeling, well-powered association studies, and effective microbiome medicine be fully realized. PMID:27565341

  20. Chameleon fragmentation

    SciTech Connect

    Brax, Philippe

    2014-02-01

    A scalar field dark energy candidate could couple to ordinary matter and photons, enabling its detection in laboratory experiments. Here we study the quantum properties of the chameleon field, one such dark energy candidate, in an ''afterglow'' experiment designed to produce, trap, and detect chameleon particles. In particular, we investigate the possible fragmentation of a beam of chameleon particles into multiple particle states due to the highly non-linear interaction terms in the chameleon Lagrangian. Fragmentation could weaken the constraints of an afterglow experiment by reducing the energy of the regenerated photons, but this energy reduction also provides a unique signature which could be detected by a properly-designed experiment. We show that constraints from the CHASE experiment are essentially unaffected by fragmentation for φ{sup 4} and 1/φ potentials, but are weakened for steeper potentials, and we discuss possible future afterglow experiments.

  1. Finding the needles in the metagenome haystack.

    PubMed

    Kowalchuk, George A; Speksnijder, Arjen G C L; Zhang, Kun; Goodman, Robert M; van Veen, Johannes A

    2007-04-01

    In the collective genomes (the metagenome) of the microorganisms inhabiting the Earth's diverse environments is written the history of life on this planet. New molecular tools developed and used for the past 15 years by microbial ecologists are facilitating the extraction, cloning, screening, and sequencing of these genomes. This approach allows microbial ecologists to access and study the full range of microbial diversity, regardless of our ability to culture organisms, and provides an unprecedented access to the breadth of natural products that these genomes encode. However, there is no way that the mere collection of sequences, no matter how expansive, can provide full coverage of the complex world of microbial metagenomes within the foreseeable future. Furthermore, although it is possible to fish out highly informative and useful genes from the sea of gene diversity in the environment, this can be a highly tedious and inefficient procedure. Microbial ecologists must be clever in their pursuit of ecologically relevant, valuable, and niche-defining genomic information within the vast haystack of microbial diversity. In this report, we seek to describe advances and prospects that will help microbial ecologists glean more knowledge from investigations into metagenomes. These include technological advances in sequencing and cloning methodologies, as well as improvements in annotation and comparative sequence analysis. More significant, however, will be ways to focus in on various subsets of the metagenome that may be of particular relevance, either by limiting the target community under study or improving the focus or speed of screening procedures. Lastly, given the cost and infrastructure necessary for large metagenome projects, and the almost inexhaustible amount of data they can produce, trends toward broader use of metagenome data across the research community coupled with the needed investment in bioinformatics infrastructure devoted to metagenomics will no

  2. Metagenomics insights into food fermentations.

    PubMed

    De Filippis, Francesca; Parente, Eugenio; Ercolini, Danilo

    2017-01-01

    This review describes the recent advances in the study of food microbial ecology, with a focus on food fermentations. High-throughput sequencing (HTS) technologies have been widely applied to the study of food microbial consortia and the different applications of HTS technologies were exploited in order to monitor microbial dynamics in food fermentative processes. Phylobiomics was the most explored application in the past decade. Metagenomics and metatranscriptomics, although still underexploited, promise to uncover the functionality of complex microbial consortia. The new knowledge acquired will help to understand how to make a profitable use of microbial genetic resources and modulate key activities of beneficial microbes in order to ensure process efficiency, product quality and safety. © 2016 The Authors. Microbial Biotechnology published by John Wiley & Sons Ltd and Society for Applied Microbiology.

  3. Challenges and Opportunities of Airborne Metagenomics

    PubMed Central

    Behzad, Hayedeh; Gojobori, Takashi; Mineta, Katsuhiko

    2015-01-01

    Recent metagenomic studies of environments, such as marine and soil, have significantly enhanced our understanding of the diverse microbial communities living in these habitats and their essential roles in sustaining vast ecosystems. The increase in the number of publications related to soil and marine metagenomics is in sharp contrast to those of air, yet airborne microbes are thought to have significant impacts on many aspects of our lives from their potential roles in atmospheric events such as cloud formation, precipitation, and atmospheric chemistry to their major impact on human health. In this review, we will discuss the current progress in airborne metagenomics, with a special focus on exploring the challenges and opportunities of undertaking such studies. The main challenges of conducting metagenomic studies of airborne microbes are as follows: 1) Low density of microorganisms in the air, 2) efficient retrieval of microorganisms from the air, 3) variability in airborne microbial community composition, 4) the lack of standardized protocols and methodologies, and 5) DNA sequencing and bioinformatics-related challenges. Overcoming these challenges could provide the groundwork for comprehensive analysis of airborne microbes and their potential impact on the atmosphere, global climate, and our health. Metagenomic studies offer a unique opportunity to examine viral and bacterial diversity in the air and monitor their spread locally or across the globe, including threats from pathogenic microorganisms. Airborne metagenomic studies could also lead to discoveries of novel genes and metabolic pathways relevant to meteorological and industrial applications, environmental bioremediation, and biogeochemical cycles. PMID:25953766

  4. ARGs-OAP: online analysis pipeline for antibiotic resistance genes detection from metagenomic data using an integrated structured ARG-database.

    PubMed

    Yang, Ying; Jiang, Xiaotao; Chai, Benli; Ma, Liping; Li, Bing; Zhang, Anni; Cole, James R; Tiedje, James M; Zhang, Tong

    2016-08-01

    Environmental dissemination of antibiotic resistance genes (ARGs) has become an increasing concern for public health. Metagenomics approaches can effectively detect broad profiles of ARGs in environmental samples; however, the detection and subsequent classification of ARG-like sequences are time consuming and have been severe obstacles in employing metagenomic methods. We sought to accelerate quantification of ARGs in metagenomic data from environmental samples. A Structured ARG reference database (SARG) was constructed by integrating ARDB and CARD, the two most commonly used databases. SARG was curated to remove redundant sequences and optimized to facilitate query sequence identification by similarity. A database with a hierarchical structure (type-subtype-reference sequence) was then constructed to facilitate classification (assigning ARG-like sequence to type, subtype and reference sequence) of sequences identified through similarity search. Utilizing SARG and a previously proposed hybrid functional gene annotation pipeline, we developed an online pipeline called ARGs-OAP for fast annotation and classification of ARG-like sequences from metagenomic data. We also evaluated and proposed a set of criteria important for efficiently conducting metagenomic analysis of ARGs using ARGs-OAP. Perl script for ARGs-OAP can be downloaded from https://github.com/biofuture/Ublastx_stageone ARGs-OAP can be accessed through http://smile.hku.hk/SARGs zhangt@hku.hk or tiedjej@msu.edu Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  5. Metagenomics of Glassy-Winged Sharpshooter, Homalodisca vitripennis (Hemiptera: Cicadellidae)

    USDA-ARS?s Scientific Manuscript database

    A Metagenomics approach was used to identify unknown organisms which live in association with the glassy-winged sharpshooter, Homalodisca vitripennis (Hemiptera: Cicadellidae). Metagenomics combines molecular biology and genetics to identify, and characterize genetic material from unique biological ...

  6. Fast comparison of genomic and meta-genomic reads with alignment-free measures based on quality values.

    PubMed

    Comin, Matteo; Schimd, Michele

    2016-08-12

    Sequencing technologies are generating enormous amounts of read data, however assembly of genomes and metagenomes remain among the most challenging tasks. In this paper we study the comparison of genomes and metagenomes only based on read data, using word counts statistics called alignment-free thus not requiring reference genomes or assemblies. Quality scores produced by sequencing platforms are fundamental for various analyses, moreover future-generation sequencing platforms, will produce longer reads but with error rate around 15 %. In this context it will be fundamental to exploit quality values information within the framework of alignment-free measures. In this paper we present a family of alignment-free measures, called d (q) -type, that are based on k-mer counts and quality values. These statistics can be used to compare genomes and metagenomes based on their read sets. Results show that the evolutionary relationship of genomes can be reconstructed based on the direct comparison of theirs reads sets. The use of quality values on average improves the classification accuracy, and its contribution increases when the reads are more noisy. Also the comparison of metagenomic microbial communities can be performed efficiently. Similar metagenomes are quickly detected, just by processing their read data, without the need of costly alignments.

  7. Metagenome resource for D-serine utilization in a DsdA-disrupted Escherichia coli.

    PubMed

    Lim, Mi Young; Lee, Hyo Jeong; Kim, Pil

    2011-04-01

    To find alternative genetic resources for D-serine dehydratase (E.C. 4.3.1.18, dsdA) mediating the deamination of D-serine into pyruvate, metagenomic libraries were screened. The chromosomal dsdA gene of a wild-type Escherichia coli W3110 strain was disrupted by inserting the tetracycline resistance gene (tet), using double-crossover, for use as a screening host. The W3110 dsdA::tet strain was not able to grow in a medium containing D-serine as a sole carbon source, whereas wild-type W3110 and the complement W3110 dsdA::tet strain containing a dsdA-expression plasmid were able to grow. After introducing metagenome libraries into the screening host, a strain containing a 40-kb DNA fragment obtained from the metagenomic souce derived from a compost was selected based on its capability to grow on the agar plate containing D-serine as a sole carbon source. For identification of the genetic resource responsible for the D-serine degrading capability, transposon- micron was randomly inserted into the 40-kb metagenome. Two strains that had lost their D-serine degrading ability were negatively selected, and the two 6-kb contigs responsible for the D-serine degrading capability were sequenced and deposited (GenBank code: HQ829474.1 and HQ829475.1). Therefore, new alternative genetic resources for D-serine dehydratase was found from the metagenomic resource, and the corresponding ORFs are discussed.

  8. The Source and Evolutionary History of a Microbial Contaminant Identified Through Soil Metagenomic Analysis

    PubMed Central

    Olm, Matthew R.; Butterfield, Cristina N.; Copeland, Alex; Boles, T. Christian; Thomas, Brian C.

    2017-01-01

    ABSTRACT In this study, strain-resolved metagenomics was used to solve a mystery. A 6.4-Mbp complete closed genome was recovered from a soil metagenome and found to be astonishingly similar to that of Delftia acidovorans SPH-1, which was isolated in Germany a decade ago. It was suspected that this organism was not native to the soil sample because it lacked the diversity that is characteristic of other soil organisms; this suspicion was confirmed when PCR testing failed to detect the bacterium in the original soil samples. D. acidovorans was also identified in 16 previously published metagenomes from multiple environments, but detailed-scale single nucleotide polymorphism analysis grouped these into five distinct clades. All of the strains indicated as contaminants fell into one clade. Fragment length anomalies were identified in paired reads mapping to the contaminant clade genotypes only. This finding was used to establish that the DNA was present in specific size selection reagents used during sequencing. Ultimately, the source of the contaminant was identified as bacterial biofilms growing in tubing. On the basis of direct measurement of the rate of fixation of mutations across the period of time in which contamination was occurring, we estimated the time of separation of the contaminant strain from the genomically sequenced ancestral population within a factor of 2. This research serves as a case study of high-resolution microbial forensics and strain tracking accomplished through metagenomics-based comparative genomics. The specific case reported here is unusual in that the study was conducted in the background of a soil metagenome and the conclusions were confirmed by independent methods. PMID:28223457

  9. Fragmentation Processes

    NASA Astrophysics Data System (ADS)

    Whelan, Colm T.

    2012-12-01

    Preface; 1. Direct and resonant double-photoionization: from atoms to solids L. Avaldi and G. Stefani; 2. The application of propagation exterior complex scaling to atomic collisions P. L. Bartlett and A. T. Stelbovics; 3. Fragmentation of molecular-ion beams in intense ultra-short laser pulses I. Ben-Itzhak; 4. Atoms with one and two active electrons in strong laser fields I. A. Ivanov and A. S. Kheifets; 5. Experimental aspects of ionization studies by positron and positronium impact G. Laricchia, D. A. Cooke, Á. Kövér and S. J. Brawley; 6. (e,2e) spectroscopy using fragmentation processes J. Lower, M. Yamazaki and M. Takahashi; 7. A coupled pseudostate approach to the calculation of ion-atom fragmentation processes M. McGovern, H. R. J. Walters and C. T. Whelan; 8. Electron Impact Ionization using (e,2e) coincidence techniques from threshold to intermediate energies A. J. Murray; 9. (e,2e) processes on atomic inner shells C. T. Whelan; 10. Spin resolved atomic (e,2e) processes J. Lower and C. T. Whelan; Index.

  10. Soil Metagenomes from Different Pristine Environments of Northwest Argentina

    PubMed Central

    Colman, Déborah I.

    2015-01-01

    This is the first study to use a high-throughput metagenomic shotgun approach to explore the biosynthetic potential of soil metagenomes from different pristine environments of northwest Argentina. Our data sets characterize these metagenomes and provide information on the possible effect these ecosystems have on their diversity and biosynthetic potential. PMID:26272581

  11. Soil Metagenomes from Different Pristine Environments of Northwest Argentina.

    PubMed

    McCarthy, Christina B; Colman, Déborah I

    2015-08-13

    This is the first study to use a high-throughput metagenomic shotgun approach to explore the biosynthetic potential of soil metagenomes from different pristine environments of northwest Argentina. Our data sets characterize these metagenomes and provide information on the possible effect these ecosystems have on their diversity and biosynthetic potential.

  12. Under-detection of endospore-forming Firmicutes in metagenomic data.

    PubMed

    Filippidou, Sevasti; Junier, Thomas; Wunderlin, Tina; Lo, Chien-Chi; Li, Po-E; Chain, Patrick S; Junier, Pilar

    2015-01-01

    Microbial diversity studies based on metagenomic sequencing have greatly enhanced our knowledge of the microbial world. However, one caveat is the fact that not all microorganisms are equally well detected, questioning the universality of this approach. Firmicutes are known to be a dominant bacterial group. Several Firmicutes species are endospore formers and this property makes them hardy in potentially harsh conditions, and thus likely to be present in a wide variety of environments, even as residents and not functional players. While metagenomic libraries can be expected to contain endospore formers, endospores are known to be resilient to many traditional methods of DNA isolation and thus potentially undetectable. In this study we evaluated the representation of endospore-forming Firmicutes in 73 published metagenomic datasets using two molecular markers unique to this bacterial group (spo0A and gpr). Both markers were notably absent in well-known habitats of Firmicutes such as soil, with spo0A found only in three mammalian gut microbiomes. A tailored DNA extraction method resulted in the detection of a large diversity of endospore-formers in amplicon sequencing of the 16S rRNA and spo0A genes. However, shotgun classification was still poor with only a minor fraction of the community assigned to Firmicutes. Thus, removing a specific bias in a molecular workflow improves detection in amplicon sequencing, but it was insufficient to overcome the limitations for detecting endospore-forming Firmicutes in whole-genome metagenomics. In conclusion, this study highlights the importance of understanding the specific methodological biases that can contribute to improve the universality of metagenomic approaches.

  13. Under-detection of endospore-forming Firmicutes in metagenomic data

    SciTech Connect

    Filippidou, Sevasti; Junier, Thomas; Wunderlin, Tina; Lo, Chien -Chi; Li, Po -E; Chain, Patrick S.; Junier, Pilar

    2015-04-25

    Microbial diversity studies based on metagenomic sequencing have greatly enhanced our knowledge of the microbial world. However, one caveat is the fact that not all microorganisms are equally well detected, questioning the universality of this approach. Firmicutes are known to be a dominant bacterial group. Several Firmicutes species are endospore formers and this property makes them hardy in potentially harsh conditions, and thus likely to be present in a wide variety of environments, even as residents and not functional players. While metagenomic libraries can be expected to contain endospore formers, endospores are known to be resilient to many traditional methods of DNA isolation and thus potentially undetectable. In this study we evaluated the representation of endospore-forming Firmicutes in 73 published metagenomic datasets using two molecular markers unique to this bacterial group (spo0A and gpr). Both markers were notably absent in well-known habitats of Firmicutes such as soil, with spo0A found only in three mammalian gut microbiomes. A tailored DNA extraction method resulted in the detection of a large diversity of endospore-formers in amplicon sequencing of the 16S rRNA and spo0A genes. However, shotgun classification was still poor with only a minor fraction of the community assigned to Firmicutes. Thus, removing a specific bias in a molecular workflow improves detection in amplicon sequencing, but it was insufficient to overcome the limitations for detecting endospore-forming Firmicutes in whole-genome metagenomics. In conclusion, this study highlights the importance of understanding the specific methodological biases that can contribute to improve the universality of metagenomic approaches.

  14. Under-detection of endospore-forming Firmicutes in metagenomic data

    DOE PAGES

    Filippidou, Sevasti; Junier, Thomas; Wunderlin, Tina; ...

    2015-04-25

    Microbial diversity studies based on metagenomic sequencing have greatly enhanced our knowledge of the microbial world. However, one caveat is the fact that not all microorganisms are equally well detected, questioning the universality of this approach. Firmicutes are known to be a dominant bacterial group. Several Firmicutes species are endospore formers and this property makes them hardy in potentially harsh conditions, and thus likely to be present in a wide variety of environments, even as residents and not functional players. While metagenomic libraries can be expected to contain endospore formers, endospores are known to be resilient to many traditional methodsmore » of DNA isolation and thus potentially undetectable. In this study we evaluated the representation of endospore-forming Firmicutes in 73 published metagenomic datasets using two molecular markers unique to this bacterial group (spo0A and gpr). Both markers were notably absent in well-known habitats of Firmicutes such as soil, with spo0A found only in three mammalian gut microbiomes. A tailored DNA extraction method resulted in the detection of a large diversity of endospore-formers in amplicon sequencing of the 16S rRNA and spo0A genes. However, shotgun classification was still poor with only a minor fraction of the community assigned to Firmicutes. Thus, removing a specific bias in a molecular workflow improves detection in amplicon sequencing, but it was insufficient to overcome the limitations for detecting endospore-forming Firmicutes in whole-genome metagenomics. In conclusion, this study highlights the importance of understanding the specific methodological biases that can contribute to improve the universality of metagenomic approaches.« less

  15. Under-detection of endospore-forming Firmicutes in metagenomic data

    PubMed Central

    Filippidou, Sevasti; Junier, Thomas; Wunderlin, Tina; Lo, Chien-Chi; Li, Po-E; Chain, Patrick S.; Junier, Pilar

    2015-01-01

    Microbial diversity studies based on metagenomic sequencing have greatly enhanced our knowledge of the microbial world. However, one caveat is the fact that not all microorganisms are equally well detected, questioning the universality of this approach. Firmicutes are known to be a dominant bacterial group. Several Firmicutes species are endospore formers and this property makes them hardy in potentially harsh conditions, and thus likely to be present in a wide variety of environments, even as residents and not functional players. While metagenomic libraries can be expected to contain endospore formers, endospores are known to be resilient to many traditional methods of DNA isolation and thus potentially undetectable. In this study we evaluated the representation of endospore-forming Firmicutes in 73 published metagenomic datasets using two molecular markers unique to this bacterial group (spo0A and gpr). Both markers were notably absent in well-known habitats of Firmicutes such as soil, with spo0A found only in three mammalian gut microbiomes. A tailored DNA extraction method resulted in the detection of a large diversity of endospore-formers in amplicon sequencing of the 16S rRNA and spo0A genes. However, shotgun classification was still poor with only a minor fraction of the community assigned to Firmicutes. Thus, removing a specific bias in a molecular workflow improves detection in amplicon sequencing, but it was insufficient to overcome the limitations for detecting endospore-forming Firmicutes in whole-genome metagenomics. In conclusion, this study highlights the importance of understanding the specific methodological biases that can contribute to improve the universality of metagenomic approaches. PMID:25973144

  16. Metagenomics for pathogen detection in public health

    PubMed Central

    2013-01-01

    Traditional pathogen detection methods in public health infectious disease surveillance rely upon the identification of agents that are already known to be associated with a particular clinical syndrome. The emerging field of metagenomics has the potential to revolutionize pathogen detection in public health laboratories by allowing the simultaneous detection of all microorganisms in a clinical sample, without a priori knowledge of their identities, through the use of next-generation DNA sequencing. A single metagenomics analysis has the potential to detect rare and novel pathogens, and to uncover the role of dysbiotic microbiomes in infectious and chronic human disease. Making use of advances in sequencing platforms and bioinformatics tools, recent studies have shown that metagenomics can even determine the whole-genome sequences of pathogens, allowing inferences about antibiotic resistance, virulence, evolution and transmission to be made. We are entering an era in which more novel infectious diseases will be identified through metagenomics-based methods than through traditional laboratory methods. The impetus is now on public health laboratories to integrate metagenomics techniques into their diagnostic arsenals. PMID:24050114

  17. Metagenomic applications in environmental monitoring and bioremediation

    DOE PAGES

    Techtmann, Stephen M.; Hazen, Terry C.

    2016-01-01

    With the rapid advances in sequencing technology, the cost of sequencing has dramatically dropped and the scale of sequencing projects has increased accordingly. This has provided the opportunity for the routine use of sequencing techniques in the monitoring of environmental microbes. While metagenomic applications have been routinely applied to better understand the ecology and diversity of microbes, their use in environmental monitoring and bioremediation is increasingly common. In this review we seek to provide an overview of some of the metagenomic techniques used in environmental systems biology, addressing their application and limitation. We will also provide several recent examples ofmore » the application of metagenomics to bioremediation. We discuss examples where microbial communities have been used to predict the presence and extent of contamination, examples of how metagenomics can be used to characterize the process of natural attenuation by unculturable microbes, as well as examples detailing the use of metagenomics to understand the impact of biostimulation on microbial communities.« less

  18. Preliminary High-Throughput Metagenome Assembly

    SciTech Connect

    Dusheyko, Serge; Furman, Craig; Pangilinan, Jasmyn; Shapiro, Harris; Tu, Hank

    2007-03-26

    Metagenome data sets present a qualitatively different assembly problem than traditional single-organism whole-genome shotgun (WGS) assembly. The unique aspects of such projects include the presence of a potentially large number of distinct organisms and their representation in the data set at widely different fractions. In addition, multiple closely related strains could be present, which would be difficult to assemble separately. Failure to take these issues into account can result in poor assemblies that either jumble together different strains or which fail to yield useful results. The DOE Joint Genome Institute has sequenced a number of metagenomic projects and plans to considerably increase this number in the coming year. As a result, the JGI has a need for high-throughput tools and techniques for handling metagenome projects. We present the techniques developed to handle metagenome assemblies in a high-throughput environment. This includes a streamlined assembly wrapper, based on the JGI?s in-house WGS assembler, Jazz. It also includes the selection of sensible defaults targeted for metagenome data sets, as well as quality control automation for cleaning up the raw results. While analysis is ongoing, we will discuss preliminary assessments of the quality of the assembly results (http://fames.jgi-psf.org).

  19. Viral Metagenomics: MetaView Software

    SciTech Connect

    Zhou, C; Smith, J

    2007-10-22

    The purpose of this report is to design and develop a tool for analysis of raw sequence read data from viral metagenomics experiments. The tool should compare read sequences of known viral nucleic acid sequence data and enable a user to attempt to determine, with some degree of confidence, what virus groups may be present in the sample. This project was conducted in two phases. In phase 1 we surveyed the literature and examined existing metagenomics tools to educate ourselves and to more precisely define the problem of analyzing raw read data from viral metagenomic experiments. In phase 2 we devised an approach and built a prototype code and database. This code takes viral metagenomic read data in fasta format as input and accesses all complete viral genomes from Kpath for sequence comparison. The system executes at the UNIX command line, producing output that is stored in an Oracle relational database. We provide here a description of the approach we came up with for handling un-assembled, short read data sets from viral metagenomics experiments. We include a discussion of the current MetaView code capabilities and additional functionality that we believe should be added, should additional funding be acquired to continue the work.

  20. Shotgun metagenomic data streams: surfing without fear

    SciTech Connect

    Berendzen, Joel R

    2010-12-06

    Timely information about bio-threat prevalence, consequence, propagation, attribution, and mitigation is needed to support decision-making, both routinely and in a crisis. One DNA sequencer can stream 25 Gbp of information per day, but sampling strategies and analysis techniques are needed to turn raw sequencing power into actionable knowledge. Shotgun metagenomics can enable biosurveillance at the level of a single city, hospital, or airplane. Metagenomics characterizes viruses and bacteria from complex environments such as soil, air filters, or sewage. Unlike targeted-primer-based sequencing, shotgun methods are not blind to sequences that are truly novel, and they can measure absolute prevalence. Shotgun metagenomic sampling can be non-invasive, efficient, and inexpensive while being informative. We have developed analysis techniques for shotgun metagenomic sequencing that rely upon phylogenetic signature patterns. They work by indexing local sequence patterns in a manner similar to web search engines. Our methods are laptop-fast and favorable scaling properties ensure they will be sustainable as sequencing methods grow. We show examples of application to soil metagenomic samples.

  1. Metazen – metadata capture for metagenomes

    DOE PAGES

    Bischof, Jared; Harrison, Travis; Paczian, Tobias; ...

    2014-12-08

    Background: As the impact and prevalence of large-scale metagenomic surveys grow, so does the acute need for more complete and standards compliant metadata. Metadata (data describing data) provides an essential complement to experimental data, helping to answer questions about its source, mode of collection, and reliability. Metadata collection and interpretation have become vital to the genomics and metagenomics communities, but considerable challenges remain, including exchange, curation, and distribution. Currently, tools are available for capturing basic field metadata during sampling, and for storing, updating and viewing it. These tools are not specifically designed for metagenomic surveys; in particular, they lack themore » appropriate metadata collection templates, a centralized storage repository, and a unique ID linking system that can be used to easily port complete and compatible metagenomic metadata into widely used assembly and sequence analysis tools. Results: Metazen was developed as a comprehensive framework designed to enable metadata capture for metagenomic sequencing projects. Specifically, Metazen provides a rapid, easy-to-use portal to encourage early deposition of project and sample metadata. Conclusion: Metazen is an interactive tool that aids users in recording their metadata in a complete and valid format. A defined set of mandatory fields captures vital information, while the option to add fields provides flexibility.« less

  2. Metazen – metadata capture for metagenomes

    SciTech Connect

    Bischof, Jared; Harrison, Travis; Paczian, Tobias; Glass, Elizabeth; Wilke, Andreas; Meyer, Folker

    2014-12-08

    Background: As the impact and prevalence of large-scale metagenomic surveys grow, so does the acute need for more complete and standards compliant metadata. Metadata (data describing data) provides an essential complement to experimental data, helping to answer questions about its source, mode of collection, and reliability. Metadata collection and interpretation have become vital to the genomics and metagenomics communities, but considerable challenges remain, including exchange, curation, and distribution. Currently, tools are available for capturing basic field metadata during sampling, and for storing, updating and viewing it. These tools are not specifically designed for metagenomic surveys; in particular, they lack the appropriate metadata collection templates, a centralized storage repository, and a unique ID linking system that can be used to easily port complete and compatible metagenomic metadata into widely used assembly and sequence analysis tools. Results: Metazen was developed as a comprehensive framework designed to enable metadata capture for metagenomic sequencing projects. Specifically, Metazen provides a rapid, easy-to-use portal to encourage early deposition of project and sample metadata. Conclusion: Metazen is an interactive tool that aids users in recording their metadata in a complete and valid format. A defined set of mandatory fields captures vital information, while the option to add fields provides flexibility.

  3. Metagenomics: Application of Genomics to Uncultured Microorganisms

    PubMed Central

    Handelsman, Jo

    2004-01-01

    Metagenomics (also referred to as environmental and community genomics) is the genomic analysis of microorganisms by direct extraction and cloning of DNA from an assemblage of microorganisms. The development of metagenomics stemmed from the ineluctable evidence that as-yet-uncultured microorganisms represent the vast majority of organisms in most environments on earth. This evidence was derived from analyses of 16S rRNA gene sequences amplified directly from the environment, an approach that avoided the bias imposed by culturing and led to the discovery of vast new lineages of microbial life. Although the portrait of the microbial world was revolutionized by analysis of 16S rRNA genes, such studies yielded only a phylogenetic description of community membership, providing little insight into the genetics, physiology, and biochemistry of the members. Metagenomics provides a second tier of technical innovation that facilitates study of the physiology and ecology of environmental microorganisms. Novel genes and gene products discovered through metagenomics include the first bacteriorhodopsin of bacterial origin; novel small molecules with antimicrobial activity; and new members of families of known proteins, such as an Na+(Li+)/H+ antiporter, RecA, DNA polymerase, and antibiotic resistance determinants. Reassembly of multiple genomes has provided insight into energy and nutrient cycling within the community, genome structure, gene function, population genetics and microheterogeneity, and lateral gene transfer among members of an uncultured community. The application of metagenomic sequence information will facilitate the design of better culturing strategies to link genomic analysis with pure culture studies. PMID:15590779

  4. Metazen – metadata capture for metagenomes

    PubMed Central

    2014-01-01

    Background As the impact and prevalence of large-scale metagenomic surveys grow, so does the acute need for more complete and standards compliant metadata. Metadata (data describing data) provides an essential complement to experimental data, helping to answer questions about its source, mode of collection, and reliability. Metadata collection and interpretation have become vital to the genomics and metagenomics communities, but considerable challenges remain, including exchange, curation, and distribution. Currently, tools are available for capturing basic field metadata during sampling, and for storing, updating and viewing it. Unfortunately, these tools are not specifically designed for metagenomic surveys; in particular, they lack the appropriate metadata collection templates, a centralized storage repository, and a unique ID linking system that can be used to easily port complete and compatible metagenomic metadata into widely used assembly and sequence analysis tools. Results Metazen was developed as a comprehensive framework designed to enable metadata capture for metagenomic sequencing projects. Specifically, Metazen provides a rapid, easy-to-use portal to encourage early deposition of project and sample metadata. Conclusions Metazen is an interactive tool that aids users in recording their metadata in a complete and valid format. A defined set of mandatory fields captures vital information, while the option to add fields provides flexibility. PMID:25780508

  5. Viral metagenomics and blood safety.

    PubMed

    Sauvage, V; Eloit, M

    2016-02-01

    The characterization of the human blood-associated viral community (also called blood virome) is essential for epidemiological surveillance and to anticipate new potential threats for blood transfusion safety. Currently, the risk of blood-borne agent transmission of well-known viruses (HBV, HCV, HIV and HTLV) can be considered as under control in high-resource countries. However, other viruses unknown or unsuspected may be transmitted to recipients by blood-derived products. This is particularly relevant considering that a significant proportion of transfused patients are immunocompromised and more frequently subjected to fatal outcomes. Several measures to prevent transfusion transmission of unknown viruses have been implemented including the exclusion of at-risk donors, leukocyte reduction of donor blood, and physicochemical treatment of the different blood components. However, up to now there is no universal method for pathogen inactivation, which would be applicable for all types of blood components and, equally effective for all viral families. In addition, among available inactivation procedures of viral genomes, some of them are recognized to be less effective on non-enveloped viruses, and inadequate to inactivate higher viral titers in plasma pools or derivatives. Given this, there is the need to implement new methodologies for the discovery of unknown viruses that may affect blood transfusion. Viral metagenomics combined with High Throughput Sequencing appears as a promising approach for the identification and global surveillance of new and/or unexpected viruses that could impair blood transfusion safety. Copyright © 2015 Elsevier Masson SAS. All rights reserved.

  6. RiboFR-Seq: a novel approach to linking 16S rRNA amplicon profiles to metagenomes

    PubMed Central

    Zhang, Yanming; Ji, Peifeng; Wang, Jinfeng; Zhao, Fangqing

    2016-01-01

    16S rRNA amplicon analysis and shotgun metagenome sequencing are two main culture-independent strategies to explore the genetic landscape of various microbial communities. Recently, numerous studies have employed these two approaches together, but downstream data analyses were performed separately, which always generated incongruent or conflict signals on both taxonomic and functional classifications. Here we propose a novel approach, RiboFR-Seq (Ribosomal RNA gene flanking region sequencing), for capturing both ribosomal RNA variable regions and their flanking protein-coding genes simultaneously. Through extensive testing on clonal bacterial strain, salivary microbiome and bacterial epibionts of marine kelp, we demonstrated that RiboFR-Seq could detect the vast majority of bacteria not only in well-studied microbiomes but also in novel communities with limited reference genomes. Combined with classical amplicon sequencing and shotgun metagenome sequencing, RiboFR-Seq can link the annotations of 16S rRNA and metagenomic contigs to make a consensus classification. By recognizing almost all 16S rRNA copies, the RiboFR-seq approach can effectively reduce the taxonomic abundance bias resulted from 16S rRNA copy number variation. We believe that RiboFR-Seq, which provides an integrated view of 16S rRNA profiles and metagenomes, will help us better understand diverse microbial communities. PMID:26984526

  7. The metagenomics RAST server – a public resource for the automatic phylogenetic and functional analysis of metagenomes

    PubMed Central

    Meyer, F; Paarmann, D; D'Souza, M; Olson, R; Glass, EM; Kubal, M; Paczian, T; Rodriguez, A; Stevens, R; Wilke, A; Wilkening, J; Edwards, RA

    2008-01-01

    Background Random community genomes (metagenomes) are now commonly used to study microbes in different environments. Over the past few years, the major challenge associated with metagenomics shifted from generating to analyzing sequences. High-throughput, low-cost next-generation sequencing has provided access to metagenomics to a wide range of researchers. Results A high-throughput pipeline has been constructed to provide high-performance computing to all researchers interested in using metagenomics. The pipeline produces automated functional assignments of sequences in the metagenome by comparing both protein and nucleotide databases. Phylogenetic and functional summaries of the metagenomes are generated, and tools for comparative metagenomics are incorporated into the standard views. User access is controlled to ensure data privacy, but the collaborative environment underpinning the service provides a framework for sharing datasets between multiple users. In the metagenomics RAST, all users retain full control of their data, and everything is available for download in a variety of formats. Conclusion The open-source metagenomics RAST service provides a new paradigm for the annotation and analysis of metagenomes. With built-in support for multiple data sources and a back end that houses abstract data types, the metagenomics RAST is stable, extensible, and freely available to all researchers. This service has removed one of the primary bottlenecks in metagenome sequence analysis – the availability of high-performance computing for annotating the data. PMID:18803844

  8. FANTOM: Functional and taxonomic analysis of metagenomes

    PubMed Central

    2013-01-01

    Background Interpretation of quantitative metagenomics data is important for our understanding of ecosystem functioning and assessing differences between various environmental samples. There is a need for an easy to use tool to explore the often complex metagenomics data in taxonomic and functional context. Results Here we introduce FANTOM, a tool that allows for exploratory and comparative analysis of metagenomics abundance data integrated with metadata information and biological databases. Importantly, FANTOM can make use of any hierarchical database and it comes supplied with NCBI taxonomic hierarchies as well as KEGG Orthology, COG, PFAM and TIGRFAM databases. Conclusions The software is implemented in Python, is platform independent, and is available at http://www.sysbio.se/Fantom. PMID:23375020

  9. A catalog of the mouse gut metagenome.

    PubMed

    Xiao, Liang; Feng, Qiang; Liang, Suisha; Sonne, Si Brask; Xia, Zhongkui; Qiu, Xinmin; Li, Xiaoping; Long, Hua; Zhang, Jianfeng; Zhang, Dongya; Liu, Chuan; Fang, Zhiwei; Chou, Joyce; Glanville, Jacob; Hao, Qin; Kotowska, Dorota; Colding, Camilla; Licht, Tine Rask; Wu, Donghai; Yu, Jun; Sung, Joseph Jao Yiu; Liang, Qiaoyi; Li, Junhua; Jia, Huijue; Lan, Zhou; Tremaroli, Valentina; Dworzynski, Piotr; Nielsen, H Bjørn; Bäckhed, Fredrik; Doré, Joël; Le Chatelier, Emmanuelle; Ehrlich, S Dusko; Lin, John C; Arumugam, Manimozhiyan; Wang, Jun; Madsen, Lise; Kristiansen, Karsten

    2015-10-01

    We established a catalog of the mouse gut metagenome comprising ∼2.6 million nonredundant genes by sequencing DNA from fecal samples of 184 mice. To secure high microbiome diversity, we used mouse strains of diverse genetic backgrounds, from different providers, kept in different housing laboratories and fed either a low-fat or high-fat diet. Similar to the human gut microbiome, >99% of the cataloged genes are bacterial. We identified 541 metagenomic species and defined a core set of 26 metagenomic species found in 95% of the mice. The mouse gut microbiome is functionally similar to its human counterpart, with 95.2% of its Kyoto Encyclopedia of Genes and Genomes (KEGG) orthologous groups in common. However, only 4.0% of the mouse gut microbial genes were shared (95% identity, 90% coverage) with those of the human gut microbiome. This catalog provides a useful reference for future studies.

  10. Ultrafast clustering algorithms for metagenomic sequence analysis

    PubMed Central

    Fu, Limin; Niu, Beifang; Wu, Sitao; Wooley, John

    2012-01-01

    The rapid advances of high-throughput sequencing technologies dramatically prompted metagenomic studies of microbial communities that exist at various environments. Fundamental questions in metagenomics include the identities, composition and dynamics of microbial populations and their functions and interactions. However, the massive quantity and the comprehensive complexity of these sequence data pose tremendous challenges in data analysis. These challenges include but are not limited to ever-increasing computational demand, biased sequence sampling, sequence errors, sequence artifacts and novel sequences. Sequence clustering methods can directly answer many of the fundamental questions by grouping similar sequences into families. In addition, clustering analysis also addresses the challenges in metagenomics. Thus, a large redundant data set can be represented with a small non-redundant set, where each cluster can be represented by a single entry or a consensus. Artifacts can be rapidly detected through clustering. Errors can be identified, filtered or corrected by using consensus from sequences within clusters. PMID:22772836

  11. Pathway-Based Functional Analysis of Metagenomes

    NASA Astrophysics Data System (ADS)

    Bercovici, Sivan; Sharon, Itai; Pinter, Ron Y.; Shlomi, Tomer

    Metagenomic data enables the study of microbes and viruses through their DNA as retrieved directly from the environment in which they live. Functional analysis of metagenomes explores the abundance of gene families, pathways, and systems, rather than their taxonomy. Through such analysis researchers are able to identify those functional capabilities most important to organisms in the examined environment. Recently, a statistical framework for the functional analysis of metagenomes was described that focuses on gene families. Here we describe two pathway level computational models for functional analysis that take into account important, yet unaddressed issues such as pathway size, gene length and overlap in gene content among pathways. We test our models over carefully designed simulated data and propose novel approaches for performance evaluation. Our models significantly improve over current approach with respect to pathway ranking and the computations of relative abundance of pathways in environments.

  12. Viral metagenomics: are we missing the giants?

    PubMed

    Halary, S; Temmam, S; Raoult, D; Desnues, C

    2016-06-01

    Amoeba-infecting giant viruses are recently discovered viruses that have been isolated from diverse environments all around the world. In parallel to isolation efforts, metagenomics confirmed their worldwide distribution from a broad range of environmental and host-associated samples, including humans, depicting them as a major component of eukaryotic viruses in nature and a possible resident of the human/animal virome whose role is still unclear. Nevertheless, metagenomics data about amoeba-infecting giant viruses still remain scarce, mainly because of methodological limitations. Efforts should be pursued both at the metagenomic sample preparation level and on in silico analyses to better understand their roles in the environment and in human/animal health and disease.

  13. Challenges and opportunities of airborne metagenomics.

    PubMed

    Behzad, Hayedeh; Gojobori, Takashi; Mineta, Katsuhiko

    2015-05-06

    Recent metagenomic studies of environments, such as marine and soil, have significantly enhanced our understanding of the diverse microbial communities living in these habitats and their essential roles in sustaining vast ecosystems. The increase in the number of publications related to soil and marine metagenomics is in sharp contrast to those of air, yet airborne microbes are thought to have significant impacts on many aspects of our lives from their potential roles in atmospheric events such as cloud formation, precipitation, and atmospheric chemistry to their major impact on human health. In this review, we will discuss the current progress in airborne metagenomics, with a special focus on exploring the challenges and opportunities of undertaking such studies. The main challenges of conducting metagenomic studies of airborne microbes are as follows: 1) Low density of microorganisms in the air, 2) efficient retrieval of microorganisms from the air, 3) variability in airborne microbial community composition, 4) the lack of standardized protocols and methodologies, and 5) DNA sequencing and bioinformatics-related challenges. Overcoming these challenges could provide the groundwork for comprehensive analysis of airborne microbes and their potential impact on the atmosphere, global climate, and our health. Metagenomic studies offer a unique opportunity to examine viral and bacterial diversity in the air and monitor their spread locally or across the globe, including threats from pathogenic microorganisms. Airborne metagenomic studies could also lead to discoveries of novel genes and metabolic pathways relevant to meteorological and industrial applications, environmental bioremediation, and biogeochemical cycles. © The Author(s) 2015. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  14. Metagenomics as a Tool for Enzyme Discovery: Hydrolytic Enzymes from Marine-Related Metagenomes.

    PubMed

    Popovic, Ana; Tchigvintsev, Anatoly; Tran, Hai; Chernikova, Tatyana N; Golyshina, Olga V; Yakimov, Michail M; Golyshin, Peter N; Yakunin, Alexander F

    2015-01-01

    This chapter discusses metagenomics and its application for enzyme discovery, with a focus on hydrolytic enzymes from marine metagenomic libraries. With less than one percent of culturable microorganisms in the environment, metagenomics, or the collective study of community genetics, has opened up a rich pool of uncharacterized metabolic pathways, enzymes, and adaptations. This great untapped pool of genes provides the particularly exciting potential to mine for new biochemical activities or novel enzymes with activities tailored to peculiar sets of environmental conditions. Metagenomes also represent a huge reservoir of novel enzymes for applications in biocatalysis, biofuels, and bioremediation. Here we present the results of enzyme discovery for four enzyme activities, of particular industrial or environmental interest, including esterase/lipase, glycosyl hydrolase, protease and dehalogenase.

  15. Metagenomic exploration of viruses throughout the Indian Ocean.

    PubMed

    Williamson, Shannon J; Allen, Lisa Zeigler; Lorenzi, Hernan A; Fadrosh, Douglas W; Brami, Daniel; Thiagarajan, Mathangi; McCrow, John P; Tovchigrechko, Andrey; Yooseph, Shibu; Venter, J Craig

    2012-01-01

    The characterization of global marine microbial taxonomic and functional diversity is a primary goal of the Global Ocean Sampling Expedition. As part of this study, 19 water samples were collected aboard the Sorcerer II sailing vessel from the southern Indian Ocean in an effort to more thoroughly understand the lifestyle strategies of the microbial inhabitants of this ultra-oligotrophic region. No investigations of whole virioplankton assemblages have been conducted on waters collected from the Indian Ocean or across multiple size fractions thus far. Therefore, the goals of this study were to examine the effect of size fractionation on viral consortia structure and function and understand the diversity and functional potential of the Indian Ocean virome. Five samples were selected for comprehensive metagenomic exploration; and sequencing was performed on the microbes captured on 3.0-, 0.8- and 0.1 µm membrane filters as well as the viral fraction (<0.1 µm). Phylogenetic approaches were also used to identify predicted proteins of viral origin in the larger fractions of data from all Indian Ocean samples, which were included in subsequent metagenomic analyses. Taxonomic profiling of viral sequences suggested that size fractionation of marine microbial communities enriches for specific groups of viruses within the different size classes and functional characterization further substantiated this observation. Functional analyses also revealed a relative enrichment for metabolic proteins of viral origin that potentially reflect the physiological condition of host cells in the Indian Ocean including those involved in nitrogen metabolism and oxidative phosphorylation. A novel classification method, MGTAXA, was used to assess virus-host relationships in the Indian Ocean by predicting the taxonomy of putative host genera, with Prochlorococcus, Acanthochlois and members of the SAR86 cluster comprising the most abundant predictions. This is the first study to holistically

  16. A metagenomic study of primate insect diet diversity.

    PubMed

    Pickett, Sarah B; Bergey, Christina M; Di Fiore, Anthony

    2012-07-01

    Descriptions of primate diets are generally based on either direct observation of foraging behavior, morphological classification of food remains from feces, or analysis of the stomach contents of deceased individuals. Some diet items (e.g. insect prey), however, are difficult to identify visually, and observation conditions often do not permit adequate quantitative sampling of feeding behavior. Moreover, the taxonomically informative morphology of some food species (e.g. swallowed seeds, insect exoskeletons) may be destroyed by the digestive process. Because of these limitations, we used a metagenomic approach to conduct a preliminary, "proof of concept" study of interspecific variation in the insect component of the diets of six sympatric New World monkeys known, based on observational field studies, to differ markedly in their feeding ecology. We used generalized arthropod polymerase chain reaction (PCR) primers and cloning to sequence mitochondrial DNA (mtDNA) sequences of the arthropod cytochrome b (CYT B) gene from fecal samples of wild woolly, titi, saki, capuchin, squirrel, and spider monkeys collected from a single sampling site in western Amazonia where these genera occur sympatrically. We then assigned preliminary taxonomic identifications to the sequences by basic local alignment search tool (BLAST) comparison to arthropod CYT B sequences present in GenBank. This study is the first to use molecular techniques to identify insect prey in primate diets. The results suggest that a metagenomic approach may prove valuable in augmenting and corroborating observational data and increasing the resolution of primate diet studies, although the lack of comparative reference sequences for many South American insects limits the approach at present. As such reference data become available for more animal and plant taxa, this approach also holds promise for studying additional components of primate diets. © 2012 Wiley Periodicals, Inc.

  17. Metagenomic Exploration of Viruses throughout the Indian Ocean

    PubMed Central

    Lorenzi, Hernan A.; Fadrosh, Douglas W.; Brami, Daniel; Thiagarajan, Mathangi; McCrow, John P.; Tovchigrechko, Andrey; Yooseph, Shibu; Venter, J. Craig

    2012-01-01

    The characterization of global marine microbial taxonomic and functional diversity is a primary goal of the Global Ocean Sampling Expedition. As part of this study, 19 water samples were collected aboard the Sorcerer II sailing vessel from the southern Indian Ocean in an effort to more thoroughly understand the lifestyle strategies of the microbial inhabitants of this ultra-oligotrophic region. No investigations of whole virioplankton assemblages have been conducted on waters collected from the Indian Ocean or across multiple size fractions thus far. Therefore, the goals of this study were to examine the effect of size fractionation on viral consortia structure and function and understand the diversity and functional potential of the Indian Ocean virome. Five samples were selected for comprehensive metagenomic exploration; and sequencing was performed on the microbes captured on 3.0-, 0.8- and 0.1 µm membrane filters as well as the viral fraction (<0.1 µm). Phylogenetic approaches were also used to identify predicted proteins of viral origin in the larger fractions of data from all Indian Ocean samples, which were included in subsequent metagenomic analyses. Taxonomic profiling of viral sequences suggested that size fractionation of marine microbial communities enriches for specific groups of viruses within the different size classes and functional characterization further substantiated this observation. Functional analyses also revealed a relative enrichment for metabolic proteins of viral origin that potentially reflect the physiological condition of host cells in the Indian Ocean including those involved in nitrogen metabolism and oxidative phosphorylation. A novel classification method, MGTAXA, was used to assess virus-host relationships in the Indian Ocean by predicting the taxonomy of putative host genera, with Prochlorococcus, Acanthochlois and members of the SAR86 cluster comprising the most abundant predictions. This is the first study to holistically

  18. Applications of metagenomics for industrial bioproducts

    USDA-ARS?s Scientific Manuscript database

    Recent progress in mining the rich genetic resource of non-culturable microbes has led to the discovery of new genes, enzymes, and natural products. The impact of metagenomics is witnessed in the development of commodity and fine chemicals, agrochemicals and pharmaceuticals where the benefit of enz...

  19. Biomolecular and metagenomic analyses of biofouling communities

    USDA-ARS?s Scientific Manuscript database

    Despite the decades of research that have focused on understanding the formation of biofouling communities, relatively little is known about the soft fouling consortia that are responsible for their formation and function. In this study, we used PhyloChip microbial profiling, metagenomic DNA sequenc...

  20. Building on basic metagenomics with complementary technologies

    PubMed Central

    Warnecke, Falk; Hugenholtz, Philip

    2007-01-01

    Metagenomics, the application of random shotgun sequencing to environmental samples, is a powerful approach for characterizing microbial communities. However, this method only represents the cornerstone of what can be achieved using a range of complementary technologies such as transcriptomics, proteomics, cell sorting and microfluidics. Together, these approaches hold great promise for the study of microbial ecology and evolution. PMID:18177506

  1. Towards a more complete metagenomics toolkit

    USDA-ARS?s Scientific Manuscript database

    The emerging scientific discipline of metagenomics has not only created a myriad of opportunities for biologists to reveal new insights into the microbial underpinnings of our environment, but has also presented a number of interesting challenges for bioinformatics algorithms and software developers...

  2. The metagenomic approach and causality in virology

    PubMed Central

    Castrignano, Silvana Beres; Nagasse-Sugahara, Teresa Keico

    2015-01-01

    Nowadays, the metagenomic approach has been a very important tool in the discovery of new viruses in environmental and biological samples. Here we discuss how these discoveries may help to elucidate the etiology of diseases and the criteria necessary to establish a causal association between a virus and a disease. PMID:25902566

  3. Clustering metagenomic sequences with interpolated Markov models

    PubMed Central

    2010-01-01

    Background Sequencing of environmental DNA (often called metagenomics) has shown tremendous potential to uncover the vast number of unknown microbes that cannot be cultured and sequenced by traditional methods. Because the output from metagenomic sequencing is a large set of reads of unknown origin, clustering reads together that were sequenced from the same species is a crucial analysis step. Many effective approaches to this task rely on sequenced genomes in public databases, but these genomes are a highly biased sample that is not necessarily representative of environments interesting to many metagenomics projects. Results We present SCIMM (Sequence Clustering with Interpolated Markov Models), an unsupervised sequence clustering method. SCIMM achieves greater clustering accuracy than previous unsupervised approaches. We examine the limitations of unsupervised learning on complex datasets, and suggest a hybrid of SCIMM and supervised learning method Phymm called PHYSCIMM that performs better when evolutionarily close training genomes are available. Conclusions SCIMM and PHYSCIMM are highly accurate methods to cluster metagenomic sequences. SCIMM operates entirely unsupervised, making it ideal for environments containing mostly novel microbes. PHYSCIMM uses supervised learning to improve clustering in environments containing microbial strains from well-characterized genera. SCIMM and PHYSCIMM are available open source from http://www.cbcb.umd.edu/software/scimm. PMID:21044341

  4. The Metagenome of Utricularia gibba's Traps: Into the Microbial Input to a Carnivorous Plant

    PubMed Central

    Alcaraz, Luis David; Martínez-Sánchez, Shamayim; Torres, Ignacio; Ibarra-Laclette, Enrique; Herrera-Estrella, Luis

    2016-01-01

    The genome and transcriptome sequences of the aquatic, rootless, and carnivorous plant Utricularia gibba L. (Lentibulariaceae), were recently determined. Traps are necessary for U. gibba because they help the plant to survive in nutrient-deprived environments. The U. gibba's traps (Ugt) are specialized structures that have been proposed to selectively filter microbial inhabitants. To determine whether the traps indeed have a microbiome that differs, in composition or abundance, from the microbiome in the surrounding environment, we used whole-genome shotgun (WGS) metagenomics to describe both the taxonomic and functional diversity of the Ugt microbiome. We collected U. gibba plants from their natural habitat and directly sequenced the metagenome of the Ugt microbiome and its surrounding water. The total predicted number of species in the Ugt was more than 1,100. Using pan-genome fragment recruitment analysis, we were able to identify to the species level of some key Ugt players, such as Pseudomonas monteilii. Functional analysis of the Ugt metagenome suggests that the trap microbiome plays an important role in nutrient scavenging and assimilation while complementing the hydrolytic functions of the plant. PMID:26859489

  5. Construction and validation of metagenomic DNA libraries from landfarm soil microorganisms.

    PubMed

    Pessoa, T B A; de Souza, S S; Cerqueira, A F; Rezende, R P; Pirovani, C P; Dias, J C T

    2013-06-28

    Landfarming biodegradation is a strategy used by the petrochemical industry to reduce pollutants in petroleum-contaminated soil. We constructed 2 metagenomic libraries from landfarming soil in order to determine the pathway used for mineralization of benzene and to examine protein expression of the bacteria in these soils. The DNA of landfarm soil, collected from Ilhéus, BA, Brazil, was extracted and a metagenomic library was constructed with the Copy Control(TM) Fosmid Library Production Kit, which clones 25-45-kb DNA fragments. The clones were selected for their ability to express enzymes capable of cleaving aromatic compounds. These clones were grown in Luria-Bertani broth plus L-arabinose, benzene, and chloramphenicol as induction substances; they were tested for activity in the catechol cleavage pathway, an intermediate step in benzene degradation. Nine clones were positive for ortho-cleavage and one was positive for meta-cleavage. Protein band patterns determined by SDS-polyacrylamide gel electrophoresis differed in bacteria grown on induced versus non-induced media (Luria-Bertani broth). We concluded that the DNA of landfarm soil is an important source of genes involved in mineralization of xenobiotic compounds, which are common in gasoline and oil spills. Metagenomic library allows identification of non-culturable microorganisms that have potential in the bioremediation of contaminated sites.

  6. Novel resistance functions uncovered using functional metagenomic investigations of resistance reservoirs

    PubMed Central

    Pehrsson, Erica C.; Forsberg, Kevin J.; Gibson, Molly K.; Ahmadi, Sara; Dantas, Gautam

    2013-01-01

    Rates of infection with antibiotic-resistant bacteria have increased precipitously over the past several decades, with far-reaching healthcare and societal costs. Recent evidence has established a link between antibiotic resistance genes in human pathogens and those found in non-pathogenic, commensal, and environmental organisms, prompting deeper investigation of natural and human-associated reservoirs of antibiotic resistance. Functional metagenomic selections, in which shotgun-cloned DNA fragments are selected for their ability to confer survival to an indicator host, have been increasingly applied to the characterization of many antibiotic resistance reservoirs. These experiments have demonstrated that antibiotic resistance genes are highly diverse and widely distributed, many times bearing little to no similarity to known sequences. Through unbiased selections for survival to antibiotic exposure, functional metagenomics can improve annotations by reducing the discovery of false-positive resistance and by allowing for the identification of previously unrecognizable resistance genes. In this review, we summarize the novel resistance functions uncovered using functional metagenomic investigations of natural and human-impacted resistance reservoirs. Examples of novel antibiotic resistance genes include those highly divergent from known sequences, those for which sequence is entirely unable to predict resistance function, bifunctional resistance genes, and those with unconventional, atypical resistance mechanisms. Overcoming antibiotic resistance in the clinic will require a better understanding of existing resistance reservoirs and the dissemination networks that govern horizontal gene exchange, informing best practices to limit the spread of resistance-conferring genes to human pathogens. PMID:23760651

  7. Multisubstrate Isotope Labeling and Metagenomic Analysis of Active Soil Bacterial Communities

    PubMed Central

    Verastegui, Y.; Cheng, J.; Engel, K.; Kolczynski, D.; Mortimer, S.; Lavigne, J.; Montalibet, J.; Romantsov, T.; Hall, M.; McConkey, B. J.; Rose, D. R.; Tomashek, J. J.; Scott, B. R.

    2014-01-01

    ABSTRACT Soil microbial diversity represents the largest global reservoir of novel microorganisms and enzymes. In this study, we coupled functional metagenomics and DNA stable-isotope probing (DNA-SIP) using multiple plant-derived carbon substrates and diverse soils to characterize active soil bacterial communities and their glycoside hydrolase genes, which have value for industrial applications. We incubated samples from three disparate Canadian soils (tundra, temperate rainforest, and agricultural) with five native carbon (12C) or stable-isotope-labeled (13C) carbohydrates (glucose, cellobiose, xylose, arabinose, and cellulose). Indicator species analysis revealed high specificity and fidelity for many uncultured and unclassified bacterial taxa in the heavy DNA for all soils and substrates. Among characterized taxa, Actinomycetales (Salinibacterium), Rhizobiales (Devosia), Rhodospirillales (Telmatospirillum), and Caulobacterales (Phenylobacterium and Asticcacaulis) were bacterial indicator species for the heavy substrates and soils tested. Both Actinomycetales and Caulobacterales (Phenylobacterium) were associated with metabolism of cellulose, and Alphaproteobacteria were associated with the metabolism of arabinose; members of the order Rhizobiales were strongly associated with the metabolism of xylose. Annotated metagenomic data suggested diverse glycoside hydrolase gene representation within the pooled heavy DNA. By screening 2,876 cloned fragments derived from the 13C-labeled DNA isolated from soils incubated with cellulose, we demonstrate the power of combining DNA-SIP, multiple-displacement amplification (MDA), and functional metagenomics by efficiently isolating multiple clones with activity on carboxymethyl cellulose and fluorogenic proxy substrates for carbohydrate-active enzymes. PMID:25028422

  8. The Metagenome of Utricularia gibba's Traps: Into the Microbial Input to a Carnivorous Plant.

    PubMed

    Alcaraz, Luis David; Martínez-Sánchez, Shamayim; Torres, Ignacio; Ibarra-Laclette, Enrique; Herrera-Estrella, Luis

    2016-01-01

    The genome and transcriptome sequences of the aquatic, rootless, and carnivorous plant Utricularia gibba L. (Lentibulariaceae), were recently determined. Traps are necessary for U. gibba because they help the plant to survive in nutrient-deprived environments. The U. gibba's traps (Ugt) are specialized structures that have been proposed to selectively filter microbial inhabitants. To determine whether the traps indeed have a microbiome that differs, in composition or abundance, from the microbiome in the surrounding environment, we used whole-genome shotgun (WGS) metagenomics to describe both the taxonomic and functional diversity of the Ugt microbiome. We collected U. gibba plants from their natural habitat and directly sequenced the metagenome of the Ugt microbiome and its surrounding water. The total predicted number of species in the Ugt was more than 1,100. Using pan-genome fragment recruitment analysis, we were able to identify to the species level of some key Ugt players, such as Pseudomonas monteilii. Functional analysis of the Ugt metagenome suggests that the trap microbiome plays an important role in nutrient scavenging and assimilation while complementing the hydrolytic functions of the plant.

  9. Physiological and evolutionary potential of microorganisms from the Canterbury Basin subseafloor, a metagenomic approach.

    PubMed

    Gaboyer, Frédéric; Burgaud, Gaëtan; Alain, Karine

    2015-05-01

    Subseafloor sediments represent a large reservoir of organic matter and are inhabited by microbial groups of the three domains of life. Besides impacting the planetary geochemical cycles, the subsurface biosphere remains poorly understood, notably questions related to possible metabolic pathways and selective advantages that may be deployed by buried microorganisms (sporulation, response to stress, dormancy). In order to better understand physiological potentials and possible lifestyles of subseafloor microbial communities, we analyzed two metagenomes from subseafloor sediments collected at 31 mbsf (meters below the sea floor) and 136 mbsf in the Canterbury Basin. Metagenomic phylogenetic and functional diversities were very similar. Phylogenetic diversity was mostly represented by Chloroflexi, Firmicutes and Proteobacteria for Bacteria and by Thaumarchaeota and Euryarchaeota for Archaea. Predicted anaerobic metabolisms encompassed fermentation, methanogenesis and utilization of fatty acids, aromatic and halogenated substrates. Potential processes that may confer selective advantages for subsurface microorganisms included sporulation, detoxication equipment or osmolyte accumulation. Annotation of genomic fragments described the metabolic versatility of Chloroflexi, Miscellaneous Crenarchaeotic Group and Euryarchaeota and showed frequent recombination events within subsurface taxa. This study confirmed that the subseafloor habitat is unique compared to other habitats at the (meta)-genomic level and described physiological potential of still uncultured groups.

  10. Unravelling core microbial metabolisms in the hypersaline microbial mats of Shark Bay using high-throughput metagenomics

    PubMed Central

    Ruvindy, Rendy; White III, Richard Allen; Neilan, Brett Anthony; Burns, Brendan Paul

    2016-01-01

    Modern microbial mats are potential analogues of some of Earth's earliest ecosystems. Excellent examples can be found in Shark Bay, Australia, with mats of various morphologies. To further our understanding of the functional genetic potential of these complex microbial ecosystems, we conducted for the first time shotgun metagenomic analyses. We assembled metagenomic next-generation sequencing data to classify the taxonomic and metabolic potential across diverse morphologies of marine mats in Shark Bay. The microbial community across taxonomic classifications using protein-coding and small subunit rRNA genes directly extracted from the metagenomes suggests that three phyla Proteobacteria, Cyanobacteria and Bacteriodetes dominate all marine mats. However, the microbial community structure between Shark Bay and Highbourne Cay (Bahamas) marine systems appears to be distinct from each other. The metabolic potential (based on SEED subsystem classifications) of the Shark Bay and Highbourne Cay microbial communities were also distinct. Shark Bay metagenomes have a metabolic pathway profile consisting of both heterotrophic and photosynthetic pathways, whereas Highbourne Cay appears to be dominated almost exclusively by photosynthetic pathways. Alternative non-rubisco-based carbon metabolism including reductive TCA cycle and 3-hydroxypropionate/4-hydroxybutyrate pathways is highly represented in Shark Bay metagenomes while not represented in Highbourne Cay microbial mats or any other mat forming ecosystems investigated to date. Potentially novel aspects of nitrogen cycling were also observed, as well as putative heavy metal cycling (arsenic, mercury, copper and cadmium). Finally, archaea are highly represented in Shark Bay and may have critical roles in overall ecosystem function in these modern microbial mats. PMID:26023869

  11. Unravelling core microbial metabolisms in the hypersaline microbial mats of Shark Bay using high-throughput metagenomics

    SciTech Connect

    Ruvindy, Rendy; White III, Richard Allen; Neilan, Brett Anthony; Burns, Brendan Paul

    2015-05-29

    Modern microbial mats are potential analogues of some of Earth’s earliest ecosystems. Excellent examples can be found in Shark Bay, Australia, with mats of various morphologies. To further our understanding of the functional genetic potential of these complex microbial ecosystems, we conducted for the first time shotgun metagenomic analyses. We assembled metagenomic nextgeneration sequencing data to classify the taxonomic and metabolic potential across diverse morphologies of marine mats in Shark Bay. The microbial community across taxonomic classifications using protein-coding and small subunit rRNA genes directly extracted from the metagenomes suggests that three phyla Proteobacteria, Cyanobacteria and Bacteriodetes dominate all marine mats. However, the microbial community structure between Shark Bay and Highbourne Cay (Bahamas) marine systems appears to be distinct from each other. The metabolic potential (based on SEED subsystem classifications) of the Shark Bay and Highbourne Cay microbial communities were also distinct. Shark Bay metagenomes have a metabolic pathway profile consisting of both heterotrophic and photosynthetic pathways, whereas Highbourne Cay appears to be dominated almost exclusively by photosynthetic pathways. Alternative non-rubisco-based carbon metabolism including reductive TCA cycle and 3-hydroxypropionate/4-hydroxybutyrate pathways is highly represented in Shark Bay metagenomes while not represented in Highbourne Cay microbial mats or any other mat forming ecosystems investigated to date. Potentially novel aspects of nitrogen cycling were also observed, as well as putative heavy metal cycling (arsenic, mercury, copper and cadmium). Finally, archaea are highly represented in Shark Bay and may have critical roles in overall ecosystem function in these modern microbial mats.

  12. Thousands of Viral Populations Recovered from Peatland Soil Metagenomes Reveal Viral Impacts on Carbon Cycling in Thawing Permafrost

    NASA Astrophysics Data System (ADS)

    Emerson, J. B.; Brum, J. R.; Roux, S.; Bolduc, B.; Woodcroft, B. J.; Singleton, C. M.; Boyd, J. A.; Hodgkins, S. B.; Wilson, R.; Trubl, G. G.; Jang, H. B.; Crill, P. M.; Chanton, J.; Saleska, S. R.; Rich, V. I.; Tyson, G. W.; Sullivan, M. B.

    2016-12-01

    Methane and carbon dioxide emissions, which are under significant microbial control, provide positive feedbacks to climate change in thawing permafrost peatlands. Although viruses in marine systems have been shown to impact microbial ecology and biogeochemical cycling through host cell lysis, horizontal gene transfer, and auxiliary metabolic gene expression, viral ecology in permafrost and other soils remains virtually unstudied due to methodological challenges. Here, we identified viral sequences in 208 assembled bulk soil metagenomes derived from a permafrost thaw gradient in Stordalen Mire, northern Sweden, from 2010-2012. 2,048 viral populations were recovered, which genome- and network-based classification revealed to be largely novel, increasing known viral genera globally by 40%. Ecologically, viral communities differed significantly across the thaw gradient and by soil depth. Co-occurring microbial community composition, soil moisture, and pH were predictors of viral community composition, indicative of biological and biogeochemical feedbacks as permafrost thaws. Host prediction—achieved through clustered regularly interspaced short palindromic repeats (CRISPRs), tetranucleotide frequency patterns, and other sequence similarities to binned microbial population genomes—was able to link 38% of the viral populations to a microbial host. 5% of the implicated hosts were archaea, predominantly methanogens and ammonia-oxidizing Nitrososphaera, 45% were Acidobacteria or Verrucomicrobia (mostly predicted heterotrophic complex carbon degraders), and 21% were Proteobacteria, including methane oxidizers. Recovered viral genome fragments also contained auxiliary metabolic genes involved in carbon and nitrogen cycling. Together, these data reveal multiple levels of previously unknown viral contributions to biogeochemical cycling, including to carbon gas emissions, in peatland soils undergoing and contributing to climate change. This work represents a significant step

  13. Accurate read-based metagenome characterization using a hierarchical suite of unique signatures

    PubMed Central

    Freitas, Tracey Allen K.; Li, Po-E; Scholz, Matthew B.; Chain, Patrick S. G.

    2015-01-01

    A major challenge in the field of shotgun metagenomics is the accurate identification of organisms present within a microbial community, based on classification of short sequence reads. Though existing microbial community profiling methods have attempted to rapidly classify the millions of reads output from modern sequencers, the combination of incomplete databases, similarity among otherwise divergent genomes, errors and biases in sequencing technologies, and the large volumes of sequencing data required for metagenome sequencing has led to unacceptably high false discovery rates (FDR). Here, we present the application of a novel, gene-independent and signature-based metagenomic taxonomic profiling method with significantly and consistently smaller FDR than any other available method. Our algorithm circumvents false positives using a series of non-redundant signature databases and examines Genomic Origins Through Taxonomic CHAllenge (GOTTCHA). GOTTCHA was tested and validated on 20 synthetic and mock datasets ranging in community composition and complexity, was applied successfully to data generated from spiked environmental and clinical samples, and robustly demonstrates superior performance compared with other available tools. PMID:25765641

  14. Microbial Diversity and Biochemical Potential Encoded by Thermal Spring Metagenomes Derived from the Kamchatka Peninsula

    PubMed Central

    Wemheuer, Bernd; Taube, Robert; Akyol, Pinar; Wemheuer, Franziska; Daniel, Rolf

    2013-01-01

    Volcanic regions contain a variety of environments suitable for extremophiles. This study was focused on assessing and exploiting the prokaryotic diversity of two microbial communities derived from different Kamchatkian thermal springs by metagenomic approaches. Samples were taken from a thermoacidophilic spring near the Mutnovsky Volcano and from a thermophilic spring in the Uzon Caldera. Environmental DNA for metagenomic analysis was isolated from collected sediment samples by direct cell lysis. The prokaryotic community composition was examined by analysis of archaeal and bacterial 16S rRNA genes. A total number of 1235 16S rRNA gene sequences were obtained and used for taxonomic classification. Most abundant in the samples were members of Thaumarchaeota, Thermotogae, and Proteobacteria. The Mutnovsky hot spring was dominated by the Terrestrial Hot Spring Group, Kosmotoga, and Acidithiobacillus. The Uzon Caldera was dominated by uncultured members of the Miscellaneous Crenarchaeotic Group and Enterobacteriaceae. The remaining 16S rRNA gene sequences belonged to the Aquificae, Dictyoglomi, Euryarchaeota, Korarchaeota, Thermodesulfobacteria, Firmicutes, and some potential new phyla. In addition, the recovered DNA was used for generation of metagenomic libraries, which were subsequently mined for genes encoding lipolytic and proteolytic enzymes. Three novel genes conferring lipolytic and one gene conferring proteolytic activity were identified. PMID:23533327

  15. Arsenic metabolism in high altitude modern stromatolites revealed by metagenomic analysis.

    PubMed

    Kurth, Daniel; Amadio, Ariel; Ordoñez, Omar F; Albarracín, Virginia H; Gärtner, Wolfgang; Farías, María E

    2017-04-21

    Modern stromatolites thrive only in selected locations in the world. Socompa Lake, located in the Andean plateau at 3570 masl, is one of the numerous extreme Andean microbial ecosystems described over recent years. Extreme environmental conditions include hypersalinity, high UV incidence, and high arsenic content, among others. After Socompa's stromatolite microbial communities were analysed by metagenomic DNA sequencing, taxonomic classification showed dominance of Proteobacteria, Bacteroidetes and Firmicutes, and a remarkably high number of unclassified sequences. A functional analysis indicated that carbon fixation might occur not only by the Calvin-Benson cycle, but also through alternative pathways such as the reverse TCA cycle, and the reductive acetyl-CoA pathway. Deltaproteobacteria were involved both in sulfate reduction and nitrogen fixation. Significant differences were found when comparing the Socompa stromatolite metagenome to the Shark Bay (Australia) smooth mat metagenome: namely, those involving stress related processes, particularly, arsenic resistance. An in-depth analysis revealed a surprisingly diverse metabolism comprising all known types of As resistance and energy generating pathways. While the ars operon was the main mechanism, an important abundance of arsM genes was observed in selected phyla. The data resulting from this work will prove a cornerstone for further studies on this rare microbial community.

  16. Comparative metagenomics of microbial communities inhabiting deep-sea hydrothermal vent chimneys with contrasting chemistries

    PubMed Central

    Xie, Wei; Wang, Fengping; Guo, Lei; Chen, Zeling; Sievert, Stefan M; Meng, Jun; Huang, Guangrui; Li, Yuxin; Yan, Qingyu; Wu, Shan; Wang, Xin; Chen, Shangwu; He, Guangyuan; Xiao, Xiang; Xu, Anlong

    2011-01-01

    Deep-sea hydrothermal vent chimneys harbor a high diversity of largely unknown microorganisms. Although the phylogenetic diversity of these microorganisms has been described previously, the adaptation and metabolic potential of the microbial communities is only beginning to be revealed. A pyrosequencing approach was used to directly obtain sequences from a fosmid library constructed from a black smoker chimney 4143-1 in the Mothra hydrothermal vent field at the Juan de Fuca Ridge. A total of 308 034 reads with an average sequence length of 227 bp were generated. Comparative genomic analyses of metagenomes from a variety of environments by two-way clustering of samples and functional gene categories demonstrated that the 4143-1 metagenome clustered most closely with that from a carbonate chimney from Lost City. Both are highly enriched in genes for mismatch repair and homologous recombination, suggesting that the microbial communities have evolved extensive DNA repair systems to cope with the extreme conditions that have potential deleterious effects on the genomes. As previously reported for the Lost City microbiome, the metagenome of chimney 4143-1 exhibited a high proportion of transposases, implying that horizontal gene transfer may be a common occurrence in the deep-sea vent chimney biosphere. In addition, genes for chemotaxis and flagellar assembly were highly enriched in the chimney metagenomes, reflecting the adaptation of the organisms to the highly dynamic conditions present within the chimney walls. Reconstruction of the metabolic pathways revealed that the microbial community in the wall of chimney 4143-1 was mainly fueled by sulfur oxidation, putatively coupled to nitrate reduction to perform inorganic carbon fixation through the Calvin–Benson–Bassham cycle. On the basis of the genomic organization of the key genes of the carbon fixation and sulfur oxidation pathways contained in the large genomic fragments, both obligate and facultative

  17. Comparative metagenomics of microbial communities inhabiting deep-sea hydrothermal vent chimneys with contrasting chemistries.

    PubMed

    Xie, Wei; Wang, Fengping; Guo, Lei; Chen, Zeling; Sievert, Stefan M; Meng, Jun; Huang, Guangrui; Li, Yuxin; Yan, Qingyu; Wu, Shan; Wang, Xin; Chen, Shangwu; He, Guangyuan; Xiao, Xiang; Xu, Anlong

    2011-03-01

    Deep-sea hydrothermal vent chimneys harbor a high diversity of largely unknown microorganisms. Although the phylogenetic diversity of these microorganisms has been described previously, the adaptation and metabolic potential of the microbial communities is only beginning to be revealed. A pyrosequencing approach was used to directly obtain sequences from a fosmid library constructed from a black smoker chimney 4143-1 in the Mothra hydrothermal vent field at the Juan de Fuca Ridge. A total of 308,034 reads with an average sequence length of 227 bp were generated. Comparative genomic analyses of metagenomes from a variety of environments by two-way clustering of samples and functional gene categories demonstrated that the 4143-1 metagenome clustered most closely with that from a carbonate chimney from Lost City. Both are highly enriched in genes for mismatch repair and homologous recombination, suggesting that the microbial communities have evolved extensive DNA repair systems to cope with the extreme conditions that have potential deleterious effects on the genomes. As previously reported for the Lost City microbiome, the metagenome of chimney 4143-1 exhibited a high proportion of transposases, implying that horizontal gene transfer may be a common occurrence in the deep-sea vent chimney biosphere. In addition, genes for chemotaxis and flagellar assembly were highly enriched in the chimney metagenomes, reflecting the adaptation of the organisms to the highly dynamic conditions present within the chimney walls. Reconstruction of the metabolic pathways revealed that the microbial community in the wall of chimney 4143-1 was mainly fueled by sulfur oxidation, putatively coupled to nitrate reduction to perform inorganic carbon fixation through the Calvin-Benson-Bassham cycle. On the basis of the genomic organization of the key genes of the carbon fixation and sulfur oxidation pathways contained in the large genomic fragments, both obligate and facultative autotrophs

  18. Metagenomic studies of the Red Sea.

    PubMed

    Behzad, Hayedeh; Ibarra, Martin Augusto; Mineta, Katsuhiko; Gojobori, Takashi

    2016-02-01

    Metagenomics has significantly advanced the field of marine microbial ecology, revealing the vast diversity of previously unknown microbial life forms in different marine niches. The tremendous amount of data generated has enabled identification of a large number of microbial genes (metagenomes), their community interactions, adaptation mechanisms, and their potential applications in pharmaceutical and biotechnology-based industries. Comparative metagenomics reveals that microbial diversity is a function of the local environment, meaning that unique or unusual environments typically harbor novel microbial species with unique genes and metabolic pathways. The Red Sea has an abundance of unique characteristics; however, its microbiota is one of the least studied among marine environments. The Red Sea harbors approximately 25 hot anoxic brine pools, plus a vibrant coral reef ecosystem. Physiochemical studies describe the Red Sea as an oligotrophic environment that contains one of the warmest and saltiest waters in the world with year-round high UV radiations. These characteristics are believed to have shaped the evolution of microbial communities in the Red Sea. Over-representation of genes involved in DNA repair, high-intensity light responses, and osmoregulation were found in the Red Sea metagenomic databases suggesting acquisition of specific environmental adaptation by the Red Sea microbiota. The Red Sea brine pools harbor a diverse range of halophilic and thermophilic bacterial and archaeal communities, which are potential sources of enzymes for pharmaceutical and biotechnology-based application. Understanding the mechanisms of these adaptations and their function within the larger ecosystem could also prove useful in light of predicted global warming scenarios where global ocean temperatures are expected to rise by 1-3°C in the next few decades. In this review, we provide an overview of the published metagenomic studies that were conducted in the Red Sea, and

  19. Metagenomic Analysis Suggests Modern Freshwater Microbialites Harbor a Distinct Core Microbial Community

    PubMed Central

    White, Richard Allen; Chan, Amy M.; Gavelis, Gregory S.; Leander, Brian S.; Brady, Allyson L.; Slater, Gregory F.; Lim, Darlene S. S.; Suttle, Curtis A.

    2016-01-01

    Modern microbialites are complex microbial communities that interface with abiotic factors to form carbonate-rich organosedimentary structures whose ancestors provide the earliest evidence of life. Past studies primarily on marine microbialites have inventoried diverse taxa and metabolic pathways, but it is unclear which of these are members of the microbialite community and which are introduced from adjacent environments. Here we control for these factors by sampling the surrounding water and nearby sediment, in addition to the microbialites and use a metagenomics approach to interrogate the microbial community. Our findings suggest that the Pavilion Lake microbialite community profile, metabolic potential and pathway distributions are distinct from those in the neighboring sediments and water. Based on RefSeq classification, members of the Proteobacteria (e.g., alpha and delta classes) were the dominant taxa in the microbialites, and possessed novel functional guilds associated with the metabolism of heavy metals, antibiotic resistance, primary alcohol biosynthesis and urea metabolism; the latter may help drive biomineralization. Urea metabolism within Pavilion Lake microbialites is a feature not previously associated in other microbialites. The microbialite communities were also significantly enriched for cyanobacteria and acidobacteria, which likely play an important role in biomineralization. Additional findings suggest that Pavilion Lake microbialites are under viral selection as genes associated with viral infection (e.g CRISPR-Cas, phage shock and phage excision) are abundant within the microbialite metagenomes. The morphology of Pavilion Lake microbialites changes dramatically with depth; yet, metagenomic data did not vary significantly by morphology or depth, indicating that microbialite morphology is altered by other factors, perhaps transcriptional differences or abiotic conditions. This work provides a comprehensive metagenomic perspective of the

  20. Quality control of microbiota metagenomics by k-mer analysis.

    PubMed

    Plaza Onate, Florian; Batto, Jean-Michel; Juste, Catherine; Fadlallah, Jehane; Fougeroux, Cyrielle; Gouas, Doriane; Pons, Nicolas; Kennedy, Sean; Levenez, Florence; Dore, Joel; Ehrlich, S Dusko; Gorochov, Guy; Larsen, Martin

    2015-03-14

    The biological and clinical consequences of the tight interactions between host and microbiota are rapidly being unraveled by next generation sequencing technologies and sophisticated bioinformatics, also referred to as microbiota metagenomics. The recent success of metagenomics has created a demand to rapidly apply the technology to large case-control cohort studies and to studies of microbiota from various habitats, including habitats relatively poor in microbes. It is therefore of foremost importance to enable a robust and rapid quality assessment of metagenomic data from samples that challenge present technological limits (sample numbers and size). Here we demonstrate that the distribution of overlapping k-mers of metagenome sequence data predicts sequence quality as defined by gene distribution and efficiency of sequence mapping to a reference gene catalogue. We used serial dilutions of gut microbiota metagenomic datasets to generate well-defined high to low quality metagenomes. We also analyzed a collection of 52 microbiota-derived metagenomes. We demonstrate that k-mer distributions of metagenomic sequence data identify sequence contaminations, such as sequences derived from "empty" ligation products. Of note, k-mer distributions were also able to predict the frequency of sequences mapping to a reference gene catalogue not only for the well-defined serial dilution datasets, but also for 52 human gut microbiota derived metagenomic datasets. We propose that k-mer analysis of raw metagenome sequence reads should be implemented as a first quality assessment prior to more extensive bioinformatics analysis, such as sequence filtering and gene mapping. With the rising demand for metagenomic analysis of microbiota it is crucial to provide tools for rapid and efficient decision making. This will eventually lead to a faster turn-around time, improved analytical quality including sample quality metrics and a significant cost reduction. Finally, improved quality

  1. Diagnostic metagenomics: potential applications to bacterial, viral and parasitic infections.

    PubMed

    Pallen, M J

    2014-12-01

    The term 'shotgun metagenomics' is applied to the direct sequencing of DNA extracted from a sample without culture or target-specific amplification or capture. In diagnostic metagenomics, this approach is applied to clinical samples in the hope of detecting and characterizing pathogens. Here, I provide a conceptual overview, before reviewing several recent promising proof-of-principle applications of metagenomics in virus discovery, analysis of outbreaks and detection of pathogens in contemporary and historical samples. I also evaluate future prospects for diagnostic metagenomics in the light of relentless improvements in sequencing technologies.

  2. An Experimental Metagenome Data Management and AnalysisSystem

    SciTech Connect

    Markowitz, Victor M.; Korzeniewski, Frank; Palaniappan, Krishna; Szeto, Ernest; Ivanova, Natalia N.; Kyrpides, Nikos C.; Hugenholtz, Philip

    2006-03-01

    The application of shotgun sequencing to environmental samples has revealed a new universe of microbial community genomes (metagenomes) involving previously uncultured organisms. Metagenome analysis, which is expected to provide a comprehensive picture of the gene functions and metabolic capacity of microbial community, needs to be conducted in the context of a comprehensive data management and analysis system. We present in this paper IMG/M, an experimental metagenome data management and analysis system that is based on the Integrated Microbial Genomes (IMG) system. IMG/M provides tools and viewers for analyzing both metagenomes and isolate genomes individually or in a comparative context.

  3. An ORFome assembly approach to metagenomics sequences analysis.

    PubMed

    Ye, Yuzhen; Tang, Haixu

    2009-06-01

    Metagenomics is an emerging methodology for the direct genomic analysis of a mixed community of uncultured microorganisms. The current analyses of metagenomics data largely rely on the computational tools originally designed for microbial genomics projects. The challenge of assembling metagenomic sequences arises mainly from the short reads and the high species complexity of the community. Alternatively, individual (short) reads will be searched directly against databases of known genes (or proteins) to identify homologous sequences. The latter approach may have low sensitivity and specificity in identifying homologous sequences, which may further bias the subsequent diversity analysis. In this paper, we present a novel approach to metagenomic data analysis, called Metagenomic ORFome Assembly (MetaORFA). The whole computational framework consists of three steps. Each read from a metagenomics project will first be annotated with putative open reading frames (ORFs) that likely encode proteins. Next, the predicted ORFs are assembled into a collection of peptides using an EULER assembly method. Finally, the assembled peptides (i.e. ORFome) are used for database searching of homologs and subsequent diversity analysis. We applied MetaORFA approach to several metagenomics datasets with low coverage short reads. The results show that MetaORFA can produce long peptides even when the sequence coverage of reads is extremely low. Hence, the ORFome assembly significantly increases the sensitivity of homology searching, and may potentially improve the diversity analysis of the metagenomic data. This improvement is especially useful for metagenomic projects when the genome assembly does not work because of the low sequence coverage.

  4. Strong spurious transcription likely contributes to DNA insert bias in typical metagenomic clone libraries.

    PubMed

    Lam, Kathy N; Charles, Trevor C

    2015-01-01

    Clone libraries provide researchers with a powerful resource to study nucleic acid from diverse sources. Metagenomic clone libraries in particular have aided in studies of microbial biodiversity and function, and allowed the mining of novel enzymes. Libraries are often constructed by cloning large inserts into cosmid or fosmid vectors. Recently, there have been reports of GC bias in fosmid metagenomic libraries, and it was speculated to be a result of fragmentation and loss of AT-rich sequences during cloning. However, evidence in the literature suggests that transcriptional activity or gene product toxicity may play a role. To explore possible mechanisms responsible for sequence bias in clone libraries, we constructed a cosmid library from a human microbiome sample and sequenced DNA from different steps during library construction: crude extract DNA, size-selected DNA, and cosmid library DNA. We confirmed a GC bias in the final cosmid library, and we provide evidence that the bias is not due to fragmentation and loss of AT-rich sequences but is likely occurring after DNA is introduced into Escherichia coli. To investigate the influence of strong constitutive transcription, we searched the sequence data for promoters and found that rpoD/σ(70) promoter sequences were underrepresented in the cosmid library. Furthermore, when we examined the genomes of taxa that were differentially abundant in the cosmid library relative to the original sample, we found the bias to be more correlated with the number of rpoD/σ(70) consensus sequences in the genome than with simple GC content. The GC bias of metagenomic libraries does not appear to be due to DNA fragmentation. Rather, analysis of promoter sequences provides support for the hypothesis that strong constitutive transcription from sequences recognized as rpoD/σ(70) consensus-like in E. coli may lead to instability, causing loss of the plasmid or loss of the insert DNA that gives rise to the transcription. Despite

  5. Protein structure determination using metagenome sequence data.

    PubMed

    Ovchinnikov, Sergey; Park, Hahnbeom; Varghese, Neha; Huang, Po-Ssu; Pavlopoulos, Georgios A; Kim, David E; Kamisetty, Hetunandan; Kyrpides, Nikos C; Baker, David

    2017-01-20

    Despite decades of work by structural biologists, there are still ~5200 protein families with unknown structure outside the range of comparative modeling. We show that Rosetta structure prediction guided by residue-residue contacts inferred from evolutionary information can accurately model proteins that belong to large families and that metagenome sequence data more than triple the number of protein families with sufficient sequences for accurate modeling. We then integrate metagenome data, contact-based structure matching, and Rosetta structure calculations to generate models for 614 protein families with currently unknown structures; 206 are membrane proteins and 137 have folds not represented in the Protein Data Bank. This approach provides the representative models for large protein families originally envisioned as the goal of the Protein Structure Initiative at a fraction of the cost. Copyright © 2017, American Association for the Advancement of Science.

  6. Insights into antibiotic resistance through metagenomic approaches.

    PubMed

    Schmieder, Robert; Edwards, Robert

    2012-01-01

    The consequences of bacterial infections have been curtailed by the introduction of a wide range of antibiotics. However, infections continue to be a leading cause of mortality, in part due to the evolution and acquisition of antibiotic-resistance genes. Antibiotic misuse and overprescription have created a driving force influencing the selection of resistance. Despite the problem of antibiotic resistance in infectious bacteria, little is known about the diversity, distribution and origins of resistance genes, especially for the unculturable majority of environmental bacteria. Functional and sequence-based metagenomics have been used for the discovery of novel resistance determinants and the improved understanding of antibiotic-resistance mechanisms in clinical and natural environments. This review discusses recent findings and future challenges in the study of antibiotic resistance through metagenomic approaches.

  7. Metagenomic sequencing of expressed prostate secretions.

    PubMed

    Smelov, Vitaly; Arroyo Mühr, L Sara; Bzhalava, Davit; Brown, Lyndon J; Komyakov, Boris; Dillner, Joakim

    2014-12-01

    To investigate which microorganisms may be present in expressed prostate secretions (EPS) metagenomic sequencing (MGS) was applied to prostate secretion samples from five men with prostatitis and five matched control men as well as to combined expressed prostate secretion and urine from six patients with prostate cancer and six matched control men. The prostate secretion samples contained a variety of bacterial sequences, mostly belonging to the Proteobacteria phylum. The combined prostate secretion and urine samples were dominated by abundant presence of the JC polyomavirus, representing >20% of all detected metagenomic sequence reads. There were also other viruses detected, for example, human papillomavirus type 81. All combined prostate secretion and urine samples were also positive for Proteobacteria. In summary, MGS of expressed prostate secretion is informative for detecting a variety of bacteria and viruses, suggesting that a more large-scale use of MGS of prostate secretions may be useful in medical and epidemiological studies of prostate infections.

  8. Protein Structure Determination using Metagenome sequence data

    PubMed Central

    Ovchinnikov, Sergey; Park, Hahnbeom; Varghese, Neha; Huang, Po-Ssu; Pavlopoulos, Georgios A.; Kim, David E.; Kamisetty, Hetunandan; Kyrpides, Nikos C.; Baker, David

    2017-01-01

    Despite decades of work by structural biologists, there are still ~5200 protein families with unknown structure outside the range of comparative modeling. We show that Rosetta structure prediction guided by residue-residue contacts inferred from evolutionary information can accurately model proteins that belong to large families, and that metagenome sequence data more than triples the number of protein families with sufficient sequences for accurate modeling. We then integrate metagenome data, contact based structure matching and Rosetta structure calculations to generate models for 614 protein families with currently unknown structures; 206 are membrane proteins and 137 have folds not represented in the PDB. This approach provides the representative models for large protein families originally envisioned as the goal of the protein structure initiative at a fraction of the cost. PMID:28104891

  9. Genomics and metagenomics in medical microbiology.

    PubMed

    Padmanabhan, Roshan; Mishra, Ajay Kumar; Raoult, Didier; Fournier, Pierre-Edouard

    2013-12-01

    Over the last two decades, sequencing tools have evolved from laborious time-consuming methodologies to real-time detection and deciphering of genomic DNA. Genome sequencing, especially using next generation sequencing (NGS) has revolutionized the landscape of microbiology and infectious disease. This deluge of sequencing data has not only enabled advances in fundamental biology but also helped improve diagnosis, typing of pathogen, virulence and antibiotic resistance detection, and development of new vaccines and culture media. In addition, NGS also enabled efficient analysis of complex human micro-floras, both commensal, and pathological, through metagenomic methods, thus helping the comprehension and management of human diseases such as obesity. This review summarizes technological advances in genomics and metagenomics relevant to the field of medical microbiology.

  10. Plant virus metagenomics: biodiversity and ecology.

    PubMed

    Roossinck, Marilyn J

    2012-01-01

    Viral metagenomics is the study of viruses in environmental samples, using next generation sequencing that produces very large data sets. For plant viruses, these studies are still relatively new, but are already indicating that our current knowledge grossly underestimates the diversity of these viruses. Some plant virus studies are using thousands of individual plants so that each sequence can be traced back to its precise host. These studies should allow for deeper ecological and evolutionary analyses. The finding of so many new plant viruses that do not cause any obvious symptoms in wild plant hosts certainly changes our perception of viruses and how they interact with their hosts. The major difficulty in these (as in all) metagenomic studies continues to be the need for better bioinformatics tools to decipher the large data sets. The implications of this new information on plant viruses for international agriculture remain to be addressed.

  11. Metagenomic Characterization of Chesapeake Bay Virioplankton▿ †

    PubMed Central

    Bench, Shellie R.; Hanson, Thomas E.; Williamson, Kurt E.; Ghosh, Dhritiman; Radosovich, Mark; Wang, Kui; Wommack, K. Eric

    2007-01-01

    Viruses are ubiquitous and abundant throughout the biosphere. In marine systems, virus-mediated processes can have significant impacts on microbial diversity and on global biogeocehmical cycling. However, viral genetic diversity remains poorly characterized. To address this shortcoming, a metagenomic library was constructed from Chesapeake Bay virioplankton. The resulting sequences constitute the largest collection of long-read double-stranded DNA (dsDNA) viral metagenome data reported to date. BLAST homology comparisons showed that Chesapeake Bay virioplankton contained a high proportion of unknown (homologous only to environmental sequences) and novel (no significant homolog) sequences. This analysis suggests that dsDNA viruses are likely one of the largest reservoirs of unknown genetic diversity in the biosphere. The taxonomic origin of BLAST homologs to viral library sequences agreed well with reported abundances of cooccurring bacterial subphyla within the estuary and indicated that cyanophages were abundant. However, the low proportion of Siphophage homologs contradicts a previous assertion that this family comprises most bacteriophage diversity. Identification and analyses of cyanobacterial homologs of the psbA gene illustrated the value of metagenomic studies of virioplankton. The phylogeny of inferred PsbA protein sequences suggested that Chesapeake Bay cyanophage strains are endemic in that environment. The ratio of psbA homologous sequences to total cyanophage sequences in the metagenome indicated that the psbA gene may be nearly universal in Chesapeake Bay cyanophage genomes. Furthermore, the low frequency of psbD homologs in the library supports the prediction that Chesapeake Bay cyanophage populations are dominated by Podoviridae. PMID:17921274

  12. Assemble CRISPRs from metagenomic sequencing data.

    PubMed

    Lei, Jikai; Sun, Yanni

    2016-09-01

    Clustered regularly interspaced short palindromic repeats and associated proteins (CRISPR-Cas) allows more specific and efficient gene editing than all previous genetic engineering systems. These exciting discoveries stem from the finding of the CRISPR system being an adaptive immune system that protects the prokaryotes against exogenous genetic elements such as phages. Despite the exciting discoveries, almost all knowledge about CRISPRs is based only on microorganisms that can be isolated, cultured and sequenced in labs. However, about 95% of bacterial species cannot be cultured in labs. The fast accumulation of metagenomic data, which contains DNA sequences of microbial species from natural samples, provides a unique opportunity for CRISPR annotation in uncultivable microbial species. However, the large amount of data, heterogeneous coverage and shared leader sequences of some CRISPRs pose challenges for identifying CRISPRs efficiently in metagenomic data. In this study, we developed a CRISPR finding tool for metagenomic data without relying on generic assembly, which is error-prone and computationally expensive for complex data. Our tool can run on commonly available machines in small labs. It employs properties of CRISPRs to decompose generic assembly into local assembly. We tested it on both mock and real metagenomic data and benchmarked the performance with state-of-the-art tools. The source code and the documentation of metaCRISPR is available at https://github.com/hangelwen/metaCRISPR CONTACT: yannisun@msu.edu. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  13. Metagenomic characterization of Chesapeake Bay virioplankton.

    PubMed

    Bench, Shellie R; Hanson, Thomas E; Williamson, Kurt E; Ghosh, Dhritiman; Radosovich, Mark; Wang, Kui; Wommack, K Eric

    2007-12-01

    Viruses are ubiquitous and abundant throughout the biosphere. In marine systems, virus-mediated processes can have significant impacts on microbial diversity and on global biogeocehmical cycling. However, viral genetic diversity remains poorly characterized. To address this shortcoming, a metagenomic library was constructed from Chesapeake Bay virioplankton. The resulting sequences constitute the largest collection of long-read double-stranded DNA (dsDNA) viral metagenome data reported to date. BLAST homology comparisons showed that Chesapeake Bay virioplankton contained a high proportion of unknown (homologous only to environmental sequences) and novel (no significant homolog) sequences. This analysis suggests that dsDNA viruses are likely one of the largest reservoirs of unknown genetic diversity in the biosphere. The taxonomic origin of BLAST homologs to viral library sequences agreed well with reported abundances of cooccurring bacterial subphyla within the estuary and indicated that cyanophages were abundant. However, the low proportion of Siphophage homologs contradicts a previous assertion that this family comprises most bacteriophage diversity. Identification and analyses of cyanobacterial homologs of the psbA gene illustrated the value of metagenomic studies of virioplankton. The phylogeny of inferred PsbA protein sequences suggested that Chesapeake Bay cyanophage strains are endemic in that environment. The ratio of psbA homologous sequences to total cyanophage sequences in the metagenome indicated that the psbA gene may be nearly universal in Chesapeake Bay cyanophage genomes. Furthermore, the low frequency of psbD homologs in the library supports the prediction that Chesapeake Bay cyanophage populations are dominated by Podoviridae.

  14. Metagenomic analysis of viruses in reclaimed water.

    PubMed

    Rosario, Karyna; Nilsson, Christina; Lim, Yan Wei; Ruan, Yijun; Breitbart, Mya

    2009-11-01

    Reclaimed water use is an important component of sustainable water resource management. However, there are concerns regarding pathogen transport through this alternative water supply. This study characterized the viral community found in reclaimed water and compared it with viruses in potable water. Reclaimed water contained 1000-fold more virus-like particles than potable water, having approximately 10(8) VLPs per millilitre. Metagenomic analyses revealed that most of the viruses in both reclaimed and potable water were novel. Bacteriophages dominated the DNA viral community in both reclaimed and potable water, but reclaimed water had a distinct phage community based on phage family distributions and host representation within each family. Eukaryotic viruses similar to plant pathogens and invertebrate picornaviruses dominated RNA metagenomic libraries. Established human pathogens were not detected in reclaimed water viral metagenomes, which contained a wealth of novel single-stranded DNA and RNA viruses related to plant, animal and insect viruses. Therefore, reclaimed water may play a role in the dissemination of highly stable viruses. Information regarding viruses present in reclaimed water but not in potable water can be used to identify new bioindicators of water quality. Future studies will need to investigate the infectivity and host range of these viruses to evaluate the impacts of reclaimed water use on human and ecosystem health.

  15. New extremophilic lipases and esterases from metagenomics.

    PubMed

    López-López, Olalla; Cerdán, Maria E; González Siso, Maria I

    2014-01-01

    Lipolytic enzymes catalyze the hydrolysis of ester bonds in the presence of water. In media with low water content or in organic solvents, they can catalyze synthetic reactions such as esterification and transesterification. Lipases and esterases, in particular those from extremophilic origin, are robust enzymes, functional under the harsh conditions of industrial processes owing to their inherent thermostability and resistance towards organic solvents, which combined with their high chemo-, regio- and enantioselectivity make them very attractive biocatalysts for a variety of industrial applications. Likewise, enzymes from extremophile sources can provide additional features such as activity at extreme temperatures, extreme pH values or high salinity levels, which could be interesting for certain purposes. New lipases and esterases have traditionally been discovered by the isolation of microbial strains producing lipolytic activity. The Genome Projects Era allowed genome mining, exploiting homology with known lipases and esterases, to be used in the search for new enzymes. The Metagenomic Era meant a step forward in this field with the study of the metagenome, the pool of genomes in an environmental microbial community. Current molecular biology techniques make it possible to construct total environmental DNA libraries, including the genomes of unculturable organisms, opening a new window to a vast field of unknown enzymes with new and unique properties. Here, we review the latest advances and findings from research into new extremophilic lipases and esterases, using metagenomic approaches, and their potential industrial and biotechnological applications.

  16. Gastrointestinal microbiology enters the metagenomics era.

    PubMed

    Frank, Daniel N; Pace, Norman R

    2008-01-01

    Advances in DNA sequence-based technologies now permit genetic analysis of complex microbial populations without the need for prior cultivation. This review summarizes the molecular methods of culture-independent microbiology ('metagenomics') and their recent application to studies of the human gastrointestinal tract in both health and disease. Culture-independent metagenomic surveys reveal unprecedented microbial biodiversity in the human intestine. Upwards of 40,000 bacterial species are estimated to comprise the collective gastrointestinal microbiome, most of which have not been characterized by culture. Diverse conditions such as antibiotic-associated diarrhea, Crohn's disease, ulcerative colitis, obesity, and pouchitis have been correlated with large-scale imbalances in gastrointestinal microbiota, or 'dysbiosis'. These findings demonstrate the importance of commensal microorganisms in maintaining gastrointestinal health. Through technological and conceptual innovations in metagenomics, the complex microbial habitat of the human gastrointestinal tract is now amenable to detailed ecological analysis. Large-scale shifts in gut commensal populations, rather than occurrence of particular microorganisms, are associated with several gastroenterological conditions; redress of these imbalances may ameliorate the conditions.

  17. Generating viral metagenomes from the coral holobiont

    PubMed Central

    Wood-Charlson, Elisha M.; Suttle, Curtis A.; van Oppen, Madeleine J. H.

    2014-01-01

    Reef-building corals comprise multipartite symbioses where the cnidarian animal is host to an array of eukaryotic and prokaryotic organisms, and the viruses that infect them. These viruses are critical elements of the coral holobiont, serving not only as agents of mortality, but also as potential vectors for lateral gene flow, and as elements encoding a variety of auxiliary metabolic functions. Consequently, understanding the functioning and health of the coral holobiont requires detailed knowledge of the associated viral assemblage and its function. Currently, the most tractable way of uncovering viral diversity and function is through metagenomic approaches, which is inherently difficult in corals because of the complex holobiont community, an extracellular mucus layer that all corals secrete, and the variety of sizes and structures of nucleic acids found in viruses. Here we present the first protocol for isolating, purifying and amplifying viral nucleic acids from corals based on mechanical disruption of cells. This method produces at least 50% higher yields of viral nucleic acids, has very low levels of cellular sequence contamination and captures wider viral diversity than previously used chemical-based extraction methods. We demonstrate that our mechanical-based method profiles a greater diversity of DNA and RNA genomes, including virus groups such as Retro-transcribing and ssRNA viruses, which are absent from metagenomes generated via chemical-based methods. In addition, we briefly present (and make publically available) the first paired DNA and RNA viral metagenomes from the coral Acropora tenuis. PMID:24847321

  18. New Extremophilic Lipases and Esterases from Metagenomics

    PubMed Central

    López-López, Olalla; Cerdán, Maria E; González Siso, Maria I

    2014-01-01

    Lipolytic enzymes catalyze the hydrolysis of ester bonds in the presence of water. In media with low water content or in organic solvents, they can catalyze synthetic reactions such as esterification and transesterification. Lipases and esterases, in particular those from extremophilic origin, are robust enzymes, functional under the harsh conditions of industrial processes owing to their inherent thermostability and resistance towards organic solvents, which combined with their high chemo-, regio- and enantioselectivity make them very attractive biocatalysts for a variety of industrial applications. Likewise, enzymes from extremophile sources can provide additional features such as activity at extreme temperatures, extreme pH values or high salinity levels, which could be interesting for certain purposes. New lipases and esterases have traditionally been discovered by the isolation of microbial strains producing lipolytic activity. The Genome Projects Era allowed genome mining, exploiting homology with known lipases and esterases, to be used in the search for new enzymes. The Metagenomic Era meant a step forward in this field with the study of the metagenome, the pool of genomes in an environmental microbial community. Current molecular biology techniques make it possible to construct total environmental DNA libraries, including the genomes of unculturable organisms, opening a new window to a vast field of unknown enzymes with new and unique properties. Here, we review the latest advances and findings from research into new extremophilic lipases and esterases, using metagenomic approaches, and their potential industrial and biotechnological applications. PMID:24588890

  19. MetaPhinder-Identifying Bacteriophage Sequences in Metagenomic Data Sets.

    PubMed

    Jurtz, Vanessa Isabell; Villarroel, Julia; Lund, Ole; Voldby Larsen, Mette; Nielsen, Morten

    Bacteriophages are the most abundant biological entity on the planet, but at the same time do not account for much of the genetic material isolated from most environments due to their small genome sizes. They also show great genetic diversity and mosaic genomes making it challenging to analyze and understand them. Here we present MetaPhinder, a method to identify assembled genomic fragments (i.e.contigs) of phage origin in metagenomic data sets. The method is based on a comparison to a database of whole genome bacteriophage sequences, integrating hits to multiple genomes to accomodate for the mosaic genome structure of many bacteriophages. The method is demonstrated to out-perform both BLAST methods based on single hits and methods based on k-mer comparisons. MetaPhinder is available as a web service at the Center for Genomic Epidemiology https://cge.cbs.dtu.dk/services/MetaPhinder/, while the source code can be downloaded from https://bitbucket.org/genomicepidemiology/metaphinder or https://github.com/vanessajurtz/MetaPhinder.

  20. Technical Report: Algorithm and Implementation for Quasispecies Abundance Inference with Confidence Intervals from Metagenomic Sequence Data

    SciTech Connect

    McLoughlin, Kevin

    2016-01-11

    This report describes the design and implementation of an algorithm for estimating relative microbial abundances, together with confidence limits, using data from metagenomic DNA sequencing. For the background behind this project and a detailed discussion of our modeling approach for metagenomic data, we refer the reader to our earlier technical report, dated March 4, 2014. Briefly, we described a fully Bayesian generative model for paired-end sequence read data, incorporating the effects of the relative abundances, the distribution of sequence fragment lengths, fragment position bias, sequencing errors and variations between the sampled genomes and the nearest reference genomes. A distinctive feature of our modeling approach is the use of a Chinese restaurant process (CRP) to describe the selection of genomes to be sampled, and thus the relative abundances. The CRP component is desirable for fitting abundances to reads that may map ambiguously to multiple targets, because it naturally leads to sparse solutions that select the best representative from each set of nearly equivalent genomes.

  1. Technical Report on Modeling for Quasispecies Abundance Inference with Confidence Intervals from Metagenomic Sequence Data

    SciTech Connect

    McLoughlin, K.

    2016-01-11

    The overall aim of this project is to develop a software package, called MetaQuant, that can determine the constituents of a complex microbial sample and estimate their relative abundances by analysis of metagenomic sequencing data. The goal for Task 1 is to create a generative model describing the stochastic process underlying the creation of sequence read pairs in the data set. The stages in this generative process include the selection of a source genome sequence for each read pair, with probability dependent on its abundance in the sample. The other stages describe the evolution of the source genome from its nearest common ancestor with a reference genome, breakage of the source DNA into short fragments, and the errors in sequencing the ends of the fragments to produce read pairs.

  2. Metagenomic Assembly Reveals Hosts of Antibiotic Resistance Genes and the Shared Resistome in Pig, Chicken, and Human Feces.

    PubMed

    Ma, Liping; Xia, Yu; Li, Bing; Yang, Ying; Li, Li-Guan; Tiedje, James M; Zhang, Tong

    2016-01-05

    The risk associated with antibiotic resistance disseminating from animal and human feces is an urgent public issue. In the present study, we sought to establish a pipeline for annotating antibiotic resistance genes (ARGs) based on metagenomic assembly to investigate ARGs and their co-occurrence with associated genetic elements. Genetic elements found on the assembled genomic fragments include mobile genetic elements (MGEs) and metal resistance genes (MRGs). We then explored the hosts of these resistance genes and the shared resistome of pig, chicken and human fecal samples. High levels of tetracycline, multidrug, erythromycin, and aminoglycoside resistance genes were discovered in these fecal samples. In particular, significantly high level of ARGs (7762 ×/Gb) was detected in adult chicken feces, indicating higher ARG contamination level than other fecal samples. Many ARGs arrangements (e.g., macA-macB and tetA-tetR) were discovered shared by chicken, pig and human feces. In addition, MGEs such as the aadA5-dfrA17-carrying class 1 integron were identified on an assembled scaffold of chicken feces, and are carried by human pathogens. Differential coverage binning analysis revealed significant ARG enrichment in adult chicken feces. A draft genome, annotated as multidrug resistant Escherichia coli, was retrieved from chicken feces metagenomes and was determined to carry diverse ARGs (multidrug, acriflavine, and macrolide). The present study demonstrates the determination of ARG hosts and the shared resistome from metagenomic data sets and successfully establishes the relationship between ARGs, hosts, and environments. This ARG annotation pipeline based on metagenomic assembly will help to bridge the knowledge gaps regarding ARG-associated genes and ARG hosts with metagenomic data sets. Moreover, this pipeline will facilitate the evaluation of environmental risks in the genetic context of ARGs.

  3. The metagenomic data life-cycle: standards and best practices.

    PubMed

    Ten Hoopen, Petra; Finn, Robert D; Bongo, Lars Ailo; Corre, Erwan; Fosso, Bruno; Meyer, Folker; Mitchell, Alex; Pelletier, Eric; Pesole, Graziano; Santamaria, Monica; Willassen, Nils Peder; Cochrane, Guy

    2017-08-01

    Metagenomics data analyses from independent studies can only be compared if the analysis workflows are described in a harmonized way. In this overview, we have mapped the landscape of data standards available for the description of essential steps in metagenomics: (i) material sampling, (ii) material sequencing, (iii) data analysis, and (iv) data archiving and publishing. Taking examples from marine research, we summarize essential variables used to describe material sampling processes and sequencing procedures in a metagenomics experiment. These aspects of metagenomics dataset generation have been to some extent addressed by the scientific community, but greater awareness and adoption is still needed. We emphasize the lack of standards relating to reporting how metagenomics datasets are analysed and how the metagenomics data analysis outputs should be archived and published. We propose best practice as a foundation for a community standard to enable reproducibility and better sharing of metagenomics datasets, leading ultimately to greater metagenomics data reuse and repurposing. © The Author 2017. Published by Oxford University Press.

  4. Taxonomer: an interactive metagenomics analysis portal for universal pathogen detection and host mRNA expression profiling.

    PubMed

    Flygare, Steven; Simmon, Keith; Miller, Chase; Qiao, Yi; Kennedy, Brett; Di Sera, Tonya; Graf, Erin H; Tardif, Keith D; Kapusta, Aurélie; Rynearson, Shawn; Stockmann, Chris; Queen, Krista; Tong, Suxiang; Voelkerding, Karl V; Blaschke, Anne; Byington, Carrie L; Jain, Seema; Pavia, Andrew; Ampofo, Krow; Eilbeck, Karen; Marth, Gabor; Yandell, Mark; Schlaberg, Robert

    2016-05-26

    High-throughput sequencing enables unbiased profiling of microbial communities, universal pathogen detection, and host response to infectious diseases. However, computation times and algorithmic inaccuracies have hindered adoption. We present Taxonomer, an ultrafast, web-tool for comprehensive metagenomics data analysis and interactive results visualization. Taxonomer is unique in providing integrated nucleotide and protein-based classification and simultaneous host messenger RNA (mRNA) transcript profiling. Using real-world case-studies, we show that Taxonomer detects previously unrecognized infections and reveals antiviral host mRNA expression profiles. To facilitate data-sharing across geographic distances in outbreak settings, Taxonomer is publicly available through a web-based user interface. Taxonomer enables rapid, accurate, and interactive analyses of metagenomics data on personal computers and mobile devices.

  5. Metagenomics and Bioinformatics in Microbial Ecology: Current Status and Beyond.

    PubMed

    Hiraoka, Satoshi; Yang, Ching-Chia; Iwasaki, Wataru

    2016-09-29

    Metagenomic approaches are now commonly used in microbial ecology to study microbial communities in more detail, including many strains that cannot be cultivated in the laboratory. Bioinformatic analyses make it possible to mine huge metagenomic datasets and discover general patterns that govern microbial ecosystems. However, the findings of typical metagenomic and bioinformatic analyses still do not completely describe the ecology and evolution of microbes in their environments. Most analyses still depend on straightforward sequence similarity searches against reference databases. We herein review the current state of metagenomics and bioinformatics in microbial ecology and discuss future directions for the field. New techniques will allow us to go beyond routine analyses and broaden our knowledge of microbial ecosystems. We need to enrich reference databases, promote platforms that enable meta- or comprehensive analyses of diverse metagenomic datasets, devise methods that utilize long-read sequence information, and develop more powerful bioinformatic methods to analyze data from diverse perspectives.

  6. Comparative Viral Metagenomics of Environmental Samples from Korea

    PubMed Central

    Kim, Min-Soo; Whon, Tae Woong

    2013-01-01

    The introduction of metagenomics into the field of virology has facilitated the exploration of viral communities in various natural habitats. Understanding the viral ecology of a variety of sample types throughout the biosphere is important per se, but it also has potential applications in clinical and diagnostic virology. However, the procedures used by viral metagenomics may produce technical errors, such as amplification bias, while public viral databases are very limited, which may hamper the determination of the viral diversity in samples. This review considers the current state of viral metagenomics, based on examples from Korean viral metagenomic studies-i.e., rice paddy soil, fermented foods, human gut, seawater, and the near-surface atmosphere. Viral metagenomics has become widespread due to various methodological developments, and much attention has been focused on studies that consider the intrinsic role of viruses that interact with their hosts. PMID:24124407

  7. Metagenomics: Retrospect and Prospects in High Throughput Age

    PubMed Central

    Kumar, Satish; Krishnani, Kishore Kumar; Bhushan, Bharat; Brahmane, Manoj Pandit

    2015-01-01

    In recent years, metagenomics has emerged as a powerful tool for mining of hidden microbial treasure in a culture independent manner. In the last two decades, metagenomics has been applied extensively to exploit concealed potential of microbial communities from almost all sorts of habitats. A brief historic progress made over the period is discussed in terms of origin of metagenomics to its current state and also the discovery of novel biological functions of commercial importance from metagenomes of diverse habitats. The present review also highlights the paradigm shift of metagenomics from basic study of community composition to insight into the microbial community dynamics for harnessing the full potential of uncultured microbes with more emphasis on the implication of breakthrough developments, namely, Next Generation Sequencing, advanced bioinformatics tools, and systems biology. PMID:26664751

  8. Metagenomics and Bioinformatics in Microbial Ecology: Current Status and Beyond

    PubMed Central

    Hiraoka, Satoshi; Yang, Ching-chia; Iwasaki, Wataru

    2016-01-01

    Metagenomic approaches are now commonly used in microbial ecology to study microbial communities in more detail, including many strains that cannot be cultivated in the laboratory. Bioinformatic analyses make it possible to mine huge metagenomic datasets and discover general patterns that govern microbial ecosystems. However, the findings of typical metagenomic and bioinformatic analyses still do not completely describe the ecology and evolution of microbes in their environments. Most analyses still depend on straightforward sequence similarity searches against reference databases. We herein review the current state of metagenomics and bioinformatics in microbial ecology and discuss future directions for the field. New techniques will allow us to go beyond routine analyses and broaden our knowledge of microbial ecosystems. We need to enrich reference databases, promote platforms that enable meta- or comprehensive analyses of diverse metagenomic datasets, devise methods that utilize long-read sequence information, and develop more powerful bioinformatic methods to analyze data from diverse perspectives. PMID:27383682

  9. Exploiting HPC Platforms for Metagenomics: Challenges and Opportunities (MICW - Metagenomics Informatics Challenges Workshop: 10K Genomes at a Time)

    SciTech Connect

    Canon, Shane

    2011-10-12

    DOE JGI's Zhong Wang, chair of the High-performance Computing session, gives a brief introduction before Berkeley Lab's Shane Canon talks about "Exploiting HPC Platforms for Metagenomics: Challenges and Opportunities" at the Metagenomics Informatics Challenges Workshop held at the DOE JGI on October 12-13, 2011.

  10. Exploiting HPC Platforms for Metagenomics: Challenges and Opportunities (MICW - Metagenomics Informatics Challenges Workshop: 10K Genomes at a Time)

    ScienceCinema

    Canon, Shane [LBNL

    2016-07-12

    DOE JGI's Zhong Wang, chair of the High-performance Computing session, gives a brief introduction before Berkeley Lab's Shane Canon talks about "Exploiting HPC Platforms for Metagenomics: Challenges and Opportunities" at the Metagenomics Informatics Challenges Workshop held at the DOE JGI on October 12-13, 2011.

  11. Metagenomic approaches to identifying infectious agents.

    PubMed

    Höper, D; Mettenleiter, T C; Beer, M

    2016-04-01

    Since the advent of next-generation sequencing (NGS) technologies, the untargeted screening of samples from outbreaks for pathogen identification using metagenomics has become technically and economically feasible. However, various aspects need to be considered in order to exploit the full potential of NGS for virus discovery. Here, the authors summarise those aspects of the main steps that have a significant impact, from sample selection through sample handling and processing, as well as sequencing and finally data analysis, with a special emphasis on existing pitfalls.

  12. Metagenomics: an inexhaustible access to nature's diversity.

    PubMed

    Langer, Martin; Gabor, Esther M; Liebeton, Klaus; Meurer, Guido; Niehaus, Frank; Schulze, Renate; Eck, Jürgen; Lorenz, Patrick

    2006-01-01

    The chemical industry has an enormous need for innovation. To save resources, energy and time, currently more and more established chemical processes are being switched to biotechnological routes. This requires white biotechnology to discover and develop novel enzymes, biocatalysts and applications. Due to a limitation in the cultivability of microbes living in certain habitats, technologies have to be established which give access to the enormous resource of uncultivated microbial diversity. Metagenomics promises to provide new and diverse enzymes and biocatalysts as well as bioactive molecules and has the potential to make industrial biotechnology an economic, sustainable success.

  13. Oral Metagenomic Biomarkers in Rheumatoid Arthritis

    DTIC Science & Technology

    2016-09-01

    AWARD NUMBER: W81XWH-15-1-0320 TITLE: Oral Metagenomic Biomarkers in Rheumatoid Arthritis PRINCIPAL INVESTIGATOR: Edward K Chan CONTRACTING...Biomarkers in Rheumatoid Arthritis 5a. CONTRACT NUMBER 5b. GRANT NUMBER W81XWH-15-1-0320 5c. PROGRAM ELEMENT NUMBER 6. AUTHOR(S) Edward K Chan 5d...significant difference in the oral  microbiome at the subspecies level of individuals with  rheumatoid   arthritis  (RA). The goal is to test the

  14. [Research in metagenomics and its applications in translational medicine].

    PubMed

    Jiahuan, Chen; Zheng, Sun; Xiaojun, Wang; Xiaoquan, Su; Kang, Ning

    2015-07-01

    Humans are born with microbiota, which have accompanied us through our life-span. There is an important symbiotic relationship between us and the microbial communities, thus microbial communities are of great importance to our health. All genomic information within this microbiota is referered to as "metagenomics" (also referred to as "human's second genome"). The analysis of high throughput metagenomic data generated from biomedical experiments would provide new approaches for translational research, and it have several applications in clinics. With the help of next generation sequencing technology and the emerging metagenomic approach (analysis of all genomic information in microbiota as a whole), we can overcome the pitfalls of tedious traditional method of isolation and cultivation of single microbial species. The metagenomic approach can also help us to analyze the whole microbial community efficiently and offer deep insights in human-microbe relationships as well as new ideas on many biomedical problems. In this review, we summarize frontiers in metagenomic research, including new concepts and methods. Then, we focus on the applications of metagenomic research in medical researches and clinical applications in recent years, which would clearly show the importance of metagenomic research in the field of translational medicine.

  15. New insights into the archaeal diversity of a hypersaline microbial mat obtained by a metagenomic approach.

    PubMed

    López-López, A; Richter, M; Peña, A; Tamames, J; Rosselló-Móra, R

    2013-05-01

    A metagenomic approach was carried out in order to study the genetic pool of a hypersaline microbial mat, paying more attention to the archaeal community and, specifically, to the putatively methanogenic members. The main aim of the work was to expand the knowledge of a likely ecologically important archaeal lineage, candidate division MSBL1, which is probably involved in methanogenesis at very high salinities. The results obtained in this study were in accordance with our previous report on the bacterial diversity encountered by using a number of molecular techniques, but remarkable differences were found in the archaeal diversity retrieval by each of the procedures used (metagenomics and 16S rRNA-based methods). The lack of synteny for most of the metagenomic fragments with known genomes, together with the low degree of similarity of the annotated open reading frames (ORFs) with the sequences in the databases, reflected the high degree of novelty in the mat community studied. Linking the sequenced clones with representatives of division MSBL1 was not possible because of the lack of additional information concerning this archaeal group in the public gene repositories. However, given the high abundance of representatives of this division in the 16S rRNA clone libraries and the low identity of the archaeal clones with known genomes, it was hypothesized that some of them could arise from MSBL1 genomes. In addition, other prokaryotic groups known to be relevant in organic matter mineralization at high salinities were detected. Copyright © 2013 Elsevier GmbH. All rights reserved.

  16. International Standards for Genomes, Transcriptomes, and Metagenomes

    PubMed Central

    Mason, Christopher E.; Afshinnekoo, Ebrahim; Tighe, Scott; Wu, Shixiu; Levy, Shawn

    2017-01-01

    Challenges and biases in preparing, characterizing, and sequencing DNA and RNA can have significant impacts on research in genomics across all kingdoms of life, including experiments in single-cells, RNA profiling, and metagenomics (across multiple genomes). Technical artifacts and contamination can arise at each point of sample manipulation, extraction, sequencing, and analysis. Thus, the measurement and benchmarking of these potential sources of error are of paramount importance as next-generation sequencing (NGS) projects become more global and ubiquitous. Fortunately, a variety of methods, standards, and technologies have recently emerged that improve measurements in genomics and sequencing, from the initial input material to the computational pipelines that process and annotate the data. Here we review current standards and their applications in genomics, including whole genomes, transcriptomes, mixed genomic samples (metagenomes), and the modified bases within each (epigenomes and epitranscriptomes). These standards, tools, and metrics are critical for quantifying the accuracy of NGS methods, which will be essential for robust approaches in clinical genomics and precision medicine. PMID:28337071

  17. Back to the future of soil metagenomics

    SciTech Connect

    Nesme, Joseph; Achouak, Wafa; Agathos, Spiros N.; Bailey, Mark; Baldrian, Petr; Brunel, Dominique; Frostegard, Asa; Heulin, Thierry; Jansson, Janet K.; Jurkevitch, Edouard; Kruus, Kristiina L.; Kowalchuk, George A.; Lagares, Antonio; Lappin-Scott, Hilary M.; Lemanceau, Philippe; Le Paslier, Denis; Mandic-Mulec, Ines; Murrell, J. Colin; Myrold, David D.; Nalin, Renaud; Nannipieri, Paolo; Neufeld, Josh D.; O'Gara, Fergal; Parnell, John J.; Puhler, Alfred; Pylro, Victor; Roesch, Luiz F. W.; Schloter, Michael; Schleper, Christa; Sczyrba, Alexander; Sessitsch, Angela; Sjoling, Sara; Sorensen, Jan; Sorensen, Soren J.; Tebbe, Christoph C.; Topp, Edward; Tsiamis, George; van Elsas, Jan Dirk; van Keulen, Geertje; Widmer, Franco; Wagner, Michael; Zhang, Tong; Zhang, Xiaojun; Zhao, Liping; Zhu, Yong -Guan; Vogel, Timothy M.; Simonet, Pascal

    2016-02-10

    Here, direct extraction and characterization of microbial community DNA through PCR amplicon surveys and metagenomics has revolutionized the study of environmental microbiology and microbial ecology. In particular, metagenomic analysis of nucleic acids provides direct access to the genomes of the “uncultivated majority.” Accelerated by advances in sequencing technology, microbiologists have discovered more novel phyla, classes, genera, and genes from microorganisms in the first decade and a half of the twenty-first century than since these “many very little living animalcules” were first discovered by van Leeuwenhoek (Table 1). The unsurpassed diversity of soils promises continued exploration of a range of industrial, agricultural, and environmental functions. The ability to explore soil microbial communities with increasing capacity offers the highest promise for answering many outstanding who, what, where, when, why, and with whom questions such as: Which microorganisms are linked to which soil habitats? How do microbial abundances change with changing edaphic conditions? How do microbial assemblages interact and influence one another synergistically or antagonistically? What is the full extent of soil microbial diversity, both functionally and phylogenetically? What are the dynamics of microbial communities in space and time? How sensitive are microbial communities to a changing climate? What is the role of horizontal gene transfer in the stability of microbial communities? Do highly diverse microbial communities confer resistance and resilience in soils?

  18. Back to the future of soil metagenomics

    DOE PAGES

    Nesme, Joseph; Achouak, Wafa; Agathos, Spiros N.; ...

    2016-02-10

    Here, direct extraction and characterization of microbial community DNA through PCR amplicon surveys and metagenomics has revolutionized the study of environmental microbiology and microbial ecology. In particular, metagenomic analysis of nucleic acids provides direct access to the genomes of the “uncultivated majority.” Accelerated by advances in sequencing technology, microbiologists have discovered more novel phyla, classes, genera, and genes from microorganisms in the first decade and a half of the twenty-first century than since these “many very little living animalcules” were first discovered by van Leeuwenhoek (Table 1). The unsurpassed diversity of soils promises continued exploration of a range of industrial,more » agricultural, and environmental functions. The ability to explore soil microbial communities with increasing capacity offers the highest promise for answering many outstanding who, what, where, when, why, and with whom questions such as: Which microorganisms are linked to which soil habitats? How do microbial abundances change with changing edaphic conditions? How do microbial assemblages interact and influence one another synergistically or antagonistically? What is the full extent of soil microbial diversity, both functionally and phylogenetically? What are the dynamics of microbial communities in space and time? How sensitive are microbial communities to a changing climate? What is the role of horizontal gene transfer in the stability of microbial communities? Do highly diverse microbial communities confer resistance and resilience in soils?« less

  19. Metagenomic characterization of ambulances across the USA.

    PubMed

    O'Hara, Niamh B; Reed, Harry J; Afshinnekoo, Ebrahim; Harvin, Donell; Caplan, Nora; Rosen, Gail; Frye, Brook; Woloszynek, Stephen; Ounit, Rachid; Levy, Shawn; Butler, Erin; Mason, Christopher E

    2017-09-22

    Microbial communities in our built environments have great influence on human health and disease. A variety of built environments have been characterized using a metagenomics-based approach, including some healthcare settings. However, there has been no study to date that has used this approach in pre-hospital settings, such as ambulances, an important first point-of-contact between patients and hospitals. We sequenced 398 samples from 137 ambulances across the USA using shotgun sequencing. We analyzed these data to explore the microbial ecology of ambulances including characterizing microbial community composition, nosocomial pathogens, patterns of diversity, presence of functional pathways and antimicrobial resistance, and potential spatial and environmental factors that may contribute to community composition. We found that the top 10 most abundant species are either common built environment microbes, microbes associated with the human microbiome (e.g., skin), or are species associated with nosocomial infections. We also found widespread evidence of antimicrobial resistance markers (hits ~ 90% samples). We identified six factors that may influence the microbial ecology of ambulances including ambulance surfaces, geographical-related factors (including region, longitude, and latitude), and weather-related factors (including temperature and precipitation). While the vast majority of microbial species classified were beneficial, we also found widespread evidence of species associated with nosocomial infections and antimicrobial resistance markers. This study indicates that metagenomics may be useful to characterize the microbial ecology of pre-hospital ambulance settings and that more rigorous testing and cleaning of ambulances may be warranted.

  20. International Standards for Genomes, Transcriptomes, and Metagenomes.

    PubMed

    Mason, Christopher E; Afshinnekoo, Ebrahim; Tighe, Scott; Wu, Shixiu; Levy, Shawn

    2017-04-01

    Challenges and biases in preparing, characterizing, and sequencing DNA and RNA can have significant impacts on research in genomics across all kingdoms of life, including experiments in single-cells, RNA profiling, and metagenomics (across multiple genomes). Technical artifacts and contamination can arise at each point of sample manipulation, extraction, sequencing, and analysis. Thus, the measurement and benchmarking of these potential sources of error are of paramount importance as next-generation sequencing (NGS) projects become more global and ubiquitous. Fortunately, a variety of methods, standards, and technologies have recently emerged that improve measurements in genomics and sequencing, from the initial input material to the computational pipelines that process and annotate the data. Here we review current standards and their applications in genomics, including whole genomes, transcriptomes, mixed genomic samples (metagenomes), and the modified bases within each (epigenomes and epitranscriptomes). These standards, tools, and metrics are critical for quantifying the accuracy of NGS methods, which will be essential for robust approaches in clinical genomics and precision medicine.

  1. Fizzy. Feature subset selection for metagenomics

    SciTech Connect

    Ditzler, Gregory; Morrison, J. Calvin; Lan, Yemin; Rosen, Gail L.

    2015-11-04

    Background: Some of the current software tools for comparative metagenomics provide ecologists with the ability to investigate and explore bacterial communities using α– & β–diversity. Feature subset selection – a sub-field of machine learning – can also provide a unique insight into the differences between metagenomic or 16S phenotypes. In particular, feature subset selection methods can obtain the operational taxonomic units (OTUs), or functional features, that have a high-level of influence on the condition being studied. For example, in a previous study we have used information-theoretic feature selection to understand the differences between protein family abundances that best discriminate between age groups in the human gut microbiome. Results: We have developed a new Python command line tool, which is compatible with the widely adopted BIOM format, for microbial ecologists that implements information-theoretic subset selection methods for biological data formats. We demonstrate the software tools capabilities on publicly available datasets. Conclusions: We have made the software implementation of Fizzy available to the public under the GNU GPL license. The standalone implementation can be found at http://github.com/EESI/Fizzy.

  2. The cystic fibrosis lower airways microbial metagenome

    PubMed Central

    Moran Losada, Patricia; Chouvarine, Philippe; Dorda, Marie; Hedtfeld, Silke; Mielke, Samira; Schulz, Angela; Wiehlmann, Lutz

    2016-01-01

    Chronic airway infections determine most morbidity in people with cystic fibrosis (CF). Herein, we present unbiased quantitative data about the frequency and abundance of DNA viruses, archaea, bacteria, moulds and fungi in CF lower airways. Induced sputa were collected on several occasions from children, adolescents and adults with CF. Deep sputum metagenome sequencing identified, on average, approximately 10 DNA viruses or fungi and several hundred bacterial taxa. The metagenome of a CF patient was typically found to be made up of an individual signature of multiple, lowly abundant species superimposed by few disease-associated pathogens, such as Pseudomonas aeruginosa and Staphylococcus aureus, as major components. The host-associated signatures ranged from inconspicuous polymicrobial communities in healthy subjects to low-complexity microbiomes dominated by the typical CF pathogens in patients with advanced lung disease. The DNA virus community in CF lungs mainly consisted of phages and occasionally of human pathogens, such as adeno- and herpesviruses. The S. aureus and P. aeruginosa populations were composed of one major and numerous minor clone types. The rare clones constitute a low copy genetic resource that could rapidly expand as a response to habitat alterations, such as antimicrobial chemotherapy or invasion of novel microbes. PMID:27730195

  3. The cystic fibrosis lower airways microbial metagenome.

    PubMed

    Moran Losada, Patricia; Chouvarine, Philippe; Dorda, Marie; Hedtfeld, Silke; Mielke, Samira; Schulz, Angela; Wiehlmann, Lutz; Tümmler, Burkhard

    2016-04-01

    Chronic airway infections determine most morbidity in people with cystic fibrosis (CF). Herein, we present unbiased quantitative data about the frequency and abundance of DNA viruses, archaea, bacteria, moulds and fungi in CF lower airways. Induced sputa were collected on several occasions from children, adolescents and adults with CF. Deep sputum metagenome sequencing identified, on average, approximately 10 DNA viruses or fungi and several hundred bacterial taxa. The metagenome of a CF patient was typically found to be made up of an individual signature of multiple, lowly abundant species superimposed by few disease-associated pathogens, such as Pseudomonas aeruginosa and Staphylococcus aureus, as major components. The host-associated signatures ranged from inconspicuous polymicrobial communities in healthy subjects to low-complexity microbiomes dominated by the typical CF pathogens in patients with advanced lung disease. The DNA virus community in CF lungs mainly consisted of phages and occasionally of human pathogens, such as adeno- and herpesviruses. The S. aureus and P. aeruginosa populations were composed of one major and numerous minor clone types. The rare clones constitute a low copy genetic resource that could rapidly expand as a response to habitat alterations, such as antimicrobial chemotherapy or invasion of novel microbes.

  4. Fizzy. Feature subset selection for metagenomics

    DOE PAGES

    Ditzler, Gregory; Morrison, J. Calvin; Lan, Yemin; ...

    2015-11-04

    Background: Some of the current software tools for comparative metagenomics provide ecologists with the ability to investigate and explore bacterial communities using α– & β–diversity. Feature subset selection – a sub-field of machine learning – can also provide a unique insight into the differences between metagenomic or 16S phenotypes. In particular, feature subset selection methods can obtain the operational taxonomic units (OTUs), or functional features, that have a high-level of influence on the condition being studied. For example, in a previous study we have used information-theoretic feature selection to understand the differences between protein family abundances that best discriminate betweenmore » age groups in the human gut microbiome. Results: We have developed a new Python command line tool, which is compatible with the widely adopted BIOM format, for microbial ecologists that implements information-theoretic subset selection methods for biological data formats. We demonstrate the software tools capabilities on publicly available datasets. Conclusions: We have made the software implementation of Fizzy available to the public under the GNU GPL license. The standalone implementation can be found at http://github.com/EESI/Fizzy.« less

  5. Identifying personal microbiomes using metagenomic codes.

    PubMed

    Franzosa, Eric A; Huang, Katherine; Meadow, James F; Gevers, Dirk; Lemon, Katherine P; Bohannan, Brendan J M; Huttenhower, Curtis

    2015-06-02

    Community composition within the human microbiome varies across individuals, but it remains unknown if this variation is sufficient to uniquely identify individuals within large populations or stable enough to identify them over time. We investigated this by developing a hitting set-based coding algorithm and applying it to the Human Microbiome Project population. Our approach defined body site-specific metagenomic codes: sets of microbial taxa or genes prioritized to uniquely and stably identify individuals. Codes capturing strain variation in clade-specific marker genes were able to distinguish among 100s of individuals at an initial sampling time point. In comparisons with follow-up samples collected 30-300 d later, ∼30% of individuals could still be uniquely pinpointed using metagenomic codes from a typical body site; coincidental (false positive) matches were rare. Codes based on the gut microbiome were exceptionally stable and pinpointed >80% of individuals. The failure of a code to match its owner at a later time point was largely explained by the loss of specific microbial strains (at current limits of detection) and was only weakly associated with the length of the sampling interval. In addition to highlighting patterns of temporal variation in the ecology of the human microbiome, this work demonstrates the feasibility of microbiome-based identifiability-a result with important ethical implications for microbiome study design. The datasets and code used in this work are available for download from huttenhower.sph.harvard.edu/idability.

  6. [Metagenomics-based detection of swine viruses].

    PubMed

    Han, Wen; Luo, Yuzi; Zhao, Bibo; Sun, Yuan; Li, Su; Qiu, Huaji

    2013-02-04

    Extreme varieties of viruses exist in the environment and animals, some of which are unknown. However, many unknown viruses are barely detected by means of conventional virus isolation and PCR assay. To develop a technology platform for detecting unknown viruses. We established the technology based on viral metagenomics in combination with novel molecular diagnostics. The technology is consisted of removal of host nucleic acid, random PCR amplification, large-scale sequencing, and bioinformatics. The technology was applied to detect classical swine fever virus (CSFV)-infected cells and a tissue sample of a pig infected with porcine circovirus type 2 (PCV2). We amplified 13.7% sequences of CSFV genome and 47.2% those of PCV2 genome, respectively. Moreover, we amplified 16.4% sequences of the simian parainfluenza virus type 5 genome from an unknown virus cell culture using the developed method. In addition, using the developed method combined with the high-throughput sequencing, we detected 1.1% virus sequences, including CSFV, PCV2, torque teno sus virus (TTSuV), porcine bocavirus (PBoV) and human adenovirus type 6 (Ad6) from 7 clinical swine samples of unknown causative agents. The developed metagenomics-based method showed good sensitivity for detection of both DNA and RNA viruses from diverse swine samples, and has potential for universal detection of known and unknown viruses. It might facilitate the diagnosis of emerging viral diseases.

  7. Fizzy: feature subset selection for metagenomics.

    PubMed

    Ditzler, Gregory; Morrison, J Calvin; Lan, Yemin; Rosen, Gail L

    2015-11-04

    Some of the current software tools for comparative metagenomics provide ecologists with the ability to investigate and explore bacterial communities using α- & β-diversity. Feature subset selection--a sub-field of machine learning--can also provide a unique insight into the differences between metagenomic or 16S phenotypes. In particular, feature subset selection methods can obtain the operational taxonomic units (OTUs), or functional features, that have a high-level of influence on the condition being studied. For example, in a previous study we have used information-theoretic feature selection to understand the differences between protein family abundances that best discriminate between age groups in the human gut microbiome. We have developed a new Python command line tool, which is compatible with the widely adopted BIOM format, for microbial ecologists that implements information-theoretic subset selection methods for biological data formats. We demonstrate the software tools capabilities on publicly available datasets. We have made the software implementation of Fizzy available to the public under the GNU GPL license. The standalone implementation can be found at http://github.com/EESI/Fizzy.

  8. Genovo: De Novo Assembly for Metagenomes

    NASA Astrophysics Data System (ADS)

    Laserson, Jonathan; Jojic, Vladimir; Koller, Daphne

    Next-generation sequencing technologies produce a large number of noisy reads from the DNA in a sample. Metagenomics and population sequencing aim to recover the genomic sequences of the species in the sample, which could be of high diversity. Methods geared towards single sequence reconstruction are not sensitive enough when applied in this setting. We introduce a generative probabilistic model of read generation from environmental samples and present Genovo, a novel de novo sequence assembler that discovers likely sequence reconstructions under the model. A Chinese restaurant process prior accounts for the unknown number of genomes in the sample. Inference is made by applying a series of hill-climbing steps iteratively until convergence. We compare the performance of Genovo to three other short read assembly programs across one synthetic dataset and eight metagenomic datasets created using the 454 platform, the largest of which has 311k reads. Genovo's reconstructions cover more bases and recover more genes than the other methods, and yield a higher assembly score.

  9. Size Does Matter: Application-driven Approaches for Soil Metagenomics

    PubMed Central

    Kakirde, Kavita S.; Parsley, Larissa C.; Liles, Mark R.

    2010-01-01

    Metagenomic analyses can provide extensive information on the structure, composition, and predicted gene functions of diverse environmental microbial assemblages. Each environment presents its own unique challenges to metagenomic investigation and requires a specifically designed approach to accommodate physicochemical and biotic factors unique to each environment that can pose technical hurdles and/or bias the metagenomic analyses. In particular, soils harbor an exceptional diversity of prokaryotes that are largely undescribed beyond the level of ribotype and are a potentially vast resource for natural product discovery. The successful application of a soil metagenomic approach depends on selecting the appropriate DNA extraction, purification, and if necessary, cloning methods for the intended downstream analyses. The most important technical considerations in a metagenomic study include obtaining a sufficient yield of high-purity DNA representing the targeted microorganisms within an environmental sample or enrichment and (if required) constructing a metagenomic library in a suitable vector and host. Size does matter in the context of the average insert size within a clone library or the sequence read length for a high-throughput sequencing approach. It is also imperative to select the appropriate metagenomic screening strategy to address the specific question(s) of interest, which should drive the selection of methods used in the earlier stages of a metagenomic project (e.g., DNA size, to clone or not to clone). Here, we present both the promising and problematic nature of soil metagenomics and discuss the factors that should be considered when selecting soil sampling, DNA extraction, purification, and cloning methods to implement based on the ultimate study objectives. PMID:21076656

  10. Evaluation of the Cow Rumen Metagenome; Assembly by Single Copy Gene Analysis and Single Cell Genome Assemblies(Metagenomics Informatics Challenges Workshop: 10K Genomes at a Time)

    ScienceCinema

    Sczyrba, Alex [DOE JGI

    2016-07-12

    DOE JGI's Alex Sczyrba on "Evaluation of the Cow Rumen Metagenome" and "Assembly by Single Copy Gene Analysis and Single Cell Genome Assemblies" at the Metagenomics Informatics Challenges Workshop held at the DOE JGI on October 12-13, 2011.

  11. Effective Analysis of NGS Metagenomic Data with Ultra-Fast Clustering Algorithms (MICW - Metagenomics Informatics Challenges Workshop: 10K Genomes at a Time)

    ScienceCinema

    Li, Weizhong [San Diego Supercomputer Center

    2016-07-12

    San Diego Supercomputer Center's Weizhong Li on "Effective Analysis of NGS Metagenomic Data with Ultra-fast Clustering Algorithms" at the Metagenomics Informatics Challenges Workshop held at the DOE JGI on October 12-13, 2011.

  12. MetaVelvet: An Extension of Velvet Assembler to de novo Metagenome Assembly from Short Sequence Reads (Metagenomics Informatics Challenges Workshop: 10K Genomes at a Time)

    ScienceCinema

    Sakakibara, Yasumbumi [Keio University

    2016-07-12

    Keio University's Yasumbumi Sakakibara on "MetaVelvet: An Extension of Velvet Assembler to de novo Metagenome Assembly from Short Sequence Reads" at the Metagenomics Informatics Challenges Workshop held at the DOE JGI on October 12-13, 2011.

  13. MetaVelvet: An Extension of Velvet Assembler to de novo Metagenome Assembly from Short Sequence Reads (Metagenomics Informatics Challenges Workshop: 10K Genomes at a Time)

    SciTech Connect

    Sakakibara, Yasumbumi

    2011-10-13

    Keio University's Yasumbumi Sakakibara on "MetaVelvet: An Extension of Velvet Assembler to de novo Metagenome Assembly from Short Sequence Reads" at the Metagenomics Informatics Challenges Workshop held at the DOE JGI on October 12-13, 2011.

  14. Effective Analysis of NGS Metagenomic Data with Ultra-Fast Clustering Algorithms (MICW - Metagenomics Informatics Challenges Workshop: 10K Genomes at a Time)

    SciTech Connect

    Li, Weizhong

    2011-10-12

    San Diego Supercomputer Center's Weizhong Li on "Effective Analysis of NGS Metagenomic Data with Ultra-fast Clustering Algorithms" at the Metagenomics Informatics Challenges Workshop held at the DOE JGI on October 12-13, 2011.

  15. Evaluation of the Cow Rumen Metagenome; Assembly by Single Copy Gene Analysis and Single Cell Genome Assemblies(Metagenomics Informatics Challenges Workshop: 10K Genomes at a Time)

    SciTech Connect

    Sczyrba, Alex

    2011-10-13

    DOE JGI's Alex Sczyrba on "Evaluation of the Cow Rumen Metagenome" and "Assembly by Single Copy Gene Analysis and Single Cell Genome Assemblies" at the Metagenomics Informatics Challenges Workshop held at the DOE JGI on October 12-13, 2011.

  16. Enhancing metagenomics investigations of microbial interactions with biofilm technology.

    PubMed

    McLean, Robert J C; Kakirde, Kavita S

    2013-11-11

    Investigations of microbial ecology and diversity have been greatly enhanced by the application of culture-independent techniques. One such approach, metagenomics, involves sample collections from soil, water, and other environments. Extracted nucleic acids from bulk environmental samples are sequenced and analyzed, which allows microbial interactions to be inferred on the basis of bioinformatics calculations. In most environments, microbial interactions occur predominately in surface-adherent, biofilm communities. In this review, we address metagenomics sampling and biofilm biology, and propose an experimental strategy whereby the resolving power of metagenomics can be enhanced by incorporating a biofilm-enrichment step during sample acquisition.

  17. Bioprospecting potential of the soil metagenome: novel enzymes and bioactivities.

    PubMed

    Lee, Myung Hwan; Lee, Seon-Woo

    2013-09-01

    The microbial diversity in soil ecosystems is higher than in any other microbial ecosystem. The majority of soil microorganisms has not been characterized, because the dominant members have not been readily culturable on standard cultivation media; therefore, the soil ecosystem is a great reservoir for the discovery of novel microbial enzymes and bioactivities. The soil metagenome, the collective microbial genome, could be cloned and sequenced directly from soils to search for novel microbial resources. This review summarizes the microbial diversity in soils and the efforts to search for microbial resources from the soil metagenome, with more emphasis on the potential of bioprospecting metagenomics and recent discoveries.

  18. Activity-Based Screening of Metagenomic Libraries for Hydrogenase Enzymes.

    PubMed

    Adam, Nicole; Perner, Mirjam

    2017-01-01

    Here we outline how to identify hydrogenase enzymes from metagenomic libraries through an activity-based screening approach. A metagenomic fosmid library is constructed in E. coli and the fosmids are transferred into a hydrogenase deletion mutant of Shewanella oneidensis (ΔhyaB) via triparental mating. If a fosmid exhibits hydrogen uptake activity, S. oneidensis' phenotype is restored and hydrogenase activity is indicated by a color change of the medium from yellow to colorless. This new method enables screening of 48 metagenomic fosmid clones in parallel.

  19. MGmapper: Reference based mapping and taxonomy annotation of metagenomics sequence reads.

    PubMed

    Petersen, Thomas Nordahl; Lukjancenko, Oksana; Thomsen, Martin Christen Frølund; Maddalena Sperotto, Maria; Lund, Ole; Møller Aarestrup, Frank; Sicheritz-Pontén, Thomas

    2017-01-01

    An increasing amount of species and gene identification studies rely on the use of next generation sequence analysis of either single isolate or metagenomics samples. Several methods are available to perform taxonomic annotations and a previous metagenomics benchmark study has shown that a vast number of false positive species annotations are a problem unless thresholds or post-processing are applied to differentiate between correct and false annotations. MGmapper is a package to process raw next generation sequence data and perform reference based sequence assignment, followed by a post-processing analysis to produce reliable taxonomy annotation at species and strain level resolution. An in-vitro bacterial mock community sample comprised of 8 genuses, 11 species and 12 strains was previously used to benchmark metagenomics classification methods. After applying a post-processing filter, we obtained 100% correct taxonomy assignments at species and genus level. A sensitivity and precision at 75% was obtained for strain level annotations. A comparison between MGmapper and Kraken at species level, shows MGmapper assigns taxonomy at species level using 84.8% of the sequence reads, compared to 70.5% for Kraken and both methods identified all species with no false positives. Extensive read count statistics are provided in plain text and excel sheets for both rejected and accepted taxonomy annotations. The use of custom databases is possible for the command-line version of MGmapper, and the complete pipeline is freely available as a bitbucked package (https://bitbucket.org/genomicepidemiology/mgmapper). A web-version (https://cge.cbs.dtu.dk/services/MGmapper) provides the basic functionality for analysis of small fastq datasets.

  20. Machine Learning Meta-analysis of Large Metagenomic Datasets: Tools and Biological Insights

    PubMed Central

    Pasolli, Edoardo; Truong, Duy Tin; Malik, Faizan; Waldron, Levi

    2016-01-01

    Shotgun metagenomic analysis of the human associated microbiome provides a rich set of microbial features for prediction and biomarker discovery in the context of human diseases and health conditions. However, the use of such high-resolution microbial features presents new challenges, and validated computational tools for learning tasks are lacking. Moreover, classification rules have scarcely been validated in independent studies, posing questions about the generality and generalization of disease-predictive models across cohorts. In this paper, we comprehensively assess approaches to metagenomics-based prediction tasks and for quantitative assessment of the strength of potential microbiome-phenotype associations. We develop a computational framework for prediction tasks using quantitative microbiome profiles, including species-level relative abundances and presence of strain-specific markers. A comprehensive meta-analysis, with particular emphasis on generalization across cohorts, was performed in a collection of 2424 publicly available metagenomic samples from eight large-scale studies. Cross-validation revealed good disease-prediction capabilities, which were in general improved by feature selection and use of strain-specific markers instead of species-level taxonomic abundance. In cross-study analysis, models transferred between studies were in some cases less accurate than models tested by within-study cross-validation. Interestingly, the addition of healthy (control) samples from other studies to training sets improved disease prediction capabilities. Some microbial species (most notably Streptococcus anginosus) seem to characterize general dysbiotic states of the microbiome rather than connections with a specific disease. Our results in modelling features of the “healthy” microbiome can be considered a first step toward defining general microbial dysbiosis. The software framework, microbiome profiles, and metadata for thousands of samples are publicly

  1. Machine Learning Meta-analysis of Large Metagenomic Datasets: Tools and Biological Insights.

    PubMed

    Pasolli, Edoardo; Truong, Duy Tin; Malik, Faizan; Waldron, Levi; Segata, Nicola

    2016-07-01

    Shotgun metagenomic analysis of the human associated microbiome provides a rich set of microbial features for prediction and biomarker discovery in the context of human diseases and health conditions. However, the use of such high-resolution microbial features presents new challenges, and validated computational tools for learning tasks are lacking. Moreover, classification rules have scarcely been validated in independent studies, posing questions about the generality and generalization of disease-predictive models across cohorts. In this paper, we comprehensively assess approaches to metagenomics-based prediction tasks and for quantitative assessment of the strength of potential microbiome-phenotype associations. We develop a computational framework for prediction tasks using quantitative microbiome profiles, including species-level relative abundances and presence of strain-specific markers. A comprehensive meta-analysis, with particular emphasis on generalization across cohorts, was performed in a collection of 2424 publicly available metagenomic samples from eight large-scale studies. Cross-validation revealed good disease-prediction capabilities, which were in general improved by feature selection and use of strain-specific markers instead of species-level taxonomic abundance. In cross-study analysis, models transferred between studies were in some cases less accurate than models tested by within-study cross-validation. Interestingly, the addition of healthy (control) samples from other studies to training sets improved disease prediction capabilities. Some microbial species (most notably Streptococcus anginosus) seem to characterize general dysbiotic states of the microbiome rather than connections with a specific disease. Our results in modelling features of the "healthy" microbiome can be considered a first step toward defining general microbial dysbiosis. The software framework, microbiome profiles, and metadata for thousands of samples are publicly

  2. Tales from the crypt and coral reef: the successes and challenges of identifying new herpesviruses using metagenomics

    PubMed Central

    Houldcroft, Charlotte J.; Breuer, Judith

    2015-01-01

    Herpesviruses are ubiquitous double-stranded DNA viruses infecting many animals, with the capacity to cause disease in both immunocompetent and immunocompromised hosts. Different herpesviruses have different cell tropisms, and have been detected in a diverse range of tissues and sample types. Metagenomics—encompassing viromics—analyses the nucleic acid of a tissue or other sample in an unbiased manner, making few or no prior assumptions about which viruses may be present in a sample. This approach has successfully discovered a number of novel herpesviruses. Furthermore, metagenomic analysis can identify herpesviruses with high degrees of sequence divergence from known herpesviruses and does not rely upon culturing large quantities of viral material. Metagenomics has had success in two areas of herpesvirus sequencing: firstly, the discovery of novel exogenous and endogenous herpesviruses in primates, bats and cnidarians; and secondly, in characterizing large areas of the genomes of herpesviruses previously only known from small fragments, revealing unexpected diversity. This review will discuss the successes and challenges of using metagenomics to identify novel herpesviruses, and future directions within the field. PMID:25821447

  3. Parton fragmentation functions

    NASA Astrophysics Data System (ADS)

    Metz, A.; Vossen, A.

    2016-11-01

    The field of fragmentation functions of light quarks and gluons is reviewed. In addition to integrated fragmentation functions, attention is paid to the dependence of fragmentation functions on transverse momenta and on polarization degrees of freedom. Higher-twist and di-hadron fragmentation functions are considered as well. Moreover, the review covers both theoretical and experimental developments in hadron production in electron-positron annihilation, deep-inelastic lepton-nucleon scattering, and proton-proton collisions.

  4. Critical Assessment of Metagenome Interpretation-a benchmark of metagenomics software.

    PubMed

    Sczyrba, Alexander; Hofmann, Peter; Belmann, Peter; Koslicki, David; Janssen, Stefan; Dröge, Johannes; Gregor, Ivan; Majda, Stephan; Fiedler, Jessika; Dahms, Eik; Bremges, Andreas; Fritz, Adrian; Garrido-Oter, Ruben; Jørgensen, Tue Sparholt; Shapiro, Nicole; Blood, Philip D; Gurevich, Alexey; Bai, Yang; Turaev, Dmitrij; DeMaere, Matthew Z; Chikhi, Rayan; Nagarajan, Niranjan; Quince, Christopher; Meyer, Fernando; Balvočiūtė, Monika; Hansen, Lars Hestbjerg; Sørensen, Søren J; Chia, Burton K H; Denis, Bertrand; Froula, Jeff L; Wang, Zhong; Egan, Robert; Don Kang, Dongwan; Cook, Jeffrey J; Deltel, Charles; Beckstette, Michael; Lemaitre, Claire; Peterlongo, Pierre; Rizk, Guillaume; Lavenier, Dominique; Wu, Yu-Wei; Singer, Steven W; Jain, Chirag; Strous, Marc; Klingenberg, Heiner; Meinicke, Peter; Barton, Michael D; Lingner, Thomas; Lin, Hsin-Hung; Liao, Yu-Chieh; Silva, Genivaldo Gueiros Z; Cuevas, Daniel A; Edwards, Robert A; Saha, Surya; Piro, Vitor C; Renard, Bernhard Y; Pop, Mihai; Klenk, Hans-Peter; Göker, Markus; Kyrpides, Nikos C; Woyke, Tanja; Vorholt, Julia A; Schulze-Lefert, Paul; Rubin, Edward M; Darling, Aaron E; Rattei, Thomas; McHardy, Alice C

    2017-10-02

    Methods for assembly, taxonomic profiling and binning are key to interpreting metagenome data, but a lack of consensus about benchmarking complicates performance assessment. The Critical Assessment of Metagenome Interpretation (CAMI) challenge has engaged the global developer community to benchmark their programs on highly complex and realistic data sets, generated from ∼700 newly sequenced microorganisms and ∼600 novel viruses and plasmids and representing common experimental setups. Assembly and genome binning programs performed well for species represented by individual genomes but were substantially affected by the presence of related strains. Taxonomic profiling and binning programs were proficient at high taxonomic ranks, with a notable performance decrease below family level. Parameter settings markedly affected performance, underscoring their importance for program reproducibility. The CAMI results highlight current challenges but also provide a roadmap for software selection to answer specific research questions.

  5. Streaming fragment assignment for real-time analysis of sequencing experiments.

    PubMed

    Roberts, Adam; Pachter, Lior

    2013-01-01

    We present eXpress, a software package for efficient probabilistic assignment of ambiguously mapping sequenced fragments. eXpress uses a streaming algorithm with linear run time and constant memory use. It can determine abundances of sequenced molecules in real time and can be applied to ChIP-seq, metagenomics and other large-scale sequencing data. We demonstrate its use on RNA-seq data and show that eXpress achieves greater efficiency than other quantification methods.

  6. Diversity of microbiota associated with symptomatic and non-symptomatic bacterial wilt-diseased banana plants determined using 16S rRNA metagenome sequencing.

    PubMed

    Suhaimi, Nurul Shamsinah Mohd; Goh, Share-Yuan; Ajam, Noni; Othman, Rofina Yasmin; Chan, Kok-Gan; Thong, Kwai Lin

    2017-08-21

    Banana is one of the most important fruits cultivated in Malaysia, and it provides many health benefits. However, bacterial wilt disease, which attacks bananas, inflicts major losses on the banana industry in Malaysia. To understand the complex interactions of the microbiota of bacterial wilt-diseased banana plants, we first determined the bacterial communities residing in the pseudostems of infected (symptomatic) and diseased-free (non-symptomatic) banana plants. We characterized the associated microorganisms using the targeted 16S rRNA metagenomics sequencing on the Illumina MiSeq platform. Taxonomic classifications revealed 17 and nine known bacterial phyla in the tissues of non-symptomatic and symptomatic plants, respectively. Cyanobacteria and Proteobacteria (accounted for more than 99% of the 16S rRNA gene fragments) were the two most abundant phyla in both plants. The five major genera found in both plant samples were Ralstonia, Sphingomonas, Methylobacterium, Flavobacterium, and Pseudomonas. Ralstonia was more abundant in symptomatic plant (59% out of the entire genera) as compared to those in the non-symptomatic plant (only 36%). Our data revealed that 102 bacterial genera were only assigned to the non-symptomatic plant. Overall, this study indicated that more diverse and abundant microbiota were associated with the non-symptomatic bacterial wilt-diseased banana plant as compared to the symptomatic plant. The higher diversity of endophytic microbiota in the non-symptomatic banana plant could be an indication of pathogen suppression which delayed or prevented the disease expression. This comparative study of the microbiota in the two plant conditions might provide caveats for potential biological control strategies.

  7. Integrating alignment-based and alignment-free sequence similarity measures for biological sequence classification.

    PubMed

    Borozan, Ivan; Watt, Stuart; Ferretti, Vincent

    2015-05-01

    Alignment-based sequence similarity searches, while accurate for some type of sequences, can produce incorrect results when used on more divergent but functionally related sequences that have undergone the sequence rearrangements observed in many bacterial and viral genomes. Here, we propose a classification model that exploits the complementary nature of alignment-based and alignment-free similarity measures with the aim to improve the accuracy with which DNA and protein sequences are characterized. Our model classifies sequences using a combined sequence similarity score calculated by adaptively weighting the contribution of different sequence similarity measures. Weights are determined independently for each sequence in the test set and reflect the discriminatory ability of individual similarity measures in the training set. Because the similarity between some sequences is determined more accurately with one type of measure rather than another, our classifier allows different sets of weights to be associated with different sequences. Using five different similarity measures, we show that our model significantly improves the classification accuracy over the current composition- and alignment-based models, when predicting the taxonomic lineage for both short viral sequence fragments and complete viral sequences. We also show that our model can be used effectively for the classification of reads from a real metagenome dataset as well as protein sequences. All the datasets and the code used in this study are freely available at https://collaborators.oicr.on.ca/vferretti/borozan_csss/csss.html. ivan.borozan@gmail.com Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press.

  8. Integrating alignment-based and alignment-free sequence similarity measures for biological sequence classification

    PubMed Central

    Borozan, Ivan; Watt, Stuart; Ferretti, Vincent

    2015-01-01

    Motivation: Alignment-based sequence similarity searches, while accurate for some type of sequences, can produce incorrect results when used on more divergent but functionally related sequences that have undergone the sequence rearrangements observed in many bacterial and viral genomes. Here, we propose a classification model that exploits the complementary nature of alignment-based and alignment-free similarity measures with the aim to improve the accuracy with which DNA and protein sequences are characterized. Results: Our model classifies sequences using a combined sequence similarity score calculated by adaptively weighting the contribution of different sequence similarity measures. Weights are determined independently for each sequence in the test set and reflect the discriminatory ability of individual similarity measures in the training set. Because the similarity between some sequences is determined more accurately with one type of measure rather than another, our classifier allows different sets of weights to be associated with different sequences. Using five different similarity measures, we show that our model significantly improves the classification accuracy over the current composition- and alignment-based models, when predicting the taxonomic lineage for both short viral sequence fragments and complete viral sequences. We also show that our model can be used effectively for the classification of reads from a real metagenome dataset as well as protein sequences. Availability and implementation: All the datasets and the code used in this study are freely available at https://collaborators.oicr.on.ca/vferretti/borozan_csss/csss.html. Contact: ivan.borozan@gmail.com Supplementary information: Supplementary data are available at Bioinformatics online. PMID:25573913

  9. Fragmentation Analysis - Fundamental Processes

    DTIC Science & Technology

    Wausau quartzite and anorthosite of 3.0 to 3.5 inch size were fragmented in this device. An analysis of the fragment distribution results of the drop...disc-shaped specimens of Wausau quartzite, anorthosite , and Felch marble were then fragmented with the impact pendulum device. Computer programs were

  10. Single Cell and Metagenomic Assemblies: Biology Drives Technical Choices and Goals (Metagenomics Informatics Challenges Workshop: 10K Genomes at a Time)

    SciTech Connect

    Stepanauskas, Ramunas

    2011-10-13

    DOE JGI's Tanja Woyke, chair of the Single Cells and Metagenomes session, delivers an introduction, followed by Bigelow Laboratory's Ramunas Stepanauskas on "Single Cell and Metagenomic Assemblies: Biology Drives Technical Choices and Goals" at the Metagenomics Informatics Challenges Workshop held at the DOE JGI on October 12-13, 2011.

  11. Single Cell and Metagenomic Assemblies: Biology Drives Technical Choices and Goals (Metagenomics Informatics Challenges Workshop: 10K Genomes at a Time)

    ScienceCinema

    Stepanauskas, Ramunas [Bigelow Laboratory

    2016-07-12

    DOE JGI's Tanja Woyke, chair of the Single Cells and Metagenomes session, delivers an introduction, followed by Bigelow Laboratory's Ramunas Stepanauskas on "Single Cell and Metagenomic Assemblies: Biology Drives Technical Choices and Goals" at the Metagenomics Informatics Challenges Workshop held at the DOE JGI on October 12-13, 2011.

  12. Metagenomic discovery of novel enzymes and biosurfactants in a slaughterhouse biofilm microbial community.

    PubMed

    Thies, Stephan; Rausch, Sonja Christina; Kovacic, Filip; Schmidt-Thaler, Alexandra; Wilhelm, Susanne; Rosenau, Frank; Daniel, Rolf; Streit, Wolfgang; Pietruszka, Jörg; Jaeger, Karl-Erich

    2016-06-08

    DNA derived from environmental samples is a rich source of novel bioactive molecules. The choice of the habitat to be sampled predefines the properties of the biomolecules to be discovered due to the physiological adaptation of the microbial community to the prevailing environmental conditions. We have constructed a metagenomic library in Escherichia coli DH10b with environmental DNA (eDNA) isolated from the microbial community of a slaughterhouse drain biofilm consisting mainly of species from the family Flavobacteriaceae. By functional screening of this library we have identified several lipases, proteases and two clones (SA343 and SA354) with biosurfactant and hemolytic activities. Sequence analysis of the respective eDNA fragments and subsequent structure homology modelling identified genes encoding putative N-acyl amino acid synthases with a unique two-domain organisation. The produced biosurfactants were identified by NMR spectroscopy as N-acyltyrosines with N-myristoyltyrosine as the predominant species. Critical micelle concentration and reduction of surface tension were similar to those of chemically synthesised N-myristoyltyrosine. Furthermore, we showed that the newly isolated N-acyltyrosines exhibit antibiotic activity against various bacteria. This is the first report describing the successful application of functional high-throughput screening assays for the identification of biosurfactant producing clones within a metagenomic library.

  13. False-positive results in metagenomic virus discovery: a strong case for follow-up diagnosis.

    PubMed

    Rosseel, T; Pardon, B; De Clercq, K; Ozhelvaci, O; Van Borm, S

    2014-08-01

    A viral metagenomic approach using virion enrichment, random amplification and next-generation sequencing was used to investigate an undiagnosed cluster of dairy cattle presenting with high persistent fever, unresponsive to anti-microbial and anti-inflammatory treatment, diarrhoea and redness of nose and teat. Serum and whole blood samples were taken in the predicted hyperviraemic state of an animal that a few days later presented with these clinical signs. Bioinformatics analysis of the resulting data from the DNA virus identification workflow (a total of 32 757 sequences with average read length 335 bases) initially demonstrated the presence of parvovirus-like sequences in the tested blood sample. Thorough follow-up using specific real-time RT-PCR assays targeting the detected sequence fragments confirmed the presence of these sequences in the original sample as well as in a sample of an additional animal, but a contamination with an identical genetic signature in negative extraction controls was demonstrated. Further investigation using an alternative extraction method identified a contamination of the originally used Qiagen extraction columns with parvovirus-like nucleic acids or virus particles. Although we did not find any relevant virus that could be associated with the disease, these observations clearly illustrate the importance of using a proper control strategy and follow-up diagnostic tests in any viral metagenomic study.

  14. Identification of a new gene encoding EPSPs with high glyphosate resistance from the metagenomic library.

    PubMed

    Jin, Dan; Lu, Wei; Ping, Shuzhen; Zhang, Wei; Chen, Jian; Dun, Baoqing; Ma, Ruiqiang; Zhao, Zhonglin; Sha, Jiying; Li, Liang; Yang, Zhirong; Chen, Ming; Lin, Min

    2007-10-01

    Glyphosate, a powerful nonselective herbicide, acts as an inhibitor of the activity of the enzyme 5-enoylpyruvylshikimate-3-phosphate synthase (EPSPS) encoded by the aroA gene involved in aromatic amino acid biosynthesis. An Escherichia coli mutant AKM4188 was constructed by insertion a kanamycin cassette within the aroA coding sequence. The mutant strain is an aromatic amino acids auxotroph and fails to grow on M9 minimal media due to the inactive aroA. A DNA metagenomic library was constructed with samples from a glyphosate-polluted area and was screened by using the mutant AKM4188 as recipient. Three plasmid clones, which restored growth to the aroA mutant in M9 minimal media supplemented with chloramphenicol, kanamycin, and 50 mM: glyphosate, were obtained from the DNA metagenomic library. One of them, which conferred glyphosate tolerance up to 150 mM: , was further characterized. The cloned fragment encoded a polypeptide, designated RD, sharing high similarity with other Class II EPSPS proteins. A His-tagged RD fusion protein was produced into E. coli to characterize the enzymatic properties of the RD EPSP protein.

  15. Metagenomic Analysis of Apple Orchard Soil Reveals Antibiotic Resistance Genes Encoding Predicted Bifunctional Proteins▿

    PubMed Central

    Donato, Justin J.; Moe, Luke A.; Converse, Brandon J.; Smart, Keith D.; Berklein, Flora C.; McManus, Patricia S.; Handelsman, Jo

    2010-01-01

    To gain insight into the diversity and origins of antibiotic resistance genes, we identified resistance genes in the soil in an apple orchard using functional metagenomics, which involves inserting large fragments of foreign DNA into Escherichia coli and assaying the resulting clones for expressed functions. Among 13 antibiotic-resistant clones, we found two genes that encode bifunctional proteins. One predicted bifunctional protein confers resistance to ceftazidime and contains a natural fusion between a predicted transcriptional regulator and a β-lactamase. Sequence analysis of the entire metagenomic clone encoding the predicted bifunctional β-lactamase revealed a gene potentially involved in chloramphenicol resistance as well as a predicted transposase. A second clone that encodes a predicted bifunctional protein confers resistance to kanamycin and contains an aminoglycoside acetyltransferase domain fused to a second acetyltransferase domain that, based on nucleotide sequence, was predicted not to be involved in antibiotic resistance. This is the first report of a transcriptional regulator fused to a β-lactamase and of an aminoglycoside acetyltransferase fused to an acetyltransferase not involved in antibiotic resistance. PMID:20453147

  16. Metagenomic discovery of novel enzymes and biosurfactants in a slaughterhouse biofilm microbial community

    PubMed Central

    Thies, Stephan; Rausch, Sonja Christina; Kovacic, Filip; Schmidt-Thaler, Alexandra; Wilhelm, Susanne; Rosenau, Frank; Daniel, Rolf; Streit, Wolfgang; Pietruszka, Jörg; Jaeger, Karl-Erich

    2016-01-01

    DNA derived from environmental samples is a rich source of novel bioactive molecules. The choice of the habitat to be sampled predefines the properties of the biomolecules to be discovered due to the physiological adaptation of the microbial community to the prevailing environmental conditions. We have constructed a metagenomic library in Escherichia coli DH10b with environmental DNA (eDNA) isolated from the microbial community of a slaughterhouse drain biofilm consisting mainly of species from the family Flavobacteriaceae. By functional screening of this library we have identified several lipases, proteases and two clones (SA343 and SA354) with biosurfactant and hemolytic activities. Sequence analysis of the respective eDNA fragments and subsequent structure homology modelling identified genes encoding putative N-acyl amino acid synthases with a unique two-domain organisation. The produced biosurfactants were identified by NMR spectroscopy as N-acyltyrosines with N-myristoyltyrosine as the predominant species. Critical micelle concentration and reduction of surface tension were similar to those of chemically synthesised N-myristoyltyrosine. Furthermore, we showed that the newly isolated N-acyltyrosines exhibit antibiotic activity against various bacteria. This is the first report describing the successful application of functional high-throughput screening assays for the identification of biosurfactant producing clones within a metagenomic library. PMID:27271534

  17. Metagenomics of an Alkaline Hot Spring in Galicia (Spain): Microbial Diversity Analysis and Screening for Novel Lipolytic Enzymes.

    PubMed

    López-López, Olalla; Knapik, Kamila; Cerdán, Maria-Esperanza; González-Siso, María-Isabel

    2015-01-01

    A fosmid library was constructed with the metagenomic DNA from the water of the Lobios hot spring (76°C, pH = 8.2) located in Ourense (Spain). Metagenomic sequencing of the fosmid library allowed the assembly of 9722 contigs ranging in size from 500 to 56,677 bp and spanning ~18 Mbp. 23,207 ORFs (Open Reading Frames) were predicted from the assembly. Biodiversity was explored by taxonomic classification and it revealed that bacteria were predominant, while the archaea were less abundant. The six most abundant bacterial phyla were Deinococcus-Thermus, Proteobacteria, Firmicutes, Acidobacteria, Aquificae, and Chloroflexi. Within the archaeal superkingdom, the phylum Thaumarchaeota was predominant with the dominant species "Candidatus Caldiarchaeum subterraneum." Functional classification revealed the genes associated to one-carbon metabolism as the most abundant. Both taxonomic and functional classifications showed a mixture of different microbial metabolic patterns: aerobic and anaerobic, chemoorganotrophic and chemolithotrophic, autotrophic and heterotrophic. Remarkably, the presence of genes encoding enzymes with potential biotechnological interest, such as xylanases, galactosidases, proteases, and lipases, was also revealed in the metagenomic library. Functional screening of this library was subsequently done looking for genes encoding lipolytic enzymes. Six genes conferring lipolytic activity were identified and one was cloned and characterized. This gene was named LOB4Est and it was expressed in a yeast mesophilic host. LOB4Est codes for a novel esterase of family VIII, with sequence similarity to β-lactamases, but with unusual wide substrate specificity. When the enzyme was purified from the mesophilic host it showed half-life of 1 h and 43 min at 50°C, and maximal activity at 40°C and pH 7.5 with p-nitrophenyl-laurate as substrate. Interestingly, the enzyme retained more than 80% of maximal activity in a broad range of pH from 6.5 to 8.

  18. Metagenomics of an Alkaline Hot Spring in Galicia (Spain): Microbial Diversity Analysis and Screening for Novel Lipolytic Enzymes

    PubMed Central

    López-López, Olalla; Knapik, Kamila; Cerdán, Maria-Esperanza; González-Siso, María-Isabel

    2015-01-01

    A fosmid library was constructed with the metagenomic DNA from the water of the Lobios hot spring (76°C, pH = 8.2) located in Ourense (Spain). Metagenomic sequencing of the fosmid library allowed the assembly of 9722 contigs ranging in size from 500 to 56,677 bp and spanning ~18 Mbp. 23,207 ORFs (Open Reading Frames) were predicted from the assembly. Biodiversity was explored by taxonomic classification and it revealed that bacteria were predominant, while the archaea were less abundant. The six most abundant bacterial phyla were Deinococcus-Thermus, Proteobacteria, Firmicutes, Acidobacteria, Aquificae, and Chloroflexi. Within the archaeal superkingdom, the phylum Thaumarchaeota was predominant with the dominant species “Candidatus Caldiarchaeum subterraneum.” Functional classification revealed the genes associated to one-carbon metabolism as the most abundant. Both taxonomic and functional classifications showed a mixture of different microbial metabolic patterns: aerobic and anaerobic, chemoorganotrophic and chemolithotrophic, autotrophic and heterotrophic. Remarkably, the presence of genes encoding enzymes with potential biotechnological interest, such as xylanases, galactosidases, proteases, and lipases, was also revealed in the metagenomic library. Functional screening of this library was subsequently done looking for genes encoding lipolytic enzymes. Six genes conferring lipolytic activity were identified and one was cloned and characterized. This gene was named LOB4Est and it was expressed in a yeast mesophilic host. LOB4Est codes for a novel esterase of family VIII, with sequence similarity to β-lactamases, but with unusual wide substrate specificity. When the enzyme was purified from the mesophilic host it showed half-life of 1 h and 43 min at 50°C, and maximal activity at 40°C and pH 7.5 with p-nitrophenyl-laurate as substrate. Interestingly, the enzyme retained more than 80% of maximal activity in a broad range of pH from 6.5 to 8. PMID:26635759

  19. RNA Viral Metagenome of Whiteflies Leads to the Discovery and Characterization of a Whitefly-Transmitted Carlavirus in North America

    PubMed Central

    Rosario, Karyna; Capobianco, Heather; Ng, Terry Fei Fan; Breitbart, Mya; Polston, Jane E.

    2014-01-01

    Whiteflies from the Bemisia tabaci species complex have the ability to transmit a large number of plant viruses and are some of the most detrimental pests in agriculture. Although whiteflies are known to transmit both DNA and RNA viruses, most of the diversity has been recorded for the former, specifically for the Begomovirus genus. This study investigated the total diversity of DNA and RNA viruses found in whiteflies collected from a single site in Florida to evaluate if there are additional, previously undetected viral types within the B. tabaci vector. Metagenomic analysis of viral DNA extracted from the whiteflies only resulted in the detection of begomoviruses. In contrast, whiteflies contained sequences similar to RNA viruses from divergent groups, with a diversity that extends beyond currently described viruses. The metagenomic analysis of whiteflies also led to the first report of a whitefly-transmitted RNA virus similar to Cowpea mild mottle virus (CpMMV Florida) (genus Carlavirus) in North America. Further investigation resulted in the detection of CpMMV Florida in native and cultivated plants growing near the original field site of whitefly collection and determination of its experimental host range. Analysis of complete CpMMV Florida genomes recovered from whiteflies and plants suggests that the current classification criteria for carlaviruses need to be reevaluated. Overall, metagenomic analysis supports that DNA plant viruses carried by B. tabaci are dominated by begomoviruses, whereas significantly less is known about RNA viruses present in this damaging insect vector. PMID:24466220

  20. Comparative Metagenomics of Freshwater Microbial Communities

    SciTech Connect

    Hemme, Chris; Deng, Ye; Tu, Qichao; Fields, Matthew; Gentry, Terry; Wu, Liyou; Tringe, Susannah; Watson, David; He, Zhili; Hazen, Terry; Tiedje, James; Rubin, Eddy; Zhou, Jizhong

    2010-05-17

    Previous analyses of a microbial metagenome from uranium and nitric-acid contaminated groundwater (FW106) showed significant environmental effects resulting from the rapid introduction of multiple contaminants. Effects include a massive loss of species and strain biodiversity, accumulation of toxin resistant genes in the metagenome and lateral transfer of toxin resistance genes between community members. To better understand these results in an ecological context, a second metagenome from a pristine groundwater system located along the same geological strike was sequenced and analyzed (FW301). It is hypothesized that FW301 approximates the ancestral FW106 community based on phylogenetic profiles and common geological parameters; however, even if is not the case, the datasets still permit comparisons between healthy and stressed groundwater ecosystems. Complex carbohydrate metabolism has been almost entirely lost in the stressed ecosystem. In contrast, the pristine system encodes a wide diversity of complex carbohydrate metabolism systems, suggesting that carbon turnover is very rapid and less leaky in the healthy groundwater system. FW301 encodes many (~;;160+) carbon monoxide dehydrogenase genes while FW106 encodes none. This result suggests that the community is frequently exposed to oxygen from aerated rainwater percolating into the subsurface, with a resulting high rate of carbon metabolism and CO production. When oxygen levels fall, the CO then serves as a major carbon source for the community. FW301 appears to be capable of CO2 fixation via the reductive carboxylase (reverse TCA) cycle and possibly acetogenesis, activities; these activities are lacking in the heterotrophic FW106 system which relies exclusively on respiration of nitrate and/or oxygen for energy production. FW301 encodes a complete set of B12 biosynthesis pathway at high abundance suggesting the use of sodium gradients for energy production in the healthy groundwater community. Overall

  1. Quantifying environmental adaptation of metabolic pathways in metagenomics

    PubMed Central

    Gianoulis, Tara A.; Raes, Jeroen; Patel, Prianka V.; Bjornson, Robert; Korbel, Jan O.; Letunic, Ivica; Yamada, Takuji; Paccanaro, Alberto; Jensen, Lars J.; Snyder, Michael; Bork, Peer; Gerstein, Mark B.

    2009-01-01

    Recently, approaches have been developed to sample the genetic content of heterogeneous environments (metagenomics). However, by what means these sequences link distinct environmental conditions with specific biological processes is not well understood. Thus, a major challenge is how the usage of particular pathways and subnetworks reflects the adaptation of microbial communities across environments and habitats—i.e., how network dynamics relates to environmental features. Previous research has treated environments as discrete, somewhat simplified classes (e.g., terrestrial vs. marine), and searched for obvious metabolic differences among them (i.e., treating the analysis as a typical classification problem). However, environmental differences result from combinations of many factors, which often vary only slightly. Therefore, we introduce an approach that employs correlation and regression to relate multiple, continuously varying factors defining an environment to the extent of particular microbial pathways present in a geographic site. Moreover, rather than looking only at individual correlations (one-to-one), we adapted canonical correlation analysis and related techniques to define an ensemble of weighted pathways that maximally covaries with a combination of environmental variables (many-to-many), which we term a metabolic footprint. Applied to available aquatic datasets, we identified footprints predictive of their environment that can potentially be used as biosensors. For example, we show a strong multivariate correlation between the energy-conversion strategies of a community and multiple environmental gradients (e.g., temperature). Moreover, we identified covariation in amino acid transport and cofactor synthesis, suggesting that limiting amounts of cofactor can (partially) explain increased import of amino acids in nutrient-limited conditions. PMID:19164758

  2. Metagenome and Metatranscriptome Analyses Using Protein Family Profiles

    PubMed Central

    Zhong, Cuncong; Yooseph, Shibu

    2016-01-01

    Analyses of metagenome data (MG) and metatranscriptome data (MT) are often challenged by a paucity of complete reference genome sequences and the uneven/low sequencing depth of the constituent organisms in the microbial community, which respectively limit the power of reference-based alignment and de novo sequence assembly. These limitations make accurate protein family classification and abundance estimation challenging, which in turn hamper downstream analyses such as abundance profiling of metabolic pathways, identification of differentially encoded/expressed genes, and de novo reconstruction of complete gene and protein sequences from the protein family of interest. The profile hidden Markov model (HMM) framework enables the construction of very useful probabilistic models for protein families that allow for accurate modeling of position specific matches, insertions, and deletions. We present a novel homology detection algorithm that integrates banded Viterbi algorithm for profile HMM parsing with an iterative simultaneous alignment and assembly computational framework. The algorithm searches a given profile HMM of a protein family against a database of fragmentary MG/MT sequencing data and simultaneously assembles complete or near-complete gene and protein sequences of the protein family. The resulting program, HMM-GRASPx, demonstrates superior performance in aligning and assembling homologs when benchmarked on both simulated marine MG and real human saliva MG datasets. On real supragingival plaque and stool MG datasets that were generated from healthy individuals, HMM-GRASPx accurately estimates the abundances of the antimicrobial resistance (AMR) gene families and enables accurate characterization of the resistome profiles of these microbial communities. For real human oral microbiome MT datasets, using the HMM-GRASPx estimated transcript abundances significantly improves detection of differentially expressed (DE) genes. Finally, HMM-GRASPx was used to

  3. Dirichlet Multinomial Mixtures: Generative Models for Microbial Metagenomics

    PubMed Central

    Holmes, Ian; Harris, Keith; Quince, Christopher

    2012-01-01

    We introduce Dirichlet multinomial mixtures (DMM) for the probabilistic modelling of microbial metagenomics data. This data can be represented as a frequency matrix giving the number of times each taxa is observed in each sample. The samples have different size, and the matrix is sparse, as communities are diverse and skewed to rare taxa. Most methods used previously to classify or cluster samples have ignored these features. We describe each community by a vector of taxa probabilities. These vectors are generated from one of a finite number of Dirichlet mixture components each with different hyperparameters. Observed samples are generated through multinomial sampling. The mixture components cluster communities into distinct ‘metacommunities’, and, hence, determine envirotypes or enterotypes, groups of communities with a similar composition. The model can also deduce the impact of a treatment and be used for classification. We wrote software for the fitting of DMM models using the ‘evidence framework’ (http://code.google.com/p/microbedmm/). This includes the Laplace approximation of the model evidence. We applied the DMM model to human gut microbe genera frequencies from Obese and Lean twins. From the model evidence four clusters fit this data best. Two clusters were dominated by Bacteroides and were homogenous; two had a more variable community composition. We could not find a significant impact of body mass on community structure. However, Obese twins were more likely to derive from the high variance clusters. We propose that obesity is not associated with a distinct microbiota but increases the chance that an individual derives from a disturbed enterotype. This is an example of the ‘Anna Karenina principle (AKP)’ applied to microbial communities: disturbed states having many more configurations than undisturbed. We verify this by showing that in a study of inflammatory bowel disease (IBD) phenotypes, ileal Crohn's disease (ICD) is associated with a

  4. Dirichlet multinomial mixtures: generative models for microbial metagenomics.

    PubMed

    Holmes, Ian; Harris, Keith; Quince, Christopher

    2012-01-01

    We introduce Dirichlet multinomial mixtures (DMM) for the probabilistic modelling of microbial metagenomics data. This data can be represented as a frequency matrix giving the number of times each taxa is observed in each sample. The samples have different size, and the matrix is sparse, as communities are diverse and skewed to rare taxa. Most methods used previously to classify or cluster samples have ignored these features. We describe each community by a vector of taxa probabilities. These vectors are generated from one of a finite number of Dirichlet mixture components each with different hyperparameters. Observed samples are generated through multinomial sampling. The mixture components cluster communities into distinct 'metacommunities', and, hence, determine envirotypes or enterotypes, groups of communities with a similar composition. The model can also deduce the impact of a treatment and be used for classification. We wrote software for the fitting of DMM models using the 'evidence framework' (http://code.google.com/p/microbedmm/). This includes the Laplace approximation of the model evidence. We applied the DMM model to human gut microbe genera frequencies from Obese and Lean twins. From the model evidence four clusters fit this data best. Two clusters were dominated by Bacteroides and were homogenous; two had a more variable community composition. We could not find a significant impact of body mass on community structure. However, Obese twins were more likely to derive from the high variance clusters. We propose that obesity is not associated with a distinct microbiota but increases the chance that an individual derives from a disturbed enterotype. This is an example of the 'Anna Karenina principle (AKP)' applied to microbial communities: disturbed states having many more configurations than undisturbed. We verify this by showing that in a study of inflammatory bowel disease (IBD) phenotypes, ileal Crohn's disease (ICD) is associated with a more variable

  5. Selectable fragmentation warhead

    SciTech Connect

    Bryan, C.S.; Paisley, D.L.; Montoya, N.I.; Stahl, D.B.

    1992-12-31

    This report discusses a selectable fragmentation warhead which is capable of producing a predetermined number of fragments from a metal plate, and accelerating the fragments toward a target. A first explosive located adjacent to the plate is detonated at selected number of points by laser-driven slapper detonators. In one embodiment, a smoother-disk and a second explosive, located adjacent to the first explosive, serve to increase acceleration of the fragments toward a target. The ability to produce a selected number of fragments allows for effective destruction of a chosen target.

  6. Selectable fragmentation warhead

    DOEpatents

    Bryan, Courtney S.; Paisley, Dennis L.; Montoya, Nelson I.; Stahl, David B.

    1993-01-01

    A selectable fragmentation warhead capable of producing a predetermined number of fragments from a metal plate, and accelerating the fragments toward a target. A first explosive located adjacent to the plate is detonated at selected number of points by laser-driven slapper detonators. In one embodiment, a smoother-disk and a second explosive, located adjacent to the first explosive, serve to increase acceleration of the fragments toward a target. The ability to produce a selected number of fragments allows for effective destruction of a chosen target.

  7. Functional metagenomics for the investigation of antibiotic resistance.

    PubMed

    Mullany, Peter

    2014-04-01

    Antibiotic resistance is a major threat to human health and well-being. To effectively combat this problem we need to understand the range of different resistance genes that allow bacteria to resist antibiotics. To do this the whole microbiota needs to be investigated. As most bacteria cannot be cultivated in the laboratory, the reservoir of antibiotic resistance genes in the non-cultivatable majority remains relatively unexplored. Currently the only way to study antibiotic resistance in these organisms is to use metagenomic approaches. Furthermore, the only method that does not require any prior knowledge about the resistance genes is functional metagenomics, which involves expressing genes from metagenomic clones in surrogate hosts. In this review the methods and limitations of functional metagenomics to isolate new antibiotic resistance genes and the mobile genetic elements that mediate their spread are explored.

  8. Functional metagenomics for the investigation of antibiotic resistance

    PubMed Central

    Mullany, Peter

    2014-01-01

    Antibiotic resistance is a major threat to human health and well-being. To effectively combat this problem we need to understand the range of different resistance genes that allow bacteria to resist antibiotics. To do this the whole microbiota needs to be investigated. As most bacteria cannot be cultivated in the laboratory, the reservoir of antibiotic resistance genes in the non-cultivatable majority remains relatively unexplored. Currently the only way to study antibiotic resistance in these organisms is to use metagenomic approaches. Furthermore, the only method that does not require any prior knowledge about the resistance genes is functional metagenomics, which involves expressing genes from metagenomic clones in surrogate hosts. In this review the methods and limitations of functional metagenomics to isolate new antibiotic resistance genes and the mobile genetic elements that mediate their spread are explored. PMID:24556726

  9. Metagenomics - a guide from sampling to data analysis

    PubMed Central

    2012-01-01

    Metagenomics applies a suite of genomic technologies and bioinformatics tools to directly access the genetic content of entire communities of organisms. The field of metagenomics has been responsible for substantial advances in microbial ecology, evolution, and diversity over the past 5 to 10 years, and many research laboratories are actively engaged in it now. With the growing numbers of activities also comes a plethora of methodological knowledge and expertise that should guide future developments in the field. This review summarizes the current opinions in metagenomics, and provides practical guidance and advice on sample processing, sequencing technology, assembly, binning, annotation, experimental design, statistical analysis, data storage, and data sharing. As more metagenomic datasets are generated, the availability of standardized procedures and shared data storage and analysis becomes increasingly important to ensure that output of individual projects can be assessed and compared. PMID:22587947

  10. Exploring Metagenomics in the Laboratory of an Introductory Biology Course†

    PubMed Central

    Gibbens, Brian B.; Scott, Cheryl L.; Hoff, Courtney D.; Schottel, Janet L.

    2015-01-01

    Four laboratory modules were designed for introductory biology students to explore the field of metagenomics. Students collected microbes from environmental samples, extracted the DNA, and amplified 16S rRNA gene sequences using polymerase chain reaction (PCR). Students designed functional metagenomics screens to determine and compare antibiotic resistance profiles among the samples. Bioinformatics tools were used to generate and interpret phylogenetic trees and identify homologous genes. A pretest and posttest were used to assess learning gains, and the results indicated that these modules increased student performance by an average of 22%. Here we describe ways to engage students in metagenomics-related research and provide readers with ideas for how they can start developing metagenomics exercises for their own classrooms. PMID:25949755

  11. Activity screening of environmental metagenomic libraries reveals novel carboxylesterase families

    PubMed Central

    Popovic, Ana; Hai, Tran; Tchigvintsev, Anatoly; Hajighasemi, Mahbod; Nocek, Boguslaw; Khusnutdinova, Anna N.; Brown, Greg; Glinos, Julia; Flick, Robert; Skarina, Tatiana; Chernikova, Tatyana N.; Yim, Veronica; Brüls, Thomas; Paslier, Denis Le; Yakimov, Michail M.; Joachimiak, Andrzej; Ferrer, Manuel; Golyshina, Olga V.; Savchenko, Alexei; Golyshin, Peter N.; Yakunin, Alexander F.

    2017-01-01

    Metagenomics has made accessible an enormous reserve of global biochemical diversity. To tap into this vast resource of novel enzymes, we have screened over one million clones from metagenome DNA libraries derived from sixteen different environments for carboxylesterase activity and identified 714 positive hits. We have validated the esterase activity of 80 selected genes, which belong to 17 different protein families including unknown and cyclase-like proteins. Three metagenomic enzymes exhibited lipase activity, and seven proteins showed polyester depolymerization activity against polylactic acid and polycaprolactone. Detailed biochemical characterization of four new enzymes revealed their substrate preference, whereas their catalytic residues were identified using site-directed mutagenesis. The crystal structure of the metal-ion dependent esterase MGS0169 from the amidohydrolase superfamily revealed a novel active site with a bound unknown ligand. Thus, activity-centered metagenomics has revealed diverse enzymes and novel families of microbial carboxylesterases, whose activity could not have been predicted using bioinformatics tools. PMID:28272521

  12. Identifying Differentially Abundant Metabolic Pathways in Metagenomic Datasets

    NASA Astrophysics Data System (ADS)

    Liu, Bo; Pop, Mihai

    Enabled by rapid advances in sequencing technology, metagenomic studies aim to characterize entire communities of microbes bypassing the need for culturing individual bacterial members. One major goal of such studies is to identify specific functional adaptations of microbial communities to their habitats. Here we describe a powerful analytical method (MetaPath) that can identify differentially abundant pathways in metagenomic data-sets, relying on a combination of metagenomic sequence data and prior metabolic pathway knowledge. We show that MetaPath outperforms other common approaches when evaluated on simulated datasets. We also demonstrate the power of our methods in analyzing two, publicly available, metagenomic datasets: a comparison of the gut microbiome of obese and lean twins; and a comparison of the gut microbiome of infant and adult subjects. We demonstrate that the subpathways identified by our method provide valuable insights into the biological activities of the microbiome.

  13. Metagenomics of Glassy-winged Sharpshooter, Homalodisca vitripennis (Hemiptera: Cicadellidae)

    USDA-ARS?s Scientific Manuscript database

    Three new insect-infecting viruses, three endosymbiotic bacteria, a fungus, and a bacterial phage were discovered using a metagenomics approach to identify unknown organisms that live in association with the sharpshooter, Homalodisca vitripennis (Hemiptera: Cicadellidae). The genetic composition of ...

  14. Metagenomic Profiling of a Microbial Assemblage Associated with the California Mussel: A Node in Networks of Carbon and Nitrogen Cycling

    PubMed Central

    Pfister, Catherine A.; Meyer, Folker; Antonopoulos, Dionysios A.

    2010-01-01

    Mussels are conspicuous and often abundant members of rocky shores and may constitute an important site for the nitrogen cycle due to their feeding and excretion activities. We used shotgun metagenomics of the microbial community associated with the surface of mussels (Mytilus californianus) on Tatoosh Island in Washington state to test whether there is a nitrogen-based microbial assemblage associated with mussels. Analyses of both tidepool mussels and those on emergent benches revealed a diverse community of Bacteria and Archaea with approximately 31 million bp from 6 mussels in each habitat. Using MG-RAST, between 22.5–25.6% were identifiable using the SEED non-redundant database for proteins. Of those fragments that were identifiable through MG-RAST, the composition was dominated by Cyanobacteria and Alpha- and Gamma-proteobacteria. Microbial composition was highly similar between the tidepool and emergent bench mussels, suggesting similar functions across these different microhabitats. One percent of the proteins identified in each sample were related to nitrogen cycling. When normalized to protein discovery rate, the high diversity and abundance of enzymes related to the nitrogen cycle in mussel-associated microbes is as great or greater than that described for other marine metagenomes. In some instances, the nitrogen-utilizing profile of this assemblage was more concordant with soil metagenomes in the Midwestern U.S. than for open ocean system. Carbon fixation and Calvin cycle enzymes further represented 0.65 and 1.26% of all proteins and their abundance was comparable to a number of open ocean marine metagenomes. In sum, the diversity and abundance of nitrogen and carbon cycle related enzymes in the microbes occupying the shells of Mytilus californianus suggest these mussels provide a node for microbial populations and thus biogeochemical processes. PMID:20463896

  15. Metagenomic analysis of the gut microbiota of the Timber Rattlesnake, Crotalus horridus.

    PubMed

    McLaughlin, Richard William; Cochran, Philip A; Dowd, Scot E

    2015-07-01

    Snakes are capable of surviving long periods without food. In this study we characterized the microbiota of a Timber Rattlesnake (Crotalus horridus), devoid of digesta, living in the wild. Pyrosequencing-based metagenomics were used to analyze phylogenetic and metabolic profiles with the aid of the MG-RAST server. Pyrosequencing of samples taken from the stomach, small intestine and colon yielded 691696, 957756 and 700419 high quality sequence reads. Taxonomic analysis of metagenomic reads indicated Eukarya was the most predominant domain, followed by bacteria and then viruses, for all three tissues. The most predominant phylum in the domain Bacteria was Proteobacteria for the tissues examined. Functional classifications by the subsystem database showed cluster-based subsystems were most predominant (10-15 %). Almost equally predominant (10-13 %) was carbohydrate metabolism. To identify bacteria in the colon at a finer taxonomic resolution, a 16S rRNA gene clone library was created. Proteobacteria was again found to be the most predominant phylum. The present study provides a baseline for understanding the microbial ecology of snakes living in the wild.

  16. PARTIE: a partition engine to separate metagenomic and amplicon projects in the Sequence Read Archive.

    PubMed

    Torres, Pedro J; Edwards, Robert A; McNair, Katelyn A

    2017-08-01

    The Sequence Read Archive (SRA) contains raw data from many different types of sequence projects. As of 2017, the SRA contained approximately ten petabases of DNA sequence (10 16 bp). Annotations of the data are provided by the submitter, and mining the data in the SRA is complicated by both the amount of data and the detail within those annotations. Here, we introduce PARTIE, a partition engine optimized to differentiate sequence read data into metagenomic (random) and amplicon (targeted) sequence data sets. PARTIE subsamples reads from the sequencing file and calculates four different statistics: k -mer frequency, 16S abundance, prokaryotic- and viral-read abundance. These metrics are used to create a RandomForest decision tree to classify the sequencing data, and PARTIE provides mechanisms for both supervised and unsupervised classification. We demonstrate the accuracy of PARTIE for classifying SRA data, discuss the probable error rates in the SRA annotations and introduce a resource assessing SRA data. PARTIE and reclassified metagenome SRA entries are available from https://github.com/linsalrob/partie. redwards@mail.sdsu.edu. Supplementary data are available at Bioinformatics online.

  17. Characterization of the gut microbiota of Kawasaki disease patients by metagenomic analysis

    PubMed Central

    Kinumaki, Akiko; Sekizuka, Tsuyoshi; Hamada, Hiromichi; Kato, Kengo; Yamashita, Akifumi; Kuroda, Makoto

    2015-01-01

    Kawasaki disease (KD) is an acute febrile illness of early childhood. Previous reports have suggested that genetic disease susceptibility factors, together with a triggering infectious agent, could be involved in KD pathogenesis; however, the precise etiology of this disease remains unknown. Additionally, previous culture-based studies have suggested a possible role of intestinal microbiota in KD pathogenesis. In this study, we performed metagenomic analysis to comprehensively assess the longitudinal variation in the intestinal microbiota of 28 KD patients. Several notable bacterial genera were commonly extracted during the acute phase, whereas a relative increase in the number of Ruminococcus bacteria was observed during the non-acute phase of KD. The metagenomic analysis results based on bacterial species classification suggested that the number of sequencing reads with similarity to five Streptococcus spp. (S. pneumonia, pseudopneumoniae, oralis, gordonii, and sanguinis), in addition to patient-derived Streptococcus isolates, markedly increased during the acute phase in most patients. Streptococci include a variety of pathogenic bacteria and probiotic bacteria that promote human health; therefore, this further species discrimination could comprehensively illuminate the KD-associated microbiota. The findings of this study suggest that KD-related Streptococci might be involved in the pathogenesis of this disease. PMID:26322033

  18. Identification and characterization of novel poly(DL-lactic acid) depolymerases from metagenome.

    PubMed

    Mayumi, Daisuke; Akutsu-Shigeno, Yukie; Uchiyama, Hiroo; Nomura, Nobuhiko; Nakajima-Kambe, Toshiaki

    2008-07-01

    Many poly(lactic acid) (PLA)-degrading microorganisms have been isolated from the natural environment by culture-based methods, but there is no study about unculturable PLA-degrading microorganisms. In this study, we constructed a metagenomic library consisting of the DNA extracted from PLA disks buried in compost. We identified three PLA-degrading genes encoding lipase or hydrolase. The purified enzymes degraded not only PLA, but also various aliphatic polyesters, tributyrin, and p-nitrophenyl esters. From their substrate specificities, the PLA depolymerases were classified into an esterase rather than a lipase. Among the PLA depolymerases, PlaM4 exhibited thermophilic properties; that is, it showed the highest activity at 70 degrees C and was stable even after incubation for 1 h at 50 degrees C. PlaM4 had absorption and degradation activities for solid PLA at 60 degrees C, which indicates that the enzyme can effectively degrade PLA in a high-temperature environment. On the other hand, the enzyme classification based on amino acid sequences showed that the other PLA depolymerases, PlaM7 and PlaM9, were not classified into known lipases or esterases. This is the first report on the identification and characterization of PLA depolymerase from a metagenome.

  19. Spectral classification

    NASA Astrophysics Data System (ADS)

    Jaschek, C.

    Taxonomic classification of astronomically observed stellar objects is described in terms of spectral properties. Stars receive a classification containing a letter, number, and a Roman numeral, which relates the star to other stars of higher or lower Roman numerals. The citation indicates the stellar chromatic emission in relation to the wavelengths of other stars. Standards are chosen from the available objects detected. Various classification schemes such as the MK, HD, and the Barbier-Chalonge-Divan systems are defined, including examples of indexing differences. Details delineating the separations between classifications are discussed with reference to the information content in spectral and in photometric classification schemes. The parameters usually used for classification include the temperature, luminosity, reddening, binarity, rotation, magnetic field, and elemental abundance or composition. The inclusion of recently discovered extended wavelength characteristics in nominal classifications is outlined, together with techniques involved in automated classification.

  20. Metagenomic Sequencing of an In Vitro-Simulated Microbial Community

    SciTech Connect

    Morgan, Jenna L.; Darling, Aaron E.; Eisen, Jonathan A.

    2009-12-01

    Background: Microbial life dominates the earth, but many species are difficult or even impossible to study under laboratory conditions. Sequencing DNA directly from the environment, a technique commonly referred to as metagenomics, is an important tool for cataloging microbial life. This culture-independent approach involves collecting samples that include microbes in them, extracting DNA from the samples, and sequencing the DNA. A sample may contain many different microorganisms, macroorganisms, and even free-floating environmental DNA. A fundamental challenge in metagenomics has been estimating the abundance of organisms in a sample based on the frequency with which the organism's DNA was observed in reads generated via DNA sequencing. Methodology/Principal Findings: We created mixtures of ten microbial species for which genome sequences are known. Each mixture contained an equal number of cells of each species. We then extracted DNA from the mixtures, sequenced the DNA, and measured the frequency with which genomic regions from each organism was observed in the sequenced DNA. We found that the observed frequency of reads mapping to each organism did not reflect the equal numbers of cells that were known to be included in each mixture. The relative organism abundances varied significantly depending on the DNA extraction and sequencing protocol utilized. Conclusions/Significance: We describe a new data resource for measuring the accuracy of metagenomic binning methods, created by in vitro-simulation of a metagenomic community. Our in vitro simulation can be used to complement previous in silico benchmark studies. In constructing a synthetic community and sequencing its metagenome, we encountered several sources of observation bias that likely affect most metagenomic experiments to date and present challenges for comparative metagenomic studies. DNA preparation methods have a particularly profound effect in our study, implying that samples prepared with different

  1. Exploration of Metagenome Assemblies with an Interactive Visualization Tool

    SciTech Connect

    Cantor, Michael; Nordberg, Henrik; Smirnova, Tatyana; Andersen, Evan; Tringe, Susannah; Hess, Matthias; Dubchak, Inna

    2014-07-09

    Metagenomics, one of the fastest growing areas of modern genomic science, is the genetic profiling of the entire community of microbial organisms present in an environmental sample. Elviz is a web-based tool for the interactive exploration of metagenome assemblies. Elviz can be used with publicly available data sets from the Joint Genome Institute or with custom user-loaded assemblies. Elviz is available at genome.jgi.doe.gov/viz

  2. Biocatalysts and their small molecule products from metagenomic studies

    PubMed Central

    Iqbal, Hala A.; Feng, Zhiyang; Brady, Sean F.

    2012-01-01

    The vast majority of bacteria present in environmental samples have never been cultured and therefore they have not been available to exploit their ability to produce useful biocatalysts or collections of biocatalysts that can biosynthesize interesting small molecules. Metagenomic libraries constructed using DNA extracted directly from natural bacterial communities offer access to the genetic information present in the genomes of these as yet uncultured bacteria. This review highlights recent efforts to recover both discrete enzymes and small molecules from metagenomic libraries. PMID:22455793

  3. [Metagenomics and biodiversity of sphagnum bogs].

    PubMed

    Rusin, L Yu

    2016-01-01

    Biodiversity of sphagnum bogs is one of the richest and less studied, while these ecosystems are among the top ones in ecological, conservation, and economic value. Recent studies focused on the prokaryotic consortia associated with sphagnum mosses, and revealed the factors that maintain sustainability and productivity of bog ecosystems. High-throughput sequencing technologies provided insight into functional diversity of moss microbial communities (microbiomes), and helped to identify the biochemical pathways and gene families that facilitate the spectrum of adaptive strategies and largely foster the very successful colonization of the Northern hemisphere by sphagnum mosses. Rich and valuable information obtained on microbiomes of peat bogs sets off the paucity of evidence on their eukaryotic diversity. Prospects and expectations of reliable assessment of taxonomic profiles, relative abundance of taxa, and hidden biodiversity of microscopic eukaryotes in sphagnum bog ecosystems are briefly outlined in the context of today's metagenomics.

  4. Identifying a healthy oral microbiome through metagenomics.

    PubMed

    Alcaraz, L D; Belda-Ferre, P; Cabrera-Rubio, R; Romero, H; Simón-Soro, A; Pignatelli, M; Mira, A

    2012-07-01

    We present the results of an exploratory study of the bacterial communities from the human oral cavity showing the advantages of pyrosequencing complex samples. Over 1.6 million reads from the metagenomes of eight dental plaque samples were taxonomically assigned through a binning procedure. We performed clustering analysis to discern if there were associations between non-caries and caries conditions in the community composition. Our results show a given bacterial consortium associated with cariogenic and non-cariogenic conditions, in agreement with the existence of a healthy oral microbiome and giving support to the idea of dental caries being a polymicrobial disease. The data are coherent with those previously reported in the literature by 16S rRNA amplification, thus giving the chance to link gene functions with taxonomy in further studies involving larger sample numbers. © 2012 The Authors. Clinical Microbiology and Infection © 2012 European Society of Clinical Microbiology and Infectious Diseases.

  5. Developing a metagenomic view of xenobiotic metabolism

    PubMed Central

    Haiser, Henry J.; Turnbaugh, Peter J.

    2012-01-01

    The microbes residing in and on the human body influence human physiology in many ways, particularly through their impact on the metabolism of xenobiotic compounds, including therapeutic drugs, antibiotics, and diet-derived bioactive compounds. Despite the importance of these interactions and the many possibilities for intervention, microbial xenobiotic metabolism remains a largely underexplored component of pharmacology. Here, we discuss the emerging evidence for both direct and indirect effects of the human gut microbiota on xenobiotic metabolism, and the initial links that have been made between specific compounds, diverse members of this complex community, and the microbial genes responsible. Furthermore, we highlight the many parallels to the now well-established field of environmental bioremediation, and the vast potential to leverage emerging metagenomic tools to shed new light on these important microbial biotransformations. PMID:22902524

  6. Expanding the Marine Virosphere Using Metagenomics

    PubMed Central

    Mizuno, Carolina Megumi; Rodriguez-Valera, Francisco; Kimes, Nikole E.; Ghai, Rohit

    2013-01-01

    Viruses infecting prokaryotic cells (phages) are the most abundant entities of the biosphere and contain a largely uncharted wealth of genomic diversity. They play a critical role in the biology of their hosts and in ecosystem functioning at large. The classical approaches studying phages require isolation from a pure culture of the host. Direct sequencing approaches have been hampered by the small amounts of phage DNA present in most natural habitats and the difficulty in applying meta-omic approaches, such as annotation of small reads and assembly. Serendipitously, it has been discovered that cellular metagenomes of highly productive ocean waters (the deep chlorophyll maximum) contain significant amounts of viral DNA derived from cells undergoing the lytic cycle. We have taken advantage of this phenomenon to retrieve metagenomic fosmids containing viral DNA from a Mediterranean deep chlorophyll maximum sample. This method allowed description of complete genomes of 208 new marine phages. The diversity of these genomes was remarkable, contributing 21 genomic groups of tailed bacteriophages of which 10 are completely new. Sequence based methods have allowed host assignment to many of them. These predicted hosts represent a wide variety of important marine prokaryotic microbes like members of SAR11 and SAR116 clades, Cyanobacteria and also the newly described low GC Actinobacteria. A metavirome constructed from the same habitat showed that many of the new phage genomes were abundantly represented. Furthermore, other available metaviromes also indicated that some of the new phages are globally distributed in low to medium latitude ocean waters. The availability of many genomes from the same sample allows a direct approach to viral population genomics confirming the remarkable mosaicism of phage genomes. PMID:24348267

  7. Expanding the marine virosphere using metagenomics.

    PubMed

    Mizuno, Carolina Megumi; Rodriguez-Valera, Francisco; Kimes, Nikole E; Ghai, Rohit

    2013-01-01

    Viruses infecting prokaryotic cells (phages) are the most abundant entities of the biosphere and contain a largely uncharted wealth of genomic diversity. They play a critical role in the biology of their hosts and in ecosystem functioning at large. The classical approaches studying phages require isolation from a pure culture of the host. Direct sequencing approaches have been hampered by the small amounts of phage DNA present in most natural habitats and the difficulty in applying meta-omic approaches, such as annotation of small reads and assembly. Serendipitously, it has been discovered that cellular metagenomes of highly productive ocean waters (the deep chlorophyll maximum) contain significant amounts of viral DNA derived from cells undergoing the lytic cycle. We have taken advantage of this phenomenon to retrieve metagenomic fosmids containing viral DNA from a Mediterranean deep chlorophyll maximum sample. This method allowed description of complete genomes of 208 new marine phages. The diversity of these genomes was remarkable, contributing 21 genomic groups of tailed bacteriophages of which 10 are completely new. Sequence based methods have allowed host assignment to many of them. These predicted hosts represent a wide variety of important marine prokaryotic microbes like members of SAR11 and SAR116 clades, Cyanobacteria and also the newly described low GC Actinobacteria. A metavirome constructed from the same habitat showed that many of the new phage genomes were abundantly represented. Furthermore, other available metaviromes also indicated that some of the new phages are globally distributed in low to medium latitude ocean waters. The availability of many genomes from the same sample allows a direct approach to viral population genomics confirming the remarkable mosaicism of phage genomes.

  8. MetaProx: the database of metagenomic proximons

    PubMed Central

    Vey, Gregory; Charles, Trevor C.

    2014-01-01

    MetaProx is the database of metagenomic proximons: a searchable repository of proximon objects conceived with two specific goals. The first objective is to accelerate research involving metagenomic functional interactions by providing a database of metagenomic operon candidates. Proximons represent a special subset of directons (series of contiguous co-directional genes) where each member gene is in close proximity to its neighbours with respect to intergenic distance. As a result, proximons represent significant operon candidates where some subset of proximons is the set of true metagenomic operons. Proximons are well suited for the inference of metagenomic functional networks because predicted functional linkages do not rely on homology-dependent information that is frequently unavailable in metagenomic scenarios. The second objective is to explore representations for semistructured biological data that can offer an alternative to the traditional relational database approach. In particular, we use a serialized object implementation and advocate a Data as Data policy where the same serialized objects can be used at all levels (database, search tool and saved user file) without conversion or the use of human-readable markups. MetaProx currently includes 4 210 818 proximons consisting of 8 926 993 total member genes. Database URL: http://metaprox.uwaterloo.ca PMID:25288655

  9. Messages from the first International Conference on Clinical Metagenomics (ICCMg).

    PubMed

    Ruppé, Etienne; Greub, Gilbert; Schrenzel, Jacques

    2017-02-01

    Metagenomics is recently entering in the clinical microbiology and an increasing number of diagnostic laboratories are now proposing the sequencing & annotation of bacterial genomes and/or the analysis of clinical samples by direct or PCR-based metagenomics with short time to results. In this context, the first International Conference on Clinical Metagenomics (ICCMg) was held in Geneva in October 2016 and several key aspects have been discussed including: i) the need for improved resolution, ii) the importance of interpretation given the common occurrence of sequence contaminants, iii) the need for improved bioinformatic pipelines, iv) the bottleneck of DNA extraction, v) the importance of gold standards, vi) the need to further reduce time to results, vii) how to improve data sharing, viii) the applications of bacterial genomics and clinical metagenomics in better adapting therapeutics and ix) the impact of metagenomics and new sequencing technologies in discovering new microbes. Further efforts in term of reduced turnaround time, improved quality and lower costs are however warranted to fully translate metagenomics in clinical applications.

  10. Beyond the bounds of orthology: functional inference from metagenomic context.

    PubMed

    Vey, Gregory; Moreno-Hagelsieb, Gabriel

    2010-07-01

    The effectiveness of the computational inference of function by genomic context is bounded by the diversity of known microbial genomes. Although metagenomes offer access to previously inaccessible organisms, their fragmentary nature prevents the conventional establishment of orthologous relationships required for reliably predicting functional interactions. We introduce a protocol for the prediction of functional interactions using data sources without information about orthologous relationships. To illustrate this process, we use the Sargasso Sea metagenome to construct a functional interaction network for the Escherichia coli K12 genome. We identify two reliability metrics, target intergenic distance and source interaction count, and apply them to selectively filter the predictions retained to construct the network of functional interactions. The resulting network contains 2297 nodes with 10 072 edges with a positive predictive value of 0.80. The metagenome yielded 8423 functional interactions beyond those found using only the genomic orthologs as a data source. This amounted to a 134% increase in the total number of functional interactions that are predicted by combining the metagenome and the genomic orthologs versus the genomic orthologs alone. In the absence of detectable orthologous relationships it remains feasible to derive a reliable set of predicted functional interactions. This offers a strategy for harnessing other metagenomes and homologs in general. Because metagenomes allow access to previously unreachable microorganisms, this will result in expanding the universe of known functional interactions thus furthering our understanding of functional organization.

  11. Elviz - exploration of metagenome assemblies with an interactive visualization tool.

    PubMed

    Cantor, Michael; Nordberg, Henrik; Smirnova, Tatyana; Hess, Matthias; Tringe, Susannah; Dubchak, Inna

    2015-04-28

    Metagenomics, the sequencing of DNA collected from an entire microbial community, enables the study of natural microbial consortia in their native habitats. Metagenomics studies produce huge volumes of data, including both the sequences themselves and metadata describing their abundance, assembly, predicted functional characteristics and environmental parameters. The ability to explore these data visually is critically important to meaningful biological interpretation. Current genomics applications cannot effectively integrate sequence data, assembly metadata, and annotation to support both genome and community-level inquiry. Elviz (Environmental Laboratory Visualization) is an interactive web-based tool for the visual exploration of assembled metagenomes and their complex metadata. Elviz allows scientists to navigate metagenome assemblies across multiple dimensions and scales, plotting parameters such as GC content, relative abundance, phylogenetic affiliation and assembled contig length. Furthermore Elviz enables interactive exploration using real-time plot navigation, search, filters, axis selection, and the ability to drill from a whole-community profile down to individual gene annotations. Thus scientists engage in a rapid feedback loop of visual pattern identification, hypothesis generation, and hypothesis testing. Compared to the current alternative of generating a succession of static figures, Elviz can greatly accelerate the speed of metagenome analysis. Elviz can be used to explore both user-submitted datasets and numerous metagenome studies publicly available at the Joint Genome Institute (JGI). Elviz is freely available at http://genome.jgi.doe.gov/viz and runs on most current web-browsers.

  12. Computational prediction of CRISPR cassettes in gut metagenome samples from Chinese type-2 diabetic patients and healthy controls.

    PubMed

    Mangericao, Tatiana C; Peng, Zhanhao; Zhang, Xuegong

    2016-01-11

    CRISPR has been becoming a hot topic as a powerful technique for genome editing for human and other higher organisms. The original CRISPR-Cas (Clustered Regularly Interspaced Short Palindromic Repeats coupled with CRISPR-associated proteins) is an important adaptive defence system for prokaryotes that provides resistance against invading elements such as viruses and plasmids. A CRISPR cassette contains short nucleotide sequences called spacers. These unique regions retain a history of the interactions between prokaryotes and their invaders in individual strains and ecosystems. One important ecosystem in the human body is the human gut, a rich habitat populated by a great diversity of microorganisms. Gut microbiomes are important for human physiology and health. Metagenome sequencing has been widely applied for studying the gut microbiomes. Most efforts in metagenome study has been focused on profiling taxa compositions and gene catalogues and identifying their associations with human health. Less attention has been paid to the analysis of the ecosystems of microbiomes themselves especially their CRISPR composition. We conducted a preliminary analysis of CRISPR sequences in a human gut metagenomic data set of Chinese individuals of type-2 diabetes patients and healthy controls. Applying an available CRISPR-identification algorithm, PILER-CR, we identified 3169 CRISPR cassettes in the data, from which we constructed a set of 1302 unique repeat sequences and 36,709 spacers. A more extensive analysis was made for the CRISPR repeats: these repeats were submitted to a more comprehensive clustering and classification using the web server tool CRISPRmap. All repeats were compared with known CRISPRs in the database CRISPRdb. A total of 784 repeats had matches in the database, and the remaining 518 repeats from our set are potentially novel ones. The computational analysis of CRISPR composition based contigs of metagenome sequencing data is feasible. It provides an efficient

  13. EBI metagenomics in 2016 - an expanding and evolving resource for the analysis and archiving of metagenomic data

    PubMed Central

    Mitchell, Alex; Bucchini, Francois; Cochrane, Guy; Denise, Hubert; Hoopen, Petra ten; Fraser, Matthew; Pesseat, Sebastien; Potter, Simon; Scheremetjew, Maxim; Sterk, Peter; Finn, Robert D.

    2016-01-01

    EBI metagenomics (https://www.ebi.ac.uk/metagenomics/) is a freely available hub for the analysis and archiving of metagenomic and metatranscriptomic data. Over the last 2 years, the resource has undergone rapid growth, with an increase of over five-fold in the number of processed samples and consequently represents one of the largest resources of analysed shotgun metagenomes. Here, we report the status of the resource in 2016 and give an overview of new developments. In particular, we describe updates to data content, a complete overhaul of the analysis pipeline, streamlining of data presentation via the website and the development of a new web based tool to compare functional analyses of sequence runs within a study. We also highlight two of the higher profile projects that have been analysed using the resource in the last year: the oceanographic projects Ocean Sampling Day and Tara Oceans. PMID:26582919

  14. Introduction to Metagenomics at DOE JGI (Opening Remarks for the Metagenomics Informatics Challenges Workshop: 10K Genomes at a Time)

    SciTech Connect

    Kyrpides, Nikos

    2011-10-12

    After a quick introduction by DOE JGI Director Eddy Rubin, DOE JGI's Nikos Kyrpides delivers the opening remarks at the Metagenomics Informatics Challenges Workshop held at the DOE JGI on October 12-13, 2011

  15. Metagenomics, metaMicrobesOnline and Kbase Data Integration (MICW - Metagenomics Informatics Challenges Workshop: 10K Genomes at a Time)

    SciTech Connect

    Dehal, Paramvir

    2011-10-12

    Berkeley Lab's Paramvir Dehal on "Managing and Storing large Datasets in MicrobesOnline, metaMicrobesOnline and the DOE Knowledgebase" at the Metagenomics Informatics Challenges Workshop held at the DOE JGI on October 12-13, 2011.

  16. Introduction to Metagenomics at DOE JGI (Opening Remarks for the Metagenomics Informatics Challenges Workshop: 10K Genomes at a Time)

    ScienceCinema

    Kyrpides, Nikos [DOE JGI

    2016-07-12

    After a quick introduction by DOE JGI Director Eddy Rubin, DOE JGI's Nikos Kyrpides delivers the opening remarks at the Metagenomics Informatics Challenges Workshop held at the DOE JGI on October 12-13, 2011

  17. Metagenomics, metaMicrobesOnline and Kbase Data Integration (MICW - Metagenomics Informatics Challenges Workshop: 10K Genomes at a Time)

    ScienceCinema

    Dehal, Paramvir [LBNL

    2016-07-12

    Berkeley Lab's Paramvir Dehal on "Managing and Storing large Datasets in MicrobesOnline, metaMicrobesOnline and the DOE Knowledgebase" at the Metagenomics Informatics Challenges Workshop held at the DOE JGI on October 12-13, 2011.

  18. Universality of fragment shapes

    PubMed Central

    Domokos, Gábor; Kun, Ferenc; Sipos, András Árpád; Szabó, Tímea

    2015-01-01

    The shape of fragments generated by the breakup of solids is central to a wide variety of problems ranging from the geomorphic evolution of boulders to the accumulation of space debris orbiting Earth. Although the statistics of the mass of fragments has been found to show a universal scaling behavior, the comprehensive characterization of fragment shapes still remained a fundamental challenge. We performed a thorough experimental study of the problem fragmenting various types of materials by slowly proceeding weathering and by rapid breakup due to explosion and hammering. We demonstrate that the shape of fragments obeys an astonishing universality having the same generic evolution with the fragment size irrespective of materials details and loading conditions. There exists a cutoff size below which fragments have an isotropic shape, however, as the size increases an exponential convergence is obtained to a unique elongated form. We show that a discrete stochastic model of fragmentation reproduces both the size and shape of fragments tuning only a single parameter which strengthens the general validity of the scaling laws. The dependence of the probability of the crack plan orientation on the linear extension of fragments proved to be essential for the shape selection mechanism. PMID:25772300

  19. Universality of fragment shapes

    NASA Astrophysics Data System (ADS)

    Domokos, Gábor; Kun, Ferenc; Sipos, András Árpád; Szabó, Tímea

    2015-03-01

    The shape of fragments generated by the breakup of solids is central to a wide variety of problems ranging from the geomorphic evolution of boulders to the accumulation of space debris orbiting Earth. Although the statistics of the mass of fragments has been found to show a universal scaling behavior, the comprehensive characterization of fragment shapes still remained a fundamental challenge. We performed a thorough experimental study of the problem fragmenting various types of materials by slowly proceeding weathering and by rapid breakup due to explosion and hammering. We demonstrate that the shape of fragments obeys an astonishing universality having the same generic evolution with the fragment size irrespective of materials details and loading conditions. There exists a cutoff size below which fragments have an isotropic shape, however, as the size increases an exponential convergence is obtained to a unique elongated form. We show that a discrete stochastic model of fragmentation reproduces both the size and shape of fragments tuning only a single parameter which strengthens the general validity of the scaling laws. The dependence of the probability of the crack plan orientation on the linear extension of fragments proved to be essential for the shape selection mechanism.

  20. A base composition analysis of natural patterns for the preprocessing of metagenome sequences

    PubMed Central

    2013-01-01

    Background On the pretext that sequence reads and contigs often exhibit the same kinds of base usage that is also observed in the sequences from which they are derived, we offer a base composition analysis tool. Our tool uses these natural patterns to determine relatedness across sequence data. We introduce spectrum sets (sets of motifs) which are permutations of bacterial restriction sites and the base composition analysis framework to measure their proportional content in sequence data. We suggest that this framework will increase the efficiency during the pre-processing stages of metagenome sequencing and assembly projects. Results Our method is able to differentiate organisms and their reads or contigs. The framework shows how to successfully determine the relatedness between these reads or contigs by comparison of base composition. In particular, we show that two types of organismal-sequence data are fundamentally different by analyzing their spectrum set motif proportions (coverage). By the application of one of the four possible spectrum sets, encompassing all known restriction sites, we provide the evidence to claim that each set has a different ability to differentiate sequence data. Furthermore, we show that the spectrum set selection having relevance to one organism, but not to the others of the data set, will greatly improve performance of sequence differentiation even if the fragment size of the read, contig or sequence is not lengthy. Conclusions We show the proof of concept of our method by its application to ten trials of two or three freshly selected sequence fragments (reads and contigs) for each experiment across the six organisms of our set. Here we describe a novel and computationally effective pre-processing step for metagenome sequencing and assembly tasks. Furthermore, our base composition method has applications in phylogeny where it can be used to infer evolutionary distances between organisms based on the notion that related organisms

  1. Identifying areas of relative change in forest fragmentation in New Hampshire between 1990 and 2000

    Treesearch

    Tonya Lister; Andrew Lister; William McWilliams; Rachel Riemann

    2007-01-01

    Forest fragmentation potentially can impact many facets of natural ecosystems. Numerous methods have been employed to assess static forest fragmentation. Few studies, however, have analyzed changes in forest fragmentation over time. In this study, we developed new classifications from Landsat imagery data acquired in 1990 and 2000 for New Hampshire, assessed...

  2. Assessment of diversity indices for the characterization of the soil prokaryotic community by metagenomic analysis

    NASA Astrophysics Data System (ADS)

    Chernov, T. I.; Tkhakakhova, A. K.; Kutovaya, O. V.

    2015-04-01

    The diversity indices used in ecology for assessing the metagenomes of soil prokaryotic communities at different phylogenetic levels were compared. The following indices were considered: the number of detected taxa and the Shannon, Menhinick, Margalef, Simpson, Chao1, and ACE indices. The diversity analysis of the prokaryotic communities in the upper horizons of a typical chernozem (Haplic Chernozem (Pachic)), a dark chestnut soil (Haplic Kastanozem (Chromic)), and an extremely arid desert soil (Endosalic Calcisol (Yermic)) was based on the analysis of 16S rRNA genes. The Menhinick, Margalef, Chao1, and ACE indices gave similar results for the classification of the communities according to their diversity levels; the Simpson index gave good results only for the high-level taxa (phyla); the best results were obtained with the Shannon index. In general, all the indices used showed a decrease in the diversity of the soil prokaryotes in the following sequence: chernozem > dark chestnut soil > extremely arid desert soil.

  3. Fragmentation properties of metals

    SciTech Connect

    Grady, D.E.; Kipp, M.E.

    1996-06-01

    In the present study we are developing an experimental fracture material property test method specific to dynamic fragmentation. Spherical test samples of the metals of interest are subjected to controlled impulsive stress loads by acceleration to high velocities with a light-gas launcher facility and subsequent normal impact on thin plates. Motion, deformation and fragmentation of the test samples are diagnosed with multiple flash radiography methods. The impact plate materials are selected to be transparent to the x-ray method so that only test metal material is imaged. Through a systematic series of such tests, both strain-to-failure and fragmentation resistance properties are determined through this experimental method. Fragmentation property data for several steels, copper, aluminum, tantalum and titanium have been obtained to date. Aspects of the dynamic data have been analyzed with computational methods to achieve a better understanding of the processes leading to failure and fragmentation, and to test an existing computational fragmentation model.

  4. Going deeper: metagenome of a hadopelagic microbial community.

    PubMed

    Eloe, Emiley A; Fadrosh, Douglas W; Novotny, Mark; Zeigler Allen, Lisa; Kim, Maria; Lombardo, Mary-Jane; Yee-Greenbaum, Joyclyn; Yooseph, Shibu; Allen, Eric E; Lasken, Roger; Williamson, Shannon J; Bartlett, Douglas H

    2011-01-01

    The paucity of sequence data from pelagic deep-ocean microbial assemblages has severely restricted molecular exploration of the largest biome on Earth. In this study, an analysis is presented of a large-scale 454-pyrosequencing metagenomic dataset from a hadopelagic environment from 6,000 m depth within the Puerto Rico Trench (PRT). A total of 145 Mbp of assembled sequence data was generated and compared to two pelagic deep ocean metagenomes and two representative surface seawater datasets from the Sargasso Sea. In a number of instances, all three deep metagenomes displayed similar trends, but were most magnified in the PRT, including enrichment in functions for two-component signal transduction mechanisms and transcriptional regulation. Overrepresented transporters in the PRT metagenome included outer membrane porins, diverse cation transporters, and di- and tri-carboxylate transporters that matched well with the prevailing catabolic processes such as butanoate, glyoxylate and dicarboxylate metabolism. A surprisingly high abundance of sulfatases for the degradation of sulfated polysaccharides were also present in the PRT. The most dramatic adaptational feature of the PRT microbes appears to be heavy metal resistance, as reflected in the large numbers of transporters present for their removal. As a complement to the metagenome approach, single-cell genomic techniques were utilized to generate partial whole-genome sequence data from four uncultivated cells from members of the dominant phyla within the PRT, Alphaproteobacteria, Gammaproteobacteria, Bacteroidetes and Planctomycetes. The single-cell sequence data provided genomic context for many of the highly abundant functional attributes identified from the PRT metagenome, as well as recruiting heavily the PRT metagenomic sequence data compared to 172 available reference marine genomes. Through these multifaceted sequence approaches, new insights have been provided into the unique functional attributes present in

  5. Metagenomic analysis of kimchi, a traditional Korean fermented food.

    PubMed

    Jung, Ji Young; Lee, Se Hee; Kim, Jeong Myeong; Park, Moon Su; Bae, Jin-Woo; Hahn, Yoonsoo; Madsen, Eugene L; Jeon, Che Ok

    2011-04-01

    Kimchi, a traditional food in the Korean culture, is made from vegetables by fermentation. In this study, metagenomic approaches were used to monitor changes in bacterial populations, metabolic potential, and overall genetic features of the microbial community during the 29-day fermentation process. Metagenomic DNA was extracted from kimchi samples obtained periodically and was sequenced using a 454 GS FLX Titanium system, which yielded a total of 701,556 reads, with an average read length of 438 bp. Phylogenetic analysis based on 16S rRNA genes from the metagenome indicated that the kimchi microbiome was dominated by members of three genera: Leuconostoc, Lactobacillus, and Weissella. Assignment of metagenomic sequences to SEED categories of the Metagenome Rapid Annotation using Subsystem Technology (MG-RAST) server revealed a genetic profile characteristic of heterotrophic lactic acid fermentation of carbohydrates, which was supported by the detection of mannitol, lactate, acetate, and ethanol as fermentation products. When the metagenomic reads were mapped onto the database of completed genomes, the Leuconostoc mesenteroides subsp. mesenteroides ATCC 8293 and Lactobacillus sakei subsp. sakei 23K genomes were highly represented. These same two genera were confirmed to be important in kimchi fermentation when the majority of kimchi metagenomic sequences showed very high identity to Leuconostoc mesenteroides and Lactobacillus genes. Besides microbial genome sequences, a surprisingly large number of phage DNA sequences were identified from the cellular fractions, possibly indicating that a high proportion of cells were infected by bacteriophages during fermentation. Overall, these results provide insights into the kimchi microbial community and also shed light on fermentation processes carried out broadly by complex microbial communities.

  6. Going Deeper: Metagenome of a Hadopelagic Microbial Community

    PubMed Central

    Eloe, Emiley A.; Fadrosh, Douglas W.; Novotny, Mark; Zeigler Allen, Lisa; Kim, Maria; Lombardo, Mary-Jane; Yee-Greenbaum, Joyclyn; Yooseph, Shibu; Allen, Eric E.; Lasken, Roger; Williamson, Shannon J.; Bartlett, Douglas H.

    2011-01-01

    The paucity of sequence data from pelagic deep-ocean microbial assemblages has severely restricted molecular exploration of the largest biome on Earth. In this study, an analysis is presented of a large-scale 454-pyrosequencing metagenomic dataset from a hadopelagic environment from 6,000 m depth within the Puerto Rico Trench (PRT). A total of 145 Mbp of assembled sequence data was generated and compared to two pelagic deep ocean metagenomes and two representative surface seawater datasets from the Sargasso Sea. In a number of instances, all three deep metagenomes displayed similar trends, but were most magnified in the PRT, including enrichment in functions for two-component signal transduction mechanisms and transcriptional regulation. Overrepresented transporters in the PRT metagenome included outer membrane porins, diverse cation transporters, and di- and tri-carboxylate transporters that matched well with the prevailing catabolic processes such as butanoate, glyoxylate and dicarboxylate metabolism. A surprisingly high abundance of sulfatases for the degradation of sulfated polysaccharides were also present in the PRT. The most dramatic adaptational feature of the PRT microbes appears to be heavy metal resistance, as reflected in the large numbers of transporters present for their removal. As a complement to the metagenome approach, single-cell genomic techniques were utilized to generate partial whole-genome sequence data from four uncultivated cells from members of the dominant phyla within the PRT, Alphaproteobacteria, Gammaproteobacteria, Bacteroidetes and Planctomycetes. The single-cell sequence data provided genomic context for many of the highly abundant functional attributes identified from the PRT metagenome, as well as recruiting heavily the PRT metagenomic sequence data compared to 172 available reference marine genomes. Through these multifaceted sequence approaches, new insights have been provided into the unique functional attributes present in

  7. Environmental Metagenomics: The Data Assembly and Data Analysis Perspectives

    NASA Astrophysics Data System (ADS)

    Kumar, Vinay; Maitra, S. S.; Shukla, Rohit Nandan

    2015-01-01

    Novel gene finding is one of the emerging fields in the environmental research. In the past decades the research was focused mainly on the discovery of microorganisms which were capable of degrading a particular compound. A lot of methods are available in literature about the cultivation and screening of these novel microorganisms. All of these methods are efficient for screening of microbes which can be cultivated in the laboratory. Microorganisms which live in extreme conditions like hot springs, frozen glaciers, acid mine drainage, etc. cannot be cultivated in the laboratory, this is because of incomplete knowledge about their growth requirements like temperature, nutrients and their mutual dependence on each other. The microbes that can be cultivated correspond only to less than 1 % of the total microbes which are present in the earth. Rest of the 99 % of uncultivated majority remains inaccessible. Metagenomics transcends the culture requirements of microbes. In metagenomics DNA is directly extracted from the environmental samples such as soil, seawater, acid mine drainage etc., followed by construction and screening of metagenomic library. With the ongoing research, a huge amount of metagenomic data is accumulating. Understanding this data is an essential step to extract novel genes of industrial importance. Various bioinformatics tools have been designed to analyze and annotate the data produced from the metagenome. The Bio-informatic requirements of metagenomics data analysis are different in theory and practice. This paper reviews the tools that are available for metagenomic data analysis and the capability such tools—what they can do and their web availability.

  8. The single-species metagenome: subtyping Staphylococcus aureus core genome sequences from shotgun metagenomic data

    PubMed Central

    Li, Ben; Petit III, Robert A.; Qin, Zhaohui S.; Darrow, Lyndsey

    2016-01-01

    In this study we developed a genome-based method for detecting Staphylococcus aureus subtypes from metagenome shotgun sequence data. We used a binomial mixture model and the coverage counts at >100,000 known S. aureus SNP (single nucleotide polymorphism) sites derived from prior comparative genomic analysis to estimate the proportion of 40 subtypes in metagenome samples. We were able to obtain >87% sensitivity and >94% specificity at 0.025X coverage for S. aureus. We found that 321 and 149 metagenome samples from the Human Microbiome Project and metaSUB analysis of the New York City subway, respectively, contained S. aureus at genome coverage >0.025. In both projects, CC8 and CC30 were the most common S. aureus clonal complexes encountered. We found evidence that the subtype composition at different body sites of the same individual were more similar than random sampling and more limited evidence that certain body sites were enriched for particular subtypes. One surprising finding was the apparent high frequency of CC398, a lineage often associated with livestock, in samples from the tongue dorsum. Epidemiologic analysis of the HMP subject population suggested that high BMI (body mass index) and health insurance are possibly associated with S. aureus carriage but there was limited power to identify factors linked to carriage of even the most common subtype. In the NYC subway data, we found a small signal of geographic distance affecting subtype clustering but other unknown factors influence taxonomic distribution of the species around the city. PMID:27781166

  9. Phylogenetic Analysis of a Spontaneous Cocoa Bean Fermentation Metagenome Reveals New Insights into Its Bacterial and Fungal Community Diversity

    PubMed Central

    Illeghems, Koen; De Vuyst, Luc; Papalexandratou, Zoi; Weckx, Stefan

    2012-01-01

    This is the first report on the phylogenetic analysis of the community diversity of a single spontaneous cocoa bean box fermentation sample through a metagenomic approach involving 454 pyrosequencing. Several sequence-based and composition-based taxonomic profiling tools were used and evaluated to avoid software-dependent results and their outcome was validated by comparison with previously obtained culture-dependent and culture-independent data. Overall, this approach revealed a wider bacterial (mainly γ-Proteobacteria) and fungal diversity than previously found. Further, the use of a combination of different classification methods, in a software-independent way, helped to understand the actual composition of the microbial ecosystem under study. In addition, bacteriophage-related sequences were found. The bacterial diversity depended partially on the methods used, as composition-based methods predicted a wider diversity than sequence-based methods, and as classification methods based solely on phylogenetic marker genes predicted a more restricted diversity compared with methods that took all reads into account. The metagenomic sequencing analysis identified Hanseniaspora uvarum, Hanseniaspora opuntiae, Saccharomyces cerevisiae, Lactobacillus fermentum, and Acetobacter pasteurianus as the prevailing species. Also, the presence of occasional members of the cocoa bean fermentation process was revealed (such as Erwinia tasmaniensis, Lactobacillus brevis, Lactobacillus casei, Lactobacillus rhamnosus, Lactococcus lactis, Leuconostoc mesenteroides, and Oenococcus oeni). Furthermore, the sequence reads associated with viral communities were of a restricted diversity, dominated by Myoviridae and Siphoviridae, and reflecting Lactobacillus as the dominant host. To conclude, an accurate overview of all members of a cocoa bean fermentation process sample was revealed, indicating the superiority of metagenomic sequencing over previously used techniques. PMID:22666442

  10. Fragment Hazard Investigation Program

    DTIC Science & Technology

    1978-10-01

    53 Ballistic Density (k) ............................................. 53 Ejection A ngle (a...54 Ejection Velocity (V) ................................................. 54 DEVELOPMENT OF EMPIRICAL RELATION...5S 54 Fragment Weight Versus Gamma for Test QD-155-08 ......................... 56 55 Fragment Range Versus Ejection Angle as a Function of

  11. Fragments and Coherence

    ERIC Educational Resources Information Center

    Watson, Anne

    2008-01-01

    Can teachers contact the inner coherence of mathematics while working in a context fragmented by always-new objectives, criteria, and initiatives? How, more importantly, can learners experience the inner coherence of mathematics while working in a context fragmented by testing, modular curricular, short-term learning objectives, and lessons that…

  12. Fragmentation of fullerenes

    NASA Astrophysics Data System (ADS)

    Chancey, Ryan T.; Oddershede, Lene; Harris, Frank E.; Sabin, John R.

    2003-04-01

    We have performed classical molecular-dynamics simulations of the fragmentation collisions of neutral fullerenes (C24, C60, C100, and C240) with a hard wall. The interactions between the carbon atoms are modeled by a Tersoff potential and the position of each carbon atom at each time step is calculated using a sixth-order predictor-corrector method. The statistical distribution of the fragments depends on impact energy. At low energies, the fragment distribution appears symmetric, with both the large and small fragment distributions well fitted by an exponential function of the same exponent, the value of which decreases with impact energy. At intermediate energies, the distribution of the smallest fragments can be fitted equally well by a power law or an exponential function. At high impact energies, the entire fragmentation pattern is well described by a single exponential function, the exponent increasing with energy. The observed tendencies in fragment distributions as well as the obtained exponents are in agreement with experimental observations. The fragmentation behavior of the four investigated fullerenes is very similar, and it is noted that C60 appears to be the most stable.

  13. Fragmentation of Shells

    NASA Astrophysics Data System (ADS)

    Wittel, F.; Kun, F.; Herrmann, H. J.; Kröplin, B. H.

    2004-07-01

    We present a theoretical and experimental study of the fragmentation of closed thin shells made of a disordered brittle material. Experiments were performed on brown and white hen egg shells under two different loading conditions: impact with a hard wall and explosion by a combustible mixture. Both give rise to power law fragment size distributions. A three-dimensional discrete element model of shells is worked out. Based on simulations of the model, we give evidence that power law fragment mass distributions arise due to an underlying phase transition which proved to be abrupt for explosion and continuous for impact. We demonstrate that the fragmentation of closed shells defines a new universality class of fragmentation phenomena.

  14. An introduction to the analysis of shotgun metagenomic data.

    PubMed

    Sharpton, Thomas J

    2014-01-01

    Environmental DNA sequencing has revealed the expansive biodiversity of microorganisms and clarified the relationship between host-associated microbial communities and host phenotype. Shotgun metagenomic DNA sequencing is a relatively new and powerful environmental sequencing approach that provides insight into community biodiversity and function. But, the analysis of metagenomic sequences is complicated due to the complex structure of the data. Fortunately, new tools and data resources have been developed to circumvent these complexities and allow researchers to determine which microbes are present in the community and what they might be doing. This review describes the analytical strategies and specific tools that can be applied to metagenomic data and the considerations and caveats associated with their use. Specifically, it documents how metagenomes can be analyzed to quantify community structure and diversity, assemble novel genomes, identify new taxa and genes, and determine which metabolic pathways are encoded in the community. It also discusses several methods that can be used compare metagenomes to identify taxa and functions that differentiate communities.

  15. Application of metagenomics in the human gut microbiome

    PubMed Central

    Wang, Wei-Lin; Xu, Shao-Yan; Ren, Zhi-Gang; Tao, Liang; Jiang, Jian-Wen; Zheng, Shu-Sen

    2015-01-01

    There are more than 1000 microbial species living in the complex human intestine. The gut microbial community plays an important role in protecting the host against pathogenic microbes, modulating immunity, regulating metabolic processes, and is even regarded as an endocrine organ. However, traditional culture methods are very limited for identifying microbes. With the application of molecular biologic technology in the field of the intestinal microbiome, especially metagenomic sequencing of the next-generation sequencing technology, progress has been made in the study of the human intestinal microbiome. Metagenomics can be used to study intestinal microbiome diversity and dysbiosis, as well as its relationship to health and disease. Moreover, functional metagenomics can identify novel functional genes, microbial pathways, antibiotic resistance genes, functional dysbiosis of the intestinal microbiome, and determine interactions and co-evolution between microbiota and host, though there are still some limitations. Metatranscriptomics, metaproteomics and metabolomics represent enormous complements to the understanding of the human gut microbiome. This review aims to demonstrate that metagenomics can be a powerful tool in studying the human gut microbiome with encouraging prospects. The limitations of metagenomics to be overcome are also discussed. Metatranscriptomics, metaproteomics and metabolomics in relation to the study of the human gut microbiome are also briefly discussed. PMID:25624713

  16. Application of metagenomics in the human gut microbiome.

    PubMed

    Wang, Wei-Lin; Xu, Shao-Yan; Ren, Zhi-Gang; Tao, Liang; Jiang, Jian-Wen; Zheng, Shu-Sen

    2015-01-21

    There are more than 1000 microbial species living in the complex human intestine. The gut microbial community plays an important role in protecting the host against pathogenic microbes, modulating immunity, regulating metabolic processes, and is even regarded as an endocrine organ. However, traditional culture methods are very limited for identifying microbes. With the application of molecular biologic technology in the field of the intestinal microbiome, especially metagenomic sequencing of the next-generation sequencing technology, progress has been made in the study of the human intestinal microbiome. Metagenomics can be used to study intestinal microbiome diversity and dysbiosis, as well as its relationship to health and disease. Moreover, functional metagenomics can identify novel functional genes, microbial pathways, antibiotic resistance genes, functional dysbiosis of the intestinal microbiome, and determine interactions and co-evolution between microbiota and host, though there are still some limitations. Metatranscriptomics, metaproteomics and metabolomics represent enormous complements to the understanding of the human gut microbiome. This review aims to demonstrate that metagenomics can be a powerful tool in studying the human gut microbiome with encouraging prospects. The limitations of metagenomics to be overcome are also discussed. Metatranscriptomics, metaproteomics and metabolomics in relation to the study of the human gut microbiome are also briefly discussed.

  17. Accurate genome relative abundance estimation based on shotgun metagenomic reads.

    PubMed

    Xia, Li C; Cram, Jacob A; Chen, Ting; Fuhrman, Jed A; Sun, Fengzhu

    2011-01-01

    Accurate estimation of microbial community composition based on metagenomic sequencing data is fundamental for subsequent metagenomics analysis. Prevalent estimation methods are mainly based on directly summarizing alignment results or its variants; often result in biased and/or unstable estimates. We have developed a unified probabilistic framework (named GRAMMy) by explicitly modeling read assignment ambiguities, genome size biases and read distributions along the genomes. Maximum likelihood method is employed to compute Genome Relative Abundance of microbial communities using the Mixture Model theory (GRAMMy). GRAMMy has been demonstrated to give estimates that are accurate and robust across both simulated and real read benchmark datasets. We applied GRAMMy to a collection of 34 metagenomic read sets from four metagenomics projects and identified 99 frequent species (minimally 0.5% abundant in at least 50% of the data-sets) in the human gut samples. Our results show substantial improvements over previous studies, such as adjusting the over-estimated abundance for Bacteroides species for human gut samples, by providing a new reference-based strategy for metagenomic sample comparisons. GRAMMy can be used flexibly with many read assignment tools (mapping, alignment or composition-based) even with low-sensitivity mapping results from huge short-read datasets. It will be increasingly useful as an accurate and robust tool for abundance estimation with the growing size of read sets and the expanding database of reference genomes.

  18. Metagenomic analysis of permafrost microbial community response to thaw

    SciTech Connect

    Mackelprang, R.; Waldrop, M.P.; DeAngelis, K.M.; David, M.M.; Chavarria, K.L.; Blazewicz, S.J.; Rubin, E.M.; Jansson, J.K.

    2011-07-01

    We employed deep metagenomic sequencing to determine the impact of thaw on microbial phylogenetic and functional genes and related this data to measurements of methane emissions. Metagenomics, the direct sequencing of DNA from the environment, allows for the examination of whole biochemical pathways and associated processes, as opposed to individual pieces of the metabolic puzzle. Our metagenome analyses revealed that during transition from a frozen to a thawed state there were rapid shifts in many microbial, phylogenetic and functional gene abundances and pathways. After one week of incubation at 5°C, permafrost metagenomes converged to be more similar to each other than while they were frozen. We found that multiple genes involved in cycling of C and nitrogen shifted rapidly during thaw. We also constructed the first draft genome from a complex soil metagenome, which corresponded to a novel methanogen. Methane previously accumulated in permafrost was released during thaw and subsequently consumed by methanotrophic bacteria. Together these data point towards the importance of rapid cycling of methane and nitrogen in thawing permafrost.

  19. Metagenomic Insights into Transferable Antibiotic Resistance in Oral Bacteria.

    PubMed

    Sukumar, S; Roberts, A P; Martin, F E; Adler, C J

    2016-08-01

    Antibiotic resistance is considered one of the greatest threats to global public health. Resistance is often conferred by the presence of antibiotic resistance genes (ARGs), which are readily found in the oral microbiome. In-depth genetic analyses of the oral microbiome through metagenomic techniques reveal a broad distribution of ARGs (including novel ARGs) in individuals not recently exposed to antibiotics, including humans in isolated indigenous populations. This has resulted in a paradigm shift from focusing on the carriage of antibiotic resistance in pathogenic bacteria to a broader concept of an oral resistome, which includes all resistance genes in the microbiome. Metagenomics is beginning to demonstrate the role of the oral resistome and horizontal gene transfer within and between commensals in the absence of selective pressure, such as an antibiotic. At the chairside, metagenomic data reinforce our need to adhere to current antibiotic guidelines to minimize the spread of resistance, as such data reveal the extent of ARGs without exposure to antimicrobials and the ecologic changes created in the oral microbiome by even a single dose of antibiotics. The aim of this review is to discuss the role of metagenomics in the investigation of the oral resistome, including the transmission of antibiotic resistance in the oral microbiome. Future perspectives, including clinical implications of the findings from metagenomic investigations of oral ARGs, are also considered. © International & American Associations for Dental Research 2016.

  20. An introduction to the analysis of shotgun metagenomic data

    PubMed Central

    Sharpton, Thomas J.

    2014-01-01

    Environmental DNA sequencing has revealed the expansive biodiversity of microorganisms and clarified the relationship between host-associated microbial communities and host phenotype. Shotgun metagenomic DNA sequencing is a relatively new and powerful environmental sequencing approach that provides insight into community biodiversity and function. But, the analysis of metagenomic sequences is complicated due to the complex structure of the data. Fortunately, new tools and data resources have been developed to circumvent these complexities and allow researchers to determine which microbes are present in the community and what they might be doing. This review describes the analytical strategies and specific tools that can be applied to metagenomic data and the considerations and caveats associated with their use. Specifically, it documents how metagenomes can be analyzed to quantify community structure and diversity, assemble novel genomes, identify new taxa and genes, and determine which metabolic pathways are encoded in the community. It also discusses several methods that can be used compare metagenomes to identify taxa and functions that differentiate communities. PMID:24982662

  1. Longitudinal Metagenomic Analysis of Hospital Air Identifies Clinically Relevant Microbes

    PubMed Central

    King, Paula; Pham, Long K.; Waltz, Shannon; Sphar, Dan; Yamamoto, Robert T.; Conrad, Douglas; Taplitz, Randy; Torriani, Francesca

    2016-01-01

    We describe the sampling of sixty-three uncultured hospital air samples collected over a six-month period and analysis using shotgun metagenomic sequencing. Our primary goals were to determine the longitudinal metagenomic variability of this environment, identify and characterize genomes of potential pathogens and determine whether they are atypical to the hospital airborne metagenome. Air samples were collected from eight locations which included patient wards, the main lobby and outside. The resulting DNA libraries produced 972 million sequences representing 51 gigabases. Hierarchical clustering of samples by the most abundant 50 microbial orders generated three major nodes which primarily clustered by type of location. Because the indoor locations were longitudinally consistent, episodic relative increases in microbial genomic signatures related to the opportunistic pathogens Aspergillus, Penicillium and Stenotrophomonas were identified as outliers at specific locations. Further analysis of microbial reads specific for Stenotrophomonas maltophilia indicated homology to a sequenced multi-drug resistant clinical strain and we observed broad sequence coverage of resistance genes. We demonstrate that a shotgun metagenomic sequencing approach can be used to characterize the resistance determinants of pathogen genomes that are uncharacteristic for an otherwise consistent hospital air microbial metagenomic profile. PMID:27482891

  2. Structure, fluctuation and magnitude of a natural grassland soil metagenome

    PubMed Central

    Delmont, Tom O; Prestat, Emmanuel; Keegan, Kevin P; Faubladier, Michael; Robe, Patrick; Clark, Ian M; Pelletier, Eric; Hirsch, Penny R; Meyer, Folker; Gilbert, Jack A; Le Paslier, Denis; Simonet, Pascal; Vogel, Timothy M

    2012-01-01

    The soil ecosystem is critical for human health, affecting aspects of the environment from key agricultural and edaphic parameters to critical influence on climate change. Soil has more unknown biodiversity than any other ecosystem. We have applied diverse DNA extraction methods coupled with high throughput pyrosequencing to explore 4.88 × 109 bp of metagenomic sequence data from the longest continually studied soil environment (Park Grass experiment at Rothamsted Research in the UK). Results emphasize important DNA extraction biases and unexpectedly low seasonal and vertical soil metagenomic functional class variations. Clustering-based subsystems and carbohydrate metabolism had the largest quantity of annotated reads assigned although <50% of reads were assigned at an E value cutoff of 10−5. In addition, with the more detailed subsystems, cAMP signaling in bacteria (3.24±0.27% of the annotated reads) and the Ton and Tol transport systems (1.69±0.11%) were relatively highly represented. The most highly represented genome from the database was that for a Bradyrhizobium species. The metagenomic variance created by integrating natural and methodological fluctuations represents a global picture of the Rothamsted soil metagenome that can be used for specific questions and future inter-environmental metagenomic comparisons. However, only 1% of annotated sequences correspond to already sequenced genomes at 96% similarity and E values of <10−5, thus, considerable genomic reconstructions efforts still have to be performed. PMID:22297556

  3. Exploratory experimentation and scientific practice: metagenomics and the proteorhodopsin case.

    PubMed

    O'Malley, Maureen A

    2007-01-01

    Exploratory experimentation and high-throughput molecular biology appear to have considerable affinity for each other. Included in the latter category is metagenomics, which is the DNA-based study of diverse microbial communities from a vast range of non-laboratory environments. Metagenomics has already made numerous discoveries and these have led to reinterpretations of fundamental concepts of microbial organization, evolution, and ecology. The most outstanding success story of metagenomics to date involves the discovery of a rhodopsin gene, named proteorhodopsin, in marine bacteria that were never suspected to have any photobiological capacities. A discussion of this finding and its detailed investigation illuminates the relationship between exploratory experimentation and metagenomics. Specifically, the proteorhodopsin story indicates that a dichotomous interpretation of theory-driven and exploratory experimentation is insufficient and that an interactive understanding of these two types of experimentation can be usefully supplemented by another category, "natural history experimentation". Further reflection on the context of metagenomics suggests the necessity of thinking more historically about exploratory and other forms of experimentation.

  4. Longitudinal Metagenomic Analysis of Hospital Air Identifies Clinically Relevant Microbes.

    PubMed

    King, Paula; Pham, Long K; Waltz, Shannon; Sphar, Dan; Yamamoto, Robert T; Conrad, Douglas; Taplitz, Randy; Torriani, Francesca; Forsyth, R Allyn

    2016-01-01

    We describe the sampling of sixty-three uncultured hospital air samples collected over a six-month period and analysis using shotgun metagenomic sequencing. Our primary goals were to determine the longitudinal metagenomic variability of this environment, identify and characterize genomes of potential pathogens and determine whether they are atypical to the hospital airborne metagenome. Air samples were collected from eight locations which included patient wards, the main lobby and outside. The resulting DNA libraries produced 972 million sequences representing 51 gigabases. Hierarchical clustering of samples by the most abundant 50 microbial orders generated three major nodes which primarily clustered by type of location. Because the indoor locations were longitudinally consistent, episodic relative increases in microbial genomic signatures related to the opportunistic pathogens Aspergillus, Penicillium and Stenotrophomonas were identified as outliers at specific locations. Further analysis of microbial reads specific for Stenotrophomonas maltophilia indicated homology to a sequenced multi-drug resistant clinical strain and we observed broad sequence coverage of resistance genes. We demonstrate that a shotgun metagenomic sequencing approach can be used to characterize the resistance determinants of pathogen genomes that are uncharacteristic for an otherwise consistent hospital air microbial metagenomic profile.

  5. Comparative metagenome analysis of an Alaskan glacier.

    PubMed

    Choudhari, Sulbha; Lohia, Ruchi; Grigoriev, Andrey

    2014-04-01

    The temperature in the Arctic region has been increasing in the recent past accompanied by melting of its glaciers. We took a snapshot of the current microbial inhabitation of an Alaskan glacier (which can be considered as one of the simplest possible ecosystems) by using metagenomic sequencing of 16S rRNA recovered from ice/snow samples. Somewhat contrary to our expectations and earlier estimates, a rich and diverse microbial population of more than 2,500 species was revealed including several species of Archaea that has been identified for the first time in the glaciers of the Northern hemisphere. The most prominent bacterial groups found were Proteobacteria, Bacteroidetes, and Firmicutes. Firmicutes were not reported in large numbers in a previously studied Alpine glacier but were dominant in an Antarctic subglacial lake. Representatives of Cyanobacteria, Actinobacteria and Planctomycetes were among the most numerous, likely reflecting the dependence of the ecosystem on the energy obtained through photosynthesis and close links with the microbial community of the soil. Principal component analysis (PCA) of nucleotide word frequency revealed distinct sequence clusters for different taxonomic groups in the Alaskan glacier community and separate clusters for the glacial communities from other regions of the world. Comparative analysis of the community composition and bacterial diversity present in the Byron glacier in Alaska with other environments showed larger overlap with an Arctic soil than with a high Arctic lake, indicating patterns of community exchange and suggesting that these bacteria may play an important role in soil development during glacial retreat.

  6. The oral metagenome in health and disease

    PubMed Central

    Belda-Ferre, Pedro; Alcaraz, Luis David; Cabrera-Rubio, Raúl; Romero, Héctor; Simón-Soro, Aurea; Pignatelli, Miguel; Mira, Alex

    2012-01-01

    The oral cavity of humans is inhabited by hundreds of bacterial species and some of them have a key role in the development of oral diseases, mainly dental caries and periodontitis. We describe for the first time the metagenome of the human oral cavity under health and diseased conditions, with a focus on supragingival dental plaque and cavities. Direct pyrosequencing of eight samples with different oral-health status produced 1 Gbp of sequence without the biases imposed by PCR or cloning. These data show that cavities are not dominated by Streptococcus mutans (the species originally identified as the ethiological agent of dental caries) but are in fact a complex community formed by tens of bacterial species, in agreement with the view that caries is a polymicrobial disease. The analysis of the reads indicated that the oral cavity is functionally a different environment from the gut, with many functional categories enriched in one of the two environments and depleted in the other. Individuals who had never suffered from dental caries showed an over-representation of several functional categories, like genes for antimicrobial peptides and quorum sensing. In addition, they did not have mutans streptococci but displayed high recruitment of other species. Several isolates belonging to these dominant bacteria in healthy individuals were cultured and shown to inhibit the growth of cariogenic bacteria, suggesting the use of these commensal bacterial strains as probiotics to promote oral health and prevent dental caries. PMID:21716308

  7. Metagenomic scaffolds enable combinatorial lignin transformation

    PubMed Central

    Strachan, Cameron R.; Singh, Rahul; VanInsberghe, David; Ievdokymenko, Kateryna; Budwill, Karen; Mohn, William W.; Eltis, Lindsay D.; Hallam, Steven J.

    2014-01-01

    Engineering the microbial transformation of lignocellulosic biomass is essential to developing modern biorefining processes that alleviate reliance on petroleum-derived energy and chemicals. Many current bioprocess streams depend on the genetic tractability of Escherichia coli with a primary emphasis on engineering cellulose/hemicellulose catabolism, small molecule production, and resistance to product inhibition. Conversely, bioprocess streams for lignin transformation remain embryonic, with relatively few environmental strains or enzymes implicated. Here we develop a biosensor responsive to monoaromatic lignin transformation products compatible with functional screening in E. coli. We use this biosensor to retrieve metagenomic scaffolds sourced from coal bed bacterial communities conferring an array of lignin transformation phenotypes that synergize in combination. Transposon mutagenesis and comparative sequence analysis of active clones identified genes encoding six functional classes mediating lignin transformation phenotypes that appear to be rearrayed in nature via horizontal gene transfer. Lignin transformation activity was then demonstrated for one of the predicted gene products encoding a multicopper oxidase to validate the screen. These results illuminate cellular and community-wide networks acting on aromatic polymers and expand the toolkit for engineering recombinant lignin transformation based on ecological design principles. PMID:24982175

  8. Metagenomic analysis of phosphorus removing sludgecommunities

    SciTech Connect

    Garcia Martin, Hector; Ivanova, Natalia; Kunin, Victor; Warnecke,Falk; Barry, Kerrie; McHardy, Alice C.; Yeates, Christine; He, Shaomei; Salamov, Asaf; Szeto, Ernest; Dalin, Eileen; Putnam, Nik; Shapiro, HarrisJ.; Pangilinan, Jasmyn L.; Rigoutsos, Isidore; Kyrpides, Nikos C.; Blackall, Linda Louise; McMahon, Katherine D.; Hugenholtz, Philip

    2006-02-01

    Enhanced Biological Phosphorus Removal (EBPR) is not wellunderstood at the metabolic level despite being one of the best-studiedmicrobially-mediated industrial processes due to its ecological andeconomic relevance. Here we present a metagenomic analysis of twolab-scale EBPR sludges dominated by the uncultured bacterium, "CandidatusAccumulibacter phosphatis." This analysis resolves several controversiesin EBPR metabolic models and provides hypotheses explaining the dominanceof A. phosphatis in this habitat, its lifestyle outside EBPR and probablecultivation requirements. Comparison of the same species from differentEBPR sludges highlights recent evolutionary dynamics in the A. phosphatisgenome that could be linked to mechanisms for environmental adaptation.In spite of an apparent lack of phylogenetic overlap in the flankingcommunities of the two sludges studied, common functional themes werefound, at least one of them complementary to the inferred metabolism ofthe dominant organism. The present study provides a much-needed blueprintfor a systems-level understanding of EBPR and illustrates thatmetagenomics enables detailed, often novel, insights into evenwell-studied biological systems.

  9. Metagenomic analysis of stressed coral holobionts.

    PubMed

    Vega Thurber, Rebecca; Willner-Hall, Dana; Rodriguez-Mueller, Beltran; Desnues, Christelle; Edwards, Robert A; Angly, Florent; Dinsdale, Elizabeth; Kelly, Linda; Rohwer, Forest

    2009-08-01

    The coral holobiont is the community of metazoans, protists and microbes associated with scleractinian corals. Disruptions in these associations have been correlated with coral disease, but little is known about the series of events involved in the shift from mutualism to pathogenesis. To evaluate structural and functional changes in coral microbial communities, Porites compressa was exposed to four stressors: increased temperature, elevated nutrients, dissolved organic carbon loading and reduced pH. Microbial metagenomic samples were collected and pyrosequenced. Functional gene analysis demonstrated that stressors increased the abundance of microbial genes involved in virulence, stress resistance, sulfur and nitrogen metabolism, motility and chemotaxis, fatty acid and lipid utilization, and secondary metabolism. Relative changes in taxonomy also demonstrated that coral-associated microbiota (Archaea, Bacteria, protists) shifted from a healthy-associated coral community (e.g. Cyanobacteria, Proteobacteria and the zooxanthellae Symbiodinium) to a community (e.g. Bacteriodetes, Fusobacteria and Fungi) of microbes often found on diseased corals. Additionally, low-abundance Vibrio spp. were found to significantly alter microbiome metabolism, suggesting that the contribution of a just a few members of a community can profoundly shift the health status of the coral holobiont.

  10. Opaque rock fragments

    SciTech Connect

    Abhijit, B.; Molinaroli, E.; Olsen, J.

    1987-05-01

    The authors describe a new, rare, but petrogenetically significant variety of rock fragments from Holocene detrital sediments. Approximately 50% of the opaque heavy mineral concentrates from Holocene siliciclastic sands are polymineralic-Fe-Ti oxide particles, i.e., they are opaque rock fragments. About 40% to 70% of these rock fragments show intergrowth of hm + il, mt + il, and mt + hm +/- il. Modal analysis of 23,282 opaque particles in 117 polished thin sections of granitic and metamorphic parent rocks and their daughter sands from semi-arid and humid climates show the following relative abundances. The data show that opaque rock fragments are more common in sands from igneous source rocks and that hm + il fragments are more durable. They assume that equilibrium conditions existed in parent rocks during the growth of these paired minerals, and that the Ti/Fe ratio did not change during oxidation of mt to hm. Geothermometric determinations using electron probe microanalysis of opaque rock fragments in sand samples from Lake Erie and the Adriatic Sea suggest that these rock fragments may have equilibrated at approximately 900/sup 0/ and 525/sup 0/C, respectively.

  11. Fragmentation trees reloaded.

    PubMed

    Böcker, Sebastian; Dührkop, Kai

    2016-01-01

    Untargeted metabolomics commonly uses liquid chromatography mass spectrometry to measure abundances of metabolites; subsequent tandem mass spectrometry is used to derive information about individual compounds. One of the bottlenecks in this experimental setup is the interpretation of fragmentation spectra to accurately and efficiently identify compounds. Fragmentation trees have become a powerful tool for the interpretation of tandem mass spectrometry data of small molecules. These trees are determined from the data using combinatorial optimization, and aim at explaining the experimental data via fragmentation cascades. Fragmentation tree computation does not require spectral or structural databases. To obtain biochemically meaningful trees, one needs an elaborate optimization function (scoring). We present a new scoring for computing fragmentation trees, transforming the combinatorial optimization into a Maximum A Posteriori estimator. We demonstrate the superiority of the new scoring for two tasks: both for the de novo identification of molecular formulas of unknown compounds, and for searching a database for structurally similar compounds, our method SIRIUS 3, performs significantly better than the previous version of our method, as well as other methods for this task. SIRIUS 3 can be a part of an untargeted metabolomics workflow, allowing researchers to investigate unknowns using automated computational methods.Graphical abstractWe present a new scoring for computing fragmentation trees from tandem mass spectrometry data based on Bayesian statistics. The best scoring fragmentation tree most likely explains the molecular formula of the measured parent ion.

  12. Fragmentation of monoclonal antibodies

    PubMed Central

    Vlasak, Josef

    2011-01-01

    Fragmentation is a degradation pathway ubiquitously observed in proteins despite the remarkable stability of peptide bond; proteins differ only by how much and where cleavage occurs. The goal of this review is to summarize reports regarding the non-enzymatic fragmentation of the peptide backbone of monoclonal antibodies (mAbs). The sites in the polypeptide chain susceptible to fragmentation are determined by a multitude of factors. Insights are provided on the intimate chemical mechanisms that can make some bonds prone to cleavage due to the presence of specific side-chains. In addition to primary structure, the secondary, tertiary and quaternary structures have a significant impact in modulating the distribution of cleavage sites by altering local flexibility, accessibility to solvent or bringing in close proximity side chains that are remote in sequence. This review focuses on cleavage sites observed in the constant regions of mAbs, with special emphasis on hinge fragmentation. The mechanisms responsible for backbone cleavage are strongly dependent on pH and can be catalyzed by metals or radicals. The distribution of cleavage sites are different under acidic compared to basic conditions, with fragmentation rates exhibiting a minimum in the pH range 5–6; therefore, the overall fragmentation pattern observed for a mAb is a complex result of structural and solvent conditions. A critical review of the techniques used to monitor fragmentation is also presented; usually a compromise has to be made between a highly sensitive method with good fragment separation and the capability to identify the cleavage site. The effect of fragmentation on the function of a mAb must be evaluated on a case-by-case basis depending on whether cleavage sites are observed in the variable or constant regions, and on the mechanism of action of the molecule. PMID:21487244

  13. Recovery of a Medieval Brucella melitensis Genome Using Shotgun Metagenomics

    PubMed Central

    Kay, Gemma L.; Sergeant, Martin J.; Giuffra, Valentina; Bandiera, Pasquale; Milanese, Marco; Bramanti, Barbara

    2014-01-01

    ABSTRACT Shotgun metagenomics provides a powerful assumption-free approach to the recovery of pathogen genomes from contemporary and historical material. We sequenced the metagenome of a calcified nodule from the skeleton of a 14th-century middle-aged male excavated from the medieval Sardinian settlement of Geridu. We obtained 6.5-fold coverage of a Brucella melitensis genome. Sequence reads from this genome showed signatures typical of ancient or aged DNA. Despite the relatively low coverage, we were able to use information from single-nucleotide polymorphisms to place the medieval pathogen genome within a clade of B. melitensis strains that included the well-studied Ether strain and two other recent Italian isolates. We confirmed this placement using information from deletions and IS711 insertions. We conclude that metagenomics stands ready to document past and present infections, shedding light on the emergence, evolution, and spread of microbial pathogens. PMID:25028426

  14. Application of metagenomics in understanding oral health and disease.

    PubMed

    Xu, Ping; Gunsolley, John

    2014-04-01

    Oral diseases including periodontal disease and caries are some of the most prevalent infectious diseases in humans. Different microbial species cohabitate and form a polymicrobial biofilm called dental plaque in the oral cavity. Metagenomics using next generation sequencing technologies has produced bacterial profiles and genomic profiles to study the relationships between microbial diversity, genetic variation, and oral diseases. Several oral metagenomic studies have examined the oral microbiome of periodontal disease and caries. Gene annotations in these studies support the association of specific genes or metabolic pathways with oral health and with specific diseases. The roles of pathogenic species and functions of specific genes in oral disease development have been recognized by metagenomic analysis. A model is proposed in which three levels of interactions occur in the oral microbiome that determines oral health or disease.

  15. Is metagenomics resolving identification of functions in microbial communities?

    PubMed Central

    Chistoserdova, Ludmila

    2014-01-01

    We are coming up on the tenth anniversary of the broad use of the method involving whole metagenome shotgun sequencing, referred to as metagenomics. The application of this approach has definitely revolutionized microbiology and the related fields, including the realization of the importance of the human microbiome. As such, metagenomics has already provided a novel outlook on the complexity and dynamics of microbial communities that are an important part of the biosphere of the planet. Accumulation of massive amounts of sequence data also caused a surge in the development of bioinformatics tools specially designed to provide pipelines for data analysis and visualization. However, a critical outlook into the field is required to appreciate what could be and what has currently been gained from the massive sequence databases that are being generated with ever-increasing speed. PMID:23945370

  16. Is metagenomics resolving identification of functions in microbial communities?

    PubMed

    Chistoserdova, Ludmila

    2014-01-01

    We are coming up on the tenth anniversary of the broad use of the method involving whole metagenome shotgun sequencing, referred to as metagenomics. The application of this approach has definitely revolutionized microbiology and the related fields, including the realization of the importance of the human microbiome. As such, metagenomics has already provided a novel outlook on the complexity and dynamics of microbial communities that are an important part of the biosphere of the planet. Accumulation of massive amounts of sequence data also caused a surge in the development of bioinformatics tools specially designed to provide pipelines for data analysis and visualization. However, a critical outlook into the field is required to appreciate what could be and what has currently been gained from the massive sequence databases that are being generated with ever-increasing speed. © 2013 The Author. Microbial Biotechnology published by John Wiley & Sons Ltd and Society for Applied Microbiology.

  17. New dimensions of the virus world discovered through metagenomics.

    PubMed

    Kristensen, David M; Mushegian, Arcady R; Dolja, Valerian V; Koonin, Eugene V

    2010-01-01

    Metagenomic analysis of viruses suggests novel patterns of evolution, changes the existing ideas of the composition of the virus world and reveals novel groups of viruses and virus-like agents. The gene composition of the marine DNA virome is dramatically different from that of known bacteriophages. The virome is dominated by rare genes, many of which might be contained within virus-like entities such as gene transfer agents. Analysis of marine metagenomes thought to consist mostly of bacterial genes revealed a variety of sequences homologous to conserved genes of eukaryotic nucleocytoplasmic large DNA viruses, resulting in the discovery of diverse members of previously undersampled groups and suggesting the existence of new classes of virus-like agents. Unexpectedly, metagenomics of marine RNA viruses showed that representatives of only one superfamily of eukaryotic viruses, the picorna-like viruses, dominate the RNA virome.

  18. Recovering complete and draft population genomes from metagenome datasets

    DOE PAGES

    Sangwan, Naseer; Xia, Fangfang; Gilbert, Jack A.

    2016-03-08

    Assembly of metagenomic sequence data into microbial genomes is of fundamental value to improving our understanding of microbial ecology and metabolism by elucidating the functional potential of hard-to-culture microorganisms. Here, we provide a synthesis of available methods to bin metagenomic contigs into species-level groups and highlight how genetic diversity, sequencing depth, and coverage influence binning success. Despite the computational cost on application to deeply sequenced complex metagenomes (e.g., soil), covarying patterns of contig coverage across multiple datasets significantly improves the binning process. We also discuss and compare current genome validation methods and reveal how these methods tackle the problem ofmore » chimeric genome bins i.e., sequences from multiple species. Finally, we explore how population genome assembly can be used to uncover biogeographic trends and to characterize the effect of in situ functional constraints on the genome-wide evolution.« less

  19. Ray Meta: scalable de novo metagenome assembly and profiling

    PubMed Central

    2012-01-01

    Voluminous parallel sequencing datasets, especially metagenomic experiments, require distributed computing for de novo assembly and taxonomic profiling. Ray Meta is a massively distributed metagenome assembler that is coupled with Ray Communities, which profiles microbiomes based on uniquely-colored k-mers. It can accurately assemble and profile a three billion read metagenomic experiment representing 1,000 bacterial genomes of uneven proportions in 15 hours with 1,024 processor cores, using only 1.5 GB per core. The software will facilitate the processing of large and complex datasets, and will help in generating biological insights for specific environments. Ray Meta is open source and available at http://denovoassembler.sf.net. PMID:23259615

  20. Application of metagenomics in understanding oral health and disease

    PubMed Central

    Xu, Ping; Gunsolley, John

    2014-01-01

    Oral diseases including periodontal disease and caries are some of the most prevalent infectious diseases in humans. Different microbial species cohabitate and form a polymicrobial biofilm called dental plaque in the oral cavity. Metagenomics using next generation sequencing technologies has produced bacterial profiles and genomic profiles to study the relationships between microbial diversity, genetic variation, and oral diseases. Several oral metagenomic studies have examined the oral microbiome of periodontal disease and caries. Gene annotations in these studies support the association of specific genes or metabolic pathways with oral health and with specific diseases. The roles of pathogenic species and functions of specific genes in oral disease development have been recognized by metagenomic analysis. A model is proposed in which three levels of interactions occur in the oral microbiome that determines oral health or disease. PMID:24642489

  1. Optimized DNA extraction and metagenomic sequencing of airborne microbial communities.

    PubMed

    Jiang, Wenjun; Liang, Peng; Wang, Buying; Fang, Jianhuo; Lang, Jidong; Tian, Geng; Jiang, Jingkun; Zhu, Ting F

    2015-05-01

    Metagenomic sequencing has been widely used for the study of microbial communities from various environments such as soil, ocean, sediment and fresh water. Nonetheless, metagenomic sequencing of microbial communities in the air remains technically challenging, partly owing to the limited mass of collectable atmospheric particulate matter and the low biological content it contains. Here we present an optimized protocol for extracting up to tens of nanograms of airborne microbial genomic DNA from collected particulate matter. With an improved sequencing library preparation protocol, this quantity is sufficient for downstream applications, such as metagenomic sequencing for sampling various genes from the airborne microbial community. The described protocol takes ∼12 h of bench time over 2-3 d, and it can be performed with standard molecular biology equipment in the laboratory. A modified version of this protocol may also be used for genomic DNA extraction from other environmental samples of limited mass or low biological content.

  2. Recovering complete and draft population genomes from metagenome datasets

    SciTech Connect

    Sangwan, Naseer; Xia, Fangfang; Gilbert, Jack A.

    2016-03-08

    Assembly of metagenomic sequence data into microbial genomes is of fundamental value to improving our understanding of microbial ecology and metabolism by elucidating the functional potential of hard-to-culture microorganisms. Here, we provide a synthesis of available methods to bin metagenomic contigs into species-level groups and highlight how genetic diversity, sequencing depth, and coverage influence binning success. Despite the computational cost on application to deeply sequenced complex metagenomes (e.g., soil), covarying patterns of contig coverage across multiple datasets significantly improves the binning process. We also discuss and compare current genome validation methods and reveal how these methods tackle the problem of chimeric genome bins i.e., sequences from multiple species. Finally, we explore how population genome assembly can be used to uncover biogeographic trends and to characterize the effect of in situ functional constraints on the genome-wide evolution.

  3. The potential of viral metagenomics in blood transfusion safety.

    PubMed

    Sauvage, V; Gomez, J; Boizeau, L; Laperche, S

    2017-09-01

    Thanks to the significant advent of high throughput sequencing in the last ten years, it is now possible via metagenomics to define the spectrum of the microbial sequences present in human blood samples. Therefore, metagenomics sequencing appears as a promising approach for the identification and global surveillance of new, emerging and/or unexpected viruses that could impair blood transfusion safety. However, despite considerable advantages compared to the traditional methods of pathogen identification, this non-targeted approach presents several drawbacks including a lack of sensitivity and sequence contaminant issues. With further improvements, especially to increase sensitivity, metagenomics sequencing should become in a near future an additional diagnostic tool in infectious disease field and especially in blood transfusion safety. Copyright © 2017 Elsevier Masson SAS. All rights reserved.

  4. New dimensions of the virus world discovered through metagenomics

    PubMed Central

    Kristensen, David M.; Mushegian, Arcady R.; Dolja, Valerian V.; Koonin, Eugene V.

    2012-01-01

    Metagenomic analysis of viruses suggests novel patterns of evolution, changes the existing ideas of the composition of the virus world and reveals novel groups of viruses and virus-like agents. The gene composition of the marine DNA virome is dramatically different from that of known bacteriophages. The virome is dominated by rare genes, many of which might be contained within virus-like entities such as gene transfer agents. Analysis of marine metagenomes thought to consist mostly of bacterial genes revealed a variety of sequences homologous to conserved genes of eukaryotic nucleocytoplasmic large DNA viruses, resulting in the discovery of diverse members of previously undersampled groups and suggesting the existence of new classes of virus-like agents. Unexpectedly, metagenomics of marine RNA viruses showed that representatives of only one superfamily of eukaryotic viruses, the picorna-like viruses, dominate the RNA virome. PMID:19942437

  5. Recovering complete and draft population genomes from metagenome datasets.

    PubMed

    Sangwan, Naseer; Xia, Fangfang; Gilbert, Jack A

    2016-03-08

    Assembly of metagenomic sequence data into microbial genomes is of fundamental value to improving our understanding of microbial ecology and metabolism by elucidating the functional potential of hard-to-culture microorganisms. Here, we provide a synthesis of available methods to bin metagenomic contigs into species-level groups and highlight how genetic diversity, sequencing depth, and coverage influence binning success. Despite the computational cost on application to deeply sequenced complex metagenomes (e.g., soil), covarying patterns of contig coverage across multiple datasets significantly improves the binning process. We also discuss and compare current genome validation methods and reveal how these methods tackle the problem of chimeric genome bins i.e., sequences from multiple species. Finally, we explore how population genome assembly can be used to uncover biogeographic trends and to characterize the effect of in situ functional constraints on the genome-wide evolution.

  6. Universality in Fragmentation

    NASA Astrophysics Data System (ADS)

    Åström, J. A.; Holian, B. L.; Timonen, J.

    2000-04-01

    Fragmentation of a two-dimensional brittle solid by impact and ``explosion,'' and a fluid by ``explosion'' are all shown to become critical. The critical points appear at a nonzero impact velocity, and at infinite explosion duration, respectively. Within the critical regimes, the fragment-size distributions satisfy a scaling form qualitatively similar to that of the cluster-size distribution of percolation, but they belong to another universality class. Energy balance arguments give a correlation length exponent that is exactly one-half of its percolation value. A single crack dominates fragmentation in the slow-fracture limit, as expected.

  7. Recovery of Uranium Fragments

    NASA Astrophysics Data System (ADS)

    James, H. R.; McElrue, D. H.; Winter, R. E.

    2002-07-01

    We describe a theory for calculating the penetration of fragments into foam. Comparisons with regular projectiles show that the drag term is similar in value to the analogous term in aerodynamics. This, plus the simple model used to describe porosity, enables the theory to be used in predicting the levels of stress present when uranium fragments are arrested in foam catchers. Consequently the theory can be used to assist in the design of catchers which will not distort uranium fragments travelling at 1-3 km/s. The theory is tested against experiments using some current designs.

  8. Selection in Coastal Synechococcus (Cyanobacteria) Populations Evaluated from Environmental Metagenomes

    PubMed Central

    Tai, Vera; Poon, Art F. Y.; Paulsen, Ian T.; Palenik, Brian

    2011-01-01

    Environmental metagenomics provides snippets of genomic sequences from all organisms in an environmental sample and are an unprecedented resource of information for investigating microbial population genetics. Current analytical methods, however, are poorly equipped to handle metagenomic data, particularly of short, unlinked sequences. A custom analytical pipeline was developed to calculate dN/dS ratios, a common metric to evaluate the role of selection in the evolution of a gene, from environmental metagenomes sequenced using 454 technology of flow-sorted populations of marine Synechococcus, the dominant cyanobacteria in coastal environments. The large majority of genes (98%) have evolved under purifying selection (dN/dS<1). The metagenome sequence coverage of the reference genomes was not uniform and genes that were highly represented in the environment (i.e. high read coverage) tended to be more evolutionarily conserved. Of the genes that may have evolved under positive selection (dN/dS>1), 77 out of 83 (93%) were hypothetical. Notable among annotated genes, ribosomal protein L35 appears to be under positive selection in one Synechococcus population. Other annotated genes, in particular a possible porin, a large-conductance mechanosensitive channel, an ATP binding component of an ABC transporter, and a homologue of a pilus retraction protein had regions of the gene with elevated dN/dS. With the increasing use of next-generation sequencing in metagenomic investigations of microbial diversity and ecology, analytical methods need to accommodate the peculiarities of these data streams. By developing a means to analyze population diversity data from these environmental metagenomes, we have provided the first insight into the role of selection in the evolution of Synechococcus, a globally significant primary producer. PMID:21931665

  9. GHOSTX: A Fast Sequence Homology Search Tool for Functional Annotation of Metagenomic Data.

    PubMed

    Suzuki, Shuji; Ishida, Takashi; Ohue, Masahito; Kakuta, Masanori; Akiyama, Yutaka

    2017-01-01

    Metagenomic analysis based on whole genome shotgun sequencing data requires fast protein sequence homology searches for predicting the function of proteins coded on metagenome short reads. However, huge amounts of sequence data cause even general homology search analyses using BLASTX to become difficult in terms of computational cost. GHOSTX is a sequence homology search tool specifically developed for functional annotation of metagenome sequences. The tool is more than 160 times faster than BLASTX and has sufficient search sensitivity for metagenomic analysis. Using this tool, user can perform functional annotation of metagenomic data within a short time and infer metabolic pathways within an environment.

  10. THE ROLE OF WATERSHED CLASSIFICATION IN DIAGNOSING CAUSES OF BIOLOGICAL IMPAIRMENT

    EPA Science Inventory

    We compared classification schemes based on watershed storage (wetland + lake area/watershed area) and forest fragmention with a gewographically-based classification scheme for two case studies involving 1) Lake Superior tributaries and 2) watersheds of riverine coastal wetlands ...

  11. THE ROLE OF WATERSHED CLASSIFICATION IN DIAGNOSING CAUSES OF BIOLOGICAL IMPAIRMENT

    EPA Science Inventory

    We compared classification schemes based on watershed storage (wetland + lake area/watershed area) and forest fragmention with a gewographically-based classification scheme for two case studies involving 1) Lake Superior tributaries and 2) watersheds of riverine coastal wetlands ...

  12. Evaluating the Quantitative Capabilities of Metagenomic Analysis Software.

    PubMed

    Kerepesi, Csaba; Grolmusz, Vince

    2016-05-01

    DNA sequencing technologies are applied widely and frequently today to describe metagenomes, i.e., microbial communities in environmental or clinical samples, without the need for culturing them. These technologies usually return short (100-300 base-pairs long) DNA reads, and these reads are processed by metagenomic analysis software that assign phylogenetic composition-information to the dataset. Here we evaluate three metagenomic analysis software (AmphoraNet--a webserver implementation of AMPHORA2--, MG-RAST, and MEGAN5) for their capabilities of assigning quantitative phylogenetic information for the data, describing the frequency of appearance of the microorganisms of the same taxa in the sample. The difficulties of the task arise from the fact that longer genomes produce more reads from the same organism than shorter genomes, and some software assign higher frequencies to species with longer genomes than to those with shorter ones. This phenomenon is called the "genome length bias." Dozens of complex artificial metagenome benchmarks can be found in the literature. Because of the complexity of those benchmarks, it is usually difficult to judge the resistance of a metagenomic software to this "genome length bias." Therefore, we have made a simple benchmark for the evaluation of the "taxon-counting" in a metagenomic sample: we have taken the same number of copies of three full bacterial genomes of different lengths, break them up randomly to short reads of average length of 150 bp, and mixed the reads, creating our simple benchmark. Because of its simplicity, the benchmark is not supposed to serve as a mock metagenome, but if a software fails on that simple task, it will surely fail on most real metagenomes. We applied three software for the benchmark. The ideal quantitative solution would assign the same proportion to the three bacterial taxa. We have found that AMPHORA2/AmphoraNet gave the most accurate results and the other two software were under

  13. Whither or wither geomicrobiology in the era of 'community metagenomics'

    USGS Publications Warehouse

    Oremland, R.S.; Capone, D.G.; Stolz, J.F.; Fuhrman, J.

    2005-01-01

    Molecular techniques are valuable tools that can improve our understanding of the structure of microbial communities. They provide the ability to probe for life in all niches of the biosphere, perhaps even supplanting the need to cultivate microorganisms or to conduct ecophysiological investigations. However, an overemphasis and strict dependence on such large information-driven endeavours as environmental metagenomics could overwhelm the field, to the detriment of microbial ecology. We now call for more balanced, hypothesis-driven research efforts that couple metagenomics with classic approaches.

  14. Fragmentation in Biaxial Tension

    SciTech Connect

    Campbell, G H; Archbold, G C; Hurricane, O A; Miller, P L

    2006-06-13

    We have carried out an experiment that places a ductile stainless steel in a state of biaxial tension at a high rate of strain. The loading of the ductile metal spherical cap is performed by the detonation of a high explosive layer with a conforming geometry to expand the metal radially outwards. Simulations of the loading and expansion of the metal predict strain rates that compare well with experimental observations. A high percentage of the HE loaded material was recovered through a soft capture process and characterization of the recovered fragments provided high quality data, including uniform strain prior to failure and fragment size. These data were used with a modified fragmentation model to determine a fragmentation energy.

  15. DNA sequence analysis using hierarchical ART-based classification networks

    SciTech Connect

    LeBlanc, C.; Hruska, S.I.; Katholi, C.R.; Unnasch, T.R.

    1994-12-31

    Adaptive resonance theory (ART) describes a class of artificial neural network architectures that act as classification tools which self-organize, work in real-time, and require no retraining to classify novel sequences. We have adapted ART networks to provide support to scientists attempting to categorize tandem repeat DNA fragments from Onchocerca volvulus. In this approach, sequences of DNA fragments are presented to multiple ART-based networks which are linked together into two (or more) tiers; the first provides coarse sequence classification while the sub- sequent tiers refine the classifications as needed. The overall rating of the resulting classification of fragments is measured using statistical techniques based on those introduced to validate results from traditional phylogenetic analysis. Tests of the Hierarchical ART-based Classification Network, or HABclass network, indicate its value as a fast, easy-to-use classification tool which adapts to new data without retraining on previously classified data.

  16. Survey of (Meta)genomic Approaches for Understanding Microbial Community Dynamics.

    PubMed

    Sharma, Anukriti; Lal, Rup

    2017-03-01

    Advancement in the next generation sequencing technologies has led to evolution of the field of genomics and metagenomics in a slim duration with nominal cost at precipitous higher rate. While metagenomics and genomics can be separately used to reveal the culture-independent and culture-based microbial evolution, respectively, (meta)genomics together can be used to demonstrate results at population level revealing in-depth complex community interactions for specific ecotypes. The field of metagenomics which started with answering "who is out there?" based on 16S rRNA gene has evolved immensely with the precise organismal reconstruction at species/strain level from the deeply covered metagenome data outweighing the need to isolate bacteria of which 99% are de facto non-cultivable. In this review we have underlined the appeal of metagenomic-derived genomes in providing insights into the evolutionary patterns, growth dynamics, genome/gene-specific sweeps, and durability of environmental pressures. We have demonstrated the use of culture-based genomics and environmental shotgun metagenome data together to elucidate environment specific genome modulations via metagenomic recruitments in terms of gene loss/gain, accessory and core-genome extent. We further illustrated the benefit of (meta)genomics in the understanding of infectious diseases by deducing the relationship between human microbiota and clinical microbiology. This review summarizes the technological advances in the (meta)genomic strategies using the genome and metagenome datasets together to increase the resolution of microbial population studies.

  17. Diversity of Virophages in Metagenomic Data Sets

    PubMed Central

    Zhou, Jinglie; Zhang, Weijia; Yan, Shuling; Xiao, Jinzhou; Zhang, Yuanyuan; Li, Bailin; Pan, Yingjie

    2013-01-01

    Virophages, e.g., Sputnik, Mavirus, and Organic Lake virophage (OLV), are unusual parasites of giant double-stranded DNA (dsDNA) viruses, yet little is known about their diversity. Here, we describe the global distribution, abundance, and genetic diversity of virophages based on analyzing and mapping comprehensive metagenomic databases. The results reveal a distinct abundance and worldwide distribution of virophages, involving almost all geographical zones and a variety of unique environments. These environments ranged from deep ocean to inland, iced to hydrothermal lakes, and human gut- to animal-associated habitats. Four complete virophage genomic sequences (Yellowstone Lake virophages [YSLVs]) were obtained, as was one nearly complete sequence (Ace Lake Mavirus [ALM]). The genomes obtained were 27,849 bp long with 26 predicted open reading frames (ORFs) (YSLV1), 23,184 bp with 21 ORFs (YSLV2), 27,050 bp with 23 ORFs (YSLV3), 28,306 bp with 34 ORFs (YSLV4), and 17,767 bp with 22 ORFs (ALM). The homologous counterparts of five genes, including putative FtsK-HerA family DNA packaging ATPase and genes encoding DNA helicase/primase, cysteine protease, major capsid protein (MCP), and minor capsid protein (mCP), were present in all virophages studied thus far. They also shared a conserved gene cluster comprising the two core genes of MCP and mCP. Comparative genomic and phylogenetic analyses showed that YSLVs, having a closer relationship to each other than to the other virophages, were more closely related to OLV than to Sputnik but distantly related to Mavirus and ALM. These findings indicate that virophages appear to be widespread and genetically diverse, with at least 3 major lineages. PMID:23408616

  18. Diversity of virophages in metagenomic data sets.

    PubMed

    Zhou, Jinglie; Zhang, Weijia; Yan, Shuling; Xiao, Jinzhou; Zhang, Yuanyuan; Li, Bailin; Pan, Yingjie; Wang, Yongjie

    2013-04-01

    Virophages, e.g., Sputnik, Mavirus, and Organic Lake virophage (OLV), are unusual parasites of giant double-stranded DNA (dsDNA) viruses, yet little is known about their diversity. Here, we describe the global distribution, abundance, and genetic diversity of virophages based on analyzing and mapping comprehensive metagenomic databases. The results reveal a distinct abundance and worldwide distribution of virophages, involving almost all geographical zones and a variety of unique environments. These environments ranged from deep ocean to inland, iced to hydrothermal lakes, and human gut- to animal-associated habitats. Four complete virophage genomic sequences (Yellowstone Lake virophages [YSLVs]) were obtained, as was one nearly complete sequence (Ace Lake Mavirus [ALM]). The genomes obtained were 27,849 bp long with 26 predicted open reading frames (ORFs) (YSLV1), 23,184 bp with 21 ORFs (YSLV2), 27,050 bp with 23 ORFs (YSLV3), 28,306 bp with 34 ORFs (YSLV4), and 17,767 bp with 22 ORFs (ALM). The homologous counterparts of five genes, including putative FtsK-HerA family DNA packaging ATPase and genes encoding DNA helicase/primase, cysteine protease, major capsid protein (MCP), and minor capsid protein (mCP), were present in all virophages studied thus far. They also shared a conserved gene cluster comprising the two core genes of MCP and mCP. Comparative genomic and phylogenetic analyses showed that YSLVs, having a closer relationship to each other than to the other virophages, were more closely related to OLV than to Sputnik but distantly related to Mavirus and ALM. These findings indicate that virophages appear to be widespread and genetically diverse, with at least 3 major lineages.

  19. Biochemical Diversity of Carboxyl Esterases and Lipases from Lake Arreo (Spain): a Metagenomic Approach

    PubMed Central

    Martínez-Martínez, Mónica; Alcaide, María; Tchigvintsev, Anatoli; Reva, Oleg; Polaina, Julio; Bargiela, Rafael; Guazzaroni, María-Eugenia; Chicote, Álvaro; Canet, Albert; Valero, Francisco; Rico Eguizabal, Eugenio; Guerrero, María del Carmen; Yakunin, Alexander F.

    2013-01-01

    The esterases and lipases from the α/β hydrolase superfamily exhibit an enormous sequence diversity, fold plasticity, and activities. Here, we present the comprehensive sequence and biochemical analyses of seven distinct esterases and lipases from the metagenome of Lake Arreo, an evaporite karstic lake in Spain (42°46′N, 2°59′W; altitude, 655 m). Together with oligonucleotide usage patterns and BLASTP analysis, our study of esterases/lipases mined from Lake Arreo suggests that its sediment contains moderately halophilic and cold-adapted proteobacteria containing DNA fragments of distantly related plasmids or chromosomal genomic islands of plasmid and phage origins. This metagenome encodes esterases/lipases with broad substrate profiles (tested over a set of 101 structurally diverse esters) and habitat-specific characteristics, as they exhibit maximal activity at alkaline pH (8.0 to 8.5) and temperature of 16 to 40°C, and they are stimulated (1.5 to 2.2 times) by chloride ions (0.1 to 1.2 M), reflecting an adaptation to environmental conditions. Our work provides further insights into the potential significance of the Lake Arreo esterases/lipases for biotechnology processes (i.e., production of enantiomers and sugar esters), because these enzymes are salt tolerant and are active at low temperatures and against a broad range of substrates. As an example, the ability of a single protein to hydrolyze triacylglycerols, (non)halogenated alkyl and aryl esters, cinnamoyl and carbohydrate esters, lactones, and chiral epoxides to a similar extent was demonstrated. PMID:23542620

  20. The Systemic Imprint of Growth and Its Uses in Ecological (Meta)Genomics

    PubMed Central

    Vieira-Silva, Sara; Rocha, Eduardo P. C.

    2010-01-01

    Microbial minimal generation times range from a few minutes to several weeks. They are evolutionarily determined by variables such as environment stability, nutrient availability, and community diversity. Selection for fast growth adaptively imprints genomes, resulting in gene amplification, adapted chromosomal organization, and biased codon usage. We found that these growth-related traits in 214 species of bacteria and archaea are highly correlated, suggesting they all result from growth optimization. While modeling their association with maximal growth rates in view of synthetic biology applications, we observed that codon usage biases are better correlates of growth rates than any other trait, including rRNA copy number. Systematic deviations to our model reveal two distinct evolutionary processes. First, genome organization shows more evolutionary inertia than growth rates. This results in over-representation of growth-related traits in fast degrading genomes. Second, selection for these traits depends on optimal growth temperature: for similar generation times purifying selection is stronger in psychrophiles, intermediate in mesophiles, and lower in thermophiles. Using this information, we created a predictor of maximal growth rate adapted to small genome fragments. We applied it to three metagenomic environmental samples to show that a transiently rich environment, as the human gut, selects for fast-growers, that a toxic environment, as the acid mine biofilm, selects for low growth rates, whereas a diverse environment, like the soil, shows all ranges of growth rates. We also demonstrate that microbial colonizers of babies gut grow faster than stabilized human adults gut communities. In conclusion, we show that one can predict maximal growth rates from sequence data alone, and we propose that such information can be used to facilitate the manipulation of generation times. Our predictor allows inferring growth rates in the vast majority of uncultivable

  1. Metagenomic analysis of microbial community of an Amazonian geothermal spring in Peru.

    PubMed

    Paul, Sujay; Cortez, Yolanda; Vera, Nadia; Villena, Gretty K; Gutiérrez-Correa, Marcel

    2016-09-01

    Aguas Calientes (AC) is an isolated geothermal spring located deep into the Amazon rainforest (7°21'12″ S, 75°00'54″ W) of Peru. This geothermal spring is slightly acidic (pH 5.0-7.0) in nature, with temperatures varying from 45 to 90 °C and continually fed by plant litter, resulting in a relatively high degree of total organic content (TOC). Pooled water sample was analyzed at 16S rRNA V3-V4 hypervariable region by amplicon metagenome sequencing on Illumina HiSeq platform. A total of 2,976,534 paired ends reads were generated which were assigned into 5434 numbers of OTUs. All the resulting 16S rRNA fragments were then classified into 58 bacterial phyla and 2 archaeal phyla. Proteobacteria (88.06%) was found to be the highest represented phyla followed by Thermi (6.43%), Firmicutes (3.41%) and Aquificae (1.10%), respectively. Crenarchaeota and Euryarchaeota were the only 2 archaeal phyla detected in this study with low abundance. Metagenomic sequences were deposited to SRA database which is available at NCBI with accession number SRX1809286. Functional categorization of the assigned OTUs was performed using PICRUSt tool. In COG analysis "Amino acid transport and metabolism" (8.5%) was found to be the highest represented category whereas among predicted KEGG pathways "Metabolism" (50.6%) was the most abundant. This is the first report of a high resolution microbial phylogenetic profile of an Amazonian hot spring.

  2. Assessment of REPLI-g Multiple Displacement Whole Genome Amplification (WGA) Techniques for Metagenomic Applications

    PubMed Central

    Ahsanuddin, Sofia; Afshinnekoo, Ebrahim; Gandara, Jorge; Hakyemezoğlu, Mustafa; Bezdan, Daniela; Minot, Samuel; Greenfield, Nick; Mason, Christopher E.

    2017-01-01

    Amplification of minute quantities of DNA is a fundamental challenge in low-biomass metagenomic and microbiome studies because of potential biases in coverage, guanine-cytosine (GC) content, and altered species abundances. Whole genome amplification (WGA), although widely used, is notorious for introducing artifact sequences, either by amplifying laboratory contaminants or by nonrandom amplification of a sample’s DNA. In this study, we investigate the effect of REPLI-g multiple displacement amplification (MDA; Qiagen, Valencia, CA, USA) on sequencing data quality and species abundance detection in 8 paired metagenomic samples and 1 titrated, mixed control sample. We extracted and sequenced genomic DNA (gDNA) from 8 environmental samples and compared the quality of the sequencing data for the MDA and their corresponding non-MDA samples. The degree of REPLI-g MDA bias was evaluated by sequence metrics, species composition, and cross-validating observed species abundance and species diversity estimates using the One Codex and MetaPhlAn taxonomic classification tools. Here, we provide evidence of the overall efficacy of REPLI-g MDA on retaining sequencing data quality and species abundance measurements while providing increased yields of high-fidelity DNA. We find that species abundance estimates are largely consistent across samples, even with REPLI-g amplification, as demonstrated by the Spearman’s rank order coefficient (R2 > 0.8). However, REPLI-g MDA often produced fewer classified reads at the species, genera, and family level, resulting in decreased species diversity. We also observed some areas with the PCR “jackpot effect,” with varying input DNA values for the Metagenomics Research Group (MGRG) controls at specific genomic loci. We visualize this effect in whole genome coverage plots and with sequence composition analyses and note these caveats of the MDA method. Despite overall concordance of species abundance between the amplified and unamplified

  3. Assessment of REPLI-g Multiple Displacement Whole Genome Amplification (WGA) Techniques for Metagenomic Applications.

    PubMed

    Ahsanuddin, Sofia; Afshinnekoo, Ebrahim; Gandara, Jorge; Hakyemezoğlu, Mustafa; Bezdan, Daniela; Minot, Samuel; Greenfield, Nick; Mason, Christopher E

    2017-04-01

    Amplification of minute quantities of DNA is a fundamental challenge in low-biomass metagenomic and microbiome studies because of potential biases in coverage, guanine-cytosine (GC) content, and altered species abundances. Whole genome amplification (WGA), although widely used, is notorious for introducing artifact sequences, either by amplifying laboratory contaminants or by nonrandom amplification of a sample's DNA. In this study, we investigate the effect of REPLI-g multiple displacement amplification (MDA; Qiagen, Valencia, CA, USA) on sequencing data quality and species abundance detection in 8 paired metagenomic samples and 1 titrated, mixed control sample. We extracted and sequenced genomic DNA (gDNA) from 8 environmental samples and compared the quality of the sequencing data for the MDA and their corresponding non-MDA samples. The degree of REPLI-g MDA bias was evaluated by sequence metrics, species composition, and cross-validating observed species abundance and species diversity estimates using the One Codex and MetaPhlAn taxonomic classification tools. Here, we provide evidence of the overall efficacy of REPLI-g MDA on retaining sequencing data quality and species abundance measurements while providing increased yields of high-fidelity DNA. We find that species abundance estimates are largely consistent across samples, even with REPLI-g amplification, as demonstrated by the Spearman's rank order coefficient (R(2) > 0.8). However, REPLI-g MDA often produced fewer classified reads at the species, genera, and family level, resulting in decreased species diversity. We also observed some areas with the PCR "jackpot effect," with varying input DNA values for the Metagenomics Research Group (MGRG) controls at specific genomic loci. We visualize this effect in whole genome coverage plots and with sequence composition analyses and note these caveats of the MDA method. Despite overall concordance of species abundance between the amplified and unamplified samples

  4. Identification and characterization of a chitin deacetylase from a metagenomic library of deep-sea sediments of the Arctic Ocean.

    PubMed

    Liu, Jinlin; Jia, Zhijuan; Li, Sha; Li, Yan; You, Qiang; Zhang, Chunyan; Zheng, Xiaotong; Xiong, Guomei; Zhao, Jin; Qi, Chao; Yang, Jihong

    2016-09-15

    The chemical and biological compositions of deep-sea sediments are interesting because of the underexplored diversity when it comes to bioprospecting. The special geographical location and climates make Arctic Ocean a unique ocean area containing an abundance of microbial resources. A metagenomic library was constructed based on the deep-sea sediments of Arctic Ocean. Part of insertion fragments of this library were sequenced. A chitin deacetylase gene, cdaYJ, was identified and characterized. A metagenomic library with 2750 clones was obtained and ten clones were sequenced. Results revealed several interesting genes, including a chitin deacetylase coding sequence, cdaYJ. The CdaYJ is homologous to some known chitin deacetylases and contains conserved chitin deacetylase active sites. CdaYJ protein exhibits a long N-terminal and a relative short C-terminal. Phylogenetic analysis revealed that CdaYJ showed highest homology to CDAs from Alphaproteobacteria. The cdaYJ gene was subcloned into the pET-28a vector and the recombinant CdaYJ (rCdaYJ) was expressed in Escherichia coli BL21 (DE3). rCdaYJ showed a molecular weight of 43kDa, and exhibited deacetylation activity by using p-nitroacetanilide as substrate. The optimal pH and temperature of rCdaYJ were tested as pH7.4 and 28°C, respectively. The construction of metagenomic library of the Arctic deep-sea sediments provides us an opportunity to look into the microbial communities and exploiting valuable gene resources. A chitin deacetylase CdaYJ was identified from the library. It showed highest deacetylation activity under slight alkaline and low temperature conditions. CdaYJ might be a candidate chitin deacetylase that possesses industrial and pharmaceutical potentials. Copyright © 2016 Elsevier B.V. All rights reserved.

  5. Genome and metagenome enabled analyses reveal new insight into the global biogeography and potential urea utilization in marine Thaumarchaeota.

    NASA Astrophysics Data System (ADS)

    Ahlgren, N.; Parada, A. E.; Fuhrman, J. A.

    2016-02-01

    Marine Thaumarchaea are an abundant, important group of marine microbial communities as they fix carbon, oxidize ammonium, and thus contribute to key N and C cycles in the oceans. From an enrichment culture, we have sequenced the complete genome of a new Thaumarchaeota strain, SPOT01. Analysis of this genome and other Thaumarchaeal genomes contributes new insight into its role in N cycling and clarifies the broader biogeography of marine Thaumarchaeal genera. Phylogenomics of Thaumarchaeota genomes reveal coherent separation into clusters roughly equivalent to the genus level, and SPOT01 represents a new genus of marine Thaumarchaea. Competitive fragment recruitment of globally distributed metagenomes from TARA, Ocean Sampling Day, and those generated from a station off California shows that the SPOT01 genus is often the most abundant genus, especially where total Thaumarchaea are most abundant in the overall community. The SPOT01 genome contains urease genes allowing it to use an alternative form of N. Genomic and metagenomic analysis also reveal that among planktonic genomes and populations, the urease genes in general are more frequently found in members of the SPOT01 genus and another genus dominant in deep waters, thus we predict these two genera contribute most significantly to urea utilization among marine Thaumarchaea. Recruitment also revealed broader biogeographic and ecological patterns of the putative genera. The SPOT01 genus was most abundant at colder temperatures (<16 C), reflective of its dominance at subpolar to polar latitudes (>45 degrees). The genus containing Nitrosopumilus maritimus had the highest temperature range, and the genus containing Candidatus Nitrosopelagicus brevis was typically most abundant at intermediate temperatures and intermediate latitudes ( 35-45 degrees). Together these genome and metagenome enabled analyses provide significant new insight into the ecology and biogeochemical contributions of marine archaea.

  6. Utility of Metagenomic Next-Generation Sequencing for Characterization of HIV and Human Pegivirus Diversity

    PubMed Central

    Naccache, Samia N.; Kabre, Beniwende; Federman, Scot; Mbanya, Dora; Kaptué, Lazare; Chiu, Charles Y.; Brennan, Catherine A.; Hackett, John

    2015-01-01

    Given the dynamic changes in HIV-1 complexity and diversity, next-generation sequencing (NGS) has the potential to revolutionize strategies for effective HIV global surveillance. In this study, we explore the utility of metagenomic NGS to characterize divergent strains of HIV-1 and to simultaneously screen for other co-infecting viruses. Thirty-five HIV-1-infected Cameroonian blood donor specimens with viral loads of >4.4 log10 copies/ml were selected to include a diverse representation of group M strains. Random-primed NGS libraries, prepared from plasma specimens, resulted in greater than 90% genome coverage for 88% of specimens. Correct subtype designations based on NGS were concordant with sub-region PCR data in 31 of 35 (89%) cases. Complete genomes were assembled for 25 strains, including circulating recombinant forms with relatively limited data available (7 CRF11_cpx, 2 CRF13_cpx, 1 CRF18_cpx, and 1 CRF37_cpx), as well as 9 unique recombinant forms. HPgV (formerly designated GBV-C) co-infection was detected in 9 of 35 (25%) specimens, of which eight specimens yielded complete genomes. The recovered HPgV genomes formed a diverse cluster with genotype 1 sequences previously reported from Ghana, Uganda, and Japan. The extensive genome coverage obtained by NGS improved accuracy and confidence in phylogenetic classification of the HIV-1 strains present in the study population relative to conventional sub-region PCR. In addition, these data demonstrate the potential for metagenomic analysis to be used for routine characterization of HIV-1 and identification of other viral co-infections. PMID:26599538

  7. Network construction and structure detection with metagenomic count data.

    PubMed

    Liu, Zhenqiu; Lin, Shili; Piantadosi, Steven

    2015-01-01

    The human microbiome plays a critical role in human health. Massive amounts of metagenomic data have been generated with advances in next-generation sequencing technologies that characterize microbial communities via direct isolation and sequencing. How to extract, analyze, and transform these vast amounts of data into useful knowledge is a great challenge to bioinformaticians. Microbial biodiversity research has focused primarily on taxa composition and abundance and less on the co-occurrences among different taxa. However, taxa co-occurrences and their relationships to environmental and clinical conditions are important because network structure may help to understand how microbial taxa function together. We propose a systematic robust approach for bacteria network construction and structure detection using metagenomic count data. Pairwise similarity/distance measures between taxa are proposed by adapting distance measures for samples in ecology. We also extend the sparse inverse covariance approach to a sparse inverse of a similarity matrix from count data for network construction. Our approach is efficient for large metagenomic count data with thousands of bacterial taxa. We evaluate our method with real and simulated data. Our method identifies true and biologically significant network structures efficiently. Network analysis is crucial for detecting subnetwork structures with metagenomic count data. We developed a software tool in MATLAB for network construction and biologically significant module detection. Software MetaNet can be downloaded from http://biostatistics.csmc.edu/MetaNet/.

  8. Metagenomics of the Mucosal Microbiota of European Eels

    PubMed Central

    Carda-Diéguez, Miguel; Ghai, Rohit; Rodriguez-Valera, Francisco

    2014-01-01

    European eels are an economically important and threatened species that are prone to rapid collapse in farm conditions. Using metagenomics, we show that the eel mucosal microbiota has specific features distinguishing it from the surrounding aquatic community. This is a first step in dissecting the resident microbiota of this critical barrier that may have implications for maintenance of healthy eel populations. PMID:25377710

  9. Metagenome Sequencing of the Greater Kudu (Tragelaphus strepsiceros) Rumen Microbiome.

    PubMed

    Dube, Anita N; Moyo, Freeman; Dhlamini, Zephaniah

    2015-08-13

    Ruminant herbivores utilize a symbiotic relationship with microorganisms in their rumen to exploit fibrous foods for nutrition. We report the metagenome sequences of the greater kudu (Tragelaphus strepsiceros) rumen digesta, revealing a diverse community of microbes and some novel hydrolytic enzymes. Copyright © 2015 Dube et al.

  10. Metagenome Sequencing of the Greater Kudu (Tragelaphus strepsiceros) Rumen Microbiome