Science.gov

Sample records for metagenome fragment classification

  1. Metagenome Fragment Classification Using N-Mer Frequency Profiles

    PubMed Central

    Rosen, Gail; Garbarine, Elaine; Caseiro, Diamantino; Polikar, Robi; Sokhansanj, Bahrad

    2008-01-01

    A vast amount of microbial sequencing data is being generated through large-scale projects in ecology, agriculture, and human health. Efficient high-throughput methods are needed to analyze the mass amounts of metagenomic data, all DNA present in an environmental sample. A major obstacle in metagenomics is the inability to obtain accuracy using technology that yields short reads. We construct the unique N-mer frequency profiles of 635 microbial genomes publicly available as of February 2008. These profiles are used to train a naive Bayes classifier (NBC) that can be used to identify the genome of any fragment. We show that our method is comparable to BLAST for small 25 bp fragments but does not have the ambiguity of BLAST's tied top scores. We demonstrate that this approach is scalable to identify any fragment from hundreds of genomes. It also performs quite well at the strain, species, and genera levels and achieves strain resolution despite classifying ubiquitous genomic fragments (gene and nongene regions). Cross-validation analysis demonstrates that species-accuracy achieves 90% for highly-represented species containing an average of 8 strains. We demonstrate that such a tool can be used on the Sargasso Sea dataset, and our analysis shows that NBC can be further enhanced. PMID:19956701

  2. Metagenome fragment classification using N-mer frequency profiles.

    PubMed

    Rosen, Gail; Garbarine, Elaine; Caseiro, Diamantino; Polikar, Robi; Sokhansanj, Bahrad

    2008-01-01

    A vast amount of microbial sequencing data is being generated through large-scale projects in ecology, agriculture, and human health. Efficient high-throughput methods are needed to analyze the mass amounts of metagenomic data, all DNA present in an environmental sample. A major obstacle in metagenomics is the inability to obtain accuracy using technology that yields short reads. We construct the unique N-mer frequency profiles of 635 microbial genomes publicly available as of February 2008. These profiles are used to train a naive Bayes classifier (NBC) that can be used to identify the genome of any fragment. We show that our method is comparable to BLAST for small 25 bp fragments but does not have the ambiguity of BLAST's tied top scores. We demonstrate that this approach is scalable to identify any fragment from hundreds of genomes. It also performs quite well at the strain, species, and genera levels and achieves strain resolution despite classifying ubiquitous genomic fragments (gene and nongene regions). Cross-validation analysis demonstrates that species-accuracy achieves 90% for highly-represented species containing an average of 8 strains. We demonstrate that such a tool can be used on the Sargasso Sea dataset, and our analysis shows that NBC can be further enhanced.

  3. Gene prediction in metagenomic fragments based on the SVM algorithm

    PubMed Central

    2013-01-01

    Background Metagenomic sequencing is becoming a powerful technology for exploring micro-ogranisms from various environments, such as human body, without isolation and cultivation. Accurately identifying genes from metagenomic fragments is one of the most fundamental issues. Results In this article, we present a novel gene prediction method named MetaGUN for metagenomic fragments based on a machine learning approach of SVM. It implements in a three-stage strategy to predict genes. Firstly, it classifies input fragments into phylogenetic groups by a k-mer based sequence binning method. Then, protein-coding sequences are identified for each group independently with SVM classifiers that integrate entropy density profiles (EDP) of codon usage, translation initiation site (TIS) scores and open reading frame (ORF) length as input patterns. Finally, the TISs are adjusted by employing a modified version of MetaTISA. To identify protein-coding sequences, MetaGun builds the universal module and the novel module. The former is based on a set of representative species, while the latter is designed to find potential functionary DNA sequences with conserved domains. Conclusions Comparisons on artificial shotgun fragments with multiple current metagenomic gene finders show that MetaGUN predicts better results on both 3' and 5' ends of genes with fragments of various lengths. Especially, it makes the most reliable predictions among these methods. As an application, MetaGUN was used to predict genes for two samples of human gut microbiome. It identifies thousands of additional genes with significant evidences. Further analysis indicates that MetaGUN tends to predict more potential novel genes than other current metagenomic gene finders. PMID:23735199

  4. Accurate phylogenetic classification of variable-length DNA fragments.

    PubMed

    McHardy, Alice Carolyn; Martín, Héctor García; Tsirigos, Aristotelis; Hugenholtz, Philip; Rigoutsos, Isidore

    2007-01-01

    Metagenome studies have retrieved vast amounts of sequence data from a variety of environments leading to new discoveries and insights into the uncultured microbial world. Except for very simple communities, the encountered diversity has made fragment assembly and the subsequent analysis a challenging problem. A taxonomic characterization of metagenomic fragments is required for a deeper understanding of shotgun-sequenced microbial communities, but success has mostly been limited to sequences containing phylogenetic marker genes. Here we present PhyloPythia, a composition-based classifier that combines higher-level generic clades from a set of 340 completed genomes with sample-derived population models. Extensive analyses on synthetic and real metagenome data sets showed that PhyloPythia allows the accurate classification of most sequence fragments across all considered taxonomic ranks, even for unknown organisms. The method requires no more than 100 kb of training sequence for the creation of accurate models of sample-specific populations and can assign fragments >or=1 kb with high specificity.

  5. Metagenomic Classification Using an Abstraction Augmented Markov Model

    PubMed Central

    Zhu, Xiujun (Sylvia)

    2016-01-01

    Abstract The abstraction augmented Markov model (AAMM) is an extension of a Markov model that can be used for the analysis of genetic sequences. It is developed using the frequencies of all possible consecutive words with same length (p-mers). This article will review the theory behind AAMM and apply the theory behind AAMM in metagenomic classification. PMID:26618474

  6. Multi-Layer and Recursive Neural Networks for Metagenomic Classification.

    PubMed

    Ditzler, Gregory; Polikar, Robi; Rosen, Gail

    2015-09-01

    Recent advances in machine learning, specifically in deep learning with neural networks, has made a profound impact on fields such as natural language processing, image classification, and language modeling; however, feasibility and potential benefits of the approaches to metagenomic data analysis has been largely under-explored. Deep learning exploits many layers of learning nonlinear feature representations, typically in an unsupervised fashion, and recent results have shown outstanding generalization performance on previously unseen data. Furthermore, some deep learning methods can also represent the structure in a data set. Consequently, deep learning and neural networks may prove to be an appropriate approach for metagenomic data. To determine whether such approaches are indeed appropriate for metagenomics, we experiment with two deep learning methods: i) a deep belief network, and ii) a recursive neural network, the latter of which provides a tree representing the structure of the data. We compare these approaches to the standard multi-layer perceptron, which has been well-established in the machine learning community as a powerful prediction algorithm, though its presence is largely missing in metagenomics literature. We find that traditional neural networks can be quite powerful classifiers on metagenomic data compared to baseline methods, such as random forests. On the other hand, while the deep learning approaches did not result in improvements to the classification accuracy, they do provide the ability to learn hierarchical representations of a data set that standard classification methods do not allow. Our goal in this effort is not to determine the best algorithm in terms accuracy-as that depends on the specific application-but rather to highlight the benefits and drawbacks of each of the approach we discuss and provide insight on how they can be improved for predictive metagenomic analysis.

  7. Fragmentation and Coverage Variation in Viral Metagenome Assemblies, and Their Effect in Diversity Calculations.

    PubMed

    García-López, Rodrigo; Vázquez-Castellanos, Jorge Francisco; Moya, Andrés

    2015-01-01

    Metagenomic libraries consist of DNA fragments from diverse species, with varying genome size and abundance. High-throughput sequencing platforms produce large volumes of reads from these libraries, which may be assembled into contigs, ideally resembling the original larger genomic sequences. The uneven species distribution, along with the stochasticity in sample processing and sequencing bias, impacts the success of accurate sequence assembly. Several assemblers enable the processing of viral metagenomic data de novo, generally using overlap layout consensus or de Bruijn graph approaches for contig assembly. The success of viral genomic reconstruction in these datasets is limited by the degree of fragmentation of each genome in the sample, which is dependent on the sequencing effort and the genome length. Depending on ecological, biological, or procedural biases, some fragments have a higher prevalence, or coverage, in the assembly. However, assemblers must face challenges, such as the formation of chimerical structures and intra-species variability. Diversity calculation relies on the classification of the sequences that comprise a metagenomic dataset. Whenever the corresponding genomic and taxonomic information is available, contigs matching the same species can be classified accordingly and the coverage of its genome can be calculated for that species. This may be used to compare populations by estimating abundance and assessing species distribution from this data. Nevertheless, the coverage does not take into account the degree of fragmentation, or else genome completeness, and is not necessarily representative of actual species distribution in the samples. Furthermore, undetermined sequences are abundant in viral metagenomic datasets, resulting in several independent contigs that cannot be assigned by homology or genomic information. These may only be classified as different operational taxonomic units (OTUs), sometimes remaining inadvisably unrelated. Thus

  8. Kraken: ultrafast metagenomic sequence classification using exact alignments

    PubMed Central

    2014-01-01

    Kraken is an ultrafast and highly accurate program for assigning taxonomic labels to metagenomic DNA sequences. Previous programs designed for this task have been relatively slow and computationally expensive, forcing researchers to use faster abundance estimation programs, which only classify small subsets of metagenomic data. Using exact alignment of k-mers, Kraken achieves classification accuracy comparable to the fastest BLAST program. In its fastest mode, Kraken classifies 100 base pair reads at a rate of over 4.1 million reads per minute, 909 times faster than Megablast and 11 times faster than the abundance estimation program MetaPhlAn. Kraken is available at http://ccb.jhu.edu/software/kraken/. PMID:24580807

  9. Fragmentation and Coverage Variation in Viral Metagenome Assemblies, and Their Effect in Diversity Calculations

    PubMed Central

    García-López, Rodrigo; Vázquez-Castellanos, Jorge Francisco; Moya, Andrés

    2015-01-01

    Metagenomic libraries consist of DNA fragments from diverse species, with varying genome size and abundance. High-throughput sequencing platforms produce large volumes of reads from these libraries, which may be assembled into contigs, ideally resembling the original larger genomic sequences. The uneven species distribution, along with the stochasticity in sample processing and sequencing bias, impacts the success of accurate sequence assembly. Several assemblers enable the processing of viral metagenomic data de novo, generally using overlap layout consensus or de Bruijn graph approaches for contig assembly. The success of viral genomic reconstruction in these datasets is limited by the degree of fragmentation of each genome in the sample, which is dependent on the sequencing effort and the genome length. Depending on ecological, biological, or procedural biases, some fragments have a higher prevalence, or coverage, in the assembly. However, assemblers must face challenges, such as the formation of chimerical structures and intra-species variability. Diversity calculation relies on the classification of the sequences that comprise a metagenomic dataset. Whenever the corresponding genomic and taxonomic information is available, contigs matching the same species can be classified accordingly and the coverage of its genome can be calculated for that species. This may be used to compare populations by estimating abundance and assessing species distribution from this data. Nevertheless, the coverage does not take into account the degree of fragmentation, or else genome completeness, and is not necessarily representative of actual species distribution in the samples. Furthermore, undetermined sequences are abundant in viral metagenomic datasets, resulting in several independent contigs that cannot be assigned by homology or genomic information. These may only be classified as different operational taxonomic units (OTUs), sometimes remaining inadvisably unrelated. Thus

  10. What's in the mix: phylogenetic classification of metagenome sequence samples.

    PubMed

    McHardy, Alice C; Rigoutsos, Isidore

    2007-10-01

    Metagenomics is a novel field which deals with the sequencing and study of microbial organisms or viruses isolated directly from a particular environment. This has already provided a wealth of information and new insights for the inhabitants of various environmental niches. For a given sample, one would like to determine the phylogenetic provenance of the obtained fragments, the relative abundance of its different members, their metabolic capabilities, and the functional properties of the community as a whole. To this end, computational analyses are becoming increasingly indispensable tools. In this review, we focus on the problem of determining the phylogenetic identity of the sample fragments, a procedure known as 'binning'. This step is essential for the reconstruction of the metabolic capabilities of individual organisms or phylogenetic clades of a community, and the study of their interactions.

  11. Phylogeny, classification and metagenomic bioprospecting of microbial acetyl xylan esterases.

    PubMed

    Adesioye, Fiyinfoluwa A; Makhalanyane, Thulani P; Biely, Peter; Cowan, Don A

    2016-11-01

    Acetyl xylan esterases (AcXEs), also termed xylan deacetylases, are broad specificity Carbohydrate-Active Enzymes (CAZymes) that hydrolyse ester bonds to liberate acetic acid from acetylated hemicellulose (typically polymeric xylan and xylooligosaccharides). They belong to eight families within the Carbohydrate Esterase (CE) class of the CAZy database. AcXE classification is largely based on sequence-dependent phylogenetic relationships, supported in some instances with substrate specificity data. However, some sequence-based predictions of AcXE-encoding gene identity have proved to be functionally incorrect. Such ambiguities can lead to mis-assignment of genes and enzymes during sequence data-mining, reinforcing the necessity for the experimental confirmation of the functional properties of putative AcXE-encoding gene products. Although one-third of all characterized CEs within CAZy families 1-7 and 16 are AcXEs, there is a need to expand the sequence database in order to strengthen the link between AcXE gene sequence and specificity. Currently, most AcXEs are derived from a limited range of (mostly microbial) sources and have been identified via culture-based bioprospecting methods, restricting current knowledge of AcXEs to data from relatively few microbial species. More recently, the successful identification of AcXEs via genome and metagenome mining has emphasised the huge potential of culture-independent bioprospecting strategies. We note, however, that the functional metagenomics approach is still hampered by screening bottlenecks. The most relevant recent reviews of AcXEs have focused primarily on the biochemical and functional properties of these enzymes. In this review, we focus on AcXE phylogeny, classification and the future of metagenomic bioprospecting for novel AcXEs.

  12. Centrifuge: rapid and sensitive classification of metagenomic sequences

    PubMed Central

    Song, Li; Breitwieser, Florian P.

    2016-01-01

    Centrifuge is a novel microbial classification engine that enables rapid, accurate, and sensitive labeling of reads and quantification of species on desktop computers. The system uses an indexing scheme based on the Burrows-Wheeler transform (BWT) and the Ferragina-Manzini (FM) index, optimized specifically for the metagenomic classification problem. Centrifuge requires a relatively small index (4.2 GB for 4078 bacterial and 200 archaeal genomes) and classifies sequences at very high speed, allowing it to process the millions of reads from a typical high-throughput DNA sequencing run within a few minutes. Together, these advances enable timely and accurate analysis of large metagenomics data sets on conventional desktop computers. Because of its space-optimized indexing schemes, Centrifuge also makes it possible to index the entire NCBI nonredundant nucleotide sequence database (a total of 109 billion bases) with an index size of 69 GB, in contrast to k-mer-based indexing schemes, which require far more extensive space. PMID:27852649

  13. Large-scale machine learning for metagenomics sequence classification

    PubMed Central

    Vervier, Kévin; Mahé, Pierre; Tournoud, Maud; Veyrieras, Jean-Baptiste; Vert, Jean-Philippe

    2016-01-01

    Motivation: Metagenomics characterizes the taxonomic diversity of microbial communities by sequencing DNA directly from an environmental sample. One of the main challenges in metagenomics data analysis is the binning step, where each sequenced read is assigned to a taxonomic clade. Because of the large volume of metagenomics datasets, binning methods need fast and accurate algorithms that can operate with reasonable computing requirements. While standard alignment-based methods provide state-of-the-art performance, compositional approaches that assign a taxonomic class to a DNA read based on the k-mers it contains have the potential to provide faster solutions. Results: We propose a new rank-flexible machine learning-based compositional approach for taxonomic assignment of metagenomics reads and show that it benefits from increasing the number of fragments sampled from reference genome to tune its parameters, up to a coverage of about 10, and from increasing the k-mer size to about 12. Tuning the method involves training machine learning models on about 108 samples in 107 dimensions, which is out of reach of standard softwares but can be done efficiently with modern implementations for large-scale machine learning. The resulting method is competitive in terms of accuracy with well-established alignment and composition-based tools for problems involving a small to moderate number of candidate species and for reasonable amounts of sequencing errors. We show, however, that machine learning-based compositional approaches are still limited in their ability to deal with problems involving a greater number of species and more sensitive to sequencing errors. We finally show that the new method outperforms the state-of-the-art in its ability to classify reads from species of lineage absent from the reference database and confirm that compositional approaches achieve faster prediction times, with a gain of 2–17 times with respect to the BWA-MEM short read mapper, depending

  14. WGSQuikr: fast whole-genome shotgun metagenomic classification.

    PubMed

    Koslicki, David; Foucart, Simon; Rosen, Gail

    2014-01-01

    With the decrease in cost and increase in output of whole-genome shotgun technologies, many metagenomic studies are utilizing this approach in lieu of the more traditional 16S rRNA amplicon technique. Due to the large number of relatively short reads output from whole-genome shotgun technologies, there is a need for fast and accurate short-read OTU classifiers. While there are relatively fast and accurate algorithms available, such as MetaPhlAn, MetaPhyler, PhyloPythiaS, and PhymmBL, these algorithms still classify samples in a read-by-read fashion and so execution times can range from hours to days on large datasets. We introduce WGSQuikr, a reconstruction method which can compute a vector of taxonomic assignments and their proportions in the sample with remarkable speed and accuracy. We demonstrate on simulated data that WGSQuikr is typically more accurate and up to an order of magnitude faster than the aforementioned classification algorithms. We also verify the utility of WGSQuikr on real biological data in the form of a mock community. WGSQuikr is a Whole-Genome Shotgun QUadratic, Iterative, K-mer based Reconstruction method which extends the previously introduced 16S rRNA-based algorithm Quikr. A MATLAB implementation of WGSQuikr is available at: http://sourceforge.net/projects/wgsquikr.

  15. Information-theoretic approaches to SVM feature selection for metagenome read classification.

    PubMed

    Garbarine, Elaine; DePasquale, Joseph; Gadia, Vinay; Polikar, Robi; Rosen, Gail

    2011-06-01

    Analysis of DNA sequences isolated directly from the environment, known as metagenomics, produces a large quantity of genome fragments that need to be classified into specific taxa. Most composition-based classification methods use all features instead of a subset of features that may maximize classifier accuracy. We show that feature selection methods can boost performance of taxonomic classifiers. This work proposes three different filter-based feature selection methods that stem from information theory: (1) a technique that combines Kullback-Leibler, Mutual Information, and distance information, (2) a text mining technique, TF-IDF, and (3) minimum redundancy-maximum-relevance (mRMR). The feature selection methods are compared by how well they improve support vector machine classification of genomic reads. Overall, the 6mer mRMR method performs well, especially on the phyla-level. If the number of total features is very large, feature selection becomes difficult because a small subset of features that captures a majority of the data variance is less likely to exist. Therefore, we conclude that there is a trade-off between feature set size and feature selection method to optimize classification performance. For larger feature set sizes, TF-IDF works better for finer-resolutions while mRMR performs the best out of any method for N=6 for all taxonomic levels.

  16. Classification and quantification of bacteriophage taxa in human gut metagenomes

    PubMed Central

    Waller, Alison S; Yamada, Takuji; Kristensen, David M; Kultima, Jens Roat; Sunagawa, Shinichi; Koonin, Eugene V; Bork, Peer

    2014-01-01

    Bacteriophages have key roles in microbial communities, to a large extent shaping the taxonomic and functional composition of the microbiome, but data on the connections between phage diversity and the composition of communities are scarce. Using taxon-specific marker genes, we identified and monitored 20 viral taxa in 252 human gut metagenomic samples, mostly at the level of genera. On average, five phage taxa were identified in each sample, with up to three of these being highly abundant. The abundances of most phage taxa vary by up to four orders of magnitude between the samples, and several taxa that are highly abundant in some samples are absent in others. Significant correlations exist between the abundances of some phage taxa and human host metadata: for example, ‘Group 936 lactococcal phages' are more prevalent and abundant in Danish samples than in samples from Spain or the United States of America. Quantification of phages that exist as integrated prophages revealed that the abundance profiles of prophages are highly individual-specific and remain unique to an individual over a 1-year time period, and prediction of prophage lysis across the samples identified hundreds of prophages that are apparently active in the gut and vary across the samples, in terms of presence and lytic state. Finally, a prophage–host network of the human gut was established and includes numerous novel host–phage associations. PMID:24621522

  17. Accurate phylogenetic classification of DNA fragments based onsequence composition

    SciTech Connect

    McHardy, Alice C.; Garcia Martin, Hector; Tsirigos, Aristotelis; Hugenholtz, Philip; Rigoutsos, Isidore

    2006-05-01

    Metagenome studies have retrieved vast amounts of sequenceout of a variety of environments, leading to novel discoveries and greatinsights into the uncultured microbial world. Except for very simplecommunities, diversity makes sequence assembly and analysis a verychallenging problem. To understand the structure a 5 nd function ofmicrobial communities, a taxonomic characterization of the obtainedsequence fragments is highly desirable, yet currently limited mostly tothose sequences that contain phylogenetic marker genes. We show that forclades at the rank of domain down to genus, sequence composition allowsthe very accurate phylogenetic 10 characterization of genomic sequence.We developed a composition-based classifier, PhyloPythia, for de novophylogenetic sequence characterization and have trained it on adata setof 340 genomes. By extensive evaluation experiments we show that themethodis accurate across all taxonomic ranks considered, even forsequences that originate fromnovel organisms and are as short as 1kb.Application to two metagenome datasets 15 obtained from samples ofphosphorus-removing sludge showed that the method allows the accurateclassification at genus level of most sequence fragments from thedominant populations, while at the same time correctly characterizingeven larger parts of the samples at higher taxonomic levels.

  18. Methods for virus classification and the challenge of incorporating metagenomic sequence data.

    PubMed

    Simmonds, Peter

    2015-06-01

    The division of viruses into orders, families, genera and species provides a classification framework that seeks to organize and make sense of the diversity of viruses infecting animals, plants and bacteria. Classifications are based on similarities in genome structure and organization, the presence of homologous genes and sequence motifs and at lower levels such as species, host range, nucleotide and antigenic relatedness and epidemiology. Classification below the level of family must also be consistent with phylogeny and virus evolutionary histories. Recently developed methods such as PASC, DEMaRC and NVR offer alternative strategies for genus and species assignments that are based purely on degrees of divergence between genome sequences. They offer the possibility of automating classification of the vast number of novel virus sequences being generated by next-generation metagenomic sequencing. However, distance-based methods struggle to deal with the complex evolutionary history of virus genomes that are shuffled by recombination and reassortment, and where taxonomic lineages evolve at different rates. In biological terms, classifications based on sequence distances alone are also arbitrary whereas the current system of virus taxonomy is of utility precisely because it is primarily based upon phenotypic characteristics. However, a separate system is clearly needed by which virus variants that lack biological information might be incorporated into the ICTV classification even if based solely on sequence relationships to existing taxa. For these, simplified taxonomic proposals and naming conventions represent a practical way to expand the existing virus classification and catalogue our rapidly increasing knowledge of virus diversity.

  19. Characterization of Uncultured Genome Fragment from Soil Metagenomic Library Exposed Rare Mismatch of Internal Tetranucleotide Frequency

    PubMed Central

    Liu, Yunpeng; Yang, Dongqing; Zhang, Nan; Chen, Lin; Cui, Zhongli; Shen, Qirong; Zhang, Ruifu

    2016-01-01

    Exploring the genomic information of a specific uncultured soil bacterium is vital to understand its function in the ecosystem but is still a challenge due to the lack of culture techniques. To examine the genomes of uncultured bacteria, a metagenomic bacterial artificial chromosome library derived from a soil sample was screened for 16S rDNA-containing clones. Five clones (4C6, 5E7, 5G4, 5G12, and 5H7) containing uncultured soil bacteria genome fragment (with low 16S rDNA similarity to isolated bacteria) were selected for sequencing. Clone 5E7 and 5G4 showed only 82 and 83% of 16S rDNA identity to known sequences. Phylogenetic analysis of 16S rDNA indicated that 5E7 and 5G4 were potentially from new class of Chloroflexi. Only one-third of the 5G4 open reading frames have significant hits against HMMER. Internal tetranucleotide frequency analysis indicated that the unknown region of 5G4 was poorly correlated with other parts of the clone, indicating that this section might be obtained through lateral transfer. It was suggested that this region rich for unknown genes is under fast evolution. PMID:28066395

  20. Rapid phylogenetic and functional classification of short genomic fragments with signature peptides

    PubMed Central

    2012-01-01

    Background Classification is difficult for shotgun metagenomics data from environments such as soils, where the diversity of sequences is high and where reference sequences from close relatives may not exist. Approaches based on sequence-similarity scores must deal with the confounding effects that inheritance and functional pressures exert on the relation between scores and phylogenetic distance, while approaches based on sequence alignment and tree-building are typically limited to a small fraction of gene families. We describe an approach based on finding one or more exact matches between a read and a precomputed set of peptide 10-mers. Results At even the largest phylogenetic distances, thousands of 10-mer peptide exact matches can be found between pairs of bacterial genomes. Genes that share one or more peptide 10-mers typically have high reciprocal BLAST scores. Among a set of 403 representative bacterial genomes, some 20 million 10-mer peptides were found to be shared. We assign each of these peptides as a signature of a particular node in a phylogenetic reference tree based on the RNA polymerase genes. We classify the phylogeny of a genomic fragment (e.g., read) at the most specific node on the reference tree that is consistent with the phylogeny of observed signature peptides it contains. Using both synthetic data from four newly-sequenced soil-bacterium genomes and ten real soil metagenomics data sets, we demonstrate a sensitivity and specificity comparable to that of the MEGAN metagenomics analysis package using BLASTX against the NR database. Phylogenetic and functional similarity metrics applied to real metagenomics data indicates a signal-to-noise ratio of approximately 400 for distinguishing among environments. Our method assigns ~6.6 Gbp/hr on a single CPU, compared with 25 kbp/hr for methods based on BLASTX against the NR database. Conclusions Classification by exact matching against a precomputed list of signature peptides provides comparable

  1. IDENTIFICATION OF AVIAN-SPECIFIC FECAL METAGENOMIC SEQUENCES USING GENOME FRAGMENT ENRICHMENTS

    EPA Science Inventory

    Sequence analysis of microbial genomes has provided biologists the opportunity to compare genetic differences between closely related microorganisms. While random sequencing has also been used to study natural microbial communities, metagenomic comparisons via sequencing analysis...

  2. Metagenomic Classification and Characterization Marine Actinobacteria from the Gulf of Maine without Representative Genomes

    NASA Astrophysics Data System (ADS)

    Sachdeva, R.; Heidelberg, J.

    2012-12-01

    Actinobacteria represent one of the largest and most diverse bacterial phyla and unlike most marine prokaryotes are gram-positive. This phylum encompasses a broad range of physiologies, morphologies, and metabolic properties with a broad array of lifestyles. The marine actinobacterial assemblage is dominated by the orders Actinomycetales and Acidimicrobiales (also known as the marine Actinobacteria clade). The Acidimicrobiales bacteria typically outnumber the Actinomycetales bacteria and are mostly represented by the OCS155 group. Although bacteria of the order Acidimicrobiales make up ~7.6% of the 16S matches from the Global Ocean Survey shotgun metagenomic libraries; very little is known about their potential function and role in biogeochemical cycling. Samples were collected from surface seawater samples in the Gulf of Maine (GOM) from the summer and winter of 2006. Sanger sequences were generated from the 0.1-0.8 μm fractions using paired-end medium insert shotgun libraries. The resulting 2.2 Gb were assembled using the Celera Assembler package into 280 Mb of non-redundant scaffolds. Putative actinobacterial assemblies were identified using (1) ribosomal RNA genes (16S and 23S), (2) phylogenetically informative non-ribosomal core genes thought to be resistant to horizontal gene transfer (e.g. RecA and RpoB) and (3) compositional binning using oligonucleotide frequency pattern based hierarchical clustering. Binning resulted in 3.6 Mb (4.2X coverage) of actinobacterial scaffolds that were comprised of 15.1 Mb of unassembled reads. Putative actinobacterial assemblies included both summer and winter reads demonstrating that the Actinobacteria are abundant year round. Classification reveals that all of the sampled Actinobacteria are from the orders Acidimicrobiales and Actinomycetales and are similar to those found in the global ocean. The GOM Actinobacteria show a broad range of G+C % content (32-66%) indicating a high level of genomic diversity. Those assemblies

  3. Studying long 16S rDNA sequences with ultrafast-metagenomic sequence classification using exact alignments (Kraken).

    PubMed

    Valenzuela-González, Fabiola; Martínez-Porchas, Marcel; Villalpando-Canchola, Enrique; Vargas-Albores, Francisco

    2016-03-01

    Ultrafast-metagenomic sequence classification using exact alignments (Kraken) is a novel approach to classify 16S rDNA sequences. The classifier is based on mapping short sequences to the lowest ancestor and performing alignments to form subtrees with specific weights in each taxon node. This study aimed to evaluate the classification performance of Kraken with long 16S rDNA random environmental sequences produced by cloning and then Sanger sequenced. A total of 480 clones were isolated and expanded, and 264 of these clones formed contigs (1352 ± 153 bp). The same sequences were analyzed using the Ribosomal Database Project (RDP) classifier. Deeper classification performance was achieved by Kraken than by the RDP: 73% of the contigs were classified up to the species or variety levels, whereas 67% of these contigs were classified no further than the genus level by the RDP. The results also demonstrated that unassembled sequences analyzed by Kraken provide similar or inclusively deeper information. Moreover, sequences that did not form contigs, which are usually discarded by other programs, provided meaningful information when analyzed by Kraken. Finally, it appears that the assembly step for Sanger sequences can be eliminated when using Kraken. Kraken cumulates the information of both sequence senses, providing additional elements for the classification. In conclusion, the results demonstrate that Kraken is an excellent choice for use in the taxonomic assignment of sequences obtained by Sanger sequencing or based on third generation sequencing, of which the main goal is to generate larger sequences.

  4. Signal Processing for Metagenomics: Extracting Information from the Soup

    PubMed Central

    Rosen, Gail L.; Sokhansanj, Bahrad A.; Polikar, Robi; Bruns, Mary Ann; Russell, Jacob; Garbarine, Elaine; Essinger, Steve; Yok, Non

    2009-01-01

    Traditionally, studies in microbial genomics have focused on single-genomes from cultured species, thereby limiting their focus to the small percentage of species that can be cultured outside their natural environment. Fortunately, recent advances in high-throughput sequencing and computational analyses have ushered in the new field of metagenomics, which aims to decode the genomes of microbes from natural communities without the need for cultivation. Although metagenomic studies have shed a great deal of insight into bacterial diversity and coding capacity, several computational challenges remain due to the massive size and complexity of metagenomic sequence data. Current tools and techniques are reviewed in this paper which address challenges in 1) genomic fragment annotation, 2) phylogenetic reconstruction, 3) functional classification of samples, and 4) interpreting complementary metaproteomics and metametabolomics data. Also surveyed are important applications of metagenomic studies, including microbial forensics and the roles of microbial communities in shaping human health and soil ecology. PMID:20436876

  5. Signal processing for metagenomics: extracting information from the soup.

    PubMed

    Rosen, Gail L; Sokhansanj, Bahrad A; Polikar, Robi; Bruns, Mary Ann; Russell, Jacob; Garbarine, Elaine; Essinger, Steve; Yok, Non

    2009-11-01

    Traditionally, studies in microbial genomics have focused on single-genomes from cultured species, thereby limiting their focus to the small percentage of species that can be cultured outside their natural environment. Fortunately, recent advances in high-throughput sequencing and computational analyses have ushered in the new field of metagenomics, which aims to decode the genomes of microbes from natural communities without the need for cultivation. Although metagenomic studies have shed a great deal of insight into bacterial diversity and coding capacity, several computational challenges remain due to the massive size and complexity of metagenomic sequence data. Current tools and techniques are reviewed in this paper which address challenges in 1) genomic fragment annotation, 2) phylogenetic reconstruction, 3) functional classification of samples, and 4) interpreting complementary metaproteomics and metametabolomics data. Also surveyed are important applications of metagenomic studies, including microbial forensics and the roles of microbial communities in shaping human health and soil ecology.

  6. DiScRIBinATE: a rapid method for accurate taxonomic classification of metagenomic sequences

    PubMed Central

    2010-01-01

    Background In metagenomic sequence data, majority of sequences/reads originate from new or partially characterized genomes, the corresponding sequences of which are absent in existing reference databases. Since taxonomic assignment of reads is based on their similarity to sequences from known organisms, the presence of reads originating from new organisms poses a major challenge to taxonomic binning methods. The recently published SOrt-ITEMS algorithm uses an elaborate work-flow to assign reads originating from hitherto unknown genomes with significant accuracy and specificity. Nevertheless, a significant proportion of reads still get misclassified. Besides, the use of an alignment-based orthology step (for improving the specificity of assignments) increases the total binning time of SOrt-ITEMS. Results In this paper, we introduce a rapid binning approach called DiScRIBinATE (Distance Score Ratio for Improved Binning And Taxonomic Estimation). DiScRIBinATE replaces the orthology approach of SOrt-ITEMS with a quicker 'alignment-free' approach. We demonstrate that incorporating this approach reduces binning time by half without any loss in the specificity and accuracy of assignments. Besides, a novel reclassification strategy incorporated in DiScRIBinATE results in reducing the overall misclassification rate to around 3 - 7%. This misclassification rate is 1.5 - 3 times lower as compared to that by SOrt-ITEMS, and 3 - 30 times lower as compared to that by MEGAN. Conclusions A significant reduction in binning time, coupled with a superior assignment accuracy (as compared to existing binning methods), indicates the immense applicability of the proposed algorithm in rapidly mapping the taxonomic diversity of large metagenomic samples with high accuracy and specificity. Availability The program is available on request from the authors. PMID:21106121

  7. Novel organic solvent-tolerant esterase isolated by metagenomics: insights into the lipase/esterase classification.

    PubMed

    Berlemont, Renaud; Spee, Olivier; Delsaute, Maud; Lara, Yannick; Schuldes, Jörg; Simon, Carola; Power, Pablo; Daniel, Rolf; Galleni, Moreno

    2013-01-01

    in order to isolate novel organic solvent-tolerant (OST) lipases, a metagenomic library was built using DNA derived from a temperate forest soil sample. A two-step activity-based screening allowed the isolation of a lipolytic clone active in the presence of organic solvents. Sequencing of the plasmid pRBest recovered from the positive clone revealed the presence of a putative lipase/esterase encoding gene. The deduced amino acid sequence (RBest1) contains the conserved lipolytic enzyme signature and is related to the previously described OST lipase from Lysinibacillus sphaericus 205y, which is the sole studied prokaryotic enzyme belonging to the 4.4 α/β hydrolase subgroup (abH04.04). Both in vivo and in vitro studies of the substrate specificity of RBest1, using triacylglycerols or nitrophenyl-esters, respectively, revealed that the enzyme is highly specific for butyrate (C4) compounds, behaving as an esterase rather than a lipase. The RBest1 esterase was purified and biochemically characterized. The optimal esterase activity was observed at pH 6.5 and at temperatures ranging from 38 to 45 °C. Enzymatic activity, determined by hydrolysis of p-nitrophenyl esters, was found to be affected by the presence of different miscible and non-miscible organic solvents, and salts. Noteworthy, RBest1 remains significantly active at high ionic strength. These findings suggest that RBest1 possesses the ability of OST enzymes to molecular adaptation in the presence of organic compounds and resistance of halophilic proteins.

  8. Genomic characterization of Defluviitoga tunisiensis L3, a key hydrolytic bacterium in a thermophilic biogas plant and its abundance as determined by metagenome fragment recruitment.

    PubMed

    Maus, Irena; Cibis, Katharina Gabriela; Bremges, Andreas; Stolze, Yvonne; Wibberg, Daniel; Tomazetto, Geizecler; Blom, Jochen; Sczyrba, Alexander; König, Helmut; Pühler, Alfred; Schlüter, Andreas

    2016-08-20

    The genome sequence of Defluviitoga tunisiensis L3 originating from a thermophilic biogas-production plant was established and recently published as Genome Announcement by our group. The circular chromosome of D. tunisiensis L3 has a size of 2,053,097bp and a mean GC content of 31.38%. To analyze the D. tunisiensis L3 genome sequence in more detail, a phylogenetic analysis of completely sequenced Thermotogae strains based on shared core genes was performed. It appeared that Petrotoga mobilis DSM 10674(T), originally isolated from a North Sea oil-production well, is the closest relative of D. tunisiensis L3. Comparative genome analyses of P. mobilis DSM 10674(T) and D. tunisiensis L3 showed moderate similarities regarding occurrence of orthologous genes. Both genomes share a common set of 1351 core genes. Reconstruction of metabolic pathways important for the biogas production process revealed that the D. tunisiensis L3 genome encodes a large set of genes predicted to facilitate utilization of a variety of complex polysaccharides including cellulose, chitin and xylan. Ethanol, acetate, hydrogen (H2) and carbon dioxide (CO2) were found as possible end-products of the fermentation process. The latter three metabolites are considered to represent substrates for methanogenic Archaea, the key organisms in the final step of the anaerobic digestion process. To determine the degree of relatedness between D. tunisiensis L3 and dominant biogas community members within the thermophilic biogas-production plant, metagenome sequences obtained from the corresponding microbial community were mapped onto the L3 genome sequence. This fragment recruitment revealed that the D. tunisiensis L3 genome is almost completely covered with metagenome sequences featuring high matching accuracy. This result indicates that strains highly related or even identical to the reference strain D. tunisiensis L3 play a dominant role within the community of the thermophilic biogas-production plant.

  9. Localizing text in scene images by boundary clustering, stroke segmentation, and string fragment classification.

    PubMed

    Yi, Chucai; Tian, Yingli

    2012-09-01

    In this paper, we propose a novel framework to extract text regions from scene images with complex backgrounds and multiple text appearances. This framework consists of three main steps: boundary clustering (BC), stroke segmentation, and string fragment classification. In BC, we propose a new bigram-color-uniformity-based method to model both text and attachment surface, and cluster edge pixels based on color pairs and spatial positions into boundary layers. Then, stroke segmentation is performed at each boundary layer by color assignment to extract character candidates. We propose two algorithms to combine the structural analysis of text stroke with color assignment and filter out background interferences. Further, we design a robust string fragment classification based on Gabor-based text features. The features are obtained from feature maps of gradient, stroke distribution, and stroke width. The proposed framework of text localization is evaluated on scene images, born-digital images, broadcast video images, and images of handheld objects captured by blind persons. Experimental results on respective datasets demonstrate that the framework outperforms state-of-the-art localization algorithms.

  10. MetaShot: an accurate workflow for taxon classification of host-associated microbiome from shotgun metagenomic data.

    PubMed

    Fosso, B; Santamaria, M; D'Antonio, M; Lovero, D; Corrado, G; Vizza, E; Passaro, N; Garbuglia, A R; Capobianchi, M R; Crescenzi, M; Valiente, G; Pesole, G

    2017-01-27

    Shotgun metagenomics by high-throughput sequencing may allow deep and accurate characterization of host-associated total microbiomes, including bacteria, viruses, protists and fungi. However, the analysis of such sequencing data is still extremely challenging in terms of both overall accuracy and computational efficiency, and current methodologies show substantial variability in misclassification rate and resolution at lower taxonomic ranks or are limited to specific life domains (e.g. only bacteria). We present here MetaShot, a workflow for assessing the total microbiome composition from host-associated shotgun sequence data, and show its overall optimal accuracy performance by analyzing both simulated and real datasets.

  11. MPI-blastn and NCBI-TaxCollector: improving metagenomic analysis with high performance classification and wide taxonomic attachment.

    PubMed

    Dias, R; Xavier, M G; Rossi, F D; Neves, M V; Lange, T A P; Giongo, A; De Rose, C A F; Triplett, E W

    2014-06-01

    Metagenomic sequencing technologies are advancing rapidly and the size of output data from high-throughput genetic sequencing has increased substantially over the years. This brings us to a scenario where advanced computational optimizations are requested to perform a metagenomic analysis. In this paper, we describe a new parallel implementation of nucleotide BLAST (MPI-blastn) and a new tool for taxonomic attachment of Basic Local Alignment Search Tool (BLAST) results that supports the NCBI taxonomy (NCBI-TaxCollector). MPI-blastn obtained a high performance when compared to the mpiBLAST and ScalaBLAST. In our best case, MPI-blastn was able to run 408 times faster in 384 cores. Our evaluations demonstrated that NCBI-TaxCollector is able to perform taxonomic attachments 125 times faster and needs 120 times less RAM than the previous TaxCollector. Through our optimizations, a multiple sequence search that currently takes 37 hours can be performed in less than 6 min and a post processing with NCBI taxonomic data attachment, which takes 48 hours, now is able to run in 23 min.

  12. The phylogenetic diversity of metagenomes.

    PubMed

    Kembel, Steven W; Eisen, Jonathan A; Pollard, Katherine S; Green, Jessica L

    2011-01-01

    Phylogenetic diversity--patterns of phylogenetic relatedness among organisms in ecological communities--provides important insights into the mechanisms underlying community assembly. Studies that measure phylogenetic diversity in microbial communities have primarily been limited to a single marker gene approach, using the small subunit of the rRNA gene (SSU-rRNA) to quantify phylogenetic relationships among microbial taxa. In this study, we present an approach for inferring phylogenetic relationships among microorganisms based on the random metagenomic sequencing of DNA fragments. To overcome challenges caused by the fragmentary nature of metagenomic data, we leveraged fully sequenced bacterial genomes as a scaffold to enable inference of phylogenetic relationships among metagenomic sequences from multiple phylogenetic marker gene families. The resulting metagenomic phylogeny can be used to quantify the phylogenetic diversity of microbial communities based on metagenomic data sets. We applied this method to understand patterns of microbial phylogenetic diversity and community assembly along an oceanic depth gradient, and compared our findings to previous studies of this gradient using SSU-rRNA gene and metagenomic analyses. Bacterial phylogenetic diversity was highest at intermediate depths beneath the ocean surface, whereas taxonomic diversity (diversity measured by binning sequences into taxonomically similar groups) showed no relationship with depth. Phylogenetic diversity estimates based on the SSU-rRNA gene and the multi-gene metagenomic phylogeny were broadly concordant, suggesting that our approach will be applicable to other metagenomic data sets for which corresponding SSU-rRNA gene sequences are unavailable. Our approach opens up the possibility of using metagenomic data to study microbial diversity in a phylogenetic context.

  13. The Metagenomic Telescope

    PubMed Central

    Szalkai, Balázs; Scheer, Ildikó; Nagy, Kinga; Vértessy, Beáta G.; Grolmusz, Vince

    2014-01-01

    Next generation sequencing technologies led to the discovery of numerous new microbe species in diverse environmental samples. Some of the new species contain genes never encountered before. Some of these genes encode proteins with novel functions, and some of these genes encode proteins that perform some well-known function in a novel way. A tool, named the Metagenomic Telescope, is described here that applies artificial intelligence methods, and seems to be capable of identifying new protein functions even in the well-studied model organisms. As a proof-of-principle demonstration of the Metagenomic Telescope, we considered DNA repair enzymes in the present work. First we identified proteins in DNA repair in well–known organisms (i.e., proteins in base excision repair, nucleotide excision repair, mismatch repair and DNA break repair); next we applied multiple alignments and then built hidden Markov profiles for each protein separately, across well–researched organisms; next, using public depositories of metagenomes, originating from extreme environments, we identified DNA repair genes in the samples. While the phylogenetic classification of the metagenomic samples are not typically available, we hypothesized that some very special DNA repair strategies need to be applied in bacteria and Archaea living in those extreme circumstances. It is a difficult task to evaluate the results obtained from mostly unknown species; therefore we applied again the hidden Markov profiling: for the identified DNA repair genes in the extreme metagenomes, we prepared new hidden Markov profiles (for each genes separately, subsequent to a cluster analysis); and we searched for similarities to those profiles in model organisms. We have found well known DNA repair proteins, numerous proteins with unknown functions, and also proteins with known, but different functions in the model organisms. PMID:25054802

  14. The metagenomic telescope.

    PubMed

    Szalkai, Balázs; Scheer, Ildikó; Nagy, Kinga; Vértessy, Beáta G; Grolmusz, Vince

    2014-01-01

    Next generation sequencing technologies led to the discovery of numerous new microbe species in diverse environmental samples. Some of the new species contain genes never encountered before. Some of these genes encode proteins with novel functions, and some of these genes encode proteins that perform some well-known function in a novel way. A tool, named the Metagenomic Telescope, is described here that applies artificial intelligence methods, and seems to be capable of identifying new protein functions even in the well-studied model organisms. As a proof-of-principle demonstration of the Metagenomic Telescope, we considered DNA repair enzymes in the present work. First we identified proteins in DNA repair in well-known organisms (i.e., proteins in base excision repair, nucleotide excision repair, mismatch repair and DNA break repair); next we applied multiple alignments and then built hidden Markov profiles for each protein separately, across well-researched organisms; next, using public depositories of metagenomes, originating from extreme environments, we identified DNA repair genes in the samples. While the phylogenetic classification of the metagenomic samples are not typically available, we hypothesized that some very special DNA repair strategies need to be applied in bacteria and Archaea living in those extreme circumstances. It is a difficult task to evaluate the results obtained from mostly unknown species; therefore we applied again the hidden Markov profiling: for the identified DNA repair genes in the extreme metagenomes, we prepared new hidden Markov profiles (for each genes separately, subsequent to a cluster analysis); and we searched for similarities to those profiles in model organisms. We have found well known DNA repair proteins, numerous proteins with unknown functions, and also proteins with known, but different functions in the model organisms.

  15. The Genomes OnLine Database (GOLD) v.5: a metadata management system based on a four level (meta)genome project classification.

    PubMed

    Reddy, T B K; Thomas, Alex D; Stamatis, Dimitri; Bertsch, Jon; Isbandi, Michelle; Jansson, Jakob; Mallajosyula, Jyothi; Pagani, Ioanna; Lobos, Elizabeth A; Kyrpides, Nikos C

    2015-01-01

    The Genomes OnLine Database (GOLD; http://www.genomesonline.org) is a comprehensive online resource to catalog and monitor genetic studies worldwide. GOLD provides up-to-date status on complete and ongoing sequencing projects along with a broad array of curated metadata. Here we report version 5 (v.5) of the database. The newly designed database schema and web user interface supports several new features including the implementation of a four level (meta)genome project classification system and a simplified intuitive web interface to access reports and launch search tools. The database currently hosts information for about 19,200 studies, 56,000 Biosamples, 56,000 sequencing projects and 39,400 analysis projects. More than just a catalog of worldwide genome projects, GOLD is a manually curated, quality-controlled metadata warehouse. The problems encountered in integrating disparate and varying quality data into GOLD are briefly highlighted. GOLD fully supports and follows the Genomic Standards Consortium (GSC) Minimum Information standards.

  16. The Genomes OnLine Database (GOLD) v.5: a metadata management system based on a four level (meta)genome project classification

    PubMed Central

    Reddy, T.B.K.; Thomas, Alex D.; Stamatis, Dimitri; Bertsch, Jon; Isbandi, Michelle; Jansson, Jakob; Mallajosyula, Jyothi; Pagani, Ioanna; Lobos, Elizabeth A.; Kyrpides, Nikos C.

    2015-01-01

    The Genomes OnLine Database (GOLD; http://www.genomesonline.org) is a comprehensive online resource to catalog and monitor genetic studies worldwide. GOLD provides up-to-date status on complete and ongoing sequencing projects along with a broad array of curated metadata. Here we report version 5 (v.5) of the database. The newly designed database schema and web user interface supports several new features including the implementation of a four level (meta)genome project classification system and a simplified intuitive web interface to access reports and launch search tools. The database currently hosts information for about 19 200 studies, 56 000 Biosamples, 56 000 sequencing projects and 39 400 analysis projects. More than just a catalog of worldwide genome projects, GOLD is a manually curated, quality-controlled metadata warehouse. The problems encountered in integrating disparate and varying quality data into GOLD are briefly highlighted. GOLD fully supports and follows the Genomic Standards Consortium (GSC) Minimum Information standards. PMID:25348402

  17. The Genomes OnLine Database (GOLD) v.5: a metadata management system based on a four level (meta)genome project classification

    SciTech Connect

    Reddy, Tatiparthi B. K.; Thomas, Alex D.; Stamatis, Dimitri; Bertsch, Jon; Isbandi, Michelle; Jansson, Jakob; Mallajosyula, Jyothi; Pagani, Ioanna; Lobos, Elizabeth A.; Kyrpides, Nikos C.

    2014-10-27

    The Genomes OnLine Database (GOLD; http://www.genomesonline.org) is a comprehensive online resource to catalog and monitor genetic studies worldwide. GOLD provides up-to-date status on complete and ongoing sequencing projects along with a broad array of curated metadata. Within this paper, we report version 5 (v.5) of the database. The newly designed database schema and web user interface supports several new features including the implementation of a four level (meta)genome project classification system and a simplified intuitive web interface to access reports and launch search tools. The database currently hosts information for about 19 200 studies, 56 000 Biosamples, 56 000 sequencing projects and 39 400 analysis projects. More than just a catalog of worldwide genome projects, GOLD is a manually curated, quality-controlled metadata warehouse. The problems encountered in integrating disparate and varying quality data into GOLD are briefly highlighted. Lastly, GOLD fully supports and follows the Genomic Standards Consortium (GSC) Minimum Information standards.

  18. Enhanced Acylcarnitine Annotation in High-Resolution Mass Spectrometry Data: Fragmentation Analysis for the Classification and Annotation of Acylcarnitines

    PubMed Central

    van der Hooft, Justin J. J.; Ridder, Lars; Barrett, Michael P.; Burgess, Karl E. V.

    2015-01-01

    Metabolite annotation and identification are primary challenges in untargeted metabolomics experiments. Rigorous workflows for reliable annotation of mass features with chemical structures or compound classes are needed to enhance the power of untargeted mass spectrometry. High-resolution mass spectrometry considerably improves the confidence in assigning elemental formulas to mass features in comparison to nominal mass spectrometry, and embedding of fragmentation methods enables more reliable metabolite annotations and facilitates metabolite classification. However, the analysis of mass fragmentation spectra can be a time-consuming step and requires expert knowledge. This study demonstrates how characteristic fragmentations, specific to compound classes, can be used to systematically analyze their presence in complex biological extracts like urine that have undergone untargeted mass spectrometry combined with data dependent or targeted fragmentation. Human urine extracts were analyzed using normal phase liquid chromatography (hydrophilic interaction chromatography) coupled to an Ion Trap-Orbitrap hybrid instrument. Subsequently, mass chromatograms and collision-induced dissociation and higher-energy collisional dissociation (HCD) fragments were annotated using the freely available MAGMa software1. Acylcarnitines play a central role in energy metabolism by transporting fatty acids into the mitochondrial matrix. By filtering on a combination of a mass fragment and neutral loss designed based on the MAGMa fragment annotations, we were able to classify and annotate 50 acylcarnitines in human urine extracts, based on high-resolution mass spectrometry HCD fragmentation spectra at different energies for all of them. Of these annotated acylcarnitines, 31 are not described in HMDB yet and for only 4 annotated acylcarnitines the fragmentation spectra could be matched to reference spectra. Therefore, we conclude that the use of mass fragmentation filters within the context

  19. Exploration of Noncoding Sequences in Metagenomes

    PubMed Central

    Tobar-Tosse, Fabián; Rodríguez, Adrián C.; Vélez, Patricia E.; Zambrano, María M.; Moreno, Pedro A.

    2013-01-01

    Environment-dependent genomic features have been defined for different metagenomes, whose genes and their associated processes are related to specific environments. Identification of ORFs and their functional categories are the most common methods for association between functional and environmental features. However, this analysis based on finding ORFs misses noncoding sequences and, therefore, some metagenome regulatory or structural information could be discarded. In this work we analyzed 23 whole metagenomes, including coding and noncoding sequences using the following sequence patterns: (G+C) content, Codon Usage (Cd), Trinucleotide Usage (Tn), and functional assignments for ORF prediction. Herein, we present evidence of a high proportion of noncoding sequences discarded in common similarity-based methods in metagenomics, and the kind of relevant information present in those. We found a high density of trinucleotide repeat sequences (TRS) in noncoding sequences, with a regulatory and adaptive function for metagenome communities. We present associations between trinucleotide values and gene function, where metagenome clustering correlate with microorganism adaptations and kinds of metagenomes. We propose here that noncoding sequences have relevant information to describe metagenomes that could be considered in a whole metagenome analysis in order to improve their organization, classification protocols, and their relation with the environment. PMID:23536879

  20. METAXA2: improved identification and taxonomic classification of small and large subunit rRNA in metagenomic data.

    PubMed

    Bengtsson-Palme, Johan; Hartmann, Martin; Eriksson, Karl Martin; Pal, Chandan; Thorell, Kaisa; Larsson, Dan Göran Joakim; Nilsson, Rolf Henrik

    2015-11-01

    The ribosomal rRNA genes are widely used as genetic markers for taxonomic identification of microbes. Particularly the small subunit (SSU; 16S/18S) rRNA gene is frequently used for species- or genus-level identification, but also the large subunit (LSU; 23S/28S) rRNA gene is employed in taxonomic assignment. The METAXA software tool is a popular utility for extracting partial rRNA sequences from large sequencing data sets and assigning them to an archaeal, bacterial, nuclear eukaryote, mitochondrial or chloroplast origin. This study describes a comprehensive update to METAXA - METAXA2 - that extends the capabilities of the tool, introducing support for the LSU rRNA gene, a greatly improved classifier allowing classification down to genus or species level, as well as enhanced support for short-read (100 bp) and paired-end sequences, among other changes. The performance of METAXA2 was compared to other commonly used taxonomic classifiers, showing that METAXA2 often outperforms previous methods in terms of making correct predictions while maintaining a low misclassification rate. METAXA2 is freely available from http://microbiology.se/software/metaxa2/.

  1. Non-linear classification for on-the-fly fractional mass filtering and targeted precursor fragmentation in mass spectrometry experiments

    PubMed Central

    Kirchner, Marc; Timm, Wiebke; Fong, Peying; Wangemann, Philine; Steen, Hanno

    2010-01-01

    Motivation: Mass spectrometry (MS) has become the method of choice for protein/peptide sequence and modification analysis. The technology employs a two-step approach: ionized peptide precursor masses are detected, selected for fragmentation, and the fragment mass spectra are collected for computational analysis. Current precursor selection schemes are based on data- or information-dependent acquisition (DDA/IDA), where fragmentation mass candidates are selected by intensity and are subsequently included in a dynamic exclusion list to avoid constant refragmentation of highly abundant species. DDA/IDA methods do not exploit valuable information that is contained in the fractional mass of high-accuracy precursor mass measurements delivered by current instrumentation. Results: We extend previous contributions that suggest that fractional mass information allows targeted fragmentation of analytes of interest. We introduce a non-linear Random Forest classification and a discrete mapping approach, which can be trained to discriminate among arbitrary fractional mass patterns for an arbitrary number of classes of analytes. These methods can be used to increase fragmentation efficiency for specific subsets of analytes or to select suitable fragmentation technologies on-the-fly. We show that theoretical generalization error estimates transfer into practical application, and that their quality depends on the accuracy of prior distribution estimate of the analyte classes. The methods are applied to two real-world proteomics datasets. Availability: All software used in this study is available from http://software.steenlab.org/fmf Contact: hanno.steen@childrens.harvard.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:20134030

  2. Distribution and Classification of Serine β-Lactamases in Brazilian Hospital Sewage and Other Environmental Metagenomes Deposited in Public Databases.

    PubMed

    Fróes, Adriana M; da Mota, Fábio F; Cuadrat, Rafael R C; Dávila, Alberto M R

    2016-01-01

    serine β-lactamases, indicating the specificity and high sensitivity of this approach in large datasets, contributing for the identification and classification of a large number of homologous genes, comprising possible new ones. Phylogenetic analysis revealed the potential reservoir of β-lactam resistance genes in the environment, contributing to understanding the evolution and dissemination of these genes.

  3. Distribution and Classification of Serine β-Lactamases in Brazilian Hospital Sewage and Other Environmental Metagenomes Deposited in Public Databases

    PubMed Central

    Fróes, Adriana M.; da Mota, Fábio F.; Cuadrat, Rafael R. C.; Dávila, Alberto M. R.

    2016-01-01

    serine β-lactamases, indicating the specificity and high sensitivity of this approach in large datasets, contributing for the identification and classification of a large number of homologous genes, comprising possible new ones. Phylogenetic analysis revealed the potential reservoir of β-lactam resistance genes in the environment, contributing to understanding the evolution and dissemination of these genes. PMID:27895627

  4. Rapid identification and classification of bacteria by 16S rDNA restriction fragment melting curve analyses (RFMCA).

    PubMed

    Rudi, Knut; Kleiberg, Gro H; Heiberg, Ragnhild; Rosnes, Jan T

    2007-08-01

    The aim of this work was to evaluate restriction fragment melting curve analyses (RFMCA) as a novel approach for rapid classification of bacteria during food production. RFMCA was evaluated for bacteria isolated from sous vide food products, and raw materials used for sous vide production. We identified four major bacterial groups in the material analysed (cluster I-Streptococcus, cluster II-Carnobacterium/Bacillus, cluster III-Staphylococcus and cluster IV-Actinomycetales). The accuracy of RFMCA was evaluated by comparison with 16S rDNA sequencing. The strains satisfying the RFMCA quality filtering criteria (73%, n=57), with both 16S rDNA sequence information and RFMCA data (n=45) gave identical group assignments with the two methods. RFMCA enabled rapid and accurate classification of bacteria that is database compatible. Potential application of RFMCA in the food or pharmaceutical industry will include development of classification models for the bacteria expected in a given product, and then to build an RFMCA database as a part of the product quality control.

  5. Metagenomic Analysis of Silage

    PubMed Central

    Tennant, Richard K.; Sambles, Christine M.; Diffey, Georgina E.; Moore, Karen A.; Love, John

    2017-01-01

    Metagenomics is defined as the direct analysis of deoxyribonucleic acid (DNA) purified from environmental samples and enables taxonomic identification of the microbial communities present within them. Two main metagenomic approaches exist; sequencing the 16S rRNA gene coding region, which exhibits sufficient variation between taxa for identification, and shotgun sequencing, in which genomes of the organisms that are present in the sample are analyzed and ascribed to "operational taxonomic units"; species, genera or families depending on the extent of sequencing coverage. In this study, shotgun sequencing was used to analyze the microbial community present in cattle silage and, coupled with a range of bioinformatics tools to quality check and filter the DNA sequence reads, perform taxonomic classification of the microbial populations present within the sampled silage, and achieve functional annotation of the sequences. These methods were employed to identify potentially harmful bacteria that existed within the silage, an indication of silage spoilage. If spoiled silage is not remediated, then upon ingestion it could be potentially fatal to the livestock. PMID:28117801

  6. Swine Fecal Metagenomics

    EPA Science Inventory

    Metagenomic approaches are providing rapid and more robust means to investigate the composition and functional genetic potential of complex microbial communities. In this study, we utilized a metagenomic approach to further understand the functional diversity of the swine gut. To...

  7. Evolutionary dynamics of clustered irregularly interspaced short palindromic repeat systems in the ocean metagenome.

    PubMed

    Sorokin, Valery A; Gelfand, Mikhail S; Artamonova, Irena I

    2010-04-01

    Clustered regularly interspaced short palindromic repeats (CRISPRs) form a recently characterized type of prokaryotic antiphage defense system. The phage-host interactions involving CRISPRs have been studied in experiments with selected bacterial or archaeal species and, computationally, in completely sequenced genomes. However, these studies do not allow one to take prokaryotic population diversity and phage-host interaction dynamics into account. This gap can be filled by using metagenomic data: in particular, the largest existing data set, generated from the Sorcerer II Global Ocean Sampling expedition. The application of three publicly available CRISPR recognition programs to the Global Ocean metagenome produced a large proportion of false-positive results. To address this problem, a filtering procedure was designed. It resulted in about 200 reliable CRISPR cassettes, which were then studied in detail. The repeat consensuses were clustered into several stable classes that differed from the existing classification. Short fragments of DNA similar to the cassette spacers were more frequently present in the same geographical location than in other locations (P, <0.0001). We developed a catalogue of elementary CRISPR-forming events and reconstructed the likely evolutionary history of cassettes that had common spacers. Metagenomic collections allow for relatively unbiased analysis of phage-host interactions and CRISPR evolution. The results of this study demonstrate that CRISPR cassettes retain the memory of the local virus population at a particular ocean location. CRISPR evolution may be described using a limited vocabulary of elementary events that have a natural biological interpretation.

  8. A highly optimized grid deployment: the metagenomic analysis example.

    PubMed

    Aparicio, Gabriel; Blanquer, Ignacio; Hernández, Vicente

    2008-01-01

    Computational resources and computationally expensive processes are two topics that are not growing at the same ratio. The availability of large amounts of computing resources in Grid infrastructures does not mean that efficiency is not an important issue. It is necessary to analyze the whole process to improve partitioning and submission schemas, especially in the most critical experiments. This is the case of metagenomic analysis, and this text shows the work done in order to optimize a Grid deployment, which has led to a reduction of the response time and the failure rates. Metagenomic studies aim at processing samples of multiple specimens to extract the genes and proteins that belong to the different species. In many cases, the sequencing of the DNA of many microorganisms is hindered by the impossibility of growing significant samples of isolated specimens. Many bacteria cannot survive alone, and require the interaction with other organisms. In such cases, the information of the DNA available belongs to different kinds of organisms. One important stage in Metagenomic analysis consists on the extraction of fragments followed by the comparison and analysis of their function stage. By the comparison to existing chains, whose function is well known, fragments can be classified. This process is computationally intensive and requires of several iterations of alignment and phylogeny classification steps. Source samples reach several millions of sequences, which could reach up to thousands of nucleotides each. These sequences are compared to a selected part of the "Non-redundant" database which only implies the information from eukaryotic species. From this first analysis, a refining process is performed and alignment analysis is restarted from the results. This process implies several CPU years. The article describes and analyzes the difficulties to fragment, automate and check the above operations in current Grid production environments. This environment has been

  9. Classification

    ERIC Educational Resources Information Center

    Clary, Renee; Wandersee, James

    2013-01-01

    In this article, Renee Clary and James Wandersee describe the beginnings of "Classification," which lies at the very heart of science and depends upon pattern recognition. Clary and Wandersee approach patterns by first telling the story of the "Linnaean classification system," introduced by Carl Linnacus (1707-1778), who is…

  10. Megraft: A software package to graft ribosomal small subunit (16S/18S) fragments onto full-length sequences for accurate species richness and sequencing depth analysis in pyrosequencing-length metagenomes

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Metagenomic libraries represent subsamples of the total DNA found at a study site and offer unprecedented opportunities to study ecological and functional aspects of microbial communities. To examine the depth of the sequencing effort, rarefaction analysis of the ribosomal small sub-unit (SSU/16S/18...

  11. Consensus statement: Virus taxonomy in the age of metagenomics.

    PubMed

    Simmonds, Peter; Adams, Mike J; Benkő, Mária; Breitbart, Mya; Brister, J Rodney; Carstens, Eric B; Davison, Andrew J; Delwart, Eric; Gorbalenya, Alexander E; Harrach, Balázs; Hull, Roger; King, Andrew M Q; Koonin, Eugene V; Krupovic, Mart; Kuhn, Jens H; Lefkowitz, Elliot J; Nibert, Max L; Orton, Richard; Roossinck, Marilyn J; Sabanadzovic, Sead; Sullivan, Matthew B; Suttle, Curtis A; Tesh, Robert B; van der Vlugt, René A; Varsani, Arvind; Zerbini, F Murilo

    2017-03-01

    The number and diversity of viral sequences that are identified in metagenomic data far exceeds that of experimentally characterized virus isolates. In a recent workshop, a panel of experts discussed the proposal that, with appropriate quality control, viruses that are known only from metagenomic data can, and should be, incorporated into the official classification scheme of the International Committee on Taxonomy of Viruses (ICTV). Although a taxonomy that is based on metagenomic sequence data alone represents a substantial departure from the traditional reliance on phenotypic properties, the development of a robust framework for sequence-based virus taxonomy is indispensable for the comprehensive characterization of the global virome. In this Consensus Statement article, we consider the rationale for why metagenomic sequence data should, and how it can, be incorporated into the ICTV taxonomy, and present proposals that have been endorsed by the Executive Committee of the ICTV.

  12. QRS Fragmentation Patterns Representing Myocardial Scar Need to Be Separated from Benign Normal Variants: Hypotheses and Proposal for Morphology based Classification

    PubMed Central

    Haukilahti, M. Anette E.; Eranti, Antti; Kenttä, Tuomas; Huikuri, Heikki V.

    2016-01-01

    The presence of a fragmented QRS complex (fQRS) in two contiguous leads of a standard 12-lead electrocardiogram (ECG) has been shown to be an indicator of myocardial scar in multiple different populations of cardiac patients. QRS fragmentation is also a predictor of adverse prognosis in acute myocardial infarction, coronary artery disease, and ischemic cardiomyopathy and a prognostic tool in structural heart diseases. An increased risk of sudden cardiac death associated with fQRS has been documented in patients with ischemic cardiomyopathy and hypertrophic cardiomyopathy. However, fQRS is also frequently observed in apparently healthy subjects. Thus, a more detailed classification of different QRS fragmentations is needed to identify the pathological fragmentation patterns and refine the role of fQRS as a risk marker of adverse cardiac events and sudden cardiac death. In most studies fQRS has been defined by the presence of an additional R wave (R′), or notching in the nadir of the S wave, or the presence of >1 R′ in two contiguous leads corresponding to a major coronary territory. However, this approach does not discriminate between minor and major fragmentations and the location of the fQRS is also neglected. In addition to this, the method is susceptible to large interobserver variability. We suppose that some fQRS subtypes result from conduction delays in the His-Purkinje system, which is a benign finding and thus can weaken the prognostic values of fQRS. The classification of fQRSs to subtypes with unambiguous definitions is needed to overcome the interobserver variability related issues and to separate fQRSs caused by myocardial scarring from benign normal variants. In this paper, we review the anatomic correlates of fQRS and the current knowledge of prognostic significance of fQRS. We also propose a detailed fQRS classification for research purposes which can later be simplified after the truly pathological morphologies have been identified. The research

  13. QRS Fragmentation Patterns Representing Myocardial Scar Need to Be Separated from Benign Normal Variants: Hypotheses and Proposal for Morphology based Classification.

    PubMed

    Haukilahti, M Anette E; Eranti, Antti; Kenttä, Tuomas; Huikuri, Heikki V

    2016-01-01

    The presence of a fragmented QRS complex (fQRS) in two contiguous leads of a standard 12-lead electrocardiogram (ECG) has been shown to be an indicator of myocardial scar in multiple different populations of cardiac patients. QRS fragmentation is also a predictor of adverse prognosis in acute myocardial infarction, coronary artery disease, and ischemic cardiomyopathy and a prognostic tool in structural heart diseases. An increased risk of sudden cardiac death associated with fQRS has been documented in patients with ischemic cardiomyopathy and hypertrophic cardiomyopathy. However, fQRS is also frequently observed in apparently healthy subjects. Thus, a more detailed classification of different QRS fragmentations is needed to identify the pathological fragmentation patterns and refine the role of fQRS as a risk marker of adverse cardiac events and sudden cardiac death. In most studies fQRS has been defined by the presence of an additional R wave (R'), or notching in the nadir of the S wave, or the presence of >1 R' in two contiguous leads corresponding to a major coronary territory. However, this approach does not discriminate between minor and major fragmentations and the location of the fQRS is also neglected. In addition to this, the method is susceptible to large interobserver variability. We suppose that some fQRS subtypes result from conduction delays in the His-Purkinje system, which is a benign finding and thus can weaken the prognostic values of fQRS. The classification of fQRSs to subtypes with unambiguous definitions is needed to overcome the interobserver variability related issues and to separate fQRSs caused by myocardial scarring from benign normal variants. In this paper, we review the anatomic correlates of fQRS and the current knowledge of prognostic significance of fQRS. We also propose a detailed fQRS classification for research purposes which can later be simplified after the truly pathological morphologies have been identified. The research

  14. Structural and Functional Insights from the Metagenome of an Acidic Hot Spring Microbial Planktonic Community in the Colombian Andes

    PubMed Central

    Jiménez, Diego Javier; Andreote, Fernando Dini; Chaves, Diego; Montaña, José Salvador; Osorio-Forero, Cesar; Junca, Howard; Zambrano, María Mercedes; Baena, Sandra

    2012-01-01

    A taxonomic and annotated functional description of microbial life was deduced from 53 Mb of metagenomic sequence retrieved from a planktonic fraction of the Neotropical high Andean (3,973 meters above sea level) acidic hot spring El Coquito (EC). A classification of unassembled metagenomic reads using different databases showed a high proportion of Gammaproteobacteria and Alphaproteobacteria (in total read affiliation), and through taxonomic affiliation of 16S rRNA gene fragments we observed the presence of Proteobacteria, micro-algae chloroplast and Firmicutes. Reads mapped against the genomes Acidiphilium cryptum JF-5, Legionella pneumophila str. Corby and Acidithiobacillus caldus revealed the presence of transposase-like sequences, potentially involved in horizontal gene transfer. Functional annotation and hierarchical comparison with different datasets obtained by pyrosequencing in different ecosystems showed that the microbial community also contained extensive DNA repair systems, possibly to cope with ultraviolet radiation at such high altitudes. Analysis of genes involved in the nitrogen cycle indicated the presence of dissimilatory nitrate reduction to N2 (narGHI, nirS, norBCDQ and nosZ), associated with Proteobacteria-like sequences. Genes involved in the sulfur cycle (cysDN, cysNC and aprA) indicated adenylsulfate and sulfite production that were affiliated to several bacterial species. In summary, metagenomic sequence data provided insight regarding the structure and possible functions of this hot spring microbial community, describing some groups potentially involved in the nitrogen and sulfur cycling in this environment. PMID:23251687

  15. A Primer on Metagenomics

    PubMed Central

    Wooley, John C.; Godzik, Adam; Friedberg, Iddo

    2010-01-01

    Metagenomics is a discipline that enables the genomic study of uncultured microorganisms. Faster, cheaper sequencing technologies and the ability to sequence uncultured microbes sampled directly from their habitats are expanding and transforming our view of the microbial world. Distilling meaningful information from the millions of new genomic sequences presents a serious challenge to bioinformaticians. In cultured microbes, the genomic data come from a single clone, making sequence assembly and annotation tractable. In metagenomics, the data come from heterogeneous microbial communities, sometimes containing more than 10,000 species, with the sequence data being noisy and partial. From sampling, to assembly, to gene calling and function prediction, bioinformatics faces new demands in interpreting voluminous, noisy, and often partial sequence data. Although metagenomics is a relative newcomer to science, the past few years have seen an explosion in computational methods applied to metagenomic-based research. It is therefore not within the scope of this article to provide an exhaustive review. Rather, we provide here a concise yet comprehensive introduction to the current computational requirements presented by metagenomics, and review the recent progress made. We also note whether there is software that implements any of the methods presented here, and briefly review its utility. Nevertheless, it would be useful if readers of this article would avail themselves of the comment section provided by this journal, and relate their own experiences. Finally, the last section of this article provides a few representative studies illustrating different facets of recent scientific discoveries made using metagenomics. PMID:20195499

  16. Random whole metagenomic sequencing for forensic discrimination of soils.

    PubMed

    Khodakova, Anastasia S; Smith, Renee J; Burgoyne, Leigh; Abarno, Damien; Linacre, Adrian

    2014-01-01

    Here we assess the ability of random whole metagenomic sequencing approaches to discriminate between similar soils from two geographically distinct urban sites for application in forensic science. Repeat samples from two parklands in residential areas separated by approximately 3 km were collected and the DNA was extracted. Shotgun, whole genome amplification (WGA) and single arbitrarily primed DNA amplification (AP-PCR) based sequencing techniques were then used to generate soil metagenomic profiles. Full and subsampled metagenomic datasets were then annotated against M5NR/M5RNA (taxonomic classification) and SEED Subsystems (metabolic classification) databases. Further comparative analyses were performed using a number of statistical tools including: hierarchical agglomerative clustering (CLUSTER); similarity profile analysis (SIMPROF); non-metric multidimensional scaling (NMDS); and canonical analysis of principal coordinates (CAP) at all major levels of taxonomic and metabolic classification. Our data showed that shotgun and WGA-based approaches generated highly similar metagenomic profiles for the soil samples such that the soil samples could not be distinguished accurately. An AP-PCR based approach was shown to be successful at obtaining reproducible site-specific metagenomic DNA profiles, which in turn were employed for successful discrimination of visually similar soil samples collected from two different locations.

  17. Classification.

    PubMed

    Tuxhorn, Ingrid; Kotagal, Prakash

    2008-07-01

    In this article, we review the practical approach and diagnostic relevance of current seizure and epilepsy classification concepts and principles as a basic framework for good management of patients with epileptic seizures and epilepsy. Inaccurate generalizations about terminology, diagnosis, and treatment may be the single most important factor, next to an inadequately obtained history, that determines the misdiagnosis and mismanagement of patients with epilepsy. A stepwise signs and symptoms approach for diagnosis, evaluation, and management along the guidelines of the International League Against Epilepsy and definitions of epileptic seizures and epilepsy syndromes offers a state-of-the-art clinical approach to managing patients with epilepsy.

  18. Classification

    NASA Technical Reports Server (NTRS)

    Oza, Nikunj C.

    2011-01-01

    A supervised learning task involves constructing a mapping from input data (normally described by several features) to the appropriate outputs. Within supervised learning, one type of task is a classification learning task, in which each output is one or more classes to which the input belongs. In supervised learning, a set of training examples---examples with known output values---is used by a learning algorithm to generate a model. This model is intended to approximate the mapping between the inputs and outputs. This model can be used to generate predicted outputs for inputs that have not been seen before. For example, we may have data consisting of observations of sunspots. In a classification learning task, our goal may be to learn to classify sunspots into one of several types. Each example may correspond to one candidate sunspot with various measurements or just an image. A learning algorithm would use the supplied examples to generate a model that approximates the mapping between each supplied set of measurements and the type of sunspot. This model can then be used to classify previously unseen sunspots based on the candidate's measurements. This chapter discusses methods to perform machine learning, with examples involving astronomy.

  19. TheViral MetaGenome Annotation Pipeline(VMGAP):an automated tool for the functional annotation of viral Metagenomic shotgun sequencing data

    PubMed Central

    Lorenzi, Hernan A.; Hoover, Jeff; Inman, Jason; Safford, Todd; Murphy, Sean; Kagan, Leonid; Williamson, Shannon J.

    2011-01-01

    In the past few years, the field of metagenomics has been growing at an accelerated pace, particularly in response to advancements in new sequencing technologies. The large volume of sequence data from novel organisms generated by metagenomic projects has triggered the development of specialized databases and tools focused on particular groups of organisms or data types. Here we describe a pipeline for the functional annotation of viral metagenomic sequence data. The Viral MetaGenome Annotation Pipeline (VMGAP) pipeline takes advantage of a number of specialized databases, such as collections of mobile genetic elements and environmental metagenomes to improve the classification and functional prediction of viral gene products. The pipeline assigns a functional term to each predicted protein sequence following a suite of comprehensive analyses whose results are ranked according to a priority rules hierarchy. Additional annotation is provided in the form of enzyme commission (EC) numbers, GO/MeGO terms and Hidden Markov Models together with supporting evidence. PMID:21886867

  20. Bambus 2: scaffolding metagenomes

    PubMed Central

    Koren, Sergey; Treangen, Todd J.; Pop, Mihai

    2011-01-01

    Motivation: Sequencing projects increasingly target samples from non-clonal sources. In particular, metagenomics has enabled scientists to begin to characterize the structure of microbial communities. The software tools developed for assembling and analyzing sequencing data for clonal organisms are, however, unable to adequately process data derived from non-clonal sources. Results: We present a new scaffolder, Bambus 2, to address some of the challenges encountered when analyzing metagenomes. Our approach relies on a combination of a novel method for detecting genomic repeats and algorithms that analyze assembly graphs to identify biologically meaningful genomic variants. We compare our software to current assemblers using simulated and real data. We demonstrate that the repeat detection algorithms have higher sensitivity than current approaches without sacrificing specificity. In metagenomic datasets, the scaffolder avoids false joins between distantly related organisms while obtaining long-range contiguity. Bambus 2 represents a first step toward automated metagenomic assembly. Availability: Bambus 2 is open source and available from http://amos.sf.net. Contact: mpop@umiacs.umd.edu Supplementary Information: Supplementary data are available at Bioinformatics online. PMID:21926123

  1. Beyond biodiversity: fish metagenomes.

    PubMed

    Ardura, Alba; Planes, Serge; Garcia-Vazquez, Eva

    2011-01-01

    Biodiversity and intra-specific genetic diversity are interrelated and determine the potential of a community to survive and evolve. Both are considered together in Prokaryote communities treated as metagenomes or ensembles of functional variants beyond species limits.Many factors alter biodiversity in higher Eukaryote communities, and human exploitation can be one of the most important for some groups of plants and animals. For example, fisheries can modify both biodiversity and genetic diversity (intra specific). Intra-specific diversity can be drastically altered by overfishing. Intense fishing pressure on one stock may imply extinction of some genetic variants and subsequent loss of intra-specific diversity. The objective of this study was to apply a metagenome approach to fish communities and explore its value for rapid evaluation of biodiversity and genetic diversity at community level. Here we have applied the metagenome approach employing the barcoding target gene coi as a model sequence in catch from four very different fish assemblages exploited by fisheries: freshwater communities from the Amazon River and northern Spanish rivers, and marine communities from the Cantabric and Mediterranean seas.Treating all sequences obtained from each regional catch as a biological unit (exploited community) we found that metagenomic diversity indices of the Amazonian catch sample here examined were lower than expected. Reduced diversity could be explained, at least partially, by overexploitation of the fish community that had been independently estimated by other methods.We propose using a metagenome approach for estimating diversity in Eukaryote communities and early evaluating genetic variation losses at multi-species level.

  2. Recovering full-length viral genomes from metagenomes

    PubMed Central

    Smits, Saskia L.; Bodewes, Rogier; Ruiz-González, Aritz; Baumgärtner, Wolfgang; Koopmans, Marion P.; Osterhaus, Albert D. M. E.; Schürch, Anita C.

    2015-01-01

    Infectious disease metagenomics is driven by the question: “what is causing the disease?” in contrast to classical metagenome studies which are guided by “what is out there?” In case of a novel virus, a first step to eventually establishing etiology can be to recover a full-length viral genome from a metagenomic sample. However, retrieval of a full-length genome of a divergent virus is technically challenging and can be time-consuming and costly. Here we discuss different assembly and fragment linkage strategies such as iterative assembly, motif searches, k-mer frequency profiling, coverage profile binning, and other strategies used to recover genomes of potential viral pathogens in a timely and cost-effective manner. PMID:26483782

  3. Metagenomic small molecule discovery methods

    PubMed Central

    Charlop-Powers, Zachary; Milshteyn, Aleksandr; Brady, Sean F.

    2014-01-01

    Metagenomic approaches to natural product discovery provide the means of harvesting bioactive small molecules synthesized by environmental bacteria without the requirement of first culturing these organisms. Advances in sequencing technologies and general metagenomic methods are beginning to provide the tools necessary to unlock the unexplored biosynthetic potential encoded by the genomes of uncultured environmental bacteria. Here, we highlight recent advances in sequence- and functional- based metagenomic approaches that promise to facilitate antibiotic discovery from diverse environmental microbiomes. PMID:25000402

  4. Hot Spring Metagenomics

    PubMed Central

    López-López, Olalla; Cerdán, María Esperanza; González-Siso, María Isabel

    2013-01-01

    Hot springs have been investigated since the XIX century, but isolation and examination of their thermophilic microbial inhabitants did not start until the 1950s. Many thermophilic microorganisms and their viruses have since been discovered, although the real complexity of thermal communities was envisaged when research based on PCR amplification of the 16S rRNA genes arose. Thereafter, the possibility of cloning and sequencing the total environmental DNA, defined as metagenome, and the study of the genes rescued in the metagenomic libraries and assemblies made it possible to gain a more comprehensive understanding of microbial communities—their diversity, structure, the interactions existing between their components, and the factors shaping the nature of these communities. In the last decade, hot springs have been a source of thermophilic enzymes of industrial interest, encouraging further study of the poorly understood diversity of microbial life in these habitats. PMID:25369743

  5. Metagenomic Analysis of Bacterial Communities of Antarctic Surface Snow

    PubMed Central

    Lopatina, Anna; Medvedeva, Sofia; Shmakov, Sergey; Logacheva, Maria D.; Krylenkov, Vjacheslav; Severinov, Konstantin

    2016-01-01

    The diversity of bacteria present in surface snow around four Russian stations in Eastern Antarctica was studied by high throughput sequencing of amplified 16S rRNA gene fragments and shotgun metagenomic sequencing. Considerable class- and genus-level variation between the samples was revealed indicating a presence of inter-site diversity of bacteria in Antarctic snow. Flavobacterium was a major genus in one sampling site and was also detected in other sites. The diversity of flavobacterial type II-C CRISPR spacers in the samples was investigated by metagenome sequencing. Thousands of unique spacers were revealed with less than 35% overlap between the sampling sites, indicating an enormous natural variety of flavobacterial CRISPR spacers and, by extension, high level of adaptive activity of the corresponding CRISPR-Cas system. None of the spacers matched known spacers of flavobacterial isolates from the Northern hemisphere. Moreover, the percentage of spacers with matches with Antarctic metagenomic sequences obtained in this work was significantly higher than with sequences from much larger publically available environmental metagenomic database. The results indicate that despite the overall very high level of diversity, Antarctic Flavobacteria comprise a separate pool that experiences pressures from mobile genetic elements different from those present in other parts of the world. The results also establish analysis of metagenomic CRISPR spacer content as a powerful tool to study bacterial populations diversity. PMID:27064693

  6. A Graph-Centric Approach for Metagenome-Guided Peptide and Protein Identification in Metaproteomics.

    PubMed

    Tang, Haixu; Li, Sujun; Ye, Yuzhen

    2016-12-01

    Metaproteomic studies adopt the common bottom-up proteomics approach to investigate the protein composition and the dynamics of protein expression in microbial communities. When matched metagenomic and/or metatranscriptomic data of the microbial communities are available, metaproteomic data analyses often employ a metagenome-guided approach, in which complete or fragmental protein-coding genes are first directly predicted from metagenomic (and/or metatranscriptomic) sequences or from their assemblies, and the resulting protein sequences are then used as the reference database for peptide/protein identification from MS/MS spectra. This approach is often limited because protein coding genes predicted from metagenomes are incomplete and fragmental. In this paper, we present a graph-centric approach to improving metagenome-guided peptide and protein identification in metaproteomics. Our method exploits the de Bruijn graph structure reported by metagenome assembly algorithms to generate a comprehensive database of protein sequences encoded in the community. We tested our method using several public metaproteomic datasets with matched metagenomic and metatranscriptomic sequencing data acquired from complex microbial communities in a biological wastewater treatment plant. The results showed that many more peptides and proteins can be identified when assembly graphs were utilized, improving the characterization of the proteins expressed in the microbial communities. The additional proteins we identified contribute to the characterization of important pathways such as those involved in degradation of chemical hazards. Our tools are released as open-source software on github at https://github.com/COL-IU/Graph2Pro.

  7. A Graph-Centric Approach for Metagenome-Guided Peptide and Protein Identification in Metaproteomics

    PubMed Central

    Tang, Haixu; Li, Sujun; Ye, Yuzhen

    2016-01-01

    Metaproteomic studies adopt the common bottom-up proteomics approach to investigate the protein composition and the dynamics of protein expression in microbial communities. When matched metagenomic and/or metatranscriptomic data of the microbial communities are available, metaproteomic data analyses often employ a metagenome-guided approach, in which complete or fragmental protein-coding genes are first directly predicted from metagenomic (and/or metatranscriptomic) sequences or from their assemblies, and the resulting protein sequences are then used as the reference database for peptide/protein identification from MS/MS spectra. This approach is often limited because protein coding genes predicted from metagenomes are incomplete and fragmental. In this paper, we present a graph-centric approach to improving metagenome-guided peptide and protein identification in metaproteomics. Our method exploits the de Bruijn graph structure reported by metagenome assembly algorithms to generate a comprehensive database of protein sequences encoded in the community. We tested our method using several public metaproteomic datasets with matched metagenomic and metatranscriptomic sequencing data acquired from complex microbial communities in a biological wastewater treatment plant. The results showed that many more peptides and proteins can be identified when assembly graphs were utilized, improving the characterization of the proteins expressed in the microbial communities. The additional proteins we identified contribute to the characterization of important pathways such as those involved in degradation of chemical hazards. Our tools are released as open-source software on github at https://github.com/COL-IU/Graph2Pro. PMID:27918579

  8. Databases of the marine metagenomics.

    PubMed

    Mineta, Katsuhiko; Gojobori, Takashi

    2016-02-01

    The metagenomic data obtained from marine environments is significantly useful for understanding marine microbial communities. In comparison with the conventional amplicon-based approach of metagenomics, the recent shotgun sequencing-based approach has become a powerful tool that provides an efficient way of grasping a diversity of the entire microbial community at a sampling point in the sea. However, this approach accelerates accumulation of the metagenome data as well as increase of data complexity. Moreover, when metagenomic approach is used for monitoring a time change of marine environments at multiple locations of the seawater, accumulation of metagenomics data will become tremendous with an enormous speed. Because this kind of situation has started becoming of reality at many marine research institutions and stations all over the world, it looks obvious that the data management and analysis will be confronted by the so-called Big Data issues such as how the database can be constructed in an efficient way and how useful knowledge should be extracted from a vast amount of the data. In this review, we summarize the outline of all the major databases of marine metagenome that are currently publically available, noting that database exclusively on marine metagenome is none but the number of metagenome databases including marine metagenome data are six, unexpectedly still small. We also extend our explanation to the databases, as reference database we call, that will be useful for constructing a marine metagenome database as well as complementing important information with the database. Then, we would point out a number of challenges to be conquered in constructing the marine metagenome database.

  9. Microbial Metagenomics: Beyond the Genome

    NASA Astrophysics Data System (ADS)

    Gilbert, Jack A.; Dupont, Christopher L.

    2011-01-01

    Metagenomics literally means “beyond the genome.” Marine microbial metagenomic databases presently comprise ˜400 billion base pairs of DNA, only ˜3% of that found in 1 ml of seawater. Very soon a trillion-base-pair sequence run will be feasible, so it is time to reflect on what we have learned from metagenomics. We review the impact of metagenomics on our understanding of marine microbial communities. We consider the studies facilitated by data generated through the Global Ocean Sampling expedition, as well as the revolution wrought at the individual laboratory level through next generation sequencing technologies. We review recent studies and discoveries since 2008, provide a discussion of bioinformatic analyses, including conceptual pipelines and sequence annotation and predict the future of metagenomics, with suggestions of collaborative community studies tailored toward answering some of the fundamental questions in marine microbial ecology.

  10. Precision Metagenomics: Rapid Metagenomic Analyses for Infectious Disease Diagnostics and Public Health Surveillance.

    PubMed

    Afshinnekoo, Ebrahim; Chou, Chou; Alexander, Noah; Ahsanuddin, Sofia; Schuetz, Audrey N; Mason, Christopher E

    2017-04-01

    Next-generation sequencing (NGS) technologies have ushered in the era of precision medicine, transforming the way we treat cancer patients and diagnose disease. Concomitantly, the advent of these technologies has created a surge of microbiome and metagenomic studies over the last decade, many of which are focused on investigating the host-gene-microbial interactions responsible for the development and spread of infectious diseases, as well as delineating their key role in maintaining health. As we continue to discover more information about the etiology of infectious diseases, the translational potential of metagenomic NGS methods for treatment and rapid diagnosis is becoming abundantly clear. Here, we present a robust protocol for the implementation and application of "precision metagenomics" across various sequencing platforms for clinical samples. Such a pipeline integrates DNA/RNA extraction, library preparation, sequencing, and bioinformatics analyses for taxonomic classification, antimicrobial resistance (AMR) marker screening, and functional analysis (biochemical and metabolic pathway abundance). Moreover, the pipeline has 3 tracks: STAT for results within 24 h; Comprehensive that affords a more in-depth analysis and takes between 5 and 7 d, but offers antimicrobial resistance information; and Targeted, which also requires 5-7 d, but with more sensitive analysis for specific pathogens. Finally, we discuss the challenges that need to be addressed before full integration in the clinical setting.

  11. Use of Substrate-Induced Gene Expression in Metagenomic Analysis of an Aromatic Hydrocarbon-Contaminated Soil

    PubMed Central

    Meier, Matthew J.; Paterson, E. Suzanne

    2015-01-01

    Metagenomics allows the study of genes related to xenobiotic degradation in a culture-independent manner, but many of these studies are limited by the lack of genomic context for metagenomic sequences. This study combined a phenotypic screen known as substrate-induced gene expression (SIGEX) with whole-metagenome shotgun sequencing. SIGEX is a high-throughput promoter-trap method that relies on transcriptional activation of a green fluorescent protein (GFP) reporter gene in response to an inducing compound and subsequent fluorescence-activated cell sorting to isolate individual inducible clones from a metagenomic DNA library. We describe a SIGEX procedure with improved library construction from fragmented metagenomic DNA and improved flow cytometry sorting procedures. We used SIGEX to interrogate an aromatic hydrocarbon (AH)-contaminated soil metagenome. The recovered clones contained sequences with various degrees of similarity to genes (or partial genes) involved in aromatic metabolism, for example, nahG (salicylate oxygenase) family genes and their respective upstream nahR regulators. To obtain a broader context for the recovered fragments, clones were mapped to contigs derived from de novo assembly of shotgun-sequenced metagenomic DNA which, in most cases, contained complete operons involved in aromatic metabolism, providing greater insight into the origin of the metagenomic fragments. A comparable set of contigs was generated using a significantly less computationally intensive procedure in which assembly of shotgun-sequenced metagenomic DNA was directed by the SIGEX-recovered sequences. This methodology may have broad applicability in identifying biologically relevant subsets of metagenomes (including both novel and known sequences) that can be targeted computationally by in silico assembly and prediction tools. PMID:26590287

  12. Use of Substrate-Induced Gene Expression in Metagenomic Analysis of an Aromatic Hydrocarbon-Contaminated Soil.

    PubMed

    Meier, Matthew J; Paterson, E Suzanne; Lambert, Iain B

    2015-11-20

    Metagenomics allows the study of genes related to xenobiotic degradation in a culture-independent manner, but many of these studies are limited by the lack of genomic context for metagenomic sequences. This study combined a phenotypic screen known as substrate-induced gene expression (SIGEX) with whole-metagenome shotgun sequencing. SIGEX is a high-throughput promoter-trap method that relies on transcriptional activation of a green fluorescent protein (GFP) reporter gene in response to an inducing compound and subsequent fluorescence-activated cell sorting to isolate individual inducible clones from a metagenomic DNA library. We describe a SIGEX procedure with improved library construction from fragmented metagenomic DNA and improved flow cytometry sorting procedures. We used SIGEX to interrogate an aromatic hydrocarbon (AH)-contaminated soil metagenome. The recovered clones contained sequences with various degrees of similarity to genes (or partial genes) involved in aromatic metabolism, for example, nahG (salicylate oxygenase) family genes and their respective upstream nahR regulators. To obtain a broader context for the recovered fragments, clones were mapped to contigs derived from de novo assembly of shotgun-sequenced metagenomic DNA which, in most cases, contained complete operons involved in aromatic metabolism, providing greater insight into the origin of the metagenomic fragments. A comparable set of contigs was generated using a significantly less computationally intensive procedure in which assembly of shotgun-sequenced metagenomic DNA was directed by the SIGEX-recovered sequences. This methodology may have broad applicability in identifying biologically relevant subsets of metagenomes (including both novel and known sequences) that can be targeted computationally by in silico assembly and prediction tools.

  13. Captured metagenomics: large-scale targeting of genes based on ‘sequence capture’ reveals functional diversity in soils

    PubMed Central

    Manoharan, Lokeshwaran; Kushwaha, Sandeep K.; Hedlund, Katarina; Ahrén, Dag

    2015-01-01

    Microbial enzyme diversity is a key to understand many ecosystem processes. Whole metagenome sequencing (WMG) obtains information on functional genes, but it is costly and inefficient due to large amount of sequencing that is required. In this study, we have applied a captured metagenomics technique for functional genes in soil microorganisms, as an alternative to WMG. Large-scale targeting of functional genes, coding for enzymes related to organic matter degradation, was applied to two agricultural soil communities through captured metagenomics. Captured metagenomics uses custom-designed, hybridization-based oligonucleotide probes that enrich functional genes of interest in metagenomic libraries where only probe-bound DNA fragments are sequenced. The captured metagenomes were highly enriched with targeted genes while maintaining their target diversity and their taxonomic distribution correlated well with the traditional ribosomal sequencing. The captured metagenomes were highly enriched with genes related to organic matter degradation; at least five times more than similar, publicly available soil WMG projects. This target enrichment technique also preserves the functional representation of the soils, thereby facilitating comparative metagenomics projects. Here, we present the first study that applies the captured metagenomics approach in large scale, and this novel method allows deep investigations of central ecosystem processes by studying functional gene abundances. PMID:26490729

  14. Reconstructing the genomic content of microbiome taxa through shotgun metagenomic deconvolution.

    PubMed

    Carr, Rogan; Shen-Orr, Shai S; Borenstein, Elhanan

    2013-01-01

    Metagenomics has transformed our understanding of the microbial world, allowing researchers to bypass the need to isolate and culture individual taxa and to directly characterize both the taxonomic and gene compositions of environmental samples. However, associating the genes found in a metagenomic sample with the specific taxa of origin remains a critical challenge. Existing binning methods, based on nucleotide composition or alignment to reference genomes allow only a coarse-grained classification and rely heavily on the availability of sequenced genomes from closely related taxa. Here, we introduce a novel computational framework, integrating variation in gene abundances across multiple samples with taxonomic abundance data to deconvolve metagenomic samples into taxa-specific gene profiles and to reconstruct the genomic content of community members. This assembly-free method is not bounded by various factors limiting previously described methods of metagenomic binning or metagenomic assembly and represents a fundamentally different approach to metagenomic-based genome reconstruction. An implementation of this framework is available at http://elbo.gs.washington.edu/software.html. We first describe the mathematical foundations of our framework and discuss considerations for implementing its various components. We demonstrate the ability of this framework to accurately deconvolve a set of metagenomic samples and to recover the gene content of individual taxa using synthetic metagenomic samples. We specifically characterize determinants of prediction accuracy and examine the impact of annotation errors on the reconstructed genomes. We finally apply metagenomic deconvolution to samples from the Human Microbiome Project, successfully reconstructing genus-level genomic content of various microbial genera, based solely on variation in gene count. These reconstructed genera are shown to correctly capture genus-specific properties. With the accumulation of metagenomic

  15. Binning of shallowly sampled metagenomic sequence fragments reveals that low abundance bacteria play important roles in sulfur cycling and degradation of complex organic polymers in an acid mine drainage community

    NASA Astrophysics Data System (ADS)

    Dick, G. J.; Andersson, A.; Banfield, J. F.

    2007-12-01

    Our understanding of environmental microbiology has been greatly enhanced by community genome sequencing of DNA recovered directly the environment. Community genomics provides insights into the diversity, community structure, metabolic function, and evolution of natural populations of uncultivated microbes, thereby revealing dynamics of how microorganisms interact with each other and their environment. Recent studies have demonstrated the potential for reconstructing near-complete genomes from natural environments while highlighting the challenges of analyzing community genomic sequence, especially from diverse environments. A major challenge of shotgun community genome sequencing is identification of DNA fragments from minor community members for which only low coverage of genomic sequence is present. We analyzed community genome sequence retrieved from biofilms in an acid mine drainage (AMD) system in the Richmond Mine at Iron Mountain, CA, with an emphasis on identification and assembly of DNA fragments from low-abundance community members. The Richmond mine hosts an extensive, relatively low diversity subterranean chemolithoautotrophic community that is sustained entirely by oxidative dissolution of pyrite. The activity of these microorganisms greatly accelerates the generation of AMD. Previous and ongoing work in our laboratory has focused on reconstrucing genomes of dominant community members, including several bacteria and archaea. We binned contigs from several samples (including one new sample and two that had been previously analyzed) by tetranucleotide frequency with clustering by Self-Organizing Maps (SOM). The binning, evaluated by comparison with information from the manually curated assembly of the dominant organisms, was found to be very effective: fragments were correctly assigned with 95% accuracy. Improperly assigned fragments often contained sequences that are either evolutionarily constrained (e.g. 16S rRNA genes) or mobile elements that are

  16. New hydrocarbon degradation pathways in the microbial metagenome from Brazilian petroleum reservoirs.

    PubMed

    Sierra-García, Isabel Natalia; Correa Alvarez, Javier; de Vasconcellos, Suzan Pantaroto; Pereira de Souza, Anete; dos Santos Neto, Eugenio Vaz; de Oliveira, Valéria Maia

    2014-01-01

    Current knowledge of the microbial diversity and metabolic pathways involved in hydrocarbon degradation in petroleum reservoirs is still limited, mostly due to the difficulty in recovering the complex community from such an extreme environment. Metagenomics is a valuable tool to investigate the genetic and functional diversity of previously uncultured microorganisms in natural environments. Using a function-driven metagenomic approach, we investigated the metabolic abilities of microbial communities in oil reservoirs. Here, we describe novel functional metabolic pathways involved in the biodegradation of aromatic compounds in a metagenomic library obtained from an oil reservoir. Although many of the deduced proteins shared homology with known enzymes of different well-described aerobic and anaerobic catabolic pathways, the metagenomic fragments did not contain the complete clusters known to be involved in hydrocarbon degradation. Instead, the metagenomic fragments comprised genes belonging to different pathways, showing novel gene arrangements. These results reinforce the potential of the metagenomic approach for the identification and elucidation of new genes and pathways in poorly studied environments and contribute to a broader perspective on the hydrocarbon degradation processes in petroleum reservoirs.

  17. New Hydrocarbon Degradation Pathways in the Microbial Metagenome from Brazilian Petroleum Reservoirs

    PubMed Central

    Sierra-García, Isabel Natalia; Correa Alvarez, Javier; Pantaroto de Vasconcellos, Suzan; Pereira de Souza, Anete; dos Santos Neto, Eugenio Vaz; de Oliveira, Valéria Maia

    2014-01-01

    Current knowledge of the microbial diversity and metabolic pathways involved in hydrocarbon degradation in petroleum reservoirs is still limited, mostly due to the difficulty in recovering the complex community from such an extreme environment. Metagenomics is a valuable tool to investigate the genetic and functional diversity of previously uncultured microorganisms in natural environments. Using a function-driven metagenomic approach, we investigated the metabolic abilities of microbial communities in oil reservoirs. Here, we describe novel functional metabolic pathways involved in the biodegradation of aromatic compounds in a metagenomic library obtained from an oil reservoir. Although many of the deduced proteins shared homology with known enzymes of different well-described aerobic and anaerobic catabolic pathways, the metagenomic fragments did not contain the complete clusters known to be involved in hydrocarbon degradation. Instead, the metagenomic fragments comprised genes belonging to different pathways, showing novel gene arrangements. These results reinforce the potential of the metagenomic approach for the identification and elucidation of new genes and pathways in poorly studied environments and contribute to a broader perspective on the hydrocarbon degradation processes in petroleum reservoirs. PMID:24587220

  18. A study of the effectiveness of machine learning methods for classification of clinical interview fragments into a large number of categories.

    PubMed

    Hasan, Mehedi; Kotov, Alexander; Idalski Carcone, April; Dong, Ming; Naar, Sylvie; Brogan Hartlieb, Kathryn

    2016-08-01

    This study examines the effectiveness of state-of-the-art supervised machine learning methods in conjunction with different feature types for the task of automatic annotation of fragments of clinical text based on codebooks with a large number of categories. We used a collection of motivational interview transcripts consisting of 11,353 utterances, which were manually annotated by two human coders as the gold standard, and experimented with state-of-art classifiers, including Naïve Bayes, J48 Decision Tree, Support Vector Machine (SVM), Random Forest (RF), AdaBoost, DiscLDA, Conditional Random Fields (CRF) and Convolutional Neural Network (CNN) in conjunction with lexical, contextual (label of the previous utterance) and semantic (distribution of words in the utterance across the Linguistic Inquiry and Word Count dictionaries) features. We found out that, when the number of classes is large, the performance of CNN and CRF is inferior to SVM. When only lexical features were used, interview transcripts were automatically annotated by SVM with the highest classification accuracy among all classifiers of 70.8%, 61% and 53.7% based on the codebooks consisting of 17, 20 and 41 codes, respectively. Using contextual and semantic features, as well as their combination, in addition to lexical ones, improved the accuracy of SVM for annotation of utterances in motivational interview transcripts with a codebook consisting of 17 classes to 71.5%, 74.2%, and 75.1%, respectively. Our results demonstrate the potential of using machine learning methods in conjunction with lexical, semantic and contextual features for automatic annotation of clinical interview transcripts with near-human accuracy.

  19. IMG/M 4 version of the integrated metagenome comparative analysis system.

    PubMed

    Markowitz, Victor M; Chen, I-Min A; Chu, Ken; Szeto, Ernest; Palaniappan, Krishna; Pillay, Manoj; Ratner, Anna; Huang, Jinghua; Pagani, Ioanna; Tringe, Susannah; Huntemann, Marcel; Billis, Konstantinos; Varghese, Neha; Tennessen, Kristin; Mavromatis, Konstantinos; Pati, Amrita; Ivanova, Natalia N; Kyrpides, Nikos C

    2014-01-01

    IMG/M (http://img.jgi.doe.gov/m) provides support for comparative analysis of microbial community aggregate genomes (metagenomes) in the context of a comprehensive set of reference genomes from all three domains of life, as well as plasmids, viruses and genome fragments. IMG/M's data content and analytical tools have expanded continuously since its first version was released in 2007. Since the last report published in the 2012 NAR Database Issue, IMG/M's database architecture, annotation and data integration pipelines and analysis tools have been extended to copewith the rapid growth in the number and size of metagenome data sets handled by the system. IMG/M data marts provide support for the analysis of publicly available genomes, expert review of metagenome annotations (IMG/M ER: http://img.jgi.doe.gov/mer) and Human Microbiome Project (HMP)-specific metagenome samples (IMG/M HMP: http://img.jgi.doe.gov/imgm_hmp).

  20. Bacillus subtilis as a tool for screening soil metagenomic libraries for antimicrobial activities.

    PubMed

    Biver, Sophie; Steels, Sébastien; Portetelle, Daniel; Vandenbol, Micheline

    2013-06-28

    Finding new antimicrobial activities by functional metagenomics has been shown to depend on the heterologous host used to express the foreign DNA. Therefore, efforts are devoted to developing new tools for constructing metagenomic libraries in shuttle vectors replicatable in phylogenetically distinct hosts. Here we evaluated the use of the Escherichia coli-Bacillus subtilis shuttle vector pHT01 to construct a forest-soil metagenomic library. This library was screened in both hosts for antimicrobial activities against four opportunistic bacteria: Proteus vulgaris, Bacillus cereus, Staphylococcus epidermidis, and Micrococcus luteus. A new antibacterial activity against B. cereus was found upon screening in B. subtilis. The new antimicrobial agent, sensitive to proteinase K, was not active when the corresponding DNA fragment was expressed in E. coli. Our results validate the use of pHT01 as a shuttle vector and B. subtilis as a host to isolate new activities by functional metagenomics.

  1. Crass: identification and reconstruction of CRISPR from unassembled metagenomic data.

    PubMed

    Skennerton, Connor T; Imelfort, Michael; Tyson, Gene W

    2013-05-01

    Clustered regularly interspaced short palindromic repeats (CRISPR) constitute a bacterial and archaeal adaptive immune system that protect against bacteriophage (phage). Analysis of CRISPR loci reveals the history of phage infections and provides a direct link between phage and their hosts. All current tools for CRISPR identification have been developed to analyse completed genomes and are not well suited to the analysis of metagenomic data sets, where CRISPR loci are difficult to assemble owing to their repetitive structure and population heterogeneity. Here, we introduce a new algorithm, Crass, which is designed to identify and reconstruct CRISPR loci from raw metagenomic data without the need for assembly or prior knowledge of CRISPR in the data set. CRISPR in assembled data are often fragmented across many contigs/scaffolds and do not fully represent the population heterogeneity of CRISPR loci. Crass identified substantially more CRISPR in metagenomes previously analysed using assembly-based approaches. Using Crass, we were able to detect CRISPR that contained spacers with sequence homology to phage in the system, which would not have been identified using other approaches. The increased sensitivity, specificity and speed of Crass will facilitate comprehensive analysis of CRISPRs in metagenomic data sets, increasing our understanding of phage-host interactions and co-evolution within microbial communities.

  2. A Bioinformatician's Guide to Metagenomics

    SciTech Connect

    Kunin, Victor; Copeland, Alex; Lapidus, Alla; Mavromatis, Konstantinos; Hugenholtz, Philip

    2008-08-01

    As random shotgun metagenomic projects proliferate and become the dominant source of publicly available sequence data, procedures for best practices in their execution and analysis become increasingly important. Based on our experience at the Joint Genome Institute, we describe step-by-step the chain of decisions accompanying a metagenomic project from the viewpoint of a bioinformatician. We guide the reader through a standard workflow for a metagenomic project beginning with pre-sequencing considerations such as community composition and sequence data type that will greatly influence downstream analyses. We proceed with recommendations for sampling and data generation including sample and metadata collection, community profiling, construction of shotgun libraries and sequencing strategies. We then discuss the application of generic sequence processing steps (read preprocessing, assembly, and gene prediction and annotation) to metagenomic datasets by contrast to genome projects. Different types of data analyses particular to metagenomes are then presented including binning, dominant population analysis and gene-centric analysis. Finally data management systems and issues are presented and discussed. We hope that this review will assist bioinformaticians and biologists in making better-informed decisions on their journey during a metagenomic project.

  3. Open resource metagenomics: a model for sharing metagenomic libraries.

    PubMed

    Neufeld, J D; Engel, K; Cheng, J; Moreno-Hagelsieb, G; Rose, D R; Charles, T C

    2011-11-30

    Both sequence-based and activity-based exploitation of environmental DNA have provided unprecedented access to the genomic content of cultivated and uncultivated microorganisms. Although researchers deposit microbial strains in culture collections and DNA sequences in databases, activity-based metagenomic studies typically only publish sequences from the hits retrieved from specific screens. Physical metagenomic libraries, conceptually similar to entire sequence datasets, are usually not straightforward to obtain by interested parties subsequent to publication. In order to facilitate unrestricted distribution of metagenomic libraries, we propose the adoption of open resource metagenomics, in line with the trend towards open access publishing, and similar to culture- and mutant-strain collections that have been the backbone of traditional microbiology and microbial genetics. The concept of open resource metagenomics includes preparation of physical DNA libraries, preferably in versatile vectors that facilitate screening in a diversity of host organisms, and pooling of clones so that single aliquots containing complete libraries can be easily distributed upon request. Database deposition of associated metadata and sequence data for each library provides researchers with information to select the most appropriate libraries for further research projects. As a starting point, we have established the Canadian MetaMicroBiome Library (CM(2)BL [1]). The CM(2)BL is a publicly accessible collection of cosmid libraries containing environmental DNA from soils collected from across Canada, spanning multiple biomes. The libraries were constructed such that the cloned DNA can be easily transferred to Gateway® compliant vectors, facilitating functional screening in virtually any surrogate microbial host for which there are available plasmid vectors. The libraries, which we are placing in the public domain, will be distributed upon request without restriction to members of both the

  4. Open resource metagenomics: a model for sharing metagenomic libraries

    PubMed Central

    Neufeld, J.D.; Engel, K.; Cheng, J.; Moreno-Hagelsieb, G.; Rose, D.R.; Charles, T.C.

    2011-01-01

    Both sequence-based and activity-based exploitation of environmental DNA have provided unprecedented access to the genomic content of cultivated and uncultivated microorganisms. Although researchers deposit microbial strains in culture collections and DNA sequences in databases, activity-based metagenomic studies typically only publish sequences from the hits retrieved from specific screens. Physical metagenomic libraries, conceptually similar to entire sequence datasets, are usually not straightforward to obtain by interested parties subsequent to publication. In order to facilitate unrestricted distribution of metagenomic libraries, we propose the adoption of open resource metagenomics, in line with the trend towards open access publishing, and similar to culture- and mutant-strain collections that have been the backbone of traditional microbiology and microbial genetics. The concept of open resource metagenomics includes preparation of physical DNA libraries, preferably in versatile vectors that facilitate screening in a diversity of host organisms, and pooling of clones so that single aliquots containing complete libraries can be easily distributed upon request. Database deposition of associated metadata and sequence data for each library provides researchers with information to select the most appropriate libraries for further research projects. As a starting point, we have established the Canadian MetaMicroBiome Library (CM2BL [1]). The CM2BL is a publicly accessible collection of cosmid libraries containing environmental DNA from soils collected from across Canada, spanning multiple biomes. The libraries were constructed such that the cloned DNA can be easily transferred to Gateway® compliant vectors, facilitating functional screening in virtually any surrogate microbial host for which there are available plasmid vectors. The libraries, which we are placing in the public domain, will be distributed upon request without restriction to members of both the

  5. CLaMS: Classifier for Metagenomic Sequences

    SciTech Connect

    Pati, Amrita

    2010-12-01

    CLaMS-"Classifer for Metagenonic Sequences" is a Java application for binning assembled metagenomes wings user-specified training sequence sets and other user-specified initial parameters. Since ClAmS analyzes and matches sequence composition-based genomic signatures, it is much faster than binning tools that rely on alignments to homologs; CLaMS can bin ~20,000 sequences in 3 minutes on a laptop with a 2.4 Ghz. Intel Core 2 Duo processor and 2 GB Ram. CLaMS is meant to be desktop application for biologist and can be run on any machine under any operating system on which the Java Runtime Environment is enabled. CLaMS is freely available in both GVI-based and command-line based forms.

  6. MetaBAT: Metagenome Binning based on Abundance and Tetranucleotide frequence

    SciTech Connect

    Kang, Dongwan; Froula, Jeff; Egan, Rob; Wang, Zhong

    2014-03-21

    Grouping large fragments assembled from shotgun metagenomic sequences to deconvolute complex microbial communities, or metagenome binning, enables the study of individual organisms and their interactions. Here we developed automated metagenome binning software, called MetaBAT, which integrates empirical probabilistic distances of genome abundance and tetranucleotide frequency. On synthetic datasets MetaBAT on average achieves 98percent precision and 90percent recall at the strain level with 281 near complete unique genomes. Applying MetaBAT to a human gut microbiome data set we recovered 176 genome bins with 92percent precision and 80percent recall. Further analyses suggest MetaBAT is able to recover genome fragments missed in reference genomes up to 19percent, while 53 genome bins are novel. In summary, we believe MetaBAT is a powerful tool to facilitate comprehensive understanding of complex microbial communities.

  7. Functional metagenomics to decipher food-microbe-host crosstalk.

    PubMed

    Larraufie, Pierre; de Wouters, Tomas; Potocki-Veronese, Gabrielle; Blottière, Hervé M; Doré, Joël

    2015-02-01

    The recent developments of metagenomics permit an extremely high-resolution molecular scan of the intestinal microbiota giving new insights and opening perspectives for clinical applications. Beyond the unprecedented vision of the intestinal microbiota given by large-scale quantitative metagenomics studies, such as the EU MetaHIT project, functional metagenomics tools allow the exploration of fine interactions between food constituents, microbiota and host, leading to the identification of signals and intimate mechanisms of crosstalk, especially between bacteria and human cells. Cloning of large genome fragments, either from complex intestinal communities or from selected bacteria, allows the screening of these biological resources for bioactivity towards complex plant polymers or functional food such as prebiotics. This permitted identification of novel carbohydrate-active enzyme families involved in dietary fibre and host glycan breakdown, and highlighted unsuspected bacterial players at the top of the intestinal microbial food chain. Similarly, exposure of fractions from genomic and metagenomic clones onto human cells engineered with reporter systems to track modulation of immune response, cell proliferation or cell metabolism has allowed the identification of bioactive clones modulating key cell signalling pathways or the induction of specific genes. This opens the possibility to decipher mechanisms by which commensal bacteria or candidate probiotics can modulate the activity of cells in the intestinal epithelium or even in distal organs such as the liver, adipose tissue or the brain. Hence, in spite of our inability to culture many of the dominant microbes of the human intestine, functional metagenomics open a new window for the exploration of food-microbe-host crosstalk.

  8. Challenges of the Unknown: Clinical Application of Microbial Metagenomics

    PubMed Central

    Rose, Graham; Wooldridge, David J.; Anscombe, Catherine; Mee, Edward T.; Misra, Raju V.; Gharbia, Saheer

    2015-01-01

    Availability of fast, high throughput and low cost whole genome sequencing holds great promise within public health microbiology, with applications ranging from outbreak detection and tracking transmission events to understanding the role played by microbial communities in health and disease. Within clinical metagenomics, identifying microorganisms from a complex and host enriched background remains a central computational challenge. As proof of principle, we sequenced two metagenomic samples, a known viral mixture of 25 human pathogens and an unknown complex biological model using benchtop technology. The datasets were then analysed using a bioinformatic pipeline developed around recent fast classification methods. A targeted approach was able to detect 20 of the viruses against a background of host contamination from multiple sources and bacterial contamination. An alternative untargeted identification method was highly correlated with these classifications, and over 1,600 species were identified when applied to the complex biological model, including several species captured at over 50% genome coverage. In summary, this study demonstrates the great potential of applying metagenomics within the clinical laboratory setting and that this can be achieved using infrastructure available to nondedicated sequencing centres. PMID:26451363

  9. Challenges of the Unknown: Clinical Application of Microbial Metagenomics.

    PubMed

    Rose, Graham; Wooldridge, David J; Anscombe, Catherine; Mee, Edward T; Misra, Raju V; Gharbia, Saheer

    2015-01-01

    Availability of fast, high throughput and low cost whole genome sequencing holds great promise within public health microbiology, with applications ranging from outbreak detection and tracking transmission events to understanding the role played by microbial communities in health and disease. Within clinical metagenomics, identifying microorganisms from a complex and host enriched background remains a central computational challenge. As proof of principle, we sequenced two metagenomic samples, a known viral mixture of 25 human pathogens and an unknown complex biological model using benchtop technology. The datasets were then analysed using a bioinformatic pipeline developed around recent fast classification methods. A targeted approach was able to detect 20 of the viruses against a background of host contamination from multiple sources and bacterial contamination. An alternative untargeted identification method was highly correlated with these classifications, and over 1,600 species were identified when applied to the complex biological model, including several species captured at over 50% genome coverage. In summary, this study demonstrates the great potential of applying metagenomics within the clinical laboratory setting and that this can be achieved using infrastructure available to nondedicated sequencing centres.

  10. Metagenomics and the niche concept.

    PubMed

    Marco, Diana

    2008-08-01

    The metagenomics approach has revolutionised the fields of bacterial diversity, ecology and evolution, as well as derived applications like bioremediation and obtaining bioproducts. A further associated conceptual change has also occurred since in the metagenomics methodology the species is no longer the unit of study, but rather partial genome arrangements or even isolated genes. In spite of this, concepts coming from ecological and evolutionary fields traditionally centred on the species, like the concept of niche, are still being applied without further revision. A reformulation of the niche concept is necessary to deal with the new operative and epistemological challenges posed by the metagenomics approach. To contribute to this end, I review past and present uses of the niche concept in ecology and in microbiological studies, showing that a new, updated definition need to be used in the context of the metagenomics. Finally, I give some insights into a more adequate conceptual background for the utilisation of the niche concept in metagenomic studies. In particular, I raise the necessity of including the microbial genetic background as another variable into the niche space.

  11. Web Resources for Metagenomics Studies

    PubMed Central

    Dudhagara, Pravin; Bhavsar, Sunil; Bhagat, Chintan; Ghelani, Anjana; Bhatt, Shreyas; Patel, Rajesh

    2015-01-01

    The development of next-generation sequencing (NGS) platforms spawned an enormous volume of data. This explosion in data has unearthed new scalability challenges for existing bioinformatics tools. The analysis of metagenomic sequences using bioinformatics pipelines is complicated by the substantial complexity of these data. In this article, we review several commonly-used online tools for metagenomics data analysis with respect to their quality and detail of analysis using simulated metagenomics data. There are at least a dozen such software tools presently available in the public domain. Among them, MGRAST, IMG/M, and METAVIR are the most well-known tools according to the number of citations by peer-reviewed scientific media up to mid-2015. Here, we describe 12 online tools with respect to their web link, annotation pipelines, clustering methods, online user support, and availability of data storage. We have also done the rating for each tool to screen more potential and preferential tools and evaluated five best tools using synthetic metagenome. The article comprehensively deals with the contemporary problems and the prospects of metagenomics from a bioinformatics viewpoint. PMID:26602607

  12. IDENTIFICATION OF CHICKEN-SPECIFIC FECAL MICROBIAL SEQUENCES USING A METAGENOMIC APPROACH

    EPA Science Inventory

    In this study, we applied a genome fragment enrichment (GFE) method to select for genomic regions that differ between different fecal metagenomes. Competitive DNA hybridizations were performed between chicken fecal DNA and pig fecal DNA (C-P) and between chicken fecal DNA and an ...

  13. Metagenomics and novel gene discovery

    PubMed Central

    Culligan, Eamonn P; Sleator, Roy D; Marchesi, Julian R; Hill, Colin

    2014-01-01

    Metagenomics provides a means of assessing the total genetic pool of all the microbes in a particular environment, in a culture-independent manner. It has revealed unprecedented diversity in microbial community composition, which is further reflected in the encoded functional diversity of the genomes, a large proportion of which consists of novel genes. Herein, we review both sequence-based and functional metagenomic methods to uncover novel genes and outline some of the associated problems of each type of approach, as well as potential solutions. Furthermore, we discuss the potential for metagenomic biotherapeutic discovery, with a particular focus on the human gut microbiome and finally, we outline how the discovery of novel genes may be used to create bioengineered probiotics. PMID:24317337

  14. riboFrame: An Improved Method for Microbial Taxonomy Profiling from Non-Targeted Metagenomics

    PubMed Central

    Ramazzotti, Matteo; Donati, Claudio; Cavalieri, Duccio

    2015-01-01

    Non-targeted metagenomics offers the unprecedented possibility of simultaneously investigate the microbial profile and the genetic capabilities of a sample by a direct analysis of its entire DNA content. The assessment of the microbial taxonomic composition is frequently obtained by mapping reads to genomic databases that, although growing, are still limited and biased. Here we present riboFrame, a novel procedure for microbial profiling based on the identification and classification of 16S rDNA sequences in non-targeted metagenomics datasets. Reads overlapping the 16S rDNA genes are identified using Hidden Markov Models and a taxonomic assignment is obtained by naïve Bayesian classification. All reads identified as ribosomal are coherently positioned in the 16S rDNA gene, allowing the use of the topology of the gene (i.e., the secondary structure and the location of variable regions) to guide the abundance analysis. We tested and verified the effectiveness of our method on simulated ribosomal data, on simulated metagenomes and on a real dataset. riboFrame exploits the taxonomic potentialities of the 16S rDNA gene in the context of non-targeted metagenomics, giving an accurate perspective on the microbial profile in metagenomic samples. PMID:26635865

  15. riboFrame: An Improved Method for Microbial Taxonomy Profiling from Non-Targeted Metagenomics.

    PubMed

    Ramazzotti, Matteo; Berná, Luisa; Donati, Claudio; Cavalieri, Duccio

    2015-01-01

    Non-targeted metagenomics offers the unprecedented possibility of simultaneously investigate the microbial profile and the genetic capabilities of a sample by a direct analysis of its entire DNA content. The assessment of the microbial taxonomic composition is frequently obtained by mapping reads to genomic databases that, although growing, are still limited and biased. Here we present riboFrame, a novel procedure for microbial profiling based on the identification and classification of 16S rDNA sequences in non-targeted metagenomics datasets. Reads overlapping the 16S rDNA genes are identified using Hidden Markov Models and a taxonomic assignment is obtained by naïve Bayesian classification. All reads identified as ribosomal are coherently positioned in the 16S rDNA gene, allowing the use of the topology of the gene (i.e., the secondary structure and the location of variable regions) to guide the abundance analysis. We tested and verified the effectiveness of our method on simulated ribosomal data, on simulated metagenomes and on a real dataset. riboFrame exploits the taxonomic potentialities of the 16S rDNA gene in the context of non-targeted metagenomics, giving an accurate perspective on the microbial profile in metagenomic samples.

  16. Precision Metagenomics: Rapid Metagenomic Analyses for Infectious Disease Diagnostics and Public Health Surveillance

    PubMed Central

    Afshinnekoo, Ebrahim; Chou, Chou; Alexander, Noah; Ahsanuddin, Sofia; Schuetz, Audrey N.; Mason, Christopher E.

    2017-01-01

    Next-generation sequencing (NGS) technologies have ushered in the era of precision medicine, transforming the way we treat cancer patients and diagnose disease. Concomitantly, the advent of these technologies has created a surge of microbiome and metagenomic studies over the last decade, many of which are focused on investigating the host-gene-microbial interactions responsible for the development and spread of infectious diseases, as well as delineating their key role in maintaining health. As we continue to discover more information about the etiology of infectious diseases, the translational potential of metagenomic NGS methods for treatment and rapid diagnosis is becoming abundantly clear. Here, we present a robust protocol for the implementation and application of “precision metagenomics” across various sequencing platforms for clinical samples. Such a pipeline integrates DNA/RNA extraction, library preparation, sequencing, and bioinformatics analyses for taxonomic classification, antimicrobial resistance (AMR) marker screening, and functional analysis (biochemical and metabolic pathway abundance). Moreover, the pipeline has 3 tracks: STAT for results within 24 h; Comprehensive that affords a more in-depth analysis and takes between 5 and 7 d, but offers antimicrobial resistance information; and Targeted, which also requires 5–7 d, but with more sensitive analysis for specific pathogens. Finally, we discuss the challenges that need to be addressed before full integration in the clinical setting. PMID:28337072

  17. Estimating richness from phage metagenomes

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Bacteriophages are important drivers of ecosystem functions, yet little is known about the vast majority of phages. Phage metagenomics, or the study of the collective genome of an assemblage of phages, enables the investigation of broad ecological questions in phage communities. One ecological cha...

  18. Metagenomic Analysis of the Pygmy Loris Fecal Microbiome Reveals Unique Functional Capacity Related to Metabolism of Aromatic Compounds

    PubMed Central

    Xu, Bo; Xu, Weijiang; Yang, Fuya; Li, Junjun; Yang, Yunjuan; Tang, Xianghua; Mu, Yuelin; Zhou, Junpei; Huang, Zunxi

    2013-01-01

    The animal gastrointestinal tract contains a complex community of microbes, whose composition ultimately reflects the co-evolution of microorganisms with their animal host. An analysis of 78,619 pyrosequencing reads generated from pygmy loris fecal DNA extracts was performed to help better understand the microbial diversity and functional capacity of the pygmy loris gut microbiome. The taxonomic analysis of the metagenomic reads indicated that pygmy loris fecal microbiomes were dominated by Bacteroidetes and Proteobacteria phyla. The hierarchical clustering of several gastrointestinal metagenomes demonstrated the similarities of the microbial community structures of pygmy loris and mouse gut systems despite their differences in functional capacity. The comparative analysis of function classification revealed that the metagenome of the pygmy loris was characterized by an overrepresentation of those sequences involved in aromatic compound metabolism compared with humans and other animals. The key enzymes related to the benzoate degradation pathway were identified based on the Kyoto Encyclopedia of Genes and Genomes pathway assignment. These results would contribute to the limited body of primate metagenome studies and provide a framework for comparative metagenomic analysis between human and non-human primates, as well as a comparative understanding of the evolution of humans and their microbiome. However, future studies on the metagenome sequencing of pygmy loris and other prosimians regarding the effects of age, genetics, and environment on the composition and activity of the metagenomes are required. PMID:23457582

  19. Practical application of self-organizing maps to interrelate biodiversity and functional data in NGS-based metagenomics.

    PubMed

    Weber, Marc; Teeling, Hanno; Huang, Sixing; Waldmann, Jost; Kassabgy, Mariette; Fuchs, Bernhard M; Klindworth, Anna; Klockow, Christine; Wichels, Antje; Gerdts, Gunnar; Amann, Rudolf; Glöckner, Frank Oliver

    2011-05-01

    Next-generation sequencing (NGS) technologies have enabled the application of broad-scale sequencing in microbial biodiversity and metagenome studies. Biodiversity is usually targeted by classifying 16S ribosomal RNA genes, while metagenomic approaches target metabolic genes. However, both approaches remain isolated, as long as the taxonomic and functional information cannot be interrelated. Techniques like self-organizing maps (SOMs) have been applied to cluster metagenomes into taxon-specific bins in order to link biodiversity with functions, but have not been applied to broad-scale NGS-based metagenomics yet. Here, we provide a novel implementation, demonstrate its potential and practicability, and provide a web-based service for public usage. Evaluation with published data sets mimicking varyingly complex habitats resulted into classification specificities and sensitivities of close to 100% to above 90% from phylum to genus level for assemblies exceeding 8 kb for low and medium complexity data. When applied to five real-world metagenomes of medium complexity from direct pyrosequencing of marine subsurface waters, classifications of assemblies above 2.5 kb were in good agreement with fluorescence in situ hybridizations, indicating that biodiversity was mostly retained within the metagenomes, and confirming high classification specificities. This was validated by two protein-based classifications (PBCs) methods. SOMs were able to retrieve the relevant taxa down to the genus level, while surpassing PBCs in resolution. In order to make the approach accessible to a broad audience, we implemented a feature-rich web-based SOM application named TaxSOM, which is freely available at http://www.megx.net/toolbox/taxsom. TaxSOM can classify reads or assemblies exceeding 2.5 kb with high accuracy and thus assists in linking biodiversity and functions in metagenome studies, which is a precondition to study microbial ecology in a holistic fashion.

  20. Use of object-oriented classification and fragmentation analysis (1985-2008) to identify important areas for conservation in Cockpit Country, Jamaica.

    PubMed

    Newman, Minke E; McLaren, Kurt P; Wilson, Byron S

    2011-01-01

    Forest fragmentation is one of the most important threats to global biodiversity, particularly in tropical developing countries. Identifying priority areas for conservation within these forests is essential to their effective management. However, this requires current, accurate environmental information that is often lacking in developing countries. The Cockpit Country, Jamaica, contains forests of international importance in terms of levels of endemism and overall diversity. These forests are under severe threat from the prospect of bauxite mining and other anthropogenic disturbances. In the absence of adequate, up-to-date ecological information, we used satellite remote sensing data and fragmentation analysis to identify interior forested areas that have experienced little or no change as priority conservation sites. We classified Landsat images from 1985, 1989, 1995, 2002, and 2008, using an object-oriented method, which allowed for the inclusion of roads. We conducted our fragmentation analysis using metrics to quantify changes in forest patch number, area, shape, and aggregation. Deforestation and fragmentation fluctuated within the 23-year period but were mostly confined to the periphery of the forest, close to roads and access trails. An area of core forest that remained intact over the period of study was identified within the largest forest patch, most of which was located within the boundaries of a forest reserve and included the last remaining patches of closed-broadleaf forest. These areas should be given highest priority for conservation, as they constitute important refuges for endemic or threatened biodiversity. Minimizing and controlling access will be important in maintaining this core.

  1. C16S - a Hidden Markov Model based algorithm for taxonomic classification of 16S rRNA gene sequences.

    PubMed

    Ghosh, Tarini Shankar; Gajjalla, Purnachander; Mohammed, Monzoorul Haque; Mande, Sharmila S

    2012-04-01

    Recent advances in high throughput sequencing technologies and concurrent refinements in 16S rDNA isolation techniques have facilitated the rapid extraction and sequencing of 16S rDNA content of microbial communities. The taxonomic affiliation of these 16S rDNA fragments is subsequently obtained using either BLAST-based or word frequency based approaches. However, the classification accuracy of such methods is observed to be limited in typical metagenomic scenarios, wherein a majority of organisms are hitherto unknown. In this study, we present a 16S rDNA classification algorithm, called C16S, that uses genus-specific Hidden Markov Models for taxonomic classification of 16S rDNA sequences. Results obtained using C16S have been compared with the widely used RDP classifier. The performance of C16S algorithm was observed to be consistently higher than the RDP classifier. In some scenarios, this increase in accuracy is as high as 34%. A web-server for the C16S algorithm is available at http://metagenomics.atc.tcs.com/C16S/.

  2. Metagenomes from Argonne's MG-RAST Metagenomics Analysis Server

    DOE Data Explorer

    MG-RAST has a large number of datasets that researchers have deposited for public use. As of July, 2014, the number of metagenomes represented by MG-RAST numbered more than 18,500, and the number of available sequences was more than 75 million! The public can browse the collection several different ways, and researchers can login to deposit new data. Researchers have the choice of keeping a dataset private so that it is viewable only by them when logged in, or they can choose to make a dataset public at any time with a simple click of a link. MG-RAST was launched in 2007 by the Mathematics and Computer Science Division at Argonne National Laboratory (ANL). It is part of the toolkit available to the Terragenomics project, which seeks to do a comprehensive metagenomics study of U.S. soil. The Terragenomics project page is located at http://www.mcs.anl.gov/research/projects/terragenomics/.

  3. Metagenome Assembly at the DOE JGI (Metagenomics Informatics Challenges Workshop: 10K Genomes at a Time)

    ScienceCinema

    Chain, Patrick [DOE JGI at LANL

    2016-07-12

    Patrick Chain of DOE JGI at LANL, Co-Chair of the Metagenome-specific Assembly session, on "Metagenome Assembly at the DOE JGI" at the Metagenomics Informatics Challenges Workshop held at the DOE JGI on October 12-13, 2011.

  4. The metagenomics RAST server - a public resource for the automatic phylogenetic and functional analysis of metagenomes.

    SciTech Connect

    Meyer, F.; Paarmann, D.; D'Souza, M.; Olson, R.; Glass, E. M.; Kubal, M.; Paczian, T.; Stevens, R.; Wilke, A.; Wilkening, J.; Edwards, R. A.; Rodriguez, A.; Mathematics and Computer Science; Univ. of Chicago; San Diego State Univ.

    2008-09-19

    Random community genomes (metagenomes) are now commonly used to study microbes in different environments. Over the past few years, the major challenge associated with metagenomics shifted from generating to analyzing sequences. High-throughput, low-cost next-generation sequencing has provided access to metagenomics to a wide range of researchers. A high-throughput pipeline has been constructed to provide high-performance computing to all researchers interested in using metagenomics. The pipeline produces automated functional assignments of sequences in the metagenome by comparing both protein and nucleotide databases. phylogenetic and functional summaries of the metagenomes are generated, and tools for comparative metagenomics are incorporated into the standard views. user access is controlled to ensure data privacy, but the collaborative environment underpinning the service provides a framework for sharing databasets between multiple users. In the metagenomics RAST, all users retain full control of their data, and everything is available for download in a variety of formats. The open-source metagenomics RAST service provides a new paradigm for the annotation and analysis of metagenomes. With built-in support for multiple data sources and a back end that houses abstract data types, the metagenomics RAST is stable, extensible, and freely available to all researchers. This service has removed one of the primary bottlenecks in metagenome sequence analysis--the available of high-performance computing for annotating the data.

  5. The Largest Fragment of a Homogeneous Fragmentation Process

    NASA Astrophysics Data System (ADS)

    Kyprianou, Andreas; Lane, Francis; Mörters, Peter

    2017-03-01

    We show that in homogeneous fragmentation processes the largest fragment at time t has size e^{-t Φ '(overline{p})}t^{-3/2 (log Φ )'(overline{p})+o(1)}, where Φ is the Lévy exponent of the fragmentation process, and overline{p} is the unique solution of the equation (log Φ )'(bar{p})=1/1+bar{p}. We argue that this result is in line with predictions arising from the classification of homogeneous fragmentation processes as logarithmically correlated random fields.

  6. Marine metagenomics as a source for bioprospecting.

    PubMed

    Kodzius, Rimantas; Gojobori, Takashi

    2015-12-01

    This review summarizes usage of genome-editing technologies for metagenomic studies; these studies are used to retrieve and modify valuable microorganisms for production, particularly in marine metagenomics. Organisms may be cultivable or uncultivable. Metagenomics is providing especially valuable information for uncultivable samples. The novel genes, pathways and genomes can be deducted. Therefore, metagenomics, particularly genome engineering and system biology, allows for the enhancement of biological and chemical producers and the creation of novel bioresources. With natural resources rapidly depleting, genomics may be an effective way to efficiently produce quantities of known and novel foods, livestock feed, fuels, pharmaceuticals and fine or bulk chemicals.

  7. Metagenomic Assembly: Overview, Challenges and Applications

    PubMed Central

    Ghurye, Jay S.; Cepeda-Espinoza, Victoria; Pop, Mihai

    2016-01-01

    Advances in sequencing technologies have led to the increased use of high throughput sequencing in characterizing the microbial communities associated with our bodies and our environment. Critical to the analysis of the resulting data are sequence assembly algorithms able to reconstruct genes and organisms from complex mixtures. Metagenomic assembly involves new computational challenges due to the specific characteristics of the metagenomic data. In this survey, we focus on major algorithmic approaches for genome and metagenome assembly, and discuss the new challenges and opportunities afforded by this new field. We also review several applications of metagenome assembly in addressing interesting biological problems. PMID:27698619

  8. Binning sequences using very sparse labels within a metagenome

    PubMed Central

    Chan, Chon-Kit Kenneth; Hsu, Arthur L; Halgamuge, Saman K; Tang, Sen-Lin

    2008-01-01

    Background In metagenomic studies, a process called binning is necessary to assign contigs that belong to multiple species to their respective phylogenetic groups. Most of the current methods of binning, such as BLAST, k-mer and PhyloPythia, involve assigning sequence fragments by comparing sequence similarity or sequence composition with already-sequenced genomes that are still far from comprehensive. We propose a semi-supervised seeding method for binning that does not depend on knowledge of completed genomes. Instead, it extracts the flanking sequences of highly conserved 16S rRNA from the metagenome and uses them as seeds (labels) to assign other reads based on their compositional similarity. Results The proposed seeding method is implemented on an unsupervised Growing Self-Organising Map (GSOM), and called Seeded GSOM (S-GSOM). We compared it with four well-known semi-supervised learning methods in a preliminary test, separating random-length prokaryotic sequence fragments sampled from the NCBI genome database. We identified the flanking sequences of the highly conserved 16S rRNA as suitable seeds that could be used to group the sequence fragments according to their species. S-GSOM showed superior performance compared to the semi-supervised methods tested. Additionally, S-GSOM may also be used to visually identify some species that do not have seeds. The proposed method was then applied to simulated metagenomic datasets using two different confidence threshold settings and compared with PhyloPythia, k-mer and BLAST. At the reference taxonomic level Order, S-GSOM outperformed all k-mer and BLAST results and showed comparable results with PhyloPythia for each of the corresponding confidence settings, where S-GSOM performed better than PhyloPythia in the ≥ 10 reads datasets and comparable in the ≥ 8 kb benchmark tests. Conclusion In the task of binning using semi-supervised learning methods, results indicate S-GSOM to be the best of the methods tested. Most

  9. MIPE: A metagenome-based community structure explorer and SSU primer evaluation tool

    PubMed Central

    Zhou, Quan

    2017-01-01

    An understanding of microbial community structure is an important issue in the field of molecular ecology. The traditional molecular method involves amplification of small subunit ribosomal RNA (SSU rRNA) genes by polymerase chain reaction (PCR). However, PCR-based amplicon approaches are affected by primer bias and chimeras. With the development of high-throughput sequencing technology, unbiased SSU rRNA gene sequences can be mined from shotgun sequencing-based metagenomic or metatranscriptomic datasets to obtain a reflection of the microbial community structure in specific types of environment and to evaluate SSU primers. However, the use of short reads obtained through next-generation sequencing for primer evaluation has not been well resolved. The software MIPE (MIcrobiota metagenome Primer Explorer) was developed to adapt numerous short reads from metagenomes and metatranscriptomes. Using metagenomic or metatranscriptomic datasets as input, MIPE extracts and aligns rRNA to reveal detailed information on microbial composition and evaluate SSU rRNA primers. A mock dataset, a real Metagenomics Rapid Annotation using Subsystem Technology (MG-RAST) test dataset, two PrimerProspector test datasets and a real metatranscriptomic dataset were used to validate MIPE. The software calls Mothur (v1.33.3) and the SILVA database (v119) for the alignment and classification of rRNA genes from a metagenome or metatranscriptome. MIPE can effectively extract shotgun rRNA reads from a metagenome or metatranscriptome and is capable of classifying these sequences and exhibiting sensitivity to different SSU rRNA PCR primers. Therefore, MIPE can be used to guide primer design for specific environmental samples. PMID:28350876

  10. Metagenomics of the Svalbard reindeer rumen microbiome reveals abundance of polysaccharide utilization loci.

    PubMed

    Pope, Phillip B; Mackenzie, Alasdair K; Gregor, Ivan; Smith, Wendy; Sundset, Monica A; McHardy, Alice C; Morrison, Mark; Eijsink, Vincent G H

    2012-01-01

    Lignocellulosic biomass remains a largely untapped source of renewable energy predominantly due to its recalcitrance and an incomplete understanding of how this is overcome in nature. We present here a compositional and comparative analysis of metagenomic data pertaining to a natural biomass-converting ecosystem adapted to austere arctic nutritional conditions, namely the rumen microbiome of Svalbard reindeer (Rangifer tarandus platyrhynchus). Community analysis showed that deeply-branched cellulolytic lineages affiliated to the Bacteroidetes and Firmicutes are dominant, whilst sequence binning methods facilitated the assemblage of metagenomic sequence for a dominant and novel Bacteroidales clade (SRM-1). Analysis of unassembled metagenomic sequence as well as metabolic reconstruction of SRM-1 revealed the presence of multiple polysaccharide utilization loci-like systems (PULs) as well as members of more than 20 glycoside hydrolase and other carbohydrate-active enzyme families targeting various polysaccharides including cellulose, xylan and pectin. Functional screening of cloned metagenome fragments revealed high cellulolytic activity and an abundance of PULs that are rich in endoglucanases (GH5) but devoid of other common enzymes thought to be involved in cellulose degradation. Combining these results with known and partly re-evaluated metagenomic data strongly indicates that much like the human distal gut, the digestive system of herbivores harbours high numbers of deeply branched and as-yet uncultured members of the Bacteroidetes that depend on PUL-like systems for plant biomass degradation.

  11. Integrative workflows for metagenomic analysis

    PubMed Central

    Ladoukakis, Efthymios; Kolisis, Fragiskos N.; Chatziioannou, Aristotelis A.

    2014-01-01

    The rapid evolution of all sequencing technologies, described by the term Next Generation Sequencing (NGS), have revolutionized metagenomic analysis. They constitute a combination of high-throughput analytical protocols, coupled to delicate measuring techniques, in order to potentially discover, properly assemble and map allelic sequences to the correct genomes, achieving particularly high yields for only a fraction of the cost of traditional processes (i.e., Sanger). From a bioinformatic perspective, this boils down to many GB of data being generated from each single sequencing experiment, rendering the management or even the storage, critical bottlenecks with respect to the overall analytical endeavor. The enormous complexity is even more aggravated by the versatility of the processing steps available, represented by the numerous bioinformatic tools that are essential, for each analytical task, in order to fully unveil the genetic content of a metagenomic dataset. These disparate tasks range from simple, nonetheless non-trivial, quality control of raw data to exceptionally complex protein annotation procedures, requesting a high level of expertise for their proper application or the neat implementation of the whole workflow. Furthermore, a bioinformatic analysis of such scale, requires grand computational resources, imposing as the sole realistic solution, the utilization of cloud computing infrastructures. In this review article we discuss different, integrative, bioinformatic solutions available, which address the aforementioned issues, by performing a critical assessment of the available automated pipelines for data management, quality control, and annotation of metagenomic data, embracing various, major sequencing technologies and applications. PMID:25478562

  12. Exploring neighborhoods in the metagenome universe.

    PubMed

    Aßhauer, Kathrin P; Klingenberg, Heiner; Lingner, Thomas; Meinicke, Peter

    2014-07-14

    The variety of metagenomes in current databases provides a rapidly growing source of information for comparative studies. However, the quantity and quality of supplementary metadata is still lagging behind. It is therefore important to be able to identify related metagenomes by means of the available sequence data alone. We have studied efficient sequence-based methods for large-scale identification of similar metagenomes within a database retrieval context. In a broad comparison of different profiling methods we found that vector-based distance measures are well-suitable for the detection of metagenomic neighbors. Our evaluation on more than 1700 publicly available metagenomes indicates that for a query metagenome from a particular habitat on average nine out of ten nearest neighbors represent the same habitat category independent of the utilized profiling method or distance measure. While for well-defined labels a neighborhood accuracy of 100% can be achieved, in general the neighbor detection is severely affected by a natural overlap of manually annotated categories. In addition, we present results of a novel visualization method that is able to reflect the similarity of metagenomes in a 2D scatter plot. The visualization method shows a similarly high accuracy in the reduced space as compared with the high-dimensional profile space. Our study suggests that for inspection of metagenome neighborhoods the profiling methods and distance measures can be chosen to provide a convenient interpretation of results in terms of the underlying features. Furthermore, supplementary metadata of metagenome samples in the future needs to comply with readily available ontologies for fine-grained and standardized annotation. To make profile-based k-nearest-neighbor search and the 2D-visualization of the metagenome universe available to the research community, we included the proposed methods in our CoMet-Universe server for comparative metagenome analysis.

  13. Resolving the Complexity of Human Skin Metagenomes Using Single-Molecule Sequencing

    PubMed Central

    Tsai, Yu-Chih; Deming, Clayton; Segre, Julia A.; Kong, Heidi H.; Korlach, Jonas

    2016-01-01

    ABSTRACT Deep metagenomic shotgun sequencing has emerged as a powerful tool to interrogate composition and function of complex microbial communities. Computational approaches to assemble genome fragments have been demonstrated to be an effective tool for de novo reconstruction of genomes from these communities. However, the resultant “genomes” are typically fragmented and incomplete due to the limited ability of short-read sequence data to assemble complex or low-coverage regions. Here, we use single-molecule, real-time (SMRT) sequencing to reconstruct a high-quality, closed genome of a previously uncharacterized Corynebacterium simulans and its companion bacteriophage from a skin metagenomic sample. Considerable improvement in assembly quality occurs in hybrid approaches incorporating short-read data, with even relatively small amounts of long-read data being sufficient to improve metagenome reconstruction. Using short-read data to evaluate strain variation of this C. simulans in its skin community at single-nucleotide resolution, we observed a dominant C. simulans strain with moderate allelic heterozygosity throughout the population. We demonstrate the utility of SMRT sequencing and hybrid approaches in metagenome quantitation, reconstruction, and annotation. PMID:26861018

  14. Metagenomes from the Saline Desert of Kutch

    PubMed Central

    Pandit, A. S.; Joshi, M. N.; Bhargava, P.; Ayachit, G. N.; Shaikh, I. M.; Saiyed, Z. M.; Saxena, A. K.

    2014-01-01

    We provide the first report on the metagenomic approach for unveiling the microbial diversity in the saline desert of Kutch. High-throughput metagenomic sequencing of environmental DNA isolated from soil collected from seven locations in Kutch was performed on an Ion Torrent platform. PMID:24831151

  15. Metagenomic applications in environmental monitoring and bioremediation

    SciTech Connect

    Techtmann, Stephen M.; Hazen, Terry C.

    2016-01-01

    With the rapid advances in sequencing technology, the cost of sequencing has dramatically dropped and the scale of sequencing projects has increased accordingly. This has provided the opportunity for the routine use of sequencing techniques in the monitoring of environmental microbes. While metagenomic applications have been routinely applied to better understand the ecology and diversity of microbes, their use in environmental monitoring and bioremediation is increasingly common. In this review we seek to provide an overview of some of the metagenomic techniques used in environmental systems biology, addressing their application and limitation. We will also provide several recent examples of the application of metagenomics to bioremediation. We discuss examples where microbial communities have been used to predict the presence and extent of contamination, examples of how metagenomics can be used to characterize the process of natural attenuation by unculturable microbes, as well as examples detailing the use of metagenomics to understand the impact of biostimulation on microbial communities.

  16. Metagenomics: Facts and Artifacts, and Computational Challenges*

    PubMed

    Wooley, John C; Ye, Yuzhen

    2009-01-01

    Metagenomics is the study of microbial communities sampled directly from their natural environment, without prior culturing. By enabling an analysis of populations including many (so-far) unculturable and often unknown microbes, metagenomics is revolutionizing the field of microbiology, and has excited researchers in many disciplines that could benefit from the study of environmental microbes, including those in ecology, environmental sciences, and biomedicine. Specific computational and statistical tools have been developed for metagenomic data analysis and comparison. New studies, however, have revealed various kinds of artifacts present in metagenomics data caused by limitations in the experimental protocols and/or inadequate data analysis procedures, which often lead to incorrect conclusions about a microbial community. Here, we review some of the artifacts, such as overestimation of species diversity and incorrect estimation of gene family frequencies, and discuss emerging computational approaches to address them. We also review potential challenges that metagenomics may encounter with the extensive application of next-generation sequencing (NGS) techniques.

  17. Accurate binning of metagenomic contigs via automated clustering sequences using information of genomic signatures and marker genes

    PubMed Central

    Lin, Hsin-Hung; Liao, Yu-Chieh

    2016-01-01

    Metagenomics, the application of shotgun sequencing, facilitates the reconstruction of the genomes of individual species from natural environments. A major challenge in the genome recovery domain is to agglomerate or ‘bin’ sequences assembled from metagenomic reads into individual groups. Metagenomic binning without consideration of reference sequences enables the comprehensive discovery of new microbial organisms and aids in the microbial genome reconstruction process. Here we present MyCC, an automated binning tool that combines genomic signatures, marker genes and optional contig coverages within one or multiple samples, in order to visualize the metagenomes and to identify the reconstructed genomic fragments. We demonstrate the superior performance of MyCC compared to other binning tools including CONCOCT, GroopM, MaxBin and MetaBAT on both synthetic and real human gut communities with a small sample size (one to 11 samples), as well as on a large metagenome dataset (over 250 samples). Moreover, we demonstrate the visualization of metagenomes in MyCC to aid in the reconstruction of genomes from distinct bins. MyCC is freely available at http://sourceforge.net/projects/sb2nhri/files/MyCC/. PMID:27067514

  18. Multisubstrate isotope labeling and metagenomic analysis of active soil bacterial communities.

    PubMed

    Verastegui, Y; Cheng, J; Engel, K; Kolczynski, D; Mortimer, S; Lavigne, J; Montalibet, J; Romantsov, T; Hall, M; McConkey, B J; Rose, D R; Tomashek, J J; Scott, B R; Charles, T C; Neufeld, J D

    2014-07-15

    Soil microbial diversity represents the largest global reservoir of novel microorganisms and enzymes. In this study, we coupled functional metagenomics and DNA stable-isotope probing (DNA-SIP) using multiple plant-derived carbon substrates and diverse soils to characterize active soil bacterial communities and their glycoside hydrolase genes, which have value for industrial applications. We incubated samples from three disparate Canadian soils (tundra, temperate rainforest, and agricultural) with five native carbon ((12)C) or stable-isotope-labeled ((13)C) carbohydrates (glucose, cellobiose, xylose, arabinose, and cellulose). Indicator species analysis revealed high specificity and fidelity for many uncultured and unclassified bacterial taxa in the heavy DNA for all soils and substrates. Among characterized taxa, Actinomycetales (Salinibacterium), Rhizobiales (Devosia), Rhodospirillales (Telmatospirillum), and Caulobacterales (Phenylobacterium and Asticcacaulis) were bacterial indicator species for the heavy substrates and soils tested. Both Actinomycetales and Caulobacterales (Phenylobacterium) were associated with metabolism of cellulose, and Alphaproteobacteria were associated with the metabolism of arabinose; members of the order Rhizobiales were strongly associated with the metabolism of xylose. Annotated metagenomic data suggested diverse glycoside hydrolase gene representation within the pooled heavy DNA. By screening 2,876 cloned fragments derived from the (13)C-labeled DNA isolated from soils incubated with cellulose, we demonstrate the power of combining DNA-SIP, multiple-displacement amplification (MDA), and functional metagenomics by efficiently isolating multiple clones with activity on carboxymethyl cellulose and fluorogenic proxy substrates for carbohydrate-active enzymes. Importance: The ability to identify genes based on function, instead of sequence homology, allows the discovery of genes that would not be identified through sequence alone. This

  19. GB Virus C/Hepatitis G Virus Groups and Subgroups: Classification by a Restriction Fragment Length Polymorphism Method Based on Phylogenetic Analysis of the 5′ Untranslated Region

    PubMed Central

    Quarleri, J. F.; Mathet, V. L.; Feld, M.; Ferrario, D.; della Latta, M. P.; Verdun, R.; Sánchez, D. O.; Oubiña, J. R.

    1999-01-01

    A phylogenetic tree based on 150 5′ untranslated region sequences deposited in GenBank database allowed segregation of the sequences into three major groups, including two subgroups, i.e., 1, 2a, 2b, and 3, supported by bootstrap analysis. Restriction site analysis of these sequences predicted that HinfI and either AatII or AciI could be used for genomic typing with 99.4% accuracy. cDNA sequencing and subsequent alignment of 21 Argentine GB virus C/hepatitis G virus strains confirmed restriction fragment length polymorphism patterns theoretically predicted. This method may be useful for a rapid screening of samples when either epidemiological or transmission studies of this agent are carried out. PMID:10203483

  20. Assembling The Marine Metagenome, One Cell At A Time

    SciTech Connect

    Xie, Gang; Han, Shunsheng; Kiss, Hajnalka; Saw, Jimmy; Senin, Pavel; Woyke, Tanja; Copeland, Alex; Gonzalez, Jose; Chatterji, Sourav; Cheng, Jan - Fang; Eisen, Jonathan A; Sieracki, Michael E; Stepanauskas, Ramunas

    2008-01-01

    The difficulty associated with the cultivation of most microorganisms and the complexity of natural microbial assemblages, such as marine plankton or human microbiome, hinder genome reconstruction of representative taxa using cultivation or metagenomic approaches. Here we used an alternative, single cell sequencing approach to obtain high-quality genome assemblies of two uncultured, numerically significant marine microorganisms. We employed fluorescence-activated cell sorting and multiple displacement amplification to obtain hundreds of micrograms of genomic DNA from individual, uncultured cells of two marine flavobacteria from the Gulf of Maine that were phylogenetically distant from existing cultured strains. Shotgun sequencing and genome finishing yielded 1.9 Mbp in 17 contigs and 1.5 Mbp in 21 contigs for the two flavobacteria, with estimated genome recoveries of about 91% and 78%, respectively. Only 0.24% of the assembling sequences were contaminants and were removed from further analysis using rigorous quality control. In contrast to all cultured strains of marine flavobacteria, the two single cell genomes were excellent Global Ocean Sampling (GOS) metagenome fragment recruiters, demonstrating their numerical significance in the ocean. The geographic distribution of GOS recruits along the Northwest Atlantic coast coincided with ocean surface currents. Metabolic reconstruction indicated diverse potential energy sources, including biopolymer degradation, proteorhodopsin photometabolism, and hydrogen oxidation. Compared to cultured relatives, the two uncultured flavobacteria have small genome sizes, few non-coding nucleotides, and few paralogous genes, suggesting adaptations to narrow ecological niches. These features may have contributed to the abundance of the two taxa in specific regions of the ocean, and may have hindered their cultivation. We demonstrate the power of single cell DNA sequencing to generate reference genomes of uncultured taxa from a complex

  1. Assembling the Marine Metagenome, One Cell at a Time

    SciTech Connect

    Woyke, Tanja; Xie, Gary; Copeland, Alex; Gonzalez, Jose M.; Han, Cliff; Kiss, Hajnalka; Saw, Jimmy H.; Senin, Pavel; Yang, Chi; Chatterji, Sourav; Cheng, Jan-Fang; Eisen, Jonathan A.; Sieracki, Michael E.; Stepanauskas, Ramunas

    2010-06-24

    The difficulty associated with the cultivation of most microorganisms and the complexity of natural microbial assemblages, such as marine plankton or human microbiome, hinder genome reconstruction of representative taxa using cultivation or metagenomic approaches. Here we used an alternative, single cell sequencing approach to obtain high-quality genome assemblies of two uncultured, numerically significant marine microorganisms. We employed fluorescence-activated cell sorting and multiple displacement amplification to obtain hundreds of micrograms of genomic DNA from individual, uncultured cells of two marine flavobacteria from the Gulf of Maine that were phylogenetically distant from existing cultured strains. Shotgun sequencing and genome finishing yielded 1.9 Mbp in 17 contigs and 1.5 Mbp in 21 contigs for the two flavobacteria, with estimated genome recoveries of about 91percent and 78percent, respectively. Only 0.24percent of the assembling sequences were contaminants and were removed from further analysis using rigorous quality control. In contrast to all cultured strains of marine flavobacteria, the two single cell genomes were excellent Global Ocean Sampling (GOS) metagenome fragment recruiters, demonstrating their numerical significance in the ocean. The geographic distribution of GOS recruits along the Northwest Atlantic coast coincided with ocean surface currents. Metabolic reconstruction indicated diverse potential energy sources, including biopolymer degradation, proteorhodopsin photometabolism, and hydrogen oxidation. Compared to cultured relatives, the two uncultured flavobacteria have small genome sizes, few non-coding nucleotides, and few paralogous genes, suggesting adaptations to narrow ecological niches. These features may have contributed to the abundance of the two taxa in specific regions of the ocean, and may have hindered their cultivation. We demonstrate the power of single cell DNA sequencing to generate reference genomes of uncultured

  2. Metagenomic reconstructions of bacterial CRISPR loci constrain population histories.

    PubMed

    Sun, Christine L; Thomas, Brian C; Barrangou, Rodolphe; Banfield, Jillian F

    2016-04-01

    Bacterial CRISPR-Cas systems provide insight into recent population history because they rapidly incorporate, in a unidirectional manner, short fragments (spacers) from coexisting infective virus populations into host chromosomes. Immunity is achieved by sequence identity between transcripts of spacers and their targets. Here, we used metagenomics to study the stability and dynamics of the type I-E CRISPR-Cas locus of Leptospirillum group II bacteria in biofilms sampled over 5 years from an acid mine drainage (AMD) system. Despite recovery of 452,686 spacers from CRISPR amplicons and metagenomic data, rarefaction curves of spacers show no saturation. The vast repertoire of spacers is attributed to phage/plasmid population diversity and retention of old spacers, despite rapid evolution of the targeted phage/plasmid genome regions (proto-spacers). The oldest spacers (spacers found at the trailer end) are conserved for at least 5 years, and 12% of these retain perfect or near-perfect matches to proto-spacer targets. The majority of proto-spacer regions contain an AAG proto-spacer adjacent motif (PAM). Spacers throughout the locus target the same phage population (AMDV1), but there are blocks of consecutive spacers without AMDV1 target sequences. Results suggest long-term coexistence of Leptospirillum with AMDV1 and periods when AMDV1 was less dominant. Metagenomics can be applied to millions of cells in a single sample to provide an extremely large spacer inventory, allow identification of phage/plasmids and enable analysis of previous phage/plasmid exposure. Thus, this approach can provide insights into prior bacterial environment and genetic interplay between hosts and their viruses.

  3. Human milk metagenome: a functional capacity analysis

    PubMed Central

    2013-01-01

    Background Human milk contains a diverse population of bacteria that likely influences colonization of the infant gastrointestinal tract. Recent studies, however, have been limited to characterization of this microbial community by 16S rRNA analysis. In the present study, a metagenomic approach using Illumina sequencing of a pooled milk sample (ten donors) was employed to determine the genera of bacteria and the types of bacterial open reading frames in human milk that may influence bacterial establishment and stability in this primal food matrix. The human milk metagenome was also compared to that of breast-fed and formula-fed infants’ feces (n = 5, each) and mothers’ feces (n = 3) at the phylum level and at a functional level using open reading frame abundance. Additionally, immune-modulatory bacterial-DNA motifs were also searched for within human milk. Results The bacterial community in human milk contained over 360 prokaryotic genera, with sequences aligning predominantly to the phyla of Proteobacteria (65%) and Firmicutes (34%), and the genera of Pseudomonas (61.1%), Staphylococcus (33.4%) and Streptococcus (0.5%). From assembled human milk-derived contigs, 30,128 open reading frames were annotated and assigned to functional categories. When compared to the metagenome of infants’ and mothers’ feces, the human milk metagenome was less diverse at the phylum level, and contained more open reading frames associated with nitrogen metabolism, membrane transport and stress response (P < 0.05). The human milk metagenome also contained a similar occurrence of immune-modulatory DNA motifs to that of infants’ and mothers’ fecal metagenomes. Conclusions Our results further expand the complexity of the human milk metagenome and enforce the benefits of human milk ingestion on the microbial colonization of the infant gut and immunity. Discovery of immune-modulatory motifs in the metagenome of human milk indicates more exhaustive analyses of the

  4. Chitinase genes revealed and compared in bacterial isolates, DNA extracts and a metagenomic library from a phytopathogen suppressive soil

    SciTech Connect

    Hjort, K.; Bergstrom, M.; Adesina, M.F.; Jansson, J.K.; Smalla, K.; Sjoling, S.

    2009-09-01

    Soil that is suppressive to disease caused by fungal pathogens is an interesting source to target for novel chitinases that might be contributing towards disease suppression. In this study we screened for chitinase genes, in a phytopathogen-suppressive soil in three ways: (1) from a metagenomic library constructed from microbial cells extracted from soil, (2) from directly extracted DNA and (3) from bacterial isolates with antifungal and chitinase activities. Terminal-restriction fragment length polymorphism (T-RFLP) of chitinase genes revealed differences in amplified chitinase genes from the metagenomic library and the directly extracted DNA, but approximately 40% of the identified chitinase terminal-restriction fragments (TRFs) were found in both sources. All of the chitinase TRFs from the isolates were matched to TRFs in the directly extracted DNA and the metagenomic library. The most abundant chitinase TRF in the soil DNA and the metagenomic library corresponded to the TRF{sup 103} of the isolate, Streptomyces mutomycini and/or Streptomyces clavifer. There were good matches between T-RFLP profiles of chitinase gene fragments obtained from different sources of DNA. However, there were also differences in both the chitinase and the 16S rRNA gene T-RFLP patterns depending on the source of DNA, emphasizing the lack of complete coverage of the gene diversity by any of the approaches used.

  5. IMG/M 4 version of the integrated metagenome comparative analysis system

    PubMed Central

    Markowitz, Victor M.; Chen, I-Min A.; Chu, Ken; Szeto, Ernest; Palaniappan, Krishna; Pillay, Manoj; Ratner, Anna; Huang, Jinghua; Pagani, Ioanna; Tringe, Susannah; Huntemann, Marcel; Billis, Konstantinos; Varghese, Neha; Tennessen, Kristin; Mavromatis, Konstantinos; Pati, Amrita; Ivanova, Natalia N.; Kyrpides, Nikos C.

    2014-01-01

    IMG/M (http://img.jgi.doe.gov/m) provides support for comparative analysis of microbial community aggregate genomes (metagenomes) in the context of a comprehensive set of reference genomes from all three domains of life, as well as plasmids, viruses and genome fragments. IMG/M’s data content and analytical tools have expanded continuously since its first version was released in 2007. Since the last report published in the 2012 NAR Database Issue, IMG/M’s database architecture, annotation and data integration pipelines and analysis tools have been extended to copewith the rapid growth in the number and size of metagenome data sets handled by the system. IMG/M data marts provide support for the analysis of publicly available genomes, expert review of metagenome annotations (IMG/M ER: http://img.jgi.doe.gov/mer) and Human Microbiome Project (HMP)-specific metagenome samples (IMG/M HMP: http://img.jgi.doe.gov/imgm_hmp). PMID:24136997

  6. Estimating DNA coverage and abundance in metagenomes using a gamma approximation

    SciTech Connect

    Hooper, Sean D; Dalevi, Daniel; Pati, Amrita; Mavromatis, Konstantinos; Ivanova, Natalia N; Kyrpides, Nikos C

    2010-01-01

    Shotgun sequencing generates large numbers of short DNA reads from either an isolated organism or, in the case of metagenomics projects, from the aggregate genome of a microbial community. These reads are then assembled based on overlapping sequences into larger, contiguous sequences (contigs). The feasibility of assembly and the coverage achieved (reads per nucleotide or distinct sequence of nucleotides) depend on several factors: the number of reads sequenced, the read length and the relative abundances of their source genomes in the microbial community. A low coverage suggests that most of the genomic DNA in the sample has not been sequenced, but it is often difficult to estimate either the extent of the uncaptured diversity or the amount of additional sequencing that would be most efficacious. In this work, we regard a metagenome as a population of DNA fragments (bins), each of which may be covered by one or more reads. We employ a gamma distribution to model this bin population due to its flexibility and ease of use. When a gamma approximation can be found that adequately fits the data, we may estimate the number of bins that were not sequenced and that could potentially be revealed by additional sequencing. We evaluated the performance of this model using simulated metagenomes and demonstrate its applicability on three recent metagenomic datasets.

  7. Toward Accurate and Quantitative Comparative Metagenomics

    PubMed Central

    Nayfach, Stephen; Pollard, Katherine S.

    2016-01-01

    Shotgun metagenomics and computational analysis are used to compare the taxonomic and functional profiles of microbial communities. Leveraging this approach to understand roles of microbes in human biology and other environments requires quantitative data summaries whose values are comparable across samples and studies. Comparability is currently hampered by the use of abundance statistics that do not estimate a meaningful parameter of the microbial community and biases introduced by experimental protocols and data-cleaning approaches. Addressing these challenges, along with improving study design, data access, metadata standardization, and analysis tools, will enable accurate comparative metagenomics. We envision a future in which microbiome studies are replicable and new metagenomes are easily and rapidly integrated with existing data. Only then can the potential of metagenomics for predictive ecological modeling, well-powered association studies, and effective microbiome medicine be fully realized. PMID:27565341

  8. Chameleon fragmentation

    SciTech Connect

    Brax, Philippe

    2014-02-01

    A scalar field dark energy candidate could couple to ordinary matter and photons, enabling its detection in laboratory experiments. Here we study the quantum properties of the chameleon field, one such dark energy candidate, in an ''afterglow'' experiment designed to produce, trap, and detect chameleon particles. In particular, we investigate the possible fragmentation of a beam of chameleon particles into multiple particle states due to the highly non-linear interaction terms in the chameleon Lagrangian. Fragmentation could weaken the constraints of an afterglow experiment by reducing the energy of the regenerated photons, but this energy reduction also provides a unique signature which could be detected by a properly-designed experiment. We show that constraints from the CHASE experiment are essentially unaffected by fragmentation for φ{sup 4} and 1/φ potentials, but are weakened for steeper potentials, and we discuss possible future afterglow experiments.

  9. Challenges and opportunities of airborne metagenomics.

    PubMed

    Behzad, Hayedeh; Gojobori, Takashi; Mineta, Katsuhiko

    2015-05-06

    Recent metagenomic studies of environments, such as marine and soil, have significantly enhanced our understanding of the diverse microbial communities living in these habitats and their essential roles in sustaining vast ecosystems. The increase in the number of publications related to soil and marine metagenomics is in sharp contrast to those of air, yet airborne microbes are thought to have significant impacts on many aspects of our lives from their potential roles in atmospheric events such as cloud formation, precipitation, and atmospheric chemistry to their major impact on human health. In this review, we will discuss the current progress in airborne metagenomics, with a special focus on exploring the challenges and opportunities of undertaking such studies. The main challenges of conducting metagenomic studies of airborne microbes are as follows: 1) Low density of microorganisms in the air, 2) efficient retrieval of microorganisms from the air, 3) variability in airborne microbial community composition, 4) the lack of standardized protocols and methodologies, and 5) DNA sequencing and bioinformatics-related challenges. Overcoming these challenges could provide the groundwork for comprehensive analysis of airborne microbes and their potential impact on the atmosphere, global climate, and our health. Metagenomic studies offer a unique opportunity to examine viral and bacterial diversity in the air and monitor their spread locally or across the globe, including threats from pathogenic microorganisms. Airborne metagenomic studies could also lead to discoveries of novel genes and metabolic pathways relevant to meteorological and industrial applications, environmental bioremediation, and biogeochemical cycles.

  10. Challenges and Opportunities of Airborne Metagenomics

    PubMed Central

    Behzad, Hayedeh; Gojobori, Takashi; Mineta, Katsuhiko

    2015-01-01

    Recent metagenomic studies of environments, such as marine and soil, have significantly enhanced our understanding of the diverse microbial communities living in these habitats and their essential roles in sustaining vast ecosystems. The increase in the number of publications related to soil and marine metagenomics is in sharp contrast to those of air, yet airborne microbes are thought to have significant impacts on many aspects of our lives from their potential roles in atmospheric events such as cloud formation, precipitation, and atmospheric chemistry to their major impact on human health. In this review, we will discuss the current progress in airborne metagenomics, with a special focus on exploring the challenges and opportunities of undertaking such studies. The main challenges of conducting metagenomic studies of airborne microbes are as follows: 1) Low density of microorganisms in the air, 2) efficient retrieval of microorganisms from the air, 3) variability in airborne microbial community composition, 4) the lack of standardized protocols and methodologies, and 5) DNA sequencing and bioinformatics-related challenges. Overcoming these challenges could provide the groundwork for comprehensive analysis of airborne microbes and their potential impact on the atmosphere, global climate, and our health. Metagenomic studies offer a unique opportunity to examine viral and bacterial diversity in the air and monitor their spread locally or across the globe, including threats from pathogenic microorganisms. Airborne metagenomic studies could also lead to discoveries of novel genes and metabolic pathways relevant to meteorological and industrial applications, environmental bioremediation, and biogeochemical cycles. PMID:25953766

  11. The Source and Evolutionary History of a Microbial Contaminant Identified Through Soil Metagenomic Analysis

    PubMed Central

    Olm, Matthew R.; Butterfield, Cristina N.; Copeland, Alex; Boles, T. Christian; Thomas, Brian C.

    2017-01-01

    ABSTRACT In this study, strain-resolved metagenomics was used to solve a mystery. A 6.4-Mbp complete closed genome was recovered from a soil metagenome and found to be astonishingly similar to that of Delftia acidovorans SPH-1, which was isolated in Germany a decade ago. It was suspected that this organism was not native to the soil sample because it lacked the diversity that is characteristic of other soil organisms; this suspicion was confirmed when PCR testing failed to detect the bacterium in the original soil samples. D. acidovorans was also identified in 16 previously published metagenomes from multiple environments, but detailed-scale single nucleotide polymorphism analysis grouped these into five distinct clades. All of the strains indicated as contaminants fell into one clade. Fragment length anomalies were identified in paired reads mapping to the contaminant clade genotypes only. This finding was used to establish that the DNA was present in specific size selection reagents used during sequencing. Ultimately, the source of the contaminant was identified as bacterial biofilms growing in tubing. On the basis of direct measurement of the rate of fixation of mutations across the period of time in which contamination was occurring, we estimated the time of separation of the contaminant strain from the genomically sequenced ancestral population within a factor of 2. This research serves as a case study of high-resolution microbial forensics and strain tracking accomplished through metagenomics-based comparative genomics. The specific case reported here is unusual in that the study was conducted in the background of a soil metagenome and the conclusions were confirmed by independent methods. PMID:28223457

  12. Fragmentation Processes

    NASA Astrophysics Data System (ADS)

    Whelan, Colm T.

    2012-12-01

    Preface; 1. Direct and resonant double-photoionization: from atoms to solids L. Avaldi and G. Stefani; 2. The application of propagation exterior complex scaling to atomic collisions P. L. Bartlett and A. T. Stelbovics; 3. Fragmentation of molecular-ion beams in intense ultra-short laser pulses I. Ben-Itzhak; 4. Atoms with one and two active electrons in strong laser fields I. A. Ivanov and A. S. Kheifets; 5. Experimental aspects of ionization studies by positron and positronium impact G. Laricchia, D. A. Cooke, Á. Kövér and S. J. Brawley; 6. (e,2e) spectroscopy using fragmentation processes J. Lower, M. Yamazaki and M. Takahashi; 7. A coupled pseudostate approach to the calculation of ion-atom fragmentation processes M. McGovern, H. R. J. Walters and C. T. Whelan; 8. Electron Impact Ionization using (e,2e) coincidence techniques from threshold to intermediate energies A. J. Murray; 9. (e,2e) processes on atomic inner shells C. T. Whelan; 10. Spin resolved atomic (e,2e) processes J. Lower and C. T. Whelan; Index.

  13. Under-detection of endospore-forming Firmicutes in metagenomic data

    DOE PAGES

    Filippidou, Sevasti; Junier, Thomas; Wunderlin, Tina; ...

    2015-04-25

    Microbial diversity studies based on metagenomic sequencing have greatly enhanced our knowledge of the microbial world. However, one caveat is the fact that not all microorganisms are equally well detected, questioning the universality of this approach. Firmicutes are known to be a dominant bacterial group. Several Firmicutes species are endospore formers and this property makes them hardy in potentially harsh conditions, and thus likely to be present in a wide variety of environments, even as residents and not functional players. While metagenomic libraries can be expected to contain endospore formers, endospores are known to be resilient to many traditional methodsmore » of DNA isolation and thus potentially undetectable. In this study we evaluated the representation of endospore-forming Firmicutes in 73 published metagenomic datasets using two molecular markers unique to this bacterial group (spo0A and gpr). Both markers were notably absent in well-known habitats of Firmicutes such as soil, with spo0A found only in three mammalian gut microbiomes. A tailored DNA extraction method resulted in the detection of a large diversity of endospore-formers in amplicon sequencing of the 16S rRNA and spo0A genes. However, shotgun classification was still poor with only a minor fraction of the community assigned to Firmicutes. Thus, removing a specific bias in a molecular workflow improves detection in amplicon sequencing, but it was insufficient to overcome the limitations for detecting endospore-forming Firmicutes in whole-genome metagenomics. In conclusion, this study highlights the importance of understanding the specific methodological biases that can contribute to improve the universality of metagenomic approaches.« less

  14. Under-detection of endospore-forming Firmicutes in metagenomic data

    SciTech Connect

    Filippidou, Sevasti; Junier, Thomas; Wunderlin, Tina; Lo, Chien -Chi; Li, Po -E; Chain, Patrick S.; Junier, Pilar

    2015-04-25

    Microbial diversity studies based on metagenomic sequencing have greatly enhanced our knowledge of the microbial world. However, one caveat is the fact that not all microorganisms are equally well detected, questioning the universality of this approach. Firmicutes are known to be a dominant bacterial group. Several Firmicutes species are endospore formers and this property makes them hardy in potentially harsh conditions, and thus likely to be present in a wide variety of environments, even as residents and not functional players. While metagenomic libraries can be expected to contain endospore formers, endospores are known to be resilient to many traditional methods of DNA isolation and thus potentially undetectable. In this study we evaluated the representation of endospore-forming Firmicutes in 73 published metagenomic datasets using two molecular markers unique to this bacterial group (spo0A and gpr). Both markers were notably absent in well-known habitats of Firmicutes such as soil, with spo0A found only in three mammalian gut microbiomes. A tailored DNA extraction method resulted in the detection of a large diversity of endospore-formers in amplicon sequencing of the 16S rRNA and spo0A genes. However, shotgun classification was still poor with only a minor fraction of the community assigned to Firmicutes. Thus, removing a specific bias in a molecular workflow improves detection in amplicon sequencing, but it was insufficient to overcome the limitations for detecting endospore-forming Firmicutes in whole-genome metagenomics. In conclusion, this study highlights the importance of understanding the specific methodological biases that can contribute to improve the universality of metagenomic approaches.

  15. Under-detection of endospore-forming Firmicutes in metagenomic data

    PubMed Central

    Filippidou, Sevasti; Junier, Thomas; Wunderlin, Tina; Lo, Chien-Chi; Li, Po-E; Chain, Patrick S.; Junier, Pilar

    2015-01-01

    Microbial diversity studies based on metagenomic sequencing have greatly enhanced our knowledge of the microbial world. However, one caveat is the fact that not all microorganisms are equally well detected, questioning the universality of this approach. Firmicutes are known to be a dominant bacterial group. Several Firmicutes species are endospore formers and this property makes them hardy in potentially harsh conditions, and thus likely to be present in a wide variety of environments, even as residents and not functional players. While metagenomic libraries can be expected to contain endospore formers, endospores are known to be resilient to many traditional methods of DNA isolation and thus potentially undetectable. In this study we evaluated the representation of endospore-forming Firmicutes in 73 published metagenomic datasets using two molecular markers unique to this bacterial group (spo0A and gpr). Both markers were notably absent in well-known habitats of Firmicutes such as soil, with spo0A found only in three mammalian gut microbiomes. A tailored DNA extraction method resulted in the detection of a large diversity of endospore-formers in amplicon sequencing of the 16S rRNA and spo0A genes. However, shotgun classification was still poor with only a minor fraction of the community assigned to Firmicutes. Thus, removing a specific bias in a molecular workflow improves detection in amplicon sequencing, but it was insufficient to overcome the limitations for detecting endospore-forming Firmicutes in whole-genome metagenomics. In conclusion, this study highlights the importance of understanding the specific methodological biases that can contribute to improve the universality of metagenomic approaches. PMID:25973144

  16. Soil Metagenomes from Different Pristine Environments of Northwest Argentina

    PubMed Central

    Colman, Déborah I.

    2015-01-01

    This is the first study to use a high-throughput metagenomic shotgun approach to explore the biosynthetic potential of soil metagenomes from different pristine environments of northwest Argentina. Our data sets characterize these metagenomes and provide information on the possible effect these ecosystems have on their diversity and biosynthetic potential. PMID:26272581

  17. Soil Metagenomes from Different Pristine Environments of Northwest Argentina.

    PubMed

    McCarthy, Christina B; Colman, Déborah I

    2015-08-13

    This is the first study to use a high-throughput metagenomic shotgun approach to explore the biosynthetic potential of soil metagenomes from different pristine environments of northwest Argentina. Our data sets characterize these metagenomes and provide information on the possible effect these ecosystems have on their diversity and biosynthetic potential.

  18. Metagenomic applications in environmental monitoring and bioremediation

    DOE PAGES

    Techtmann, Stephen M.; Hazen, Terry C.

    2016-01-01

    With the rapid advances in sequencing technology, the cost of sequencing has dramatically dropped and the scale of sequencing projects has increased accordingly. This has provided the opportunity for the routine use of sequencing techniques in the monitoring of environmental microbes. While metagenomic applications have been routinely applied to better understand the ecology and diversity of microbes, their use in environmental monitoring and bioremediation is increasingly common. In this review we seek to provide an overview of some of the metagenomic techniques used in environmental systems biology, addressing their application and limitation. We will also provide several recent examples ofmore » the application of metagenomics to bioremediation. We discuss examples where microbial communities have been used to predict the presence and extent of contamination, examples of how metagenomics can be used to characterize the process of natural attenuation by unculturable microbes, as well as examples detailing the use of metagenomics to understand the impact of biostimulation on microbial communities.« less

  19. Metagenomics: Application of Genomics to Uncultured Microorganisms

    PubMed Central

    Handelsman, Jo

    2004-01-01

    Metagenomics (also referred to as environmental and community genomics) is the genomic analysis of microorganisms by direct extraction and cloning of DNA from an assemblage of microorganisms. The development of metagenomics stemmed from the ineluctable evidence that as-yet-uncultured microorganisms represent the vast majority of organisms in most environments on earth. This evidence was derived from analyses of 16S rRNA gene sequences amplified directly from the environment, an approach that avoided the bias imposed by culturing and led to the discovery of vast new lineages of microbial life. Although the portrait of the microbial world was revolutionized by analysis of 16S rRNA genes, such studies yielded only a phylogenetic description of community membership, providing little insight into the genetics, physiology, and biochemistry of the members. Metagenomics provides a second tier of technical innovation that facilitates study of the physiology and ecology of environmental microorganisms. Novel genes and gene products discovered through metagenomics include the first bacteriorhodopsin of bacterial origin; novel small molecules with antimicrobial activity; and new members of families of known proteins, such as an Na+(Li+)/H+ antiporter, RecA, DNA polymerase, and antibiotic resistance determinants. Reassembly of multiple genomes has provided insight into energy and nutrient cycling within the community, genome structure, gene function, population genetics and microheterogeneity, and lateral gene transfer among members of an uncultured community. The application of metagenomic sequence information will facilitate the design of better culturing strategies to link genomic analysis with pure culture studies. PMID:15590779

  20. Preliminary High-Throughput Metagenome Assembly

    SciTech Connect

    Dusheyko, Serge; Furman, Craig; Pangilinan, Jasmyn; Shapiro, Harris; Tu, Hank

    2007-03-26

    Metagenome data sets present a qualitatively different assembly problem than traditional single-organism whole-genome shotgun (WGS) assembly. The unique aspects of such projects include the presence of a potentially large number of distinct organisms and their representation in the data set at widely different fractions. In addition, multiple closely related strains could be present, which would be difficult to assemble separately. Failure to take these issues into account can result in poor assemblies that either jumble together different strains or which fail to yield useful results. The DOE Joint Genome Institute has sequenced a number of metagenomic projects and plans to considerably increase this number in the coming year. As a result, the JGI has a need for high-throughput tools and techniques for handling metagenome projects. We present the techniques developed to handle metagenome assemblies in a high-throughput environment. This includes a streamlined assembly wrapper, based on the JGI?s in-house WGS assembler, Jazz. It also includes the selection of sensible defaults targeted for metagenome data sets, as well as quality control automation for cleaning up the raw results. While analysis is ongoing, we will discuss preliminary assessments of the quality of the assembly results (http://fames.jgi-psf.org).

  1. Shotgun metagenomic data streams: surfing without fear

    SciTech Connect

    Berendzen, Joel R

    2010-12-06

    Timely information about bio-threat prevalence, consequence, propagation, attribution, and mitigation is needed to support decision-making, both routinely and in a crisis. One DNA sequencer can stream 25 Gbp of information per day, but sampling strategies and analysis techniques are needed to turn raw sequencing power into actionable knowledge. Shotgun metagenomics can enable biosurveillance at the level of a single city, hospital, or airplane. Metagenomics characterizes viruses and bacteria from complex environments such as soil, air filters, or sewage. Unlike targeted-primer-based sequencing, shotgun methods are not blind to sequences that are truly novel, and they can measure absolute prevalence. Shotgun metagenomic sampling can be non-invasive, efficient, and inexpensive while being informative. We have developed analysis techniques for shotgun metagenomic sequencing that rely upon phylogenetic signature patterns. They work by indexing local sequence patterns in a manner similar to web search engines. Our methods are laptop-fast and favorable scaling properties ensure they will be sustainable as sequencing methods grow. We show examples of application to soil metagenomic samples.

  2. Viral Metagenomics: MetaView Software

    SciTech Connect

    Zhou, C; Smith, J

    2007-10-22

    The purpose of this report is to design and develop a tool for analysis of raw sequence read data from viral metagenomics experiments. The tool should compare read sequences of known viral nucleic acid sequence data and enable a user to attempt to determine, with some degree of confidence, what virus groups may be present in the sample. This project was conducted in two phases. In phase 1 we surveyed the literature and examined existing metagenomics tools to educate ourselves and to more precisely define the problem of analyzing raw read data from viral metagenomic experiments. In phase 2 we devised an approach and built a prototype code and database. This code takes viral metagenomic read data in fasta format as input and accesses all complete viral genomes from Kpath for sequence comparison. The system executes at the UNIX command line, producing output that is stored in an Oracle relational database. We provide here a description of the approach we came up with for handling un-assembled, short read data sets from viral metagenomics experiments. We include a discussion of the current MetaView code capabilities and additional functionality that we believe should be added, should additional funding be acquired to continue the work.

  3. Metazen – metadata capture for metagenomes

    DOE PAGES

    Bischof, Jared; Harrison, Travis; Paczian, Tobias; ...

    2014-12-08

    Background: As the impact and prevalence of large-scale metagenomic surveys grow, so does the acute need for more complete and standards compliant metadata. Metadata (data describing data) provides an essential complement to experimental data, helping to answer questions about its source, mode of collection, and reliability. Metadata collection and interpretation have become vital to the genomics and metagenomics communities, but considerable challenges remain, including exchange, curation, and distribution. Currently, tools are available for capturing basic field metadata during sampling, and for storing, updating and viewing it. These tools are not specifically designed for metagenomic surveys; in particular, they lack themore » appropriate metadata collection templates, a centralized storage repository, and a unique ID linking system that can be used to easily port complete and compatible metagenomic metadata into widely used assembly and sequence analysis tools. Results: Metazen was developed as a comprehensive framework designed to enable metadata capture for metagenomic sequencing projects. Specifically, Metazen provides a rapid, easy-to-use portal to encourage early deposition of project and sample metadata. Conclusion: Metazen is an interactive tool that aids users in recording their metadata in a complete and valid format. A defined set of mandatory fields captures vital information, while the option to add fields provides flexibility.« less

  4. Metazen – metadata capture for metagenomes

    SciTech Connect

    Bischof, Jared; Harrison, Travis; Paczian, Tobias; Glass, Elizabeth; Wilke, Andreas; Meyer, Folker

    2014-12-08

    Background: As the impact and prevalence of large-scale metagenomic surveys grow, so does the acute need for more complete and standards compliant metadata. Metadata (data describing data) provides an essential complement to experimental data, helping to answer questions about its source, mode of collection, and reliability. Metadata collection and interpretation have become vital to the genomics and metagenomics communities, but considerable challenges remain, including exchange, curation, and distribution. Currently, tools are available for capturing basic field metadata during sampling, and for storing, updating and viewing it. These tools are not specifically designed for metagenomic surveys; in particular, they lack the appropriate metadata collection templates, a centralized storage repository, and a unique ID linking system that can be used to easily port complete and compatible metagenomic metadata into widely used assembly and sequence analysis tools. Results: Metazen was developed as a comprehensive framework designed to enable metadata capture for metagenomic sequencing projects. Specifically, Metazen provides a rapid, easy-to-use portal to encourage early deposition of project and sample metadata. Conclusion: Metazen is an interactive tool that aids users in recording their metadata in a complete and valid format. A defined set of mandatory fields captures vital information, while the option to add fields provides flexibility.

  5. RiboFR-Seq: a novel approach to linking 16S rRNA amplicon profiles to metagenomes

    PubMed Central

    Zhang, Yanming; Ji, Peifeng; Wang, Jinfeng; Zhao, Fangqing

    2016-01-01

    16S rRNA amplicon analysis and shotgun metagenome sequencing are two main culture-independent strategies to explore the genetic landscape of various microbial communities. Recently, numerous studies have employed these two approaches together, but downstream data analyses were performed separately, which always generated incongruent or conflict signals on both taxonomic and functional classifications. Here we propose a novel approach, RiboFR-Seq (Ribosomal RNA gene flanking region sequencing), for capturing both ribosomal RNA variable regions and their flanking protein-coding genes simultaneously. Through extensive testing on clonal bacterial strain, salivary microbiome and bacterial epibionts of marine kelp, we demonstrated that RiboFR-Seq could detect the vast majority of bacteria not only in well-studied microbiomes but also in novel communities with limited reference genomes. Combined with classical amplicon sequencing and shotgun metagenome sequencing, RiboFR-Seq can link the annotations of 16S rRNA and metagenomic contigs to make a consensus classification. By recognizing almost all 16S rRNA copies, the RiboFR-seq approach can effectively reduce the taxonomic abundance bias resulted from 16S rRNA copy number variation. We believe that RiboFR-Seq, which provides an integrated view of 16S rRNA profiles and metagenomes, will help us better understand diverse microbial communities. PMID:26984526

  6. Amplification methods bias metagenomic libraries of uncultured single-stranded and double-stranded DNA viruses.

    PubMed

    Kim, Kyoung-Ho; Bae, Jin-Woo

    2011-11-01

    Investigation of viruses in the environment often requires the amplification of viral DNA before sequencing of viral metagenomes. In this study, two of the most widely used amplification methods, the linker amplified shotgun library (LASL) and multiple displacement amplification (MDA) methods, were applied to a sample from the seawater surface. Viral DNA was extracted from viruses concentrated by tangential flow filtration and amplified by these two methods. 454 pyrosequencing was used to read the metagenomic sequences from different libraries. The resulting taxonomic classifications of the viruses, their functional assignments, and assembly patterns differed substantially depending on the amplification method. Only double-stranded DNA viruses were retrieved from the LASL, whereas most sequences in the MDA library were from single-stranded DNA viruses, and double-stranded DNA viral sequences were minorities. Thus, the two amplification methods reveal different aspects of viral diversity.

  7. Viral metagenomics and blood safety.

    PubMed

    Sauvage, V; Eloit, M

    2016-02-01

    The characterization of the human blood-associated viral community (also called blood virome) is essential for epidemiological surveillance and to anticipate new potential threats for blood transfusion safety. Currently, the risk of blood-borne agent transmission of well-known viruses (HBV, HCV, HIV and HTLV) can be considered as under control in high-resource countries. However, other viruses unknown or unsuspected may be transmitted to recipients by blood-derived products. This is particularly relevant considering that a significant proportion of transfused patients are immunocompromised and more frequently subjected to fatal outcomes. Several measures to prevent transfusion transmission of unknown viruses have been implemented including the exclusion of at-risk donors, leukocyte reduction of donor blood, and physicochemical treatment of the different blood components. However, up to now there is no universal method for pathogen inactivation, which would be applicable for all types of blood components and, equally effective for all viral families. In addition, among available inactivation procedures of viral genomes, some of them are recognized to be less effective on non-enveloped viruses, and inadequate to inactivate higher viral titers in plasma pools or derivatives. Given this, there is the need to implement new methodologies for the discovery of unknown viruses that may affect blood transfusion. Viral metagenomics combined with High Throughput Sequencing appears as a promising approach for the identification and global surveillance of new and/or unexpected viruses that could impair blood transfusion safety.

  8. Viral metagenomics: are we missing the giants?

    PubMed

    Halary, S; Temmam, S; Raoult, D; Desnues, C

    2016-06-01

    Amoeba-infecting giant viruses are recently discovered viruses that have been isolated from diverse environments all around the world. In parallel to isolation efforts, metagenomics confirmed their worldwide distribution from a broad range of environmental and host-associated samples, including humans, depicting them as a major component of eukaryotic viruses in nature and a possible resident of the human/animal virome whose role is still unclear. Nevertheless, metagenomics data about amoeba-infecting giant viruses still remain scarce, mainly because of methodological limitations. Efforts should be pursued both at the metagenomic sample preparation level and on in silico analyses to better understand their roles in the environment and in human/animal health and disease.

  9. Metagenomics as a Tool for Enzyme Discovery: Hydrolytic Enzymes from Marine-Related Metagenomes.

    PubMed

    Popovic, Ana; Tchigvintsev, Anatoly; Tran, Hai; Chernikova, Tatyana N; Golyshina, Olga V; Yakimov, Michail M; Golyshin, Peter N; Yakunin, Alexander F

    2015-01-01

    This chapter discusses metagenomics and its application for enzyme discovery, with a focus on hydrolytic enzymes from marine metagenomic libraries. With less than one percent of culturable microorganisms in the environment, metagenomics, or the collective study of community genetics, has opened up a rich pool of uncharacterized metabolic pathways, enzymes, and adaptations. This great untapped pool of genes provides the particularly exciting potential to mine for new biochemical activities or novel enzymes with activities tailored to peculiar sets of environmental conditions. Metagenomes also represent a huge reservoir of novel enzymes for applications in biocatalysis, biofuels, and bioremediation. Here we present the results of enzyme discovery for four enzyme activities, of particular industrial or environmental interest, including esterase/lipase, glycosyl hydrolase, protease and dehalogenase.

  10. Metagenomic exploration of viruses throughout the Indian Ocean.

    PubMed

    Williamson, Shannon J; Allen, Lisa Zeigler; Lorenzi, Hernan A; Fadrosh, Douglas W; Brami, Daniel; Thiagarajan, Mathangi; McCrow, John P; Tovchigrechko, Andrey; Yooseph, Shibu; Venter, J Craig

    2012-01-01

    The characterization of global marine microbial taxonomic and functional diversity is a primary goal of the Global Ocean Sampling Expedition. As part of this study, 19 water samples were collected aboard the Sorcerer II sailing vessel from the southern Indian Ocean in an effort to more thoroughly understand the lifestyle strategies of the microbial inhabitants of this ultra-oligotrophic region. No investigations of whole virioplankton assemblages have been conducted on waters collected from the Indian Ocean or across multiple size fractions thus far. Therefore, the goals of this study were to examine the effect of size fractionation on viral consortia structure and function and understand the diversity and functional potential of the Indian Ocean virome. Five samples were selected for comprehensive metagenomic exploration; and sequencing was performed on the microbes captured on 3.0-, 0.8- and 0.1 µm membrane filters as well as the viral fraction (<0.1 µm). Phylogenetic approaches were also used to identify predicted proteins of viral origin in the larger fractions of data from all Indian Ocean samples, which were included in subsequent metagenomic analyses. Taxonomic profiling of viral sequences suggested that size fractionation of marine microbial communities enriches for specific groups of viruses within the different size classes and functional characterization further substantiated this observation. Functional analyses also revealed a relative enrichment for metabolic proteins of viral origin that potentially reflect the physiological condition of host cells in the Indian Ocean including those involved in nitrogen metabolism and oxidative phosphorylation. A novel classification method, MGTAXA, was used to assess virus-host relationships in the Indian Ocean by predicting the taxonomy of putative host genera, with Prochlorococcus, Acanthochlois and members of the SAR86 cluster comprising the most abundant predictions. This is the first study to holistically

  11. Metagenomic Exploration of Viruses throughout the Indian Ocean

    PubMed Central

    Lorenzi, Hernan A.; Fadrosh, Douglas W.; Brami, Daniel; Thiagarajan, Mathangi; McCrow, John P.; Tovchigrechko, Andrey; Yooseph, Shibu; Venter, J. Craig

    2012-01-01

    The characterization of global marine microbial taxonomic and functional diversity is a primary goal of the Global Ocean Sampling Expedition. As part of this study, 19 water samples were collected aboard the Sorcerer II sailing vessel from the southern Indian Ocean in an effort to more thoroughly understand the lifestyle strategies of the microbial inhabitants of this ultra-oligotrophic region. No investigations of whole virioplankton assemblages have been conducted on waters collected from the Indian Ocean or across multiple size fractions thus far. Therefore, the goals of this study were to examine the effect of size fractionation on viral consortia structure and function and understand the diversity and functional potential of the Indian Ocean virome. Five samples were selected for comprehensive metagenomic exploration; and sequencing was performed on the microbes captured on 3.0-, 0.8- and 0.1 µm membrane filters as well as the viral fraction (<0.1 µm). Phylogenetic approaches were also used to identify predicted proteins of viral origin in the larger fractions of data from all Indian Ocean samples, which were included in subsequent metagenomic analyses. Taxonomic profiling of viral sequences suggested that size fractionation of marine microbial communities enriches for specific groups of viruses within the different size classes and functional characterization further substantiated this observation. Functional analyses also revealed a relative enrichment for metabolic proteins of viral origin that potentially reflect the physiological condition of host cells in the Indian Ocean including those involved in nitrogen metabolism and oxidative phosphorylation. A novel classification method, MGTAXA, was used to assess virus-host relationships in the Indian Ocean by predicting the taxonomy of putative host genera, with Prochlorococcus, Acanthochlois and members of the SAR86 cluster comprising the most abundant predictions. This is the first study to holistically

  12. Towards a more complete metagenomics toolkit

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The emerging scientific discipline of metagenomics has not only created a myriad of opportunities for biologists to reveal new insights into the microbial underpinnings of our environment, but has also presented a number of interesting challenges for bioinformatics algorithms and software developers...

  13. The metagenomic approach and causality in virology

    PubMed Central

    Castrignano, Silvana Beres; Nagasse-Sugahara, Teresa Keico

    2015-01-01

    Nowadays, the metagenomic approach has been a very important tool in the discovery of new viruses in environmental and biological samples. Here we discuss how these discoveries may help to elucidate the etiology of diseases and the criteria necessary to establish a causal association between a virus and a disease. PMID:25902566

  14. Building on basic metagenomics with complementary technologies

    PubMed Central

    Warnecke, Falk; Hugenholtz, Philip

    2007-01-01

    Metagenomics, the application of random shotgun sequencing to environmental samples, is a powerful approach for characterizing microbial communities. However, this method only represents the cornerstone of what can be achieved using a range of complementary technologies such as transcriptomics, proteomics, cell sorting and microfluidics. Together, these approaches hold great promise for the study of microbial ecology and evolution. PMID:18177506

  15. Clustering metagenomic sequences with interpolated Markov models

    PubMed Central

    2010-01-01

    Background Sequencing of environmental DNA (often called metagenomics) has shown tremendous potential to uncover the vast number of unknown microbes that cannot be cultured and sequenced by traditional methods. Because the output from metagenomic sequencing is a large set of reads of unknown origin, clustering reads together that were sequenced from the same species is a crucial analysis step. Many effective approaches to this task rely on sequenced genomes in public databases, but these genomes are a highly biased sample that is not necessarily representative of environments interesting to many metagenomics projects. Results We present SCIMM (Sequence Clustering with Interpolated Markov Models), an unsupervised sequence clustering method. SCIMM achieves greater clustering accuracy than previous unsupervised approaches. We examine the limitations of unsupervised learning on complex datasets, and suggest a hybrid of SCIMM and supervised learning method Phymm called PHYSCIMM that performs better when evolutionarily close training genomes are available. Conclusions SCIMM and PHYSCIMM are highly accurate methods to cluster metagenomic sequences. SCIMM operates entirely unsupervised, making it ideal for environments containing mostly novel microbes. PHYSCIMM uses supervised learning to improve clustering in environments containing microbial strains from well-characterized genera. SCIMM and PHYSCIMM are available open source from http://www.cbcb.umd.edu/software/scimm. PMID:21044341

  16. Unravelling core microbial metabolisms in the hypersaline microbial mats of Shark Bay using high-throughput metagenomics

    SciTech Connect

    Ruvindy, Rendy; White III, Richard Allen; Neilan, Brett Anthony; Burns, Brendan Paul

    2015-05-29

    Modern microbial mats are potential analogues of some of Earth’s earliest ecosystems. Excellent examples can be found in Shark Bay, Australia, with mats of various morphologies. To further our understanding of the functional genetic potential of these complex microbial ecosystems, we conducted for the first time shotgun metagenomic analyses. We assembled metagenomic nextgeneration sequencing data to classify the taxonomic and metabolic potential across diverse morphologies of marine mats in Shark Bay. The microbial community across taxonomic classifications using protein-coding and small subunit rRNA genes directly extracted from the metagenomes suggests that three phyla Proteobacteria, Cyanobacteria and Bacteriodetes dominate all marine mats. However, the microbial community structure between Shark Bay and Highbourne Cay (Bahamas) marine systems appears to be distinct from each other. The metabolic potential (based on SEED subsystem classifications) of the Shark Bay and Highbourne Cay microbial communities were also distinct. Shark Bay metagenomes have a metabolic pathway profile consisting of both heterotrophic and photosynthetic pathways, whereas Highbourne Cay appears to be dominated almost exclusively by photosynthetic pathways. Alternative non-rubisco-based carbon metabolism including reductive TCA cycle and 3-hydroxypropionate/4-hydroxybutyrate pathways is highly represented in Shark Bay metagenomes while not represented in Highbourne Cay microbial mats or any other mat forming ecosystems investigated to date. Potentially novel aspects of nitrogen cycling were also observed, as well as putative heavy metal cycling (arsenic, mercury, copper and cadmium). Finally, archaea are highly represented in Shark Bay and may have critical roles in overall ecosystem function in these modern microbial mats.

  17. Unravelling core microbial metabolisms in the hypersaline microbial mats of Shark Bay using high-throughput metagenomics

    PubMed Central

    Ruvindy, Rendy; White III, Richard Allen; Neilan, Brett Anthony; Burns, Brendan Paul

    2016-01-01

    Modern microbial mats are potential analogues of some of Earth's earliest ecosystems. Excellent examples can be found in Shark Bay, Australia, with mats of various morphologies. To further our understanding of the functional genetic potential of these complex microbial ecosystems, we conducted for the first time shotgun metagenomic analyses. We assembled metagenomic next-generation sequencing data to classify the taxonomic and metabolic potential across diverse morphologies of marine mats in Shark Bay. The microbial community across taxonomic classifications using protein-coding and small subunit rRNA genes directly extracted from the metagenomes suggests that three phyla Proteobacteria, Cyanobacteria and Bacteriodetes dominate all marine mats. However, the microbial community structure between Shark Bay and Highbourne Cay (Bahamas) marine systems appears to be distinct from each other. The metabolic potential (based on SEED subsystem classifications) of the Shark Bay and Highbourne Cay microbial communities were also distinct. Shark Bay metagenomes have a metabolic pathway profile consisting of both heterotrophic and photosynthetic pathways, whereas Highbourne Cay appears to be dominated almost exclusively by photosynthetic pathways. Alternative non-rubisco-based carbon metabolism including reductive TCA cycle and 3-hydroxypropionate/4-hydroxybutyrate pathways is highly represented in Shark Bay metagenomes while not represented in Highbourne Cay microbial mats or any other mat forming ecosystems investigated to date. Potentially novel aspects of nitrogen cycling were also observed, as well as putative heavy metal cycling (arsenic, mercury, copper and cadmium). Finally, archaea are highly represented in Shark Bay and may have critical roles in overall ecosystem function in these modern microbial mats. PMID:26023869

  18. The Metagenome of Utricularia gibba's Traps: Into the Microbial Input to a Carnivorous Plant.

    PubMed

    Alcaraz, Luis David; Martínez-Sánchez, Shamayim; Torres, Ignacio; Ibarra-Laclette, Enrique; Herrera-Estrella, Luis

    2016-01-01

    The genome and transcriptome sequences of the aquatic, rootless, and carnivorous plant Utricularia gibba L. (Lentibulariaceae), were recently determined. Traps are necessary for U. gibba because they help the plant to survive in nutrient-deprived environments. The U. gibba's traps (Ugt) are specialized structures that have been proposed to selectively filter microbial inhabitants. To determine whether the traps indeed have a microbiome that differs, in composition or abundance, from the microbiome in the surrounding environment, we used whole-genome shotgun (WGS) metagenomics to describe both the taxonomic and functional diversity of the Ugt microbiome. We collected U. gibba plants from their natural habitat and directly sequenced the metagenome of the Ugt microbiome and its surrounding water. The total predicted number of species in the Ugt was more than 1,100. Using pan-genome fragment recruitment analysis, we were able to identify to the species level of some key Ugt players, such as Pseudomonas monteilii. Functional analysis of the Ugt metagenome suggests that the trap microbiome plays an important role in nutrient scavenging and assimilation while complementing the hydrolytic functions of the plant.

  19. Construction and validation of metagenomic DNA libraries from landfarm soil microorganisms.

    PubMed

    Pessoa, T B A; de Souza, S S; Cerqueira, A F; Rezende, R P; Pirovani, C P; Dias, J C T

    2013-06-28

    Landfarming biodegradation is a strategy used by the petrochemical industry to reduce pollutants in petroleum-contaminated soil. We constructed 2 metagenomic libraries from landfarming soil in order to determine the pathway used for mineralization of benzene and to examine protein expression of the bacteria in these soils. The DNA of landfarm soil, collected from Ilhéus, BA, Brazil, was extracted and a metagenomic library was constructed with the Copy Control(TM) Fosmid Library Production Kit, which clones 25-45-kb DNA fragments. The clones were selected for their ability to express enzymes capable of cleaving aromatic compounds. These clones were grown in Luria-Bertani broth plus L-arabinose, benzene, and chloramphenicol as induction substances; they were tested for activity in the catechol cleavage pathway, an intermediate step in benzene degradation. Nine clones were positive for ortho-cleavage and one was positive for meta-cleavage. Protein band patterns determined by SDS-polyacrylamide gel electrophoresis differed in bacteria grown on induced versus non-induced media (Luria-Bertani broth). We concluded that the DNA of landfarm soil is an important source of genes involved in mineralization of xenobiotic compounds, which are common in gasoline and oil spills. Metagenomic library allows identification of non-culturable microorganisms that have potential in the bioremediation of contaminated sites.

  20. Preparation of high-molecular weight DNA and metagenomic libraries from soils and hot springs.

    PubMed

    Reigstad, Laila J; Bartossek, Rita; Schleper, Christa

    2011-01-01

    Metagenomics has become an important tool for the characterization of microorganisms, as it is independent of their enrichment or cultivation in the laboratory. Its application has led to the discovery of metabolisms from widespread, yet uncharacterized organisms such as the ammonia-oxidizing archaea. Different approaches ranging from the generation of short sequence reads by direct use of high-throughput sequencing technologies to the construction and sequencing of large-insert DNA libraries are being employed. For these purposes, DNA of high quality needs to be prepared from an environmental sample, which is a particular challenge for soils and sediments. Here we describe the methods used for the isolation of high-molecular weight (hmw) DNA from soil and hot spring samples, the subsequent production of large-insert metagenomic libraries, and the analysis of the resulting genomic fragments. Detailed step-by-step procedures include (1) how to isolate good-quality hmw DNA from soils and mud; (2) how to prepare the DNA for cloning; (3) how to efficiently establish, grow, pick, replicate, and store the large-insert metagenomic fosmid library; and finally, (4) how to screen the library for genes of interest.

  1. Physiological and evolutionary potential of microorganisms from the Canterbury Basin subseafloor, a metagenomic approach.

    PubMed

    Gaboyer, Frédéric; Burgaud, Gaëtan; Alain, Karine

    2015-05-01

    Subseafloor sediments represent a large reservoir of organic matter and are inhabited by microbial groups of the three domains of life. Besides impacting the planetary geochemical cycles, the subsurface biosphere remains poorly understood, notably questions related to possible metabolic pathways and selective advantages that may be deployed by buried microorganisms (sporulation, response to stress, dormancy). In order to better understand physiological potentials and possible lifestyles of subseafloor microbial communities, we analyzed two metagenomes from subseafloor sediments collected at 31 mbsf (meters below the sea floor) and 136 mbsf in the Canterbury Basin. Metagenomic phylogenetic and functional diversities were very similar. Phylogenetic diversity was mostly represented by Chloroflexi, Firmicutes and Proteobacteria for Bacteria and by Thaumarchaeota and Euryarchaeota for Archaea. Predicted anaerobic metabolisms encompassed fermentation, methanogenesis and utilization of fatty acids, aromatic and halogenated substrates. Potential processes that may confer selective advantages for subsurface microorganisms included sporulation, detoxication equipment or osmolyte accumulation. Annotation of genomic fragments described the metabolic versatility of Chloroflexi, Miscellaneous Crenarchaeotic Group and Euryarchaeota and showed frequent recombination events within subsurface taxa. This study confirmed that the subseafloor habitat is unique compared to other habitats at the (meta)-genomic level and described physiological potential of still uncultured groups.

  2. Multisubstrate Isotope Labeling and Metagenomic Analysis of Active Soil Bacterial Communities

    PubMed Central

    Verastegui, Y.; Cheng, J.; Engel, K.; Kolczynski, D.; Mortimer, S.; Lavigne, J.; Montalibet, J.; Romantsov, T.; Hall, M.; McConkey, B. J.; Rose, D. R.; Tomashek, J. J.; Scott, B. R.

    2014-01-01

    ABSTRACT Soil microbial diversity represents the largest global reservoir of novel microorganisms and enzymes. In this study, we coupled functional metagenomics and DNA stable-isotope probing (DNA-SIP) using multiple plant-derived carbon substrates and diverse soils to characterize active soil bacterial communities and their glycoside hydrolase genes, which have value for industrial applications. We incubated samples from three disparate Canadian soils (tundra, temperate rainforest, and agricultural) with five native carbon (12C) or stable-isotope-labeled (13C) carbohydrates (glucose, cellobiose, xylose, arabinose, and cellulose). Indicator species analysis revealed high specificity and fidelity for many uncultured and unclassified bacterial taxa in the heavy DNA for all soils and substrates. Among characterized taxa, Actinomycetales (Salinibacterium), Rhizobiales (Devosia), Rhodospirillales (Telmatospirillum), and Caulobacterales (Phenylobacterium and Asticcacaulis) were bacterial indicator species for the heavy substrates and soils tested. Both Actinomycetales and Caulobacterales (Phenylobacterium) were associated with metabolism of cellulose, and Alphaproteobacteria were associated with the metabolism of arabinose; members of the order Rhizobiales were strongly associated with the metabolism of xylose. Annotated metagenomic data suggested diverse glycoside hydrolase gene representation within the pooled heavy DNA. By screening 2,876 cloned fragments derived from the 13C-labeled DNA isolated from soils incubated with cellulose, we demonstrate the power of combining DNA-SIP, multiple-displacement amplification (MDA), and functional metagenomics by efficiently isolating multiple clones with activity on carboxymethyl cellulose and fluorogenic proxy substrates for carbohydrate-active enzymes. PMID:25028422

  3. The Metagenome of Utricularia gibba's Traps: Into the Microbial Input to a Carnivorous Plant

    PubMed Central

    Alcaraz, Luis David; Martínez-Sánchez, Shamayim; Torres, Ignacio; Ibarra-Laclette, Enrique; Herrera-Estrella, Luis

    2016-01-01

    The genome and transcriptome sequences of the aquatic, rootless, and carnivorous plant Utricularia gibba L. (Lentibulariaceae), were recently determined. Traps are necessary for U. gibba because they help the plant to survive in nutrient-deprived environments. The U. gibba's traps (Ugt) are specialized structures that have been proposed to selectively filter microbial inhabitants. To determine whether the traps indeed have a microbiome that differs, in composition or abundance, from the microbiome in the surrounding environment, we used whole-genome shotgun (WGS) metagenomics to describe both the taxonomic and functional diversity of the Ugt microbiome. We collected U. gibba plants from their natural habitat and directly sequenced the metagenome of the Ugt microbiome and its surrounding water. The total predicted number of species in the Ugt was more than 1,100. Using pan-genome fragment recruitment analysis, we were able to identify to the species level of some key Ugt players, such as Pseudomonas monteilii. Functional analysis of the Ugt metagenome suggests that the trap microbiome plays an important role in nutrient scavenging and assimilation while complementing the hydrolytic functions of the plant. PMID:26859489

  4. Accurate read-based metagenome characterization using a hierarchical suite of unique signatures

    PubMed Central

    Freitas, Tracey Allen K.; Li, Po-E; Scholz, Matthew B.; Chain, Patrick S. G.

    2015-01-01

    A major challenge in the field of shotgun metagenomics is the accurate identification of organisms present within a microbial community, based on classification of short sequence reads. Though existing microbial community profiling methods have attempted to rapidly classify the millions of reads output from modern sequencers, the combination of incomplete databases, similarity among otherwise divergent genomes, errors and biases in sequencing technologies, and the large volumes of sequencing data required for metagenome sequencing has led to unacceptably high false discovery rates (FDR). Here, we present the application of a novel, gene-independent and signature-based metagenomic taxonomic profiling method with significantly and consistently smaller FDR than any other available method. Our algorithm circumvents false positives using a series of non-redundant signature databases and examines Genomic Origins Through Taxonomic CHAllenge (GOTTCHA). GOTTCHA was tested and validated on 20 synthetic and mock datasets ranging in community composition and complexity, was applied successfully to data generated from spiked environmental and clinical samples, and robustly demonstrates superior performance compared with other available tools. PMID:25765641

  5. Metagenomic Analysis Suggests Modern Freshwater Microbialites Harbor a Distinct Core Microbial Community

    PubMed Central

    White, Richard Allen; Chan, Amy M.; Gavelis, Gregory S.; Leander, Brian S.; Brady, Allyson L.; Slater, Gregory F.; Lim, Darlene S. S.; Suttle, Curtis A.

    2016-01-01

    Modern microbialites are complex microbial communities that interface with abiotic factors to form carbonate-rich organosedimentary structures whose ancestors provide the earliest evidence of life. Past studies primarily on marine microbialites have inventoried diverse taxa and metabolic pathways, but it is unclear which of these are members of the microbialite community and which are introduced from adjacent environments. Here we control for these factors by sampling the surrounding water and nearby sediment, in addition to the microbialites and use a metagenomics approach to interrogate the microbial community. Our findings suggest that the Pavilion Lake microbialite community profile, metabolic potential and pathway distributions are distinct from those in the neighboring sediments and water. Based on RefSeq classification, members of the Proteobacteria (e.g., alpha and delta classes) were the dominant taxa in the microbialites, and possessed novel functional guilds associated with the metabolism of heavy metals, antibiotic resistance, primary alcohol biosynthesis and urea metabolism; the latter may help drive biomineralization. Urea metabolism within Pavilion Lake microbialites is a feature not previously associated in other microbialites. The microbialite communities were also significantly enriched for cyanobacteria and acidobacteria, which likely play an important role in biomineralization. Additional findings suggest that Pavilion Lake microbialites are under viral selection as genes associated with viral infection (e.g CRISPR-Cas, phage shock and phage excision) are abundant within the microbialite metagenomes. The morphology of Pavilion Lake microbialites changes dramatically with depth; yet, metagenomic data did not vary significantly by morphology or depth, indicating that microbialite morphology is altered by other factors, perhaps transcriptional differences or abiotic conditions. This work provides a comprehensive metagenomic perspective of the

  6. Metagenomic studies of the Red Sea.

    PubMed

    Behzad, Hayedeh; Ibarra, Martin Augusto; Mineta, Katsuhiko; Gojobori, Takashi

    2016-02-01

    Metagenomics has significantly advanced the field of marine microbial ecology, revealing the vast diversity of previously unknown microbial life forms in different marine niches. The tremendous amount of data generated has enabled identification of a large number of microbial genes (metagenomes), their community interactions, adaptation mechanisms, and their potential applications in pharmaceutical and biotechnology-based industries. Comparative metagenomics reveals that microbial diversity is a function of the local environment, meaning that unique or unusual environments typically harbor novel microbial species with unique genes and metabolic pathways. The Red Sea has an abundance of unique characteristics; however, its microbiota is one of the least studied among marine environments. The Red Sea harbors approximately 25 hot anoxic brine pools, plus a vibrant coral reef ecosystem. Physiochemical studies describe the Red Sea as an oligotrophic environment that contains one of the warmest and saltiest waters in the world with year-round high UV radiations. These characteristics are believed to have shaped the evolution of microbial communities in the Red Sea. Over-representation of genes involved in DNA repair, high-intensity light responses, and osmoregulation were found in the Red Sea metagenomic databases suggesting acquisition of specific environmental adaptation by the Red Sea microbiota. The Red Sea brine pools harbor a diverse range of halophilic and thermophilic bacterial and archaeal communities, which are potential sources of enzymes for pharmaceutical and biotechnology-based application. Understanding the mechanisms of these adaptations and their function within the larger ecosystem could also prove useful in light of predicted global warming scenarios where global ocean temperatures are expected to rise by 1-3°C in the next few decades. In this review, we provide an overview of the published metagenomic studies that were conducted in the Red Sea, and

  7. An Experimental Metagenome Data Management and AnalysisSystem

    SciTech Connect

    Markowitz, Victor M.; Korzeniewski, Frank; Palaniappan, Krishna; Szeto, Ernest; Ivanova, Natalia N.; Kyrpides, Nikos C.; Hugenholtz, Philip

    2006-03-01

    The application of shotgun sequencing to environmental samples has revealed a new universe of microbial community genomes (metagenomes) involving previously uncultured organisms. Metagenome analysis, which is expected to provide a comprehensive picture of the gene functions and metabolic capacity of microbial community, needs to be conducted in the context of a comprehensive data management and analysis system. We present in this paper IMG/M, an experimental metagenome data management and analysis system that is based on the Integrated Microbial Genomes (IMG) system. IMG/M provides tools and viewers for analyzing both metagenomes and isolate genomes individually or in a comparative context.

  8. An ORFome assembly approach to metagenomics sequences analysis.

    PubMed

    Ye, Yuzhen; Tang, Haixu

    2009-06-01

    Metagenomics is an emerging methodology for the direct genomic analysis of a mixed community of uncultured microorganisms. The current analyses of metagenomics data largely rely on the computational tools originally designed for microbial genomics projects. The challenge of assembling metagenomic sequences arises mainly from the short reads and the high species complexity of the community. Alternatively, individual (short) reads will be searched directly against databases of known genes (or proteins) to identify homologous sequences. The latter approach may have low sensitivity and specificity in identifying homologous sequences, which may further bias the subsequent diversity analysis. In this paper, we present a novel approach to metagenomic data analysis, called Metagenomic ORFome Assembly (MetaORFA). The whole computational framework consists of three steps. Each read from a metagenomics project will first be annotated with putative open reading frames (ORFs) that likely encode proteins. Next, the predicted ORFs are assembled into a collection of peptides using an EULER assembly method. Finally, the assembled peptides (i.e. ORFome) are used for database searching of homologs and subsequent diversity analysis. We applied MetaORFA approach to several metagenomics datasets with low coverage short reads. The results show that MetaORFA can produce long peptides even when the sequence coverage of reads is extremely low. Hence, the ORFome assembly significantly increases the sensitivity of homology searching, and may potentially improve the diversity analysis of the metagenomic data. This improvement is especially useful for metagenomic projects when the genome assembly does not work because of the low sequence coverage.

  9. Genomics and metagenomics in medical microbiology.

    PubMed

    Padmanabhan, Roshan; Mishra, Ajay Kumar; Raoult, Didier; Fournier, Pierre-Edouard

    2013-12-01

    Over the last two decades, sequencing tools have evolved from laborious time-consuming methodologies to real-time detection and deciphering of genomic DNA. Genome sequencing, especially using next generation sequencing (NGS) has revolutionized the landscape of microbiology and infectious disease. This deluge of sequencing data has not only enabled advances in fundamental biology but also helped improve diagnosis, typing of pathogen, virulence and antibiotic resistance detection, and development of new vaccines and culture media. In addition, NGS also enabled efficient analysis of complex human micro-floras, both commensal, and pathological, through metagenomic methods, thus helping the comprehension and management of human diseases such as obesity. This review summarizes technological advances in genomics and metagenomics relevant to the field of medical microbiology.

  10. Insights into antibiotic resistance through metagenomic approaches.

    PubMed

    Schmieder, Robert; Edwards, Robert

    2012-01-01

    The consequences of bacterial infections have been curtailed by the introduction of a wide range of antibiotics. However, infections continue to be a leading cause of mortality, in part due to the evolution and acquisition of antibiotic-resistance genes. Antibiotic misuse and overprescription have created a driving force influencing the selection of resistance. Despite the problem of antibiotic resistance in infectious bacteria, little is known about the diversity, distribution and origins of resistance genes, especially for the unculturable majority of environmental bacteria. Functional and sequence-based metagenomics have been used for the discovery of novel resistance determinants and the improved understanding of antibiotic-resistance mechanisms in clinical and natural environments. This review discusses recent findings and future challenges in the study of antibiotic resistance through metagenomic approaches.

  11. Metagenomic sequencing of expressed prostate secretions.

    PubMed

    Smelov, Vitaly; Arroyo Mühr, L Sara; Bzhalava, Davit; Brown, Lyndon J; Komyakov, Boris; Dillner, Joakim

    2014-12-01

    To investigate which microorganisms may be present in expressed prostate secretions (EPS) metagenomic sequencing (MGS) was applied to prostate secretion samples from five men with prostatitis and five matched control men as well as to combined expressed prostate secretion and urine from six patients with prostate cancer and six matched control men. The prostate secretion samples contained a variety of bacterial sequences, mostly belonging to the Proteobacteria phylum. The combined prostate secretion and urine samples were dominated by abundant presence of the JC polyomavirus, representing >20% of all detected metagenomic sequence reads. There were also other viruses detected, for example, human papillomavirus type 81. All combined prostate secretion and urine samples were also positive for Proteobacteria. In summary, MGS of expressed prostate secretion is informative for detecting a variety of bacteria and viruses, suggesting that a more large-scale use of MGS of prostate secretions may be useful in medical and epidemiological studies of prostate infections.

  12. Metagenomics: advances in ecology and biotechnology.

    PubMed

    Steele, Helen L; Streit, Wolfgang R

    2005-06-15

    This review highlights the significant advances which have been made in prokaryotic ecology and biotechnology due to the application of metagenomic techniques. It is now possible to link processes to specific microorganisms in the environment, such as the detection of a new phototrophic process in marine bacteria, and to characterise the metabolic cooperation which takes place in mixed species biofilms. The range of prokaryote derived products available for biotechnology applications is increasing rapidly. The knowledge gained from analysis of biosynthetic pathways provides valuable information about enzymology and allows engineering of biocatalysts for specific processes. The expansion of metagenomic techniques to include alternative heterologous hosts for gene expression and the development of sophisticated assays which enable screening of thousands of clones offers the possibility to find out even more valuable information about the prokaryotic world.

  13. New Extremophilic Lipases and Esterases from Metagenomics

    PubMed Central

    López-López, Olalla; Cerdán, Maria E; González Siso, Maria I

    2014-01-01

    Lipolytic enzymes catalyze the hydrolysis of ester bonds in the presence of water. In media with low water content or in organic solvents, they can catalyze synthetic reactions such as esterification and transesterification. Lipases and esterases, in particular those from extremophilic origin, are robust enzymes, functional under the harsh conditions of industrial processes owing to their inherent thermostability and resistance towards organic solvents, which combined with their high chemo-, regio- and enantioselectivity make them very attractive biocatalysts for a variety of industrial applications. Likewise, enzymes from extremophile sources can provide additional features such as activity at extreme temperatures, extreme pH values or high salinity levels, which could be interesting for certain purposes. New lipases and esterases have traditionally been discovered by the isolation of microbial strains producing lipolytic activity. The Genome Projects Era allowed genome mining, exploiting homology with known lipases and esterases, to be used in the search for new enzymes. The Metagenomic Era meant a step forward in this field with the study of the metagenome, the pool of genomes in an environmental microbial community. Current molecular biology techniques make it possible to construct total environmental DNA libraries, including the genomes of unculturable organisms, opening a new window to a vast field of unknown enzymes with new and unique properties. Here, we review the latest advances and findings from research into new extremophilic lipases and esterases, using metagenomic approaches, and their potential industrial and biotechnological applications. PMID:24588890

  14. Generating viral metagenomes from the coral holobiont

    PubMed Central

    Wood-Charlson, Elisha M.; Suttle, Curtis A.; van Oppen, Madeleine J. H.

    2014-01-01

    Reef-building corals comprise multipartite symbioses where the cnidarian animal is host to an array of eukaryotic and prokaryotic organisms, and the viruses that infect them. These viruses are critical elements of the coral holobiont, serving not only as agents of mortality, but also as potential vectors for lateral gene flow, and as elements encoding a variety of auxiliary metabolic functions. Consequently, understanding the functioning and health of the coral holobiont requires detailed knowledge of the associated viral assemblage and its function. Currently, the most tractable way of uncovering viral diversity and function is through metagenomic approaches, which is inherently difficult in corals because of the complex holobiont community, an extracellular mucus layer that all corals secrete, and the variety of sizes and structures of nucleic acids found in viruses. Here we present the first protocol for isolating, purifying and amplifying viral nucleic acids from corals based on mechanical disruption of cells. This method produces at least 50% higher yields of viral nucleic acids, has very low levels of cellular sequence contamination and captures wider viral diversity than previously used chemical-based extraction methods. We demonstrate that our mechanical-based method profiles a greater diversity of DNA and RNA genomes, including virus groups such as Retro-transcribing and ssRNA viruses, which are absent from metagenomes generated via chemical-based methods. In addition, we briefly present (and make publically available) the first paired DNA and RNA viral metagenomes from the coral Acropora tenuis. PMID:24847321

  15. MetaPhinder-Identifying Bacteriophage Sequences in Metagenomic Data Sets.

    PubMed

    Jurtz, Vanessa Isabell; Villarroel, Julia; Lund, Ole; Voldby Larsen, Mette; Nielsen, Morten

    Bacteriophages are the most abundant biological entity on the planet, but at the same time do not account for much of the genetic material isolated from most environments due to their small genome sizes. They also show great genetic diversity and mosaic genomes making it challenging to analyze and understand them. Here we present MetaPhinder, a method to identify assembled genomic fragments (i.e.contigs) of phage origin in metagenomic data sets. The method is based on a comparison to a database of whole genome bacteriophage sequences, integrating hits to multiple genomes to accomodate for the mosaic genome structure of many bacteriophages. The method is demonstrated to out-perform both BLAST methods based on single hits and methods based on k-mer comparisons. MetaPhinder is available as a web service at the Center for Genomic Epidemiology https://cge.cbs.dtu.dk/services/MetaPhinder/, while the source code can be downloaded from https://bitbucket.org/genomicepidemiology/metaphinder or https://github.com/vanessajurtz/MetaPhinder.

  16. Technical Report: Algorithm and Implementation for Quasispecies Abundance Inference with Confidence Intervals from Metagenomic Sequence Data

    SciTech Connect

    McLoughlin, Kevin

    2016-01-11

    This report describes the design and implementation of an algorithm for estimating relative microbial abundances, together with confidence limits, using data from metagenomic DNA sequencing. For the background behind this project and a detailed discussion of our modeling approach for metagenomic data, we refer the reader to our earlier technical report, dated March 4, 2014. Briefly, we described a fully Bayesian generative model for paired-end sequence read data, incorporating the effects of the relative abundances, the distribution of sequence fragment lengths, fragment position bias, sequencing errors and variations between the sampled genomes and the nearest reference genomes. A distinctive feature of our modeling approach is the use of a Chinese restaurant process (CRP) to describe the selection of genomes to be sampled, and thus the relative abundances. The CRP component is desirable for fitting abundances to reads that may map ambiguously to multiple targets, because it naturally leads to sparse solutions that select the best representative from each set of nearly equivalent genomes.

  17. Technical Report on Modeling for Quasispecies Abundance Inference with Confidence Intervals from Metagenomic Sequence Data

    SciTech Connect

    McLoughlin, K.

    2016-01-11

    The overall aim of this project is to develop a software package, called MetaQuant, that can determine the constituents of a complex microbial sample and estimate their relative abundances by analysis of metagenomic sequencing data. The goal for Task 1 is to create a generative model describing the stochastic process underlying the creation of sequence read pairs in the data set. The stages in this generative process include the selection of a source genome sequence for each read pair, with probability dependent on its abundance in the sample. The other stages describe the evolution of the source genome from its nearest common ancestor with a reference genome, breakage of the source DNA into short fragments, and the errors in sequencing the ends of the fragments to produce read pairs.

  18. Metagenomic Assembly Reveals Hosts of Antibiotic Resistance Genes and the Shared Resistome in Pig, Chicken, and Human Feces.

    PubMed

    Ma, Liping; Xia, Yu; Li, Bing; Yang, Ying; Li, Li-Guan; Tiedje, James M; Zhang, Tong

    2016-01-05

    The risk associated with antibiotic resistance disseminating from animal and human feces is an urgent public issue. In the present study, we sought to establish a pipeline for annotating antibiotic resistance genes (ARGs) based on metagenomic assembly to investigate ARGs and their co-occurrence with associated genetic elements. Genetic elements found on the assembled genomic fragments include mobile genetic elements (MGEs) and metal resistance genes (MRGs). We then explored the hosts of these resistance genes and the shared resistome of pig, chicken and human fecal samples. High levels of tetracycline, multidrug, erythromycin, and aminoglycoside resistance genes were discovered in these fecal samples. In particular, significantly high level of ARGs (7762 ×/Gb) was detected in adult chicken feces, indicating higher ARG contamination level than other fecal samples. Many ARGs arrangements (e.g., macA-macB and tetA-tetR) were discovered shared by chicken, pig and human feces. In addition, MGEs such as the aadA5-dfrA17-carrying class 1 integron were identified on an assembled scaffold of chicken feces, and are carried by human pathogens. Differential coverage binning analysis revealed significant ARG enrichment in adult chicken feces. A draft genome, annotated as multidrug resistant Escherichia coli, was retrieved from chicken feces metagenomes and was determined to carry diverse ARGs (multidrug, acriflavine, and macrolide). The present study demonstrates the determination of ARG hosts and the shared resistome from metagenomic data sets and successfully establishes the relationship between ARGs, hosts, and environments. This ARG annotation pipeline based on metagenomic assembly will help to bridge the knowledge gaps regarding ARG-associated genes and ARG hosts with metagenomic data sets. Moreover, this pipeline will facilitate the evaluation of environmental risks in the genetic context of ARGs.

  19. Metagenomics: Retrospect and Prospects in High Throughput Age

    PubMed Central

    Kumar, Satish; Krishnani, Kishore Kumar; Bhushan, Bharat; Brahmane, Manoj Pandit

    2015-01-01

    In recent years, metagenomics has emerged as a powerful tool for mining of hidden microbial treasure in a culture independent manner. In the last two decades, metagenomics has been applied extensively to exploit concealed potential of microbial communities from almost all sorts of habitats. A brief historic progress made over the period is discussed in terms of origin of metagenomics to its current state and also the discovery of novel biological functions of commercial importance from metagenomes of diverse habitats. The present review also highlights the paradigm shift of metagenomics from basic study of community composition to insight into the microbial community dynamics for harnessing the full potential of uncultured microbes with more emphasis on the implication of breakthrough developments, namely, Next Generation Sequencing, advanced bioinformatics tools, and systems biology. PMID:26664751

  20. Comparative Viral Metagenomics of Environmental Samples from Korea

    PubMed Central

    Kim, Min-Soo; Whon, Tae Woong

    2013-01-01

    The introduction of metagenomics into the field of virology has facilitated the exploration of viral communities in various natural habitats. Understanding the viral ecology of a variety of sample types throughout the biosphere is important per se, but it also has potential applications in clinical and diagnostic virology. However, the procedures used by viral metagenomics may produce technical errors, such as amplification bias, while public viral databases are very limited, which may hamper the determination of the viral diversity in samples. This review considers the current state of viral metagenomics, based on examples from Korean viral metagenomic studies-i.e., rice paddy soil, fermented foods, human gut, seawater, and the near-surface atmosphere. Viral metagenomics has become widespread due to various methodological developments, and much attention has been focused on studies that consider the intrinsic role of viruses that interact with their hosts. PMID:24124407

  1. Metagenomics and Bioinformatics in Microbial Ecology: Current Status and Beyond

    PubMed Central

    Hiraoka, Satoshi; Yang, Ching-chia; Iwasaki, Wataru

    2016-01-01

    Metagenomic approaches are now commonly used in microbial ecology to study microbial communities in more detail, including many strains that cannot be cultivated in the laboratory. Bioinformatic analyses make it possible to mine huge metagenomic datasets and discover general patterns that govern microbial ecosystems. However, the findings of typical metagenomic and bioinformatic analyses still do not completely describe the ecology and evolution of microbes in their environments. Most analyses still depend on straightforward sequence similarity searches against reference databases. We herein review the current state of metagenomics and bioinformatics in microbial ecology and discuss future directions for the field. New techniques will allow us to go beyond routine analyses and broaden our knowledge of microbial ecosystems. We need to enrich reference databases, promote platforms that enable meta- or comprehensive analyses of diverse metagenomic datasets, devise methods that utilize long-read sequence information, and develop more powerful bioinformatic methods to analyze data from diverse perspectives. PMID:27383682

  2. Exploiting HPC Platforms for Metagenomics: Challenges and Opportunities (MICW - Metagenomics Informatics Challenges Workshop: 10K Genomes at a Time)

    ScienceCinema

    Canon, Shane [LBNL

    2016-07-12

    DOE JGI's Zhong Wang, chair of the High-performance Computing session, gives a brief introduction before Berkeley Lab's Shane Canon talks about "Exploiting HPC Platforms for Metagenomics: Challenges and Opportunities" at the Metagenomics Informatics Challenges Workshop held at the DOE JGI on October 12-13, 2011.

  3. Metagenomics: an inexhaustible access to nature's diversity.

    PubMed

    Langer, Martin; Gabor, Esther M; Liebeton, Klaus; Meurer, Guido; Niehaus, Frank; Schulze, Renate; Eck, Jürgen; Lorenz, Patrick

    2006-01-01

    The chemical industry has an enormous need for innovation. To save resources, energy and time, currently more and more established chemical processes are being switched to biotechnological routes. This requires white biotechnology to discover and develop novel enzymes, biocatalysts and applications. Due to a limitation in the cultivability of microbes living in certain habitats, technologies have to be established which give access to the enormous resource of uncultivated microbial diversity. Metagenomics promises to provide new and diverse enzymes and biocatalysts as well as bioactive molecules and has the potential to make industrial biotechnology an economic, sustainable success.

  4. Metagenomic approaches to identifying infectious agents.

    PubMed

    Höper, D; Mettenleiter, T C; Beer, M

    2016-04-01

    Since the advent of next-generation sequencing (NGS) technologies, the untargeted screening of samples from outbreaks for pathogen identification using metagenomics has become technically and economically feasible. However, various aspects need to be considered in order to exploit the full potential of NGS for virus discovery. Here, the authors summarise those aspects of the main steps that have a significant impact, from sample selection through sample handling and processing, as well as sequencing and finally data analysis, with a special emphasis on existing pitfalls.

  5. Size Does Matter: Application-driven Approaches for Soil Metagenomics

    PubMed Central

    Kakirde, Kavita S.; Parsley, Larissa C.; Liles, Mark R.

    2010-01-01

    Metagenomic analyses can provide extensive information on the structure, composition, and predicted gene functions of diverse environmental microbial assemblages. Each environment presents its own unique challenges to metagenomic investigation and requires a specifically designed approach to accommodate physicochemical and biotic factors unique to each environment that can pose technical hurdles and/or bias the metagenomic analyses. In particular, soils harbor an exceptional diversity of prokaryotes that are largely undescribed beyond the level of ribotype and are a potentially vast resource for natural product discovery. The successful application of a soil metagenomic approach depends on selecting the appropriate DNA extraction, purification, and if necessary, cloning methods for the intended downstream analyses. The most important technical considerations in a metagenomic study include obtaining a sufficient yield of high-purity DNA representing the targeted microorganisms within an environmental sample or enrichment and (if required) constructing a metagenomic library in a suitable vector and host. Size does matter in the context of the average insert size within a clone library or the sequence read length for a high-throughput sequencing approach. It is also imperative to select the appropriate metagenomic screening strategy to address the specific question(s) of interest, which should drive the selection of methods used in the earlier stages of a metagenomic project (e.g., DNA size, to clone or not to clone). Here, we present both the promising and problematic nature of soil metagenomics and discuss the factors that should be considered when selecting soil sampling, DNA extraction, purification, and cloning methods to implement based on the ultimate study objectives. PMID:21076656

  6. International Standards for Genomes, Transcriptomes, and Metagenomes.

    PubMed

    Mason, Christopher E; Afshinnekoo, Ebrahim; Tighe, Scott; Wu, Shixiu; Levy, Shawn

    2017-03-17

    Challenges and biases in preparing, characterizing, and sequencing DNA and RNA can have significant impacts on research in genomics across all kingdoms of life, including experiments in single-cells, RNA profiling, and metagenomics (across multiple genomes). Technical artifacts and contamination can arise at each point of sample manipulation, extraction, sequencing, and analysis. Thus, the measurement and benchmarking of these potential sources of error are of paramount importance as next-generation sequencing (NGS) projects become more global and ubiquitous. Fortunately, a variety of methods, standards, and technologies have recently emerged that improve measurements in genomics and sequencing, from the initial input material to the computational pipelines that process and annotate the data. Here we review current standards and their applications in genomics, including whole genomes, transcriptomes, mixed genomic samples (metagenomes), and the modified bases within each (epigenomes and epitranscriptomes). These standards, tools, and metrics are critical for quantifying the accuracy of NGS methods, which will be essential for robust approaches in clinical genomics and precision medicine.

  7. Identifying personal microbiomes using metagenomic codes.

    PubMed

    Franzosa, Eric A; Huang, Katherine; Meadow, James F; Gevers, Dirk; Lemon, Katherine P; Bohannan, Brendan J M; Huttenhower, Curtis

    2015-06-02

    Community composition within the human microbiome varies across individuals, but it remains unknown if this variation is sufficient to uniquely identify individuals within large populations or stable enough to identify them over time. We investigated this by developing a hitting set-based coding algorithm and applying it to the Human Microbiome Project population. Our approach defined body site-specific metagenomic codes: sets of microbial taxa or genes prioritized to uniquely and stably identify individuals. Codes capturing strain variation in clade-specific marker genes were able to distinguish among 100s of individuals at an initial sampling time point. In comparisons with follow-up samples collected 30-300 d later, ∼30% of individuals could still be uniquely pinpointed using metagenomic codes from a typical body site; coincidental (false positive) matches were rare. Codes based on the gut microbiome were exceptionally stable and pinpointed >80% of individuals. The failure of a code to match its owner at a later time point was largely explained by the loss of specific microbial strains (at current limits of detection) and was only weakly associated with the length of the sampling interval. In addition to highlighting patterns of temporal variation in the ecology of the human microbiome, this work demonstrates the feasibility of microbiome-based identifiability-a result with important ethical implications for microbiome study design. The datasets and code used in this work are available for download from huttenhower.sph.harvard.edu/idability.

  8. Back to the future of soil metagenomics

    DOE PAGES

    Nesme, Joseph; Achouak, Wafa; Agathos, Spiros N.; ...

    2016-02-10

    Here, direct extraction and characterization of microbial community DNA through PCR amplicon surveys and metagenomics has revolutionized the study of environmental microbiology and microbial ecology. In particular, metagenomic analysis of nucleic acids provides direct access to the genomes of the “uncultivated majority.” Accelerated by advances in sequencing technology, microbiologists have discovered more novel phyla, classes, genera, and genes from microorganisms in the first decade and a half of the twenty-first century than since these “many very little living animalcules” were first discovered by van Leeuwenhoek (Table 1). The unsurpassed diversity of soils promises continued exploration of a range of industrial,more » agricultural, and environmental functions. The ability to explore soil microbial communities with increasing capacity offers the highest promise for answering many outstanding who, what, where, when, why, and with whom questions such as: Which microorganisms are linked to which soil habitats? How do microbial abundances change with changing edaphic conditions? How do microbial assemblages interact and influence one another synergistically or antagonistically? What is the full extent of soil microbial diversity, both functionally and phylogenetically? What are the dynamics of microbial communities in space and time? How sensitive are microbial communities to a changing climate? What is the role of horizontal gene transfer in the stability of microbial communities? Do highly diverse microbial communities confer resistance and resilience in soils?« less

  9. Back to the future of soil metagenomics

    SciTech Connect

    Nesme, Joseph; Achouak, Wafa; Agathos, Spiros N.; Bailey, Mark; Baldrian, Petr; Brunel, Dominique; Frostegard, Asa; Heulin, Thierry; Jansson, Janet K.; Jurkevitch, Edouard; Kruus, Kristiina L.; Kowalchuk, George A.; Lagares, Antonio; Lappin-Scott, Hilary M.; Lemanceau, Philippe; Le Paslier, Denis; Mandic-Mulec, Ines; Murrell, J. Colin; Myrold, David D.; Nalin, Renaud; Nannipieri, Paolo; Neufeld, Josh D.; O'Gara, Fergal; Parnell, John J.; Puhler, Alfred; Pylro, Victor; Roesch, Luiz F. W.; Schloter, Michael; Schleper, Christa; Sczyrba, Alexander; Sessitsch, Angela; Sjoling, Sara; Sorensen, Jan; Sorensen, Soren J.; Tebbe, Christoph C.; Topp, Edward; Tsiamis, George; van Elsas, Jan Dirk; van Keulen, Geertje; Widmer, Franco; Wagner, Michael; Zhang, Tong; Zhang, Xiaojun; Zhao, Liping; Zhu, Yong -Guan; Vogel, Timothy M.; Simonet, Pascal

    2016-02-10

    Here, direct extraction and characterization of microbial community DNA through PCR amplicon surveys and metagenomics has revolutionized the study of environmental microbiology and microbial ecology. In particular, metagenomic analysis of nucleic acids provides direct access to the genomes of the “uncultivated majority.” Accelerated by advances in sequencing technology, microbiologists have discovered more novel phyla, classes, genera, and genes from microorganisms in the first decade and a half of the twenty-first century than since these “many very little living animalcules” were first discovered by van Leeuwenhoek (Table 1). The unsurpassed diversity of soils promises continued exploration of a range of industrial, agricultural, and environmental functions. The ability to explore soil microbial communities with increasing capacity offers the highest promise for answering many outstanding who, what, where, when, why, and with whom questions such as: Which microorganisms are linked to which soil habitats? How do microbial abundances change with changing edaphic conditions? How do microbial assemblages interact and influence one another synergistically or antagonistically? What is the full extent of soil microbial diversity, both functionally and phylogenetically? What are the dynamics of microbial communities in space and time? How sensitive are microbial communities to a changing climate? What is the role of horizontal gene transfer in the stability of microbial communities? Do highly diverse microbial communities confer resistance and resilience in soils?

  10. Fizzy. Feature subset selection for metagenomics

    DOE PAGES

    Ditzler, Gregory; Morrison, J. Calvin; Lan, Yemin; ...

    2015-11-04

    Background: Some of the current software tools for comparative metagenomics provide ecologists with the ability to investigate and explore bacterial communities using α– & β–diversity. Feature subset selection – a sub-field of machine learning – can also provide a unique insight into the differences between metagenomic or 16S phenotypes. In particular, feature subset selection methods can obtain the operational taxonomic units (OTUs), or functional features, that have a high-level of influence on the condition being studied. For example, in a previous study we have used information-theoretic feature selection to understand the differences between protein family abundances that best discriminate betweenmore » age groups in the human gut microbiome. Results: We have developed a new Python command line tool, which is compatible with the widely adopted BIOM format, for microbial ecologists that implements information-theoretic subset selection methods for biological data formats. We demonstrate the software tools capabilities on publicly available datasets. Conclusions: We have made the software implementation of Fizzy available to the public under the GNU GPL license. The standalone implementation can be found at http://github.com/EESI/Fizzy.« less

  11. Genovo: De Novo Assembly for Metagenomes

    NASA Astrophysics Data System (ADS)

    Laserson, Jonathan; Jojic, Vladimir; Koller, Daphne

    Next-generation sequencing technologies produce a large number of noisy reads from the DNA in a sample. Metagenomics and population sequencing aim to recover the genomic sequences of the species in the sample, which could be of high diversity. Methods geared towards single sequence reconstruction are not sensitive enough when applied in this setting. We introduce a generative probabilistic model of read generation from environmental samples and present Genovo, a novel de novo sequence assembler that discovers likely sequence reconstructions under the model. A Chinese restaurant process prior accounts for the unknown number of genomes in the sample. Inference is made by applying a series of hill-climbing steps iteratively until convergence. We compare the performance of Genovo to three other short read assembly programs across one synthetic dataset and eight metagenomic datasets created using the 454 platform, the largest of which has 311k reads. Genovo's reconstructions cover more bases and recover more genes than the other methods, and yield a higher assembly score.

  12. International Standards for Genomes, Transcriptomes, and Metagenomes

    PubMed Central

    Mason, Christopher E.; Afshinnekoo, Ebrahim; Tighe, Scott; Wu, Shixiu; Levy, Shawn

    2017-01-01

    Challenges and biases in preparing, characterizing, and sequencing DNA and RNA can have significant impacts on research in genomics across all kingdoms of life, including experiments in single-cells, RNA profiling, and metagenomics (across multiple genomes). Technical artifacts and contamination can arise at each point of sample manipulation, extraction, sequencing, and analysis. Thus, the measurement and benchmarking of these potential sources of error are of paramount importance as next-generation sequencing (NGS) projects become more global and ubiquitous. Fortunately, a variety of methods, standards, and technologies have recently emerged that improve measurements in genomics and sequencing, from the initial input material to the computational pipelines that process and annotate the data. Here we review current standards and their applications in genomics, including whole genomes, transcriptomes, mixed genomic samples (metagenomes), and the modified bases within each (epigenomes and epitranscriptomes). These standards, tools, and metrics are critical for quantifying the accuracy of NGS methods, which will be essential for robust approaches in clinical genomics and precision medicine. PMID:28337071

  13. The cystic fibrosis lower airways microbial metagenome

    PubMed Central

    Moran Losada, Patricia; Chouvarine, Philippe; Dorda, Marie; Hedtfeld, Silke; Mielke, Samira; Schulz, Angela; Wiehlmann, Lutz

    2016-01-01

    Chronic airway infections determine most morbidity in people with cystic fibrosis (CF). Herein, we present unbiased quantitative data about the frequency and abundance of DNA viruses, archaea, bacteria, moulds and fungi in CF lower airways. Induced sputa were collected on several occasions from children, adolescents and adults with CF. Deep sputum metagenome sequencing identified, on average, approximately 10 DNA viruses or fungi and several hundred bacterial taxa. The metagenome of a CF patient was typically found to be made up of an individual signature of multiple, lowly abundant species superimposed by few disease-associated pathogens, such as Pseudomonas aeruginosa and Staphylococcus aureus, as major components. The host-associated signatures ranged from inconspicuous polymicrobial communities in healthy subjects to low-complexity microbiomes dominated by the typical CF pathogens in patients with advanced lung disease. The DNA virus community in CF lungs mainly consisted of phages and occasionally of human pathogens, such as adeno- and herpesviruses. The S. aureus and P. aeruginosa populations were composed of one major and numerous minor clone types. The rare clones constitute a low copy genetic resource that could rapidly expand as a response to habitat alterations, such as antimicrobial chemotherapy or invasion of novel microbes. PMID:27730195

  14. The cystic fibrosis lower airways microbial metagenome.

    PubMed

    Moran Losada, Patricia; Chouvarine, Philippe; Dorda, Marie; Hedtfeld, Silke; Mielke, Samira; Schulz, Angela; Wiehlmann, Lutz; Tümmler, Burkhard

    2016-04-01

    Chronic airway infections determine most morbidity in people with cystic fibrosis (CF). Herein, we present unbiased quantitative data about the frequency and abundance of DNA viruses, archaea, bacteria, moulds and fungi in CF lower airways. Induced sputa were collected on several occasions from children, adolescents and adults with CF. Deep sputum metagenome sequencing identified, on average, approximately 10 DNA viruses or fungi and several hundred bacterial taxa. The metagenome of a CF patient was typically found to be made up of an individual signature of multiple, lowly abundant species superimposed by few disease-associated pathogens, such as Pseudomonas aeruginosa and Staphylococcus aureus, as major components. The host-associated signatures ranged from inconspicuous polymicrobial communities in healthy subjects to low-complexity microbiomes dominated by the typical CF pathogens in patients with advanced lung disease. The DNA virus community in CF lungs mainly consisted of phages and occasionally of human pathogens, such as adeno- and herpesviruses. The S. aureus and P. aeruginosa populations were composed of one major and numerous minor clone types. The rare clones constitute a low copy genetic resource that could rapidly expand as a response to habitat alterations, such as antimicrobial chemotherapy or invasion of novel microbes.

  15. Fizzy. Feature subset selection for metagenomics

    SciTech Connect

    Ditzler, Gregory; Morrison, J. Calvin; Lan, Yemin; Rosen, Gail L.

    2015-11-04

    Background: Some of the current software tools for comparative metagenomics provide ecologists with the ability to investigate and explore bacterial communities using α– & β–diversity. Feature subset selection – a sub-field of machine learning – can also provide a unique insight into the differences between metagenomic or 16S phenotypes. In particular, feature subset selection methods can obtain the operational taxonomic units (OTUs), or functional features, that have a high-level of influence on the condition being studied. For example, in a previous study we have used information-theoretic feature selection to understand the differences between protein family abundances that best discriminate between age groups in the human gut microbiome. Results: We have developed a new Python command line tool, which is compatible with the widely adopted BIOM format, for microbial ecologists that implements information-theoretic subset selection methods for biological data formats. We demonstrate the software tools capabilities on publicly available datasets. Conclusions: We have made the software implementation of Fizzy available to the public under the GNU GPL license. The standalone implementation can be found at http://github.com/EESI/Fizzy.

  16. Evaluation of the Cow Rumen Metagenome; Assembly by Single Copy Gene Analysis and Single Cell Genome Assemblies(Metagenomics Informatics Challenges Workshop: 10K Genomes at a Time)

    SciTech Connect

    Sczyrba, Alex

    2011-10-13

    DOE JGI's Alex Sczyrba on "Evaluation of the Cow Rumen Metagenome" and "Assembly by Single Copy Gene Analysis and Single Cell Genome Assemblies" at the Metagenomics Informatics Challenges Workshop held at the DOE JGI on October 12-13, 2011.

  17. Evaluation of the Cow Rumen Metagenome; Assembly by Single Copy Gene Analysis and Single Cell Genome Assemblies(Metagenomics Informatics Challenges Workshop: 10K Genomes at a Time)

    ScienceCinema

    Sczyrba, Alex [DOE JGI

    2016-07-12

    DOE JGI's Alex Sczyrba on "Evaluation of the Cow Rumen Metagenome" and "Assembly by Single Copy Gene Analysis and Single Cell Genome Assemblies" at the Metagenomics Informatics Challenges Workshop held at the DOE JGI on October 12-13, 2011.

  18. Effective Analysis of NGS Metagenomic Data with Ultra-Fast Clustering Algorithms (MICW - Metagenomics Informatics Challenges Workshop: 10K Genomes at a Time)

    ScienceCinema

    Li, Weizhong [San Diego Supercomputer Center

    2016-07-12

    San Diego Supercomputer Center's Weizhong Li on "Effective Analysis of NGS Metagenomic Data with Ultra-fast Clustering Algorithms" at the Metagenomics Informatics Challenges Workshop held at the DOE JGI on October 12-13, 2011.

  19. MetaVelvet: An Extension of Velvet Assembler to de novo Metagenome Assembly from Short Sequence Reads (Metagenomics Informatics Challenges Workshop: 10K Genomes at a Time)

    ScienceCinema

    Sakakibara, Yasumbumi [Keio University

    2016-07-12

    Keio University's Yasumbumi Sakakibara on "MetaVelvet: An Extension of Velvet Assembler to de novo Metagenome Assembly from Short Sequence Reads" at the Metagenomics Informatics Challenges Workshop held at the DOE JGI on October 12-13, 2011.

  20. Bioprospecting potential of the soil metagenome: novel enzymes and bioactivities.

    PubMed

    Lee, Myung Hwan; Lee, Seon-Woo

    2013-09-01

    The microbial diversity in soil ecosystems is higher than in any other microbial ecosystem. The majority of soil microorganisms has not been characterized, because the dominant members have not been readily culturable on standard cultivation media; therefore, the soil ecosystem is a great reservoir for the discovery of novel microbial enzymes and bioactivities. The soil metagenome, the collective microbial genome, could be cloned and sequenced directly from soils to search for novel microbial resources. This review summarizes the microbial diversity in soils and the efforts to search for microbial resources from the soil metagenome, with more emphasis on the potential of bioprospecting metagenomics and recent discoveries.

  1. Activity-Based Screening of Metagenomic Libraries for Hydrogenase Enzymes.

    PubMed

    Adam, Nicole; Perner, Mirjam

    2017-01-01

    Here we outline how to identify hydrogenase enzymes from metagenomic libraries through an activity-based screening approach. A metagenomic fosmid library is constructed in E. coli and the fosmids are transferred into a hydrogenase deletion mutant of Shewanella oneidensis (ΔhyaB) via triparental mating. If a fosmid exhibits hydrogen uptake activity, S. oneidensis' phenotype is restored and hydrogenase activity is indicated by a color change of the medium from yellow to colorless. This new method enables screening of 48 metagenomic fosmid clones in parallel.

  2. Machine Learning Meta-analysis of Large Metagenomic Datasets: Tools and Biological Insights.

    PubMed

    Pasolli, Edoardo; Truong, Duy Tin; Malik, Faizan; Waldron, Levi; Segata, Nicola

    2016-07-01

    Shotgun metagenomic analysis of the human associated microbiome provides a rich set of microbial features for prediction and biomarker discovery in the context of human diseases and health conditions. However, the use of such high-resolution microbial features presents new challenges, and validated computational tools for learning tasks are lacking. Moreover, classification rules have scarcely been validated in independent studies, posing questions about the generality and generalization of disease-predictive models across cohorts. In this paper, we comprehensively assess approaches to metagenomics-based prediction tasks and for quantitative assessment of the strength of potential microbiome-phenotype associations. We develop a computational framework for prediction tasks using quantitative microbiome profiles, including species-level relative abundances and presence of strain-specific markers. A comprehensive meta-analysis, with particular emphasis on generalization across cohorts, was performed in a collection of 2424 publicly available metagenomic samples from eight large-scale studies. Cross-validation revealed good disease-prediction capabilities, which were in general improved by feature selection and use of strain-specific markers instead of species-level taxonomic abundance. In cross-study analysis, models transferred between studies were in some cases less accurate than models tested by within-study cross-validation. Interestingly, the addition of healthy (control) samples from other studies to training sets improved disease prediction capabilities. Some microbial species (most notably Streptococcus anginosus) seem to characterize general dysbiotic states of the microbiome rather than connections with a specific disease. Our results in modelling features of the "healthy" microbiome can be considered a first step toward defining general microbial dysbiosis. The software framework, microbiome profiles, and metadata for thousands of samples are publicly

  3. Machine Learning Meta-analysis of Large Metagenomic Datasets: Tools and Biological Insights

    PubMed Central

    Pasolli, Edoardo; Truong, Duy Tin; Malik, Faizan; Waldron, Levi

    2016-01-01

    Shotgun metagenomic analysis of the human associated microbiome provides a rich set of microbial features for prediction and biomarker discovery in the context of human diseases and health conditions. However, the use of such high-resolution microbial features presents new challenges, and validated computational tools for learning tasks are lacking. Moreover, classification rules have scarcely been validated in independent studies, posing questions about the generality and generalization of disease-predictive models across cohorts. In this paper, we comprehensively assess approaches to metagenomics-based prediction tasks and for quantitative assessment of the strength of potential microbiome-phenotype associations. We develop a computational framework for prediction tasks using quantitative microbiome profiles, including species-level relative abundances and presence of strain-specific markers. A comprehensive meta-analysis, with particular emphasis on generalization across cohorts, was performed in a collection of 2424 publicly available metagenomic samples from eight large-scale studies. Cross-validation revealed good disease-prediction capabilities, which were in general improved by feature selection and use of strain-specific markers instead of species-level taxonomic abundance. In cross-study analysis, models transferred between studies were in some cases less accurate than models tested by within-study cross-validation. Interestingly, the addition of healthy (control) samples from other studies to training sets improved disease prediction capabilities. Some microbial species (most notably Streptococcus anginosus) seem to characterize general dysbiotic states of the microbiome rather than connections with a specific disease. Our results in modelling features of the “healthy” microbiome can be considered a first step toward defining general microbial dysbiosis. The software framework, microbiome profiles, and metadata for thousands of samples are publicly

  4. Tales from the crypt and coral reef: the successes and challenges of identifying new herpesviruses using metagenomics

    PubMed Central

    Houldcroft, Charlotte J.; Breuer, Judith

    2015-01-01

    Herpesviruses are ubiquitous double-stranded DNA viruses infecting many animals, with the capacity to cause disease in both immunocompetent and immunocompromised hosts. Different herpesviruses have different cell tropisms, and have been detected in a diverse range of tissues and sample types. Metagenomics—encompassing viromics—analyses the nucleic acid of a tissue or other sample in an unbiased manner, making few or no prior assumptions about which viruses may be present in a sample. This approach has successfully discovered a number of novel herpesviruses. Furthermore, metagenomic analysis can identify herpesviruses with high degrees of sequence divergence from known herpesviruses and does not rely upon culturing large quantities of viral material. Metagenomics has had success in two areas of herpesvirus sequencing: firstly, the discovery of novel exogenous and endogenous herpesviruses in primates, bats and cnidarians; and secondly, in characterizing large areas of the genomes of herpesviruses previously only known from small fragments, revealing unexpected diversity. This review will discuss the successes and challenges of using metagenomics to identify novel herpesviruses, and future directions within the field. PMID:25821447

  5. Streaming fragment assignment for real-time analysis of sequencing experiments.

    PubMed

    Roberts, Adam; Pachter, Lior

    2013-01-01

    We present eXpress, a software package for efficient probabilistic assignment of ambiguously mapping sequenced fragments. eXpress uses a streaming algorithm with linear run time and constant memory use. It can determine abundances of sequenced molecules in real time and can be applied to ChIP-seq, metagenomics and other large-scale sequencing data. We demonstrate its use on RNA-seq data and show that eXpress achieves greater efficiency than other quantification methods.

  6. Parton fragmentation functions

    NASA Astrophysics Data System (ADS)

    Metz, A.; Vossen, A.

    2016-11-01

    The field of fragmentation functions of light quarks and gluons is reviewed. In addition to integrated fragmentation functions, attention is paid to the dependence of fragmentation functions on transverse momenta and on polarization degrees of freedom. Higher-twist and di-hadron fragmentation functions are considered as well. Moreover, the review covers both theoretical and experimental developments in hadron production in electron-positron annihilation, deep-inelastic lepton-nucleon scattering, and proton-proton collisions.

  7. Metagenomics of an Alkaline Hot Spring in Galicia (Spain): Microbial Diversity Analysis and Screening for Novel Lipolytic Enzymes

    PubMed Central

    López-López, Olalla; Knapik, Kamila; Cerdán, Maria-Esperanza; González-Siso, María-Isabel

    2015-01-01

    A fosmid library was constructed with the metagenomic DNA from the water of the Lobios hot spring (76°C, pH = 8.2) located in Ourense (Spain). Metagenomic sequencing of the fosmid library allowed the assembly of 9722 contigs ranging in size from 500 to 56,677 bp and spanning ~18 Mbp. 23,207 ORFs (Open Reading Frames) were predicted from the assembly. Biodiversity was explored by taxonomic classification and it revealed that bacteria were predominant, while the archaea were less abundant. The six most abundant bacterial phyla were Deinococcus-Thermus, Proteobacteria, Firmicutes, Acidobacteria, Aquificae, and Chloroflexi. Within the archaeal superkingdom, the phylum Thaumarchaeota was predominant with the dominant species “Candidatus Caldiarchaeum subterraneum.” Functional classification revealed the genes associated to one-carbon metabolism as the most abundant. Both taxonomic and functional classifications showed a mixture of different microbial metabolic patterns: aerobic and anaerobic, chemoorganotrophic and chemolithotrophic, autotrophic and heterotrophic. Remarkably, the presence of genes encoding enzymes with potential biotechnological interest, such as xylanases, galactosidases, proteases, and lipases, was also revealed in the metagenomic library. Functional screening of this library was subsequently done looking for genes encoding lipolytic enzymes. Six genes conferring lipolytic activity were identified and one was cloned and characterized. This gene was named LOB4Est and it was expressed in a yeast mesophilic host. LOB4Est codes for a novel esterase of family VIII, with sequence similarity to β-lactamases, but with unusual wide substrate specificity. When the enzyme was purified from the mesophilic host it showed half-life of 1 h and 43 min at 50°C, and maximal activity at 40°C and pH 7.5 with p-nitrophenyl-laurate as substrate. Interestingly, the enzyme retained more than 80% of maximal activity in a broad range of pH from 6.5 to 8. PMID:26635759

  8. Single Cell and Metagenomic Assemblies: Biology Drives Technical Choices and Goals (Metagenomics Informatics Challenges Workshop: 10K Genomes at a Time)

    ScienceCinema

    Stepanauskas, Ramunas [Bigelow Laboratory

    2016-07-12

    DOE JGI's Tanja Woyke, chair of the Single Cells and Metagenomes session, delivers an introduction, followed by Bigelow Laboratory's Ramunas Stepanauskas on "Single Cell and Metagenomic Assemblies: Biology Drives Technical Choices and Goals" at the Metagenomics Informatics Challenges Workshop held at the DOE JGI on October 12-13, 2011.

  9. Fragmentation Analysis - Fundamental Processes

    DTIC Science & Technology

    Wausau quartzite and anorthosite of 3.0 to 3.5 inch size were fragmented in this device. An analysis of the fragment distribution results of the drop...disc-shaped specimens of Wausau quartzite, anorthosite , and Felch marble were then fragmented with the impact pendulum device. Computer programs were

  10. False-positive results in metagenomic virus discovery: a strong case for follow-up diagnosis.

    PubMed

    Rosseel, T; Pardon, B; De Clercq, K; Ozhelvaci, O; Van Borm, S

    2014-08-01

    A viral metagenomic approach using virion enrichment, random amplification and next-generation sequencing was used to investigate an undiagnosed cluster of dairy cattle presenting with high persistent fever, unresponsive to anti-microbial and anti-inflammatory treatment, diarrhoea and redness of nose and teat. Serum and whole blood samples were taken in the predicted hyperviraemic state of an animal that a few days later presented with these clinical signs. Bioinformatics analysis of the resulting data from the DNA virus identification workflow (a total of 32 757 sequences with average read length 335 bases) initially demonstrated the presence of parvovirus-like sequences in the tested blood sample. Thorough follow-up using specific real-time RT-PCR assays targeting the detected sequence fragments confirmed the presence of these sequences in the original sample as well as in a sample of an additional animal, but a contamination with an identical genetic signature in negative extraction controls was demonstrated. Further investigation using an alternative extraction method identified a contamination of the originally used Qiagen extraction columns with parvovirus-like nucleic acids or virus particles. Although we did not find any relevant virus that could be associated with the disease, these observations clearly illustrate the importance of using a proper control strategy and follow-up diagnostic tests in any viral metagenomic study.

  11. Metagenomic discovery of novel enzymes and biosurfactants in a slaughterhouse biofilm microbial community.

    PubMed

    Thies, Stephan; Rausch, Sonja Christina; Kovacic, Filip; Schmidt-Thaler, Alexandra; Wilhelm, Susanne; Rosenau, Frank; Daniel, Rolf; Streit, Wolfgang; Pietruszka, Jörg; Jaeger, Karl-Erich

    2016-06-08

    DNA derived from environmental samples is a rich source of novel bioactive molecules. The choice of the habitat to be sampled predefines the properties of the biomolecules to be discovered due to the physiological adaptation of the microbial community to the prevailing environmental conditions. We have constructed a metagenomic library in Escherichia coli DH10b with environmental DNA (eDNA) isolated from the microbial community of a slaughterhouse drain biofilm consisting mainly of species from the family Flavobacteriaceae. By functional screening of this library we have identified several lipases, proteases and two clones (SA343 and SA354) with biosurfactant and hemolytic activities. Sequence analysis of the respective eDNA fragments and subsequent structure homology modelling identified genes encoding putative N-acyl amino acid synthases with a unique two-domain organisation. The produced biosurfactants were identified by NMR spectroscopy as N-acyltyrosines with N-myristoyltyrosine as the predominant species. Critical micelle concentration and reduction of surface tension were similar to those of chemically synthesised N-myristoyltyrosine. Furthermore, we showed that the newly isolated N-acyltyrosines exhibit antibiotic activity against various bacteria. This is the first report describing the successful application of functional high-throughput screening assays for the identification of biosurfactant producing clones within a metagenomic library.

  12. Metagenomic discovery of novel enzymes and biosurfactants in a slaughterhouse biofilm microbial community

    PubMed Central

    Thies, Stephan; Rausch, Sonja Christina; Kovacic, Filip; Schmidt-Thaler, Alexandra; Wilhelm, Susanne; Rosenau, Frank; Daniel, Rolf; Streit, Wolfgang; Pietruszka, Jörg; Jaeger, Karl-Erich

    2016-01-01

    DNA derived from environmental samples is a rich source of novel bioactive molecules. The choice of the habitat to be sampled predefines the properties of the biomolecules to be discovered due to the physiological adaptation of the microbial community to the prevailing environmental conditions. We have constructed a metagenomic library in Escherichia coli DH10b with environmental DNA (eDNA) isolated from the microbial community of a slaughterhouse drain biofilm consisting mainly of species from the family Flavobacteriaceae. By functional screening of this library we have identified several lipases, proteases and two clones (SA343 and SA354) with biosurfactant and hemolytic activities. Sequence analysis of the respective eDNA fragments and subsequent structure homology modelling identified genes encoding putative N-acyl amino acid synthases with a unique two-domain organisation. The produced biosurfactants were identified by NMR spectroscopy as N-acyltyrosines with N-myristoyltyrosine as the predominant species. Critical micelle concentration and reduction of surface tension were similar to those of chemically synthesised N-myristoyltyrosine. Furthermore, we showed that the newly isolated N-acyltyrosines exhibit antibiotic activity against various bacteria. This is the first report describing the successful application of functional high-throughput screening assays for the identification of biosurfactant producing clones within a metagenomic library. PMID:27271534

  13. RNA viral metagenome of whiteflies leads to the discovery and characterization of a whitefly-transmitted carlavirus in North America.

    PubMed

    Rosario, Karyna; Capobianco, Heather; Ng, Terry Fei Fan; Breitbart, Mya; Polston, Jane E

    2014-01-01

    Whiteflies from the Bemisia tabaci species complex have the ability to transmit a large number of plant viruses and are some of the most detrimental pests in agriculture. Although whiteflies are known to transmit both DNA and RNA viruses, most of the diversity has been recorded for the former, specifically for the Begomovirus genus. This study investigated the total diversity of DNA and RNA viruses found in whiteflies collected from a single site in Florida to evaluate if there are additional, previously undetected viral types within the B. tabaci vector. Metagenomic analysis of viral DNA extracted from the whiteflies only resulted in the detection of begomoviruses. In contrast, whiteflies contained sequences similar to RNA viruses from divergent groups, with a diversity that extends beyond currently described viruses. The metagenomic analysis of whiteflies also led to the first report of a whitefly-transmitted RNA virus similar to Cowpea mild mottle virus (CpMMV Florida) (genus Carlavirus) in North America. Further investigation resulted in the detection of CpMMV Florida in native and cultivated plants growing near the original field site of whitefly collection and determination of its experimental host range. Analysis of complete CpMMV Florida genomes recovered from whiteflies and plants suggests that the current classification criteria for carlaviruses need to be reevaluated. Overall, metagenomic analysis supports that DNA plant viruses carried by B. tabaci are dominated by begomoviruses, whereas significantly less is known about RNA viruses present in this damaging insect vector.

  14. RNA Viral Metagenome of Whiteflies Leads to the Discovery and Characterization of a Whitefly-Transmitted Carlavirus in North America

    PubMed Central

    Rosario, Karyna; Capobianco, Heather; Ng, Terry Fei Fan; Breitbart, Mya; Polston, Jane E.

    2014-01-01

    Whiteflies from the Bemisia tabaci species complex have the ability to transmit a large number of plant viruses and are some of the most detrimental pests in agriculture. Although whiteflies are known to transmit both DNA and RNA viruses, most of the diversity has been recorded for the former, specifically for the Begomovirus genus. This study investigated the total diversity of DNA and RNA viruses found in whiteflies collected from a single site in Florida to evaluate if there are additional, previously undetected viral types within the B. tabaci vector. Metagenomic analysis of viral DNA extracted from the whiteflies only resulted in the detection of begomoviruses. In contrast, whiteflies contained sequences similar to RNA viruses from divergent groups, with a diversity that extends beyond currently described viruses. The metagenomic analysis of whiteflies also led to the first report of a whitefly-transmitted RNA virus similar to Cowpea mild mottle virus (CpMMV Florida) (genus Carlavirus) in North America. Further investigation resulted in the detection of CpMMV Florida in native and cultivated plants growing near the original field site of whitefly collection and determination of its experimental host range. Analysis of complete CpMMV Florida genomes recovered from whiteflies and plants suggests that the current classification criteria for carlaviruses need to be reevaluated. Overall, metagenomic analysis supports that DNA plant viruses carried by B. tabaci are dominated by begomoviruses, whereas significantly less is known about RNA viruses present in this damaging insect vector. PMID:24466220

  15. Metagenome and Metatranscriptome Analyses Using Protein Family Profiles

    PubMed Central

    Zhong, Cuncong; Yooseph, Shibu

    2016-01-01

    Analyses of metagenome data (MG) and metatranscriptome data (MT) are often challenged by a paucity of complete reference genome sequences and the uneven/low sequencing depth of the constituent organisms in the microbial community, which respectively limit the power of reference-based alignment and de novo sequence assembly. These limitations make accurate protein family classification and abundance estimation challenging, which in turn hamper downstream analyses such as abundance profiling of metabolic pathways, identification of differentially encoded/expressed genes, and de novo reconstruction of complete gene and protein sequences from the protein family of interest. The profile hidden Markov model (HMM) framework enables the construction of very useful probabilistic models for protein families that allow for accurate modeling of position specific matches, insertions, and deletions. We present a novel homology detection algorithm that integrates banded Viterbi algorithm for profile HMM parsing with an iterative simultaneous alignment and assembly computational framework. The algorithm searches a given profile HMM of a protein family against a database of fragmentary MG/MT sequencing data and simultaneously assembles complete or near-complete gene and protein sequences of the protein family. The resulting program, HMM-GRASPx, demonstrates superior performance in aligning and assembling homologs when benchmarked on both simulated marine MG and real human saliva MG datasets. On real supragingival plaque and stool MG datasets that were generated from healthy individuals, HMM-GRASPx accurately estimates the abundances of the antimicrobial resistance (AMR) gene families and enables accurate characterization of the resistome profiles of these microbial communities. For real human oral microbiome MT datasets, using the HMM-GRASPx estimated transcript abundances significantly improves detection of differentially expressed (DE) genes. Finally, HMM-GRASPx was used to

  16. Dirichlet multinomial mixtures: generative models for microbial metagenomics.

    PubMed

    Holmes, Ian; Harris, Keith; Quince, Christopher

    2012-01-01

    We introduce Dirichlet multinomial mixtures (DMM) for the probabilistic modelling of microbial metagenomics data. This data can be represented as a frequency matrix giving the number of times each taxa is observed in each sample. The samples have different size, and the matrix is sparse, as communities are diverse and skewed to rare taxa. Most methods used previously to classify or cluster samples have ignored these features. We describe each community by a vector of taxa probabilities. These vectors are generated from one of a finite number of Dirichlet mixture components each with different hyperparameters. Observed samples are generated through multinomial sampling. The mixture components cluster communities into distinct 'metacommunities', and, hence, determine envirotypes or enterotypes, groups of communities with a similar composition. The model can also deduce the impact of a treatment and be used for classification. We wrote software for the fitting of DMM models using the 'evidence framework' (http://code.google.com/p/microbedmm/). This includes the Laplace approximation of the model evidence. We applied the DMM model to human gut microbe genera frequencies from Obese and Lean twins. From the model evidence four clusters fit this data best. Two clusters were dominated by Bacteroides and were homogenous; two had a more variable community composition. We could not find a significant impact of body mass on community structure. However, Obese twins were more likely to derive from the high variance clusters. We propose that obesity is not associated with a distinct microbiota but increases the chance that an individual derives from a disturbed enterotype. This is an example of the 'Anna Karenina principle (AKP)' applied to microbial communities: disturbed states having many more configurations than undisturbed. We verify this by showing that in a study of inflammatory bowel disease (IBD) phenotypes, ileal Crohn's disease (ICD) is associated with a more variable

  17. Dirichlet Multinomial Mixtures: Generative Models for Microbial Metagenomics

    PubMed Central

    Holmes, Ian; Harris, Keith; Quince, Christopher

    2012-01-01

    We introduce Dirichlet multinomial mixtures (DMM) for the probabilistic modelling of microbial metagenomics data. This data can be represented as a frequency matrix giving the number of times each taxa is observed in each sample. The samples have different size, and the matrix is sparse, as communities are diverse and skewed to rare taxa. Most methods used previously to classify or cluster samples have ignored these features. We describe each community by a vector of taxa probabilities. These vectors are generated from one of a finite number of Dirichlet mixture components each with different hyperparameters. Observed samples are generated through multinomial sampling. The mixture components cluster communities into distinct ‘metacommunities’, and, hence, determine envirotypes or enterotypes, groups of communities with a similar composition. The model can also deduce the impact of a treatment and be used for classification. We wrote software for the fitting of DMM models using the ‘evidence framework’ (http://code.google.com/p/microbedmm/). This includes the Laplace approximation of the model evidence. We applied the DMM model to human gut microbe genera frequencies from Obese and Lean twins. From the model evidence four clusters fit this data best. Two clusters were dominated by Bacteroides and were homogenous; two had a more variable community composition. We could not find a significant impact of body mass on community structure. However, Obese twins were more likely to derive from the high variance clusters. We propose that obesity is not associated with a distinct microbiota but increases the chance that an individual derives from a disturbed enterotype. This is an example of the ‘Anna Karenina principle (AKP)’ applied to microbial communities: disturbed states having many more configurations than undisturbed. We verify this by showing that in a study of inflammatory bowel disease (IBD) phenotypes, ileal Crohn's disease (ICD) is associated with a

  18. Comparative Metagenomics of Freshwater Microbial Communities

    SciTech Connect

    Hemme, Chris; Deng, Ye; Tu, Qichao; Fields, Matthew; Gentry, Terry; Wu, Liyou; Tringe, Susannah; Watson, David; He, Zhili; Hazen, Terry; Tiedje, James; Rubin, Eddy; Zhou, Jizhong

    2010-05-17

    Previous analyses of a microbial metagenome from uranium and nitric-acid contaminated groundwater (FW106) showed significant environmental effects resulting from the rapid introduction of multiple contaminants. Effects include a massive loss of species and strain biodiversity, accumulation of toxin resistant genes in the metagenome and lateral transfer of toxin resistance genes between community members. To better understand these results in an ecological context, a second metagenome from a pristine groundwater system located along the same geological strike was sequenced and analyzed (FW301). It is hypothesized that FW301 approximates the ancestral FW106 community based on phylogenetic profiles and common geological parameters; however, even if is not the case, the datasets still permit comparisons between healthy and stressed groundwater ecosystems. Complex carbohydrate metabolism has been almost entirely lost in the stressed ecosystem. In contrast, the pristine system encodes a wide diversity of complex carbohydrate metabolism systems, suggesting that carbon turnover is very rapid and less leaky in the healthy groundwater system. FW301 encodes many (~;;160+) carbon monoxide dehydrogenase genes while FW106 encodes none. This result suggests that the community is frequently exposed to oxygen from aerated rainwater percolating into the subsurface, with a resulting high rate of carbon metabolism and CO production. When oxygen levels fall, the CO then serves as a major carbon source for the community. FW301 appears to be capable of CO2 fixation via the reductive carboxylase (reverse TCA) cycle and possibly acetogenesis, activities; these activities are lacking in the heterotrophic FW106 system which relies exclusively on respiration of nitrate and/or oxygen for energy production. FW301 encodes a complete set of B12 biosynthesis pathway at high abundance suggesting the use of sodium gradients for energy production in the healthy groundwater community. Overall

  19. Metagenomics - a guide from sampling to data analysis

    PubMed Central

    2012-01-01

    Metagenomics applies a suite of genomic technologies and bioinformatics tools to directly access the genetic content of entire communities of organisms. The field of metagenomics has been responsible for substantial advances in microbial ecology, evolution, and diversity over the past 5 to 10 years, and many research laboratories are actively engaged in it now. With the growing numbers of activities also comes a plethora of methodological knowledge and expertise that should guide future developments in the field. This review summarizes the current opinions in metagenomics, and provides practical guidance and advice on sample processing, sequencing technology, assembly, binning, annotation, experimental design, statistical analysis, data storage, and data sharing. As more metagenomic datasets are generated, the availability of standardized procedures and shared data storage and analysis becomes increasingly important to ensure that output of individual projects can be assessed and compared. PMID:22587947

  20. Exploring Metagenomics in the Laboratory of an Introductory Biology Course†

    PubMed Central

    Gibbens, Brian B.; Scott, Cheryl L.; Hoff, Courtney D.; Schottel, Janet L.

    2015-01-01

    Four laboratory modules were designed for introductory biology students to explore the field of metagenomics. Students collected microbes from environmental samples, extracted the DNA, and amplified 16S rRNA gene sequences using polymerase chain reaction (PCR). Students designed functional metagenomics screens to determine and compare antibiotic resistance profiles among the samples. Bioinformatics tools were used to generate and interpret phylogenetic trees and identify homologous genes. A pretest and posttest were used to assess learning gains, and the results indicated that these modules increased student performance by an average of 22%. Here we describe ways to engage students in metagenomics-related research and provide readers with ideas for how they can start developing metagenomics exercises for their own classrooms. PMID:25949755

  1. Metagenomics of Glassy-winged Sharpshooter, Homalodisca vitripennis (Hemiptera: Cicadellidae)

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Three new insect-infecting viruses, three endosymbiotic bacteria, a fungus, and a bacterial phage were discovered using a metagenomics approach to identify unknown organisms that live in association with the sharpshooter, Homalodisca vitripennis (Hemiptera: Cicadellidae). The genetic composition of ...

  2. Activity screening of environmental metagenomic libraries reveals novel carboxylesterase families

    PubMed Central

    Popovic, Ana; Hai, Tran; Tchigvintsev, Anatoly; Hajighasemi, Mahbod; Nocek, Boguslaw; Khusnutdinova, Anna N.; Brown, Greg; Glinos, Julia; Flick, Robert; Skarina, Tatiana; Chernikova, Tatyana N.; Yim, Veronica; Brüls, Thomas; Paslier, Denis Le; Yakimov, Michail M.; Joachimiak, Andrzej; Ferrer, Manuel; Golyshina, Olga V.; Savchenko, Alexei; Golyshin, Peter N.; Yakunin, Alexander F.

    2017-01-01

    Metagenomics has made accessible an enormous reserve of global biochemical diversity. To tap into this vast resource of novel enzymes, we have screened over one million clones from metagenome DNA libraries derived from sixteen different environments for carboxylesterase activity and identified 714 positive hits. We have validated the esterase activity of 80 selected genes, which belong to 17 different protein families including unknown and cyclase-like proteins. Three metagenomic enzymes exhibited lipase activity, and seven proteins showed polyester depolymerization activity against polylactic acid and polycaprolactone. Detailed biochemical characterization of four new enzymes revealed their substrate preference, whereas their catalytic residues were identified using site-directed mutagenesis. The crystal structure of the metal-ion dependent esterase MGS0169 from the amidohydrolase superfamily revealed a novel active site with a bound unknown ligand. Thus, activity-centered metagenomics has revealed diverse enzymes and novel families of microbial carboxylesterases, whose activity could not have been predicted using bioinformatics tools. PMID:28272521

  3. Identifying Differentially Abundant Metabolic Pathways in Metagenomic Datasets

    NASA Astrophysics Data System (ADS)

    Liu, Bo; Pop, Mihai

    Enabled by rapid advances in sequencing technology, metagenomic studies aim to characterize entire communities of microbes bypassing the need for culturing individual bacterial members. One major goal of such studies is to identify specific functional adaptations of microbial communities to their habitats. Here we describe a powerful analytical method (MetaPath) that can identify differentially abundant pathways in metagenomic data-sets, relying on a combination of metagenomic sequence data and prior metabolic pathway knowledge. We show that MetaPath outperforms other common approaches when evaluated on simulated datasets. We also demonstrate the power of our methods in analyzing two, publicly available, metagenomic datasets: a comparison of the gut microbiome of obese and lean twins; and a comparison of the gut microbiome of infant and adult subjects. We demonstrate that the subpathways identified by our method provide valuable insights into the biological activities of the microbiome.

  4. Identification and characterization of novel poly(DL-lactic acid) depolymerases from metagenome.

    PubMed

    Mayumi, Daisuke; Akutsu-Shigeno, Yukie; Uchiyama, Hiroo; Nomura, Nobuhiko; Nakajima-Kambe, Toshiaki

    2008-07-01

    Many poly(lactic acid) (PLA)-degrading microorganisms have been isolated from the natural environment by culture-based methods, but there is no study about unculturable PLA-degrading microorganisms. In this study, we constructed a metagenomic library consisting of the DNA extracted from PLA disks buried in compost. We identified three PLA-degrading genes encoding lipase or hydrolase. The purified enzymes degraded not only PLA, but also various aliphatic polyesters, tributyrin, and p-nitrophenyl esters. From their substrate specificities, the PLA depolymerases were classified into an esterase rather than a lipase. Among the PLA depolymerases, PlaM4 exhibited thermophilic properties; that is, it showed the highest activity at 70 degrees C and was stable even after incubation for 1 h at 50 degrees C. PlaM4 had absorption and degradation activities for solid PLA at 60 degrees C, which indicates that the enzyme can effectively degrade PLA in a high-temperature environment. On the other hand, the enzyme classification based on amino acid sequences showed that the other PLA depolymerases, PlaM7 and PlaM9, were not classified into known lipases or esterases. This is the first report on the identification and characterization of PLA depolymerase from a metagenome.

  5. Characterization of the gut microbiota of Kawasaki disease patients by metagenomic analysis

    PubMed Central

    Kinumaki, Akiko; Sekizuka, Tsuyoshi; Hamada, Hiromichi; Kato, Kengo; Yamashita, Akifumi; Kuroda, Makoto

    2015-01-01

    Kawasaki disease (KD) is an acute febrile illness of early childhood. Previous reports have suggested that genetic disease susceptibility factors, together with a triggering infectious agent, could be involved in KD pathogenesis; however, the precise etiology of this disease remains unknown. Additionally, previous culture-based studies have suggested a possible role of intestinal microbiota in KD pathogenesis. In this study, we performed metagenomic analysis to comprehensively assess the longitudinal variation in the intestinal microbiota of 28 KD patients. Several notable bacterial genera were commonly extracted during the acute phase, whereas a relative increase in the number of Ruminococcus bacteria was observed during the non-acute phase of KD. The metagenomic analysis results based on bacterial species classification suggested that the number of sequencing reads with similarity to five Streptococcus spp. (S. pneumonia, pseudopneumoniae, oralis, gordonii, and sanguinis), in addition to patient-derived Streptococcus isolates, markedly increased during the acute phase in most patients. Streptococci include a variety of pathogenic bacteria and probiotic bacteria that promote human health; therefore, this further species discrimination could comprehensively illuminate the KD-associated microbiota. The findings of this study suggest that KD-related Streptococci might be involved in the pathogenesis of this disease. PMID:26322033

  6. Metagenomic analysis of the gut microbiota of the Timber Rattlesnake, Crotalus horridus.

    PubMed

    McLaughlin, Richard William; Cochran, Philip A; Dowd, Scot E

    2015-07-01

    Snakes are capable of surviving long periods without food. In this study we characterized the microbiota of a Timber Rattlesnake (Crotalus horridus), devoid of digesta, living in the wild. Pyrosequencing-based metagenomics were used to analyze phylogenetic and metabolic profiles with the aid of the MG-RAST server. Pyrosequencing of samples taken from the stomach, small intestine and colon yielded 691696, 957756 and 700419 high quality sequence reads. Taxonomic analysis of metagenomic reads indicated Eukarya was the most predominant domain, followed by bacteria and then viruses, for all three tissues. The most predominant phylum in the domain Bacteria was Proteobacteria for the tissues examined. Functional classifications by the subsystem database showed cluster-based subsystems were most predominant (10-15 %). Almost equally predominant (10-13 %) was carbohydrate metabolism. To identify bacteria in the colon at a finer taxonomic resolution, a 16S rRNA gene clone library was created. Proteobacteria was again found to be the most predominant phylum. The present study provides a baseline for understanding the microbial ecology of snakes living in the wild.

  7. Metagenomic Profiling of a Microbial Assemblage Associated with the California Mussel: A Node in Networks of Carbon and Nitrogen Cycling

    PubMed Central

    Pfister, Catherine A.; Meyer, Folker; Antonopoulos, Dionysios A.

    2010-01-01

    Mussels are conspicuous and often abundant members of rocky shores and may constitute an important site for the nitrogen cycle due to their feeding and excretion activities. We used shotgun metagenomics of the microbial community associated with the surface of mussels (Mytilus californianus) on Tatoosh Island in Washington state to test whether there is a nitrogen-based microbial assemblage associated with mussels. Analyses of both tidepool mussels and those on emergent benches revealed a diverse community of Bacteria and Archaea with approximately 31 million bp from 6 mussels in each habitat. Using MG-RAST, between 22.5–25.6% were identifiable using the SEED non-redundant database for proteins. Of those fragments that were identifiable through MG-RAST, the composition was dominated by Cyanobacteria and Alpha- and Gamma-proteobacteria. Microbial composition was highly similar between the tidepool and emergent bench mussels, suggesting similar functions across these different microhabitats. One percent of the proteins identified in each sample were related to nitrogen cycling. When normalized to protein discovery rate, the high diversity and abundance of enzymes related to the nitrogen cycle in mussel-associated microbes is as great or greater than that described for other marine metagenomes. In some instances, the nitrogen-utilizing profile of this assemblage was more concordant with soil metagenomes in the Midwestern U.S. than for open ocean system. Carbon fixation and Calvin cycle enzymes further represented 0.65 and 1.26% of all proteins and their abundance was comparable to a number of open ocean marine metagenomes. In sum, the diversity and abundance of nitrogen and carbon cycle related enzymes in the microbes occupying the shells of Mytilus californianus suggest these mussels provide a node for microbial populations and thus biogeochemical processes. PMID:20463896

  8. Selectable fragmentation warhead

    DOEpatents

    Bryan, Courtney S.; Paisley, Dennis L.; Montoya, Nelson I.; Stahl, David B.

    1993-01-01

    A selectable fragmentation warhead capable of producing a predetermined number of fragments from a metal plate, and accelerating the fragments toward a target. A first explosive located adjacent to the plate is detonated at selected number of points by laser-driven slapper detonators. In one embodiment, a smoother-disk and a second explosive, located adjacent to the first explosive, serve to increase acceleration of the fragments toward a target. The ability to produce a selected number of fragments allows for effective destruction of a chosen target.

  9. Metagenomic Sequencing of an In Vitro-Simulated Microbial Community

    SciTech Connect

    Morgan, Jenna L.; Darling, Aaron E.; Eisen, Jonathan A.

    2009-12-01

    Background: Microbial life dominates the earth, but many species are difficult or even impossible to study under laboratory conditions. Sequencing DNA directly from the environment, a technique commonly referred to as metagenomics, is an important tool for cataloging microbial life. This culture-independent approach involves collecting samples that include microbes in them, extracting DNA from the samples, and sequencing the DNA. A sample may contain many different microorganisms, macroorganisms, and even free-floating environmental DNA. A fundamental challenge in metagenomics has been estimating the abundance of organisms in a sample based on the frequency with which the organism's DNA was observed in reads generated via DNA sequencing. Methodology/Principal Findings: We created mixtures of ten microbial species for which genome sequences are known. Each mixture contained an equal number of cells of each species. We then extracted DNA from the mixtures, sequenced the DNA, and measured the frequency with which genomic regions from each organism was observed in the sequenced DNA. We found that the observed frequency of reads mapping to each organism did not reflect the equal numbers of cells that were known to be included in each mixture. The relative organism abundances varied significantly depending on the DNA extraction and sequencing protocol utilized. Conclusions/Significance: We describe a new data resource for measuring the accuracy of metagenomic binning methods, created by in vitro-simulation of a metagenomic community. Our in vitro simulation can be used to complement previous in silico benchmark studies. In constructing a synthetic community and sequencing its metagenome, we encountered several sources of observation bias that likely affect most metagenomic experiments to date and present challenges for comparative metagenomic studies. DNA preparation methods have a particularly profound effect in our study, implying that samples prepared with different

  10. Exploration of Metagenome Assemblies with an Interactive Visualization Tool

    SciTech Connect

    Cantor, Michael; Nordberg, Henrik; Smirnova, Tatyana; Andersen, Evan; Tringe, Susannah; Hess, Matthias; Dubchak, Inna

    2014-07-09

    Metagenomics, one of the fastest growing areas of modern genomic science, is the genetic profiling of the entire community of microbial organisms present in an environmental sample. Elviz is a web-based tool for the interactive exploration of metagenome assemblies. Elviz can be used with publicly available data sets from the Joint Genome Institute or with custom user-loaded assemblies. Elviz is available at genome.jgi.doe.gov/viz

  11. Biocatalysts and their small molecule products from metagenomic studies

    PubMed Central

    Iqbal, Hala A.; Feng, Zhiyang; Brady, Sean F.

    2012-01-01

    The vast majority of bacteria present in environmental samples have never been cultured and therefore they have not been available to exploit their ability to produce useful biocatalysts or collections of biocatalysts that can biosynthesize interesting small molecules. Metagenomic libraries constructed using DNA extracted directly from natural bacterial communities offer access to the genetic information present in the genomes of these as yet uncultured bacteria. This review highlights recent efforts to recover both discrete enzymes and small molecules from metagenomic libraries. PMID:22455793

  12. [Metagenomics and biodiversity of sphagnum bogs].

    PubMed

    Rusin, L Yu

    2016-01-01

    Biodiversity of sphagnum bogs is one of the richest and less studied, while these ecosystems are among the top ones in ecological, conservation, and economic value. Recent studies focused on the prokaryotic consortia associated with sphagnum mosses, and revealed the factors that maintain sustainability and productivity of bog ecosystems. High-throughput sequencing technologies provided insight into functional diversity of moss microbial communities (microbiomes), and helped to identify the biochemical pathways and gene families that facilitate the spectrum of adaptive strategies and largely foster the very successful colonization of the Northern hemisphere by sphagnum mosses. Rich and valuable information obtained on microbiomes of peat bogs sets off the paucity of evidence on their eukaryotic diversity. Prospects and expectations of reliable assessment of taxonomic profiles, relative abundance of taxa, and hidden biodiversity of microscopic eukaryotes in sphagnum bog ecosystems are briefly outlined in the context of today's metagenomics.

  13. MetaProx: the database of metagenomic proximons

    PubMed Central

    Vey, Gregory; Charles, Trevor C.

    2014-01-01

    MetaProx is the database of metagenomic proximons: a searchable repository of proximon objects conceived with two specific goals. The first objective is to accelerate research involving metagenomic functional interactions by providing a database of metagenomic operon candidates. Proximons represent a special subset of directons (series of contiguous co-directional genes) where each member gene is in close proximity to its neighbours with respect to intergenic distance. As a result, proximons represent significant operon candidates where some subset of proximons is the set of true metagenomic operons. Proximons are well suited for the inference of metagenomic functional networks because predicted functional linkages do not rely on homology-dependent information that is frequently unavailable in metagenomic scenarios. The second objective is to explore representations for semistructured biological data that can offer an alternative to the traditional relational database approach. In particular, we use a serialized object implementation and advocate a Data as Data policy where the same serialized objects can be used at all levels (database, search tool and saved user file) without conversion or the use of human-readable markups. MetaProx currently includes 4 210 818 proximons consisting of 8 926 993 total member genes. Database URL: http://metaprox.uwaterloo.ca PMID:25288655

  14. Messages from the first International Conference on Clinical Metagenomics (ICCMg).

    PubMed

    Ruppé, Etienne; Greub, Gilbert; Schrenzel, Jacques

    2017-02-01

    Metagenomics is recently entering in the clinical microbiology and an increasing number of diagnostic laboratories are now proposing the sequencing & annotation of bacterial genomes and/or the analysis of clinical samples by direct or PCR-based metagenomics with short time to results. In this context, the first International Conference on Clinical Metagenomics (ICCMg) was held in Geneva in October 2016 and several key aspects have been discussed including: i) the need for improved resolution, ii) the importance of interpretation given the common occurrence of sequence contaminants, iii) the need for improved bioinformatic pipelines, iv) the bottleneck of DNA extraction, v) the importance of gold standards, vi) the need to further reduce time to results, vii) how to improve data sharing, viii) the applications of bacterial genomics and clinical metagenomics in better adapting therapeutics and ix) the impact of metagenomics and new sequencing technologies in discovering new microbes. Further efforts in term of reduced turnaround time, improved quality and lower costs are however warranted to fully translate metagenomics in clinical applications.

  15. Metagenomics, metaMicrobesOnline and Kbase Data Integration (MICW - Metagenomics Informatics Challenges Workshop: 10K Genomes at a Time)

    ScienceCinema

    Dehal, Paramvir [LBNL

    2016-07-12

    Berkeley Lab's Paramvir Dehal on "Managing and Storing large Datasets in MicrobesOnline, metaMicrobesOnline and the DOE Knowledgebase" at the Metagenomics Informatics Challenges Workshop held at the DOE JGI on October 12-13, 2011.

  16. Introduction to Metagenomics at DOE JGI (Opening Remarks for the Metagenomics Informatics Challenges Workshop: 10K Genomes at a Time)

    ScienceCinema

    Kyrpides, Nikos [DOE JGI

    2016-07-12

    After a quick introduction by DOE JGI Director Eddy Rubin, DOE JGI's Nikos Kyrpides delivers the opening remarks at the Metagenomics Informatics Challenges Workshop held at the DOE JGI on October 12-13, 2011

  17. A base composition analysis of natural patterns for the preprocessing of metagenome sequences

    PubMed Central

    2013-01-01

    Background On the pretext that sequence reads and contigs often exhibit the same kinds of base usage that is also observed in the sequences from which they are derived, we offer a base composition analysis tool. Our tool uses these natural patterns to determine relatedness across sequence data. We introduce spectrum sets (sets of motifs) which are permutations of bacterial restriction sites and the base composition analysis framework to measure their proportional content in sequence data. We suggest that this framework will increase the efficiency during the pre-processing stages of metagenome sequencing and assembly projects. Results Our method is able to differentiate organisms and their reads or contigs. The framework shows how to successfully determine the relatedness between these reads or contigs by comparison of base composition. In particular, we show that two types of organismal-sequence data are fundamentally different by analyzing their spectrum set motif proportions (coverage). By the application of one of the four possible spectrum sets, encompassing all known restriction sites, we provide the evidence to claim that each set has a different ability to differentiate sequence data. Furthermore, we show that the spectrum set selection having relevance to one organism, but not to the others of the data set, will greatly improve performance of sequence differentiation even if the fragment size of the read, contig or sequence is not lengthy. Conclusions We show the proof of concept of our method by its application to ten trials of two or three freshly selected sequence fragments (reads and contigs) for each experiment across the six organisms of our set. Here we describe a novel and computationally effective pre-processing step for metagenome sequencing and assembly tasks. Furthermore, our base composition method has applications in phylogeny where it can be used to infer evolutionary distances between organisms based on the notion that related organisms

  18. Universality of fragment shapes

    PubMed Central

    Domokos, Gábor; Kun, Ferenc; Sipos, András Árpád; Szabó, Tímea

    2015-01-01

    The shape of fragments generated by the breakup of solids is central to a wide variety of problems ranging from the geomorphic evolution of boulders to the accumulation of space debris orbiting Earth. Although the statistics of the mass of fragments has been found to show a universal scaling behavior, the comprehensive characterization of fragment shapes still remained a fundamental challenge. We performed a thorough experimental study of the problem fragmenting various types of materials by slowly proceeding weathering and by rapid breakup due to explosion and hammering. We demonstrate that the shape of fragments obeys an astonishing universality having the same generic evolution with the fragment size irrespective of materials details and loading conditions. There exists a cutoff size below which fragments have an isotropic shape, however, as the size increases an exponential convergence is obtained to a unique elongated form. We show that a discrete stochastic model of fragmentation reproduces both the size and shape of fragments tuning only a single parameter which strengthens the general validity of the scaling laws. The dependence of the probability of the crack plan orientation on the linear extension of fragments proved to be essential for the shape selection mechanism. PMID:25772300

  19. Universality of fragment shapes

    NASA Astrophysics Data System (ADS)

    Domokos, Gábor; Kun, Ferenc; Sipos, András Árpád; Szabó, Tímea

    2015-03-01

    The shape of fragments generated by the breakup of solids is central to a wide variety of problems ranging from the geomorphic evolution of boulders to the accumulation of space debris orbiting Earth. Although the statistics of the mass of fragments has been found to show a universal scaling behavior, the comprehensive characterization of fragment shapes still remained a fundamental challenge. We performed a thorough experimental study of the problem fragmenting various types of materials by slowly proceeding weathering and by rapid breakup due to explosion and hammering. We demonstrate that the shape of fragments obeys an astonishing universality having the same generic evolution with the fragment size irrespective of materials details and loading conditions. There exists a cutoff size below which fragments have an isotropic shape, however, as the size increases an exponential convergence is obtained to a unique elongated form. We show that a discrete stochastic model of fragmentation reproduces both the size and shape of fragments tuning only a single parameter which strengthens the general validity of the scaling laws. The dependence of the probability of the crack plan orientation on the linear extension of fragments proved to be essential for the shape selection mechanism.

  20. Assessment of diversity indices for the characterization of the soil prokaryotic community by metagenomic analysis

    NASA Astrophysics Data System (ADS)

    Chernov, T. I.; Tkhakakhova, A. K.; Kutovaya, O. V.

    2015-04-01

    The diversity indices used in ecology for assessing the metagenomes of soil prokaryotic communities at different phylogenetic levels were compared. The following indices were considered: the number of detected taxa and the Shannon, Menhinick, Margalef, Simpson, Chao1, and ACE indices. The diversity analysis of the prokaryotic communities in the upper horizons of a typical chernozem (Haplic Chernozem (Pachic)), a dark chestnut soil (Haplic Kastanozem (Chromic)), and an extremely arid desert soil (Endosalic Calcisol (Yermic)) was based on the analysis of 16S rRNA genes. The Menhinick, Margalef, Chao1, and ACE indices gave similar results for the classification of the communities according to their diversity levels; the Simpson index gave good results only for the high-level taxa (phyla); the best results were obtained with the Shannon index. In general, all the indices used showed a decrease in the diversity of the soil prokaryotes in the following sequence: chernozem > dark chestnut soil > extremely arid desert soil.

  1. Fragmentation properties of metals

    SciTech Connect

    Grady, D.E.; Kipp, M.E.

    1996-06-01

    In the present study we are developing an experimental fracture material property test method specific to dynamic fragmentation. Spherical test samples of the metals of interest are subjected to controlled impulsive stress loads by acceleration to high velocities with a light-gas launcher facility and subsequent normal impact on thin plates. Motion, deformation and fragmentation of the test samples are diagnosed with multiple flash radiography methods. The impact plate materials are selected to be transparent to the x-ray method so that only test metal material is imaged. Through a systematic series of such tests, both strain-to-failure and fragmentation resistance properties are determined through this experimental method. Fragmentation property data for several steels, copper, aluminum, tantalum and titanium have been obtained to date. Aspects of the dynamic data have been analyzed with computational methods to achieve a better understanding of the processes leading to failure and fragmentation, and to test an existing computational fragmentation model.

  2. Going Deeper: Metagenome of a Hadopelagic Microbial Community

    PubMed Central

    Eloe, Emiley A.; Fadrosh, Douglas W.; Novotny, Mark; Zeigler Allen, Lisa; Kim, Maria; Lombardo, Mary-Jane; Yee-Greenbaum, Joyclyn; Yooseph, Shibu; Allen, Eric E.; Lasken, Roger; Williamson, Shannon J.; Bartlett, Douglas H.

    2011-01-01

    The paucity of sequence data from pelagic deep-ocean microbial assemblages has severely restricted molecular exploration of the largest biome on Earth. In this study, an analysis is presented of a large-scale 454-pyrosequencing metagenomic dataset from a hadopelagic environment from 6,000 m depth within the Puerto Rico Trench (PRT). A total of 145 Mbp of assembled sequence data was generated and compared to two pelagic deep ocean metagenomes and two representative surface seawater datasets from the Sargasso Sea. In a number of instances, all three deep metagenomes displayed similar trends, but were most magnified in the PRT, including enrichment in functions for two-component signal transduction mechanisms and transcriptional regulation. Overrepresented transporters in the PRT metagenome included outer membrane porins, diverse cation transporters, and di- and tri-carboxylate transporters that matched well with the prevailing catabolic processes such as butanoate, glyoxylate and dicarboxylate metabolism. A surprisingly high abundance of sulfatases for the degradation of sulfated polysaccharides were also present in the PRT. The most dramatic adaptational feature of the PRT microbes appears to be heavy metal resistance, as reflected in the large numbers of transporters present for their removal. As a complement to the metagenome approach, single-cell genomic techniques were utilized to generate partial whole-genome sequence data from four uncultivated cells from members of the dominant phyla within the PRT, Alphaproteobacteria, Gammaproteobacteria, Bacteroidetes and Planctomycetes. The single-cell sequence data provided genomic context for many of the highly abundant functional attributes identified from the PRT metagenome, as well as recruiting heavily the PRT metagenomic sequence data compared to 172 available reference marine genomes. Through these multifaceted sequence approaches, new insights have been provided into the unique functional attributes present in

  3. Environmental Metagenomics: The Data Assembly and Data Analysis Perspectives

    NASA Astrophysics Data System (ADS)

    Kumar, Vinay; Maitra, S. S.; Shukla, Rohit Nandan

    2015-01-01

    Novel gene finding is one of the emerging fields in the environmental research. In the past decades the research was focused mainly on the discovery of microorganisms which were capable of degrading a particular compound. A lot of methods are available in literature about the cultivation and screening of these novel microorganisms. All of these methods are efficient for screening of microbes which can be cultivated in the laboratory. Microorganisms which live in extreme conditions like hot springs, frozen glaciers, acid mine drainage, etc. cannot be cultivated in the laboratory, this is because of incomplete knowledge about their growth requirements like temperature, nutrients and their mutual dependence on each other. The microbes that can be cultivated correspond only to less than 1 % of the total microbes which are present in the earth. Rest of the 99 % of uncultivated majority remains inaccessible. Metagenomics transcends the culture requirements of microbes. In metagenomics DNA is directly extracted from the environmental samples such as soil, seawater, acid mine drainage etc., followed by construction and screening of metagenomic library. With the ongoing research, a huge amount of metagenomic data is accumulating. Understanding this data is an essential step to extract novel genes of industrial importance. Various bioinformatics tools have been designed to analyze and annotate the data produced from the metagenome. The Bio-informatic requirements of metagenomics data analysis are different in theory and practice. This paper reviews the tools that are available for metagenomic data analysis and the capability such tools—what they can do and their web availability.

  4. The single-species metagenome: subtyping Staphylococcus aureus core genome sequences from shotgun metagenomic data

    PubMed Central

    Li, Ben; Petit III, Robert A.; Qin, Zhaohui S.; Darrow, Lyndsey

    2016-01-01

    In this study we developed a genome-based method for detecting Staphylococcus aureus subtypes from metagenome shotgun sequence data. We used a binomial mixture model and the coverage counts at >100,000 known S. aureus SNP (single nucleotide polymorphism) sites derived from prior comparative genomic analysis to estimate the proportion of 40 subtypes in metagenome samples. We were able to obtain >87% sensitivity and >94% specificity at 0.025X coverage for S. aureus. We found that 321 and 149 metagenome samples from the Human Microbiome Project and metaSUB analysis of the New York City subway, respectively, contained S. aureus at genome coverage >0.025. In both projects, CC8 and CC30 were the most common S. aureus clonal complexes encountered. We found evidence that the subtype composition at different body sites of the same individual were more similar than random sampling and more limited evidence that certain body sites were enriched for particular subtypes. One surprising finding was the apparent high frequency of CC398, a lineage often associated with livestock, in samples from the tongue dorsum. Epidemiologic analysis of the HMP subject population suggested that high BMI (body mass index) and health insurance are possibly associated with S. aureus carriage but there was limited power to identify factors linked to carriage of even the most common subtype. In the NYC subway data, we found a small signal of geographic distance affecting subtype clustering but other unknown factors influence taxonomic distribution of the species around the city. PMID:27781166

  5. Phylogenetic Analysis of a Spontaneous Cocoa Bean Fermentation Metagenome Reveals New Insights into Its Bacterial and Fungal Community Diversity

    PubMed Central

    Illeghems, Koen; De Vuyst, Luc; Papalexandratou, Zoi; Weckx, Stefan

    2012-01-01

    This is the first report on the phylogenetic analysis of the community diversity of a single spontaneous cocoa bean box fermentation sample through a metagenomic approach involving 454 pyrosequencing. Several sequence-based and composition-based taxonomic profiling tools were used and evaluated to avoid software-dependent results and their outcome was validated by comparison with previously obtained culture-dependent and culture-independent data. Overall, this approach revealed a wider bacterial (mainly γ-Proteobacteria) and fungal diversity than previously found. Further, the use of a combination of different classification methods, in a software-independent way, helped to understand the actual composition of the microbial ecosystem under study. In addition, bacteriophage-related sequences were found. The bacterial diversity depended partially on the methods used, as composition-based methods predicted a wider diversity than sequence-based methods, and as classification methods based solely on phylogenetic marker genes predicted a more restricted diversity compared with methods that took all reads into account. The metagenomic sequencing analysis identified Hanseniaspora uvarum, Hanseniaspora opuntiae, Saccharomyces cerevisiae, Lactobacillus fermentum, and Acetobacter pasteurianus as the prevailing species. Also, the presence of occasional members of the cocoa bean fermentation process was revealed (such as Erwinia tasmaniensis, Lactobacillus brevis, Lactobacillus casei, Lactobacillus rhamnosus, Lactococcus lactis, Leuconostoc mesenteroides, and Oenococcus oeni). Furthermore, the sequence reads associated with viral communities were of a restricted diversity, dominated by Myoviridae and Siphoviridae, and reflecting Lactobacillus as the dominant host. To conclude, an accurate overview of all members of a cocoa bean fermentation process sample was revealed, indicating the superiority of metagenomic sequencing over previously used techniques. PMID:22666442

  6. Fragments and Coherence

    ERIC Educational Resources Information Center

    Watson, Anne

    2008-01-01

    Can teachers contact the inner coherence of mathematics while working in a context fragmented by always-new objectives, criteria, and initiatives? How, more importantly, can learners experience the inner coherence of mathematics while working in a context fragmented by testing, modular curricular, short-term learning objectives, and lessons that…

  7. Fragment Hazard Investigation Program

    DTIC Science & Technology

    1978-10-01

    53 Ballistic Density (k) ............................................. 53 Ejection A ngle (a...54 Ejection Velocity (V) ................................................. 54 DEVELOPMENT OF EMPIRICAL RELATION...5S 54 Fragment Weight Versus Gamma for Test QD-155-08 ......................... 56 55 Fragment Range Versus Ejection Angle as a Function of

  8. Fragmentation of fullerenes

    NASA Astrophysics Data System (ADS)

    Chancey, Ryan T.; Oddershede, Lene; Harris, Frank E.; Sabin, John R.

    2003-04-01

    We have performed classical molecular-dynamics simulations of the fragmentation collisions of neutral fullerenes (C24, C60, C100, and C240) with a hard wall. The interactions between the carbon atoms are modeled by a Tersoff potential and the position of each carbon atom at each time step is calculated using a sixth-order predictor-corrector method. The statistical distribution of the fragments depends on impact energy. At low energies, the fragment distribution appears symmetric, with both the large and small fragment distributions well fitted by an exponential function of the same exponent, the value of which decreases with impact energy. At intermediate energies, the distribution of the smallest fragments can be fitted equally well by a power law or an exponential function. At high impact energies, the entire fragmentation pattern is well described by a single exponential function, the exponent increasing with energy. The observed tendencies in fragment distributions as well as the obtained exponents are in agreement with experimental observations. The fragmentation behavior of the four investigated fullerenes is very similar, and it is noted that C60 appears to be the most stable.

  9. Application of metagenomics in the human gut microbiome.

    PubMed

    Wang, Wei-Lin; Xu, Shao-Yan; Ren, Zhi-Gang; Tao, Liang; Jiang, Jian-Wen; Zheng, Shu-Sen

    2015-01-21

    There are more than 1000 microbial species living in the complex human intestine. The gut microbial community plays an important role in protecting the host against pathogenic microbes, modulating immunity, regulating metabolic processes, and is even regarded as an endocrine organ. However, traditional culture methods are very limited for identifying microbes. With the application of molecular biologic technology in the field of the intestinal microbiome, especially metagenomic sequencing of the next-generation sequencing technology, progress has been made in the study of the human intestinal microbiome. Metagenomics can be used to study intestinal microbiome diversity and dysbiosis, as well as its relationship to health and disease. Moreover, functional metagenomics can identify novel functional genes, microbial pathways, antibiotic resistance genes, functional dysbiosis of the intestinal microbiome, and determine interactions and co-evolution between microbiota and host, though there are still some limitations. Metatranscriptomics, metaproteomics and metabolomics represent enormous complements to the understanding of the human gut microbiome. This review aims to demonstrate that metagenomics can be a powerful tool in studying the human gut microbiome with encouraging prospects. The limitations of metagenomics to be overcome are also discussed. Metatranscriptomics, metaproteomics and metabolomics in relation to the study of the human gut microbiome are also briefly discussed.

  10. Longitudinal Metagenomic Analysis of Hospital Air Identifies Clinically Relevant Microbes

    PubMed Central

    King, Paula; Pham, Long K.; Waltz, Shannon; Sphar, Dan; Yamamoto, Robert T.; Conrad, Douglas; Taplitz, Randy; Torriani, Francesca

    2016-01-01

    We describe the sampling of sixty-three uncultured hospital air samples collected over a six-month period and analysis using shotgun metagenomic sequencing. Our primary goals were to determine the longitudinal metagenomic variability of this environment, identify and characterize genomes of potential pathogens and determine whether they are atypical to the hospital airborne metagenome. Air samples were collected from eight locations which included patient wards, the main lobby and outside. The resulting DNA libraries produced 972 million sequences representing 51 gigabases. Hierarchical clustering of samples by the most abundant 50 microbial orders generated three major nodes which primarily clustered by type of location. Because the indoor locations were longitudinally consistent, episodic relative increases in microbial genomic signatures related to the opportunistic pathogens Aspergillus, Penicillium and Stenotrophomonas were identified as outliers at specific locations. Further analysis of microbial reads specific for Stenotrophomonas maltophilia indicated homology to a sequenced multi-drug resistant clinical strain and we observed broad sequence coverage of resistance genes. We demonstrate that a shotgun metagenomic sequencing approach can be used to characterize the resistance determinants of pathogen genomes that are uncharacteristic for an otherwise consistent hospital air microbial metagenomic profile. PMID:27482891

  11. Structure, fluctuation and magnitude of a natural grassland soil metagenome

    PubMed Central

    Delmont, Tom O; Prestat, Emmanuel; Keegan, Kevin P; Faubladier, Michael; Robe, Patrick; Clark, Ian M; Pelletier, Eric; Hirsch, Penny R; Meyer, Folker; Gilbert, Jack A; Le Paslier, Denis; Simonet, Pascal; Vogel, Timothy M

    2012-01-01

    The soil ecosystem is critical for human health, affecting aspects of the environment from key agricultural and edaphic parameters to critical influence on climate change. Soil has more unknown biodiversity than any other ecosystem. We have applied diverse DNA extraction methods coupled with high throughput pyrosequencing to explore 4.88 × 109 bp of metagenomic sequence data from the longest continually studied soil environment (Park Grass experiment at Rothamsted Research in the UK). Results emphasize important DNA extraction biases and unexpectedly low seasonal and vertical soil metagenomic functional class variations. Clustering-based subsystems and carbohydrate metabolism had the largest quantity of annotated reads assigned although <50% of reads were assigned at an E value cutoff of 10−5. In addition, with the more detailed subsystems, cAMP signaling in bacteria (3.24±0.27% of the annotated reads) and the Ton and Tol transport systems (1.69±0.11%) were relatively highly represented. The most highly represented genome from the database was that for a Bradyrhizobium species. The metagenomic variance created by integrating natural and methodological fluctuations represents a global picture of the Rothamsted soil metagenome that can be used for specific questions and future inter-environmental metagenomic comparisons. However, only 1% of annotated sequences correspond to already sequenced genomes at 96% similarity and E values of <10−5, thus, considerable genomic reconstructions efforts still have to be performed. PMID:22297556

  12. An introduction to the analysis of shotgun metagenomic data

    PubMed Central

    Sharpton, Thomas J.

    2014-01-01

    Environmental DNA sequencing has revealed the expansive biodiversity of microorganisms and clarified the relationship between host-associated microbial communities and host phenotype. Shotgun metagenomic DNA sequencing is a relatively new and powerful environmental sequencing approach that provides insight into community biodiversity and function. But, the analysis of metagenomic sequences is complicated due to the complex structure of the data. Fortunately, new tools and data resources have been developed to circumvent these complexities and allow researchers to determine which microbes are present in the community and what they might be doing. This review describes the analytical strategies and specific tools that can be applied to metagenomic data and the considerations and caveats associated with their use. Specifically, it documents how metagenomes can be analyzed to quantify community structure and diversity, assemble novel genomes, identify new taxa and genes, and determine which metabolic pathways are encoded in the community. It also discusses several methods that can be used compare metagenomes to identify taxa and functions that differentiate communities. PMID:24982662

  13. Metagenomic analysis of permafrost microbial community response to thaw

    SciTech Connect

    Mackelprang, R.; Waldrop, M.P.; DeAngelis, K.M.; David, M.M.; Chavarria, K.L.; Blazewicz, S.J.; Rubin, E.M.; Jansson, J.K.

    2011-07-01

    We employed deep metagenomic sequencing to determine the impact of thaw on microbial phylogenetic and functional genes and related this data to measurements of methane emissions. Metagenomics, the direct sequencing of DNA from the environment, allows for the examination of whole biochemical pathways and associated processes, as opposed to individual pieces of the metabolic puzzle. Our metagenome analyses revealed that during transition from a frozen to a thawed state there were rapid shifts in many microbial, phylogenetic and functional gene abundances and pathways. After one week of incubation at 5°C, permafrost metagenomes converged to be more similar to each other than while they were frozen. We found that multiple genes involved in cycling of C and nitrogen shifted rapidly during thaw. We also constructed the first draft genome from a complex soil metagenome, which corresponded to a novel methanogen. Methane previously accumulated in permafrost was released during thaw and subsequently consumed by methanotrophic bacteria. Together these data point towards the importance of rapid cycling of methane and nitrogen in thawing permafrost.

  14. Application of metagenomics in the human gut microbiome

    PubMed Central

    Wang, Wei-Lin; Xu, Shao-Yan; Ren, Zhi-Gang; Tao, Liang; Jiang, Jian-Wen; Zheng, Shu-Sen

    2015-01-01

    There are more than 1000 microbial species living in the complex human intestine. The gut microbial community plays an important role in protecting the host against pathogenic microbes, modulating immunity, regulating metabolic processes, and is even regarded as an endocrine organ. However, traditional culture methods are very limited for identifying microbes. With the application of molecular biologic technology in the field of the intestinal microbiome, especially metagenomic sequencing of the next-generation sequencing technology, progress has been made in the study of the human intestinal microbiome. Metagenomics can be used to study intestinal microbiome diversity and dysbiosis, as well as its relationship to health and disease. Moreover, functional metagenomics can identify novel functional genes, microbial pathways, antibiotic resistance genes, functional dysbiosis of the intestinal microbiome, and determine interactions and co-evolution between microbiota and host, though there are still some limitations. Metatranscriptomics, metaproteomics and metabolomics represent enormous complements to the understanding of the human gut microbiome. This review aims to demonstrate that metagenomics can be a powerful tool in studying the human gut microbiome with encouraging prospects. The limitations of metagenomics to be overcome are also discussed. Metatranscriptomics, metaproteomics and metabolomics in relation to the study of the human gut microbiome are also briefly discussed. PMID:25624713

  15. Exploratory experimentation and scientific practice: metagenomics and the proteorhodopsin case.

    PubMed

    O'Malley, Maureen A

    2007-01-01

    Exploratory experimentation and high-throughput molecular biology appear to have considerable affinity for each other. Included in the latter category is metagenomics, which is the DNA-based study of diverse microbial communities from a vast range of non-laboratory environments. Metagenomics has already made numerous discoveries and these have led to reinterpretations of fundamental concepts of microbial organization, evolution, and ecology. The most outstanding success story of metagenomics to date involves the discovery of a rhodopsin gene, named proteorhodopsin, in marine bacteria that were never suspected to have any photobiological capacities. A discussion of this finding and its detailed investigation illuminates the relationship between exploratory experimentation and metagenomics. Specifically, the proteorhodopsin story indicates that a dichotomous interpretation of theory-driven and exploratory experimentation is insufficient and that an interactive understanding of these two types of experimentation can be usefully supplemented by another category, "natural history experimentation". Further reflection on the context of metagenomics suggests the necessity of thinking more historically about exploratory and other forms of experimentation.

  16. An introduction to the analysis of shotgun metagenomic data.

    PubMed

    Sharpton, Thomas J

    2014-01-01

    Environmental DNA sequencing has revealed the expansive biodiversity of microorganisms and clarified the relationship between host-associated microbial communities and host phenotype. Shotgun metagenomic DNA sequencing is a relatively new and powerful environmental sequencing approach that provides insight into community biodiversity and function. But, the analysis of metagenomic sequences is complicated due to the complex structure of the data. Fortunately, new tools and data resources have been developed to circumvent these complexities and allow researchers to determine which microbes are present in the community and what they might be doing. This review describes the analytical strategies and specific tools that can be applied to metagenomic data and the considerations and caveats associated with their use. Specifically, it documents how metagenomes can be analyzed to quantify community structure and diversity, assemble novel genomes, identify new taxa and genes, and determine which metabolic pathways are encoded in the community. It also discusses several methods that can be used compare metagenomes to identify taxa and functions that differentiate communities.

  17. Longitudinal Metagenomic Analysis of Hospital Air Identifies Clinically Relevant Microbes.

    PubMed

    King, Paula; Pham, Long K; Waltz, Shannon; Sphar, Dan; Yamamoto, Robert T; Conrad, Douglas; Taplitz, Randy; Torriani, Francesca; Forsyth, R Allyn

    2016-01-01

    We describe the sampling of sixty-three uncultured hospital air samples collected over a six-month period and analysis using shotgun metagenomic sequencing. Our primary goals were to determine the longitudinal metagenomic variability of this environment, identify and characterize genomes of potential pathogens and determine whether they are atypical to the hospital airborne metagenome. Air samples were collected from eight locations which included patient wards, the main lobby and outside. The resulting DNA libraries produced 972 million sequences representing 51 gigabases. Hierarchical clustering of samples by the most abundant 50 microbial orders generated three major nodes which primarily clustered by type of location. Because the indoor locations were longitudinally consistent, episodic relative increases in microbial genomic signatures related to the opportunistic pathogens Aspergillus, Penicillium and Stenotrophomonas were identified as outliers at specific locations. Further analysis of microbial reads specific for Stenotrophomonas maltophilia indicated homology to a sequenced multi-drug resistant clinical strain and we observed broad sequence coverage of resistance genes. We demonstrate that a shotgun metagenomic sequencing approach can be used to characterize the resistance determinants of pathogen genomes that are uncharacteristic for an otherwise consistent hospital air microbial metagenomic profile.

  18. Accurate genome relative abundance estimation based on shotgun metagenomic reads.

    PubMed

    Xia, Li C; Cram, Jacob A; Chen, Ting; Fuhrman, Jed A; Sun, Fengzhu

    2011-01-01

    Accurate estimation of microbial community composition based on metagenomic sequencing data is fundamental for subsequent metagenomics analysis. Prevalent estimation methods are mainly based on directly summarizing alignment results or its variants; often result in biased and/or unstable estimates. We have developed a unified probabilistic framework (named GRAMMy) by explicitly modeling read assignment ambiguities, genome size biases and read distributions along the genomes. Maximum likelihood method is employed to compute Genome Relative Abundance of microbial communities using the Mixture Model theory (GRAMMy). GRAMMy has been demonstrated to give estimates that are accurate and robust across both simulated and real read benchmark datasets. We applied GRAMMy to a collection of 34 metagenomic read sets from four metagenomics projects and identified 99 frequent species (minimally 0.5% abundant in at least 50% of the data-sets) in the human gut samples. Our results show substantial improvements over previous studies, such as adjusting the over-estimated abundance for Bacteroides species for human gut samples, by providing a new reference-based strategy for metagenomic sample comparisons. GRAMMy can be used flexibly with many read assignment tools (mapping, alignment or composition-based) even with low-sensitivity mapping results from huge short-read datasets. It will be increasingly useful as an accurate and robust tool for abundance estimation with the growing size of read sets and the expanding database of reference genomes.

  19. Comparative metagenome analysis of an Alaskan glacier.

    PubMed

    Choudhari, Sulbha; Lohia, Ruchi; Grigoriev, Andrey

    2014-04-01

    The temperature in the Arctic region has been increasing in the recent past accompanied by melting of its glaciers. We took a snapshot of the current microbial inhabitation of an Alaskan glacier (which can be considered as one of the simplest possible ecosystems) by using metagenomic sequencing of 16S rRNA recovered from ice/snow samples. Somewhat contrary to our expectations and earlier estimates, a rich and diverse microbial population of more than 2,500 species was revealed including several species of Archaea that has been identified for the first time in the glaciers of the Northern hemisphere. The most prominent bacterial groups found were Proteobacteria, Bacteroidetes, and Firmicutes. Firmicutes were not reported in large numbers in a previously studied Alpine glacier but were dominant in an Antarctic subglacial lake. Representatives of Cyanobacteria, Actinobacteria and Planctomycetes were among the most numerous, likely reflecting the dependence of the ecosystem on the energy obtained through photosynthesis and close links with the microbial community of the soil. Principal component analysis (PCA) of nucleotide word frequency revealed distinct sequence clusters for different taxonomic groups in the Alaskan glacier community and separate clusters for the glacial communities from other regions of the world. Comparative analysis of the community composition and bacterial diversity present in the Byron glacier in Alaska with other environments showed larger overlap with an Arctic soil than with a high Arctic lake, indicating patterns of community exchange and suggesting that these bacteria may play an important role in soil development during glacial retreat.

  20. Metagenomic analysis of phosphorus removing sludgecommunities

    SciTech Connect

    Garcia Martin, Hector; Ivanova, Natalia; Kunin, Victor; Warnecke,Falk; Barry, Kerrie; McHardy, Alice C.; Yeates, Christine; He, Shaomei; Salamov, Asaf; Szeto, Ernest; Dalin, Eileen; Putnam, Nik; Shapiro, HarrisJ.; Pangilinan, Jasmyn L.; Rigoutsos, Isidore; Kyrpides, Nikos C.; Blackall, Linda Louise; McMahon, Katherine D.; Hugenholtz, Philip

    2006-02-01

    Enhanced Biological Phosphorus Removal (EBPR) is not wellunderstood at the metabolic level despite being one of the best-studiedmicrobially-mediated industrial processes due to its ecological andeconomic relevance. Here we present a metagenomic analysis of twolab-scale EBPR sludges dominated by the uncultured bacterium, "CandidatusAccumulibacter phosphatis." This analysis resolves several controversiesin EBPR metabolic models and provides hypotheses explaining the dominanceof A. phosphatis in this habitat, its lifestyle outside EBPR and probablecultivation requirements. Comparison of the same species from differentEBPR sludges highlights recent evolutionary dynamics in the A. phosphatisgenome that could be linked to mechanisms for environmental adaptation.In spite of an apparent lack of phylogenetic overlap in the flankingcommunities of the two sludges studied, common functional themes werefound, at least one of them complementary to the inferred metabolism ofthe dominant organism. The present study provides a much-needed blueprintfor a systems-level understanding of EBPR and illustrates thatmetagenomics enables detailed, often novel, insights into evenwell-studied biological systems.

  1. Application of metagenomics in understanding oral health and disease

    PubMed Central

    Xu, Ping; Gunsolley, John

    2014-01-01

    Oral diseases including periodontal disease and caries are some of the most prevalent infectious diseases in humans. Different microbial species cohabitate and form a polymicrobial biofilm called dental plaque in the oral cavity. Metagenomics using next generation sequencing technologies has produced bacterial profiles and genomic profiles to study the relationships between microbial diversity, genetic variation, and oral diseases. Several oral metagenomic studies have examined the oral microbiome of periodontal disease and caries. Gene annotations in these studies support the association of specific genes or metabolic pathways with oral health and with specific diseases. The roles of pathogenic species and functions of specific genes in oral disease development have been recognized by metagenomic analysis. A model is proposed in which three levels of interactions occur in the oral microbiome that determines oral health or disease. PMID:24642489

  2. Is metagenomics resolving identification of functions in microbial communities?

    PubMed

    Chistoserdova, Ludmila

    2014-01-01

    We are coming up on the tenth anniversary of the broad use of the method involving whole metagenome shotgun sequencing, referred to as metagenomics. The application of this approach has definitely revolutionized microbiology and the related fields, including the realization of the importance of the human microbiome. As such, metagenomics has already provided a novel outlook on the complexity and dynamics of microbial communities that are an important part of the biosphere of the planet. Accumulation of massive amounts of sequence data also caused a surge in the development of bioinformatics tools specially designed to provide pipelines for data analysis and visualization. However, a critical outlook into the field is required to appreciate what could be and what has currently been gained from the massive sequence databases that are being generated with ever-increasing speed.

  3. Recovering complete and draft population genomes from metagenome datasets

    SciTech Connect

    Sangwan, Naseer; Xia, Fangfang; Gilbert, Jack A.

    2016-03-08

    Assembly of metagenomic sequence data into microbial genomes is of fundamental value to improving our understanding of microbial ecology and metabolism by elucidating the functional potential of hard-to-culture microorganisms. Here, we provide a synthesis of available methods to bin metagenomic contigs into species-level groups and highlight how genetic diversity, sequencing depth, and coverage influence binning success. Despite the computational cost on application to deeply sequenced complex metagenomes (e.g., soil), covarying patterns of contig coverage across multiple datasets significantly improves the binning process. We also discuss and compare current genome validation methods and reveal how these methods tackle the problem of chimeric genome bins i.e., sequences from multiple species. Finally, we explore how population genome assembly can be used to uncover biogeographic trends and to characterize the effect of in situ functional constraints on the genome-wide evolution.

  4. Recovering complete and draft population genomes from metagenome datasets

    DOE PAGES

    Sangwan, Naseer; Xia, Fangfang; Gilbert, Jack A.

    2016-03-08

    Assembly of metagenomic sequence data into microbial genomes is of fundamental value to improving our understanding of microbial ecology and metabolism by elucidating the functional potential of hard-to-culture microorganisms. Here, we provide a synthesis of available methods to bin metagenomic contigs into species-level groups and highlight how genetic diversity, sequencing depth, and coverage influence binning success. Despite the computational cost on application to deeply sequenced complex metagenomes (e.g., soil), covarying patterns of contig coverage across multiple datasets significantly improves the binning process. We also discuss and compare current genome validation methods and reveal how these methods tackle the problem ofmore » chimeric genome bins i.e., sequences from multiple species. Finally, we explore how population genome assembly can be used to uncover biogeographic trends and to characterize the effect of in situ functional constraints on the genome-wide evolution.« less

  5. New dimensions of the virus world discovered through metagenomics

    PubMed Central

    Kristensen, David M.; Mushegian, Arcady R.; Dolja, Valerian V.; Koonin, Eugene V.

    2012-01-01

    Metagenomic analysis of viruses suggests novel patterns of evolution, changes the existing ideas of the composition of the virus world and reveals novel groups of viruses and virus-like agents. The gene composition of the marine DNA virome is dramatically different from that of known bacteriophages. The virome is dominated by rare genes, many of which might be contained within virus-like entities such as gene transfer agents. Analysis of marine metagenomes thought to consist mostly of bacterial genes revealed a variety of sequences homologous to conserved genes of eukaryotic nucleocytoplasmic large DNA viruses, resulting in the discovery of diverse members of previously undersampled groups and suggesting the existence of new classes of virus-like agents. Unexpectedly, metagenomics of marine RNA viruses showed that representatives of only one superfamily of eukaryotic viruses, the picorna-like viruses, dominate the RNA virome. PMID:19942437

  6. Recovery of a Medieval Brucella melitensis Genome Using Shotgun Metagenomics

    PubMed Central

    Kay, Gemma L.; Sergeant, Martin J.; Giuffra, Valentina; Bandiera, Pasquale; Milanese, Marco; Bramanti, Barbara

    2014-01-01

    ABSTRACT Shotgun metagenomics provides a powerful assumption-free approach to the recovery of pathogen genomes from contemporary and historical material. We sequenced the metagenome of a calcified nodule from the skeleton of a 14th-century middle-aged male excavated from the medieval Sardinian settlement of Geridu. We obtained 6.5-fold coverage of a Brucella melitensis genome. Sequence reads from this genome showed signatures typical of ancient or aged DNA. Despite the relatively low coverage, we were able to use information from single-nucleotide polymorphisms to place the medieval pathogen genome within a clade of B. melitensis strains that included the well-studied Ether strain and two other recent Italian isolates. We confirmed this placement using information from deletions and IS711 insertions. We conclude that metagenomics stands ready to document past and present infections, shedding light on the emergence, evolution, and spread of microbial pathogens. PMID:25028426

  7. Opaque rock fragments

    SciTech Connect

    Abhijit, B.; Molinaroli, E.; Olsen, J.

    1987-05-01

    The authors describe a new, rare, but petrogenetically significant variety of rock fragments from Holocene detrital sediments. Approximately 50% of the opaque heavy mineral concentrates from Holocene siliciclastic sands are polymineralic-Fe-Ti oxide particles, i.e., they are opaque rock fragments. About 40% to 70% of these rock fragments show intergrowth of hm + il, mt + il, and mt + hm +/- il. Modal analysis of 23,282 opaque particles in 117 polished thin sections of granitic and metamorphic parent rocks and their daughter sands from semi-arid and humid climates show the following relative abundances. The data show that opaque rock fragments are more common in sands from igneous source rocks and that hm + il fragments are more durable. They assume that equilibrium conditions existed in parent rocks during the growth of these paired minerals, and that the Ti/Fe ratio did not change during oxidation of mt to hm. Geothermometric determinations using electron probe microanalysis of opaque rock fragments in sand samples from Lake Erie and the Adriatic Sea suggest that these rock fragments may have equilibrated at approximately 900/sup 0/ and 525/sup 0/C, respectively.

  8. THE ROLE OF WATERSHED CLASSIFICATION IN DIAGNOSING CAUSES OF BIOLOGICAL IMPAIRMENT

    EPA Science Inventory

    We compared classification schemes based on watershed storage (wetland + lake area/watershed area) and forest fragmention with a gewographically-based classification scheme for two case studies involving 1) Lake Superior tributaries and 2) watersheds of riverine coastal wetlands ...

  9. Fragmentation of monoclonal antibodies

    PubMed Central

    Vlasak, Josef

    2011-01-01

    Fragmentation is a degradation pathway ubiquitously observed in proteins despite the remarkable stability of peptide bond; proteins differ only by how much and where cleavage occurs. The goal of this review is to summarize reports regarding the non-enzymatic fragmentation of the peptide backbone of monoclonal antibodies (mAbs). The sites in the polypeptide chain susceptible to fragmentation are determined by a multitude of factors. Insights are provided on the intimate chemical mechanisms that can make some bonds prone to cleavage due to the presence of specific side-chains. In addition to primary structure, the secondary, tertiary and quaternary structures have a significant impact in modulating the distribution of cleavage sites by altering local flexibility, accessibility to solvent or bringing in close proximity side chains that are remote in sequence. This review focuses on cleavage sites observed in the constant regions of mAbs, with special emphasis on hinge fragmentation. The mechanisms responsible for backbone cleavage are strongly dependent on pH and can be catalyzed by metals or radicals. The distribution of cleavage sites are different under acidic compared to basic conditions, with fragmentation rates exhibiting a minimum in the pH range 5–6; therefore, the overall fragmentation pattern observed for a mAb is a complex result of structural and solvent conditions. A critical review of the techniques used to monitor fragmentation is also presented; usually a compromise has to be made between a highly sensitive method with good fragment separation and the capability to identify the cleavage site. The effect of fragmentation on the function of a mAb must be evaluated on a case-by-case basis depending on whether cleavage sites are observed in the variable or constant regions, and on the mechanism of action of the molecule. PMID:21487244

  10. Universality in Fragmentation

    NASA Astrophysics Data System (ADS)

    Åström, J. A.; Holian, B. L.; Timonen, J.

    2000-04-01

    Fragmentation of a two-dimensional brittle solid by impact and ``explosion,'' and a fluid by ``explosion'' are all shown to become critical. The critical points appear at a nonzero impact velocity, and at infinite explosion duration, respectively. Within the critical regimes, the fragment-size distributions satisfy a scaling form qualitatively similar to that of the cluster-size distribution of percolation, but they belong to another universality class. Energy balance arguments give a correlation length exponent that is exactly one-half of its percolation value. A single crack dominates fragmentation in the slow-fracture limit, as expected.

  11. DNA sequence analysis using hierarchical ART-based classification networks

    SciTech Connect

    LeBlanc, C.; Hruska, S.I.; Katholi, C.R.; Unnasch, T.R.

    1994-12-31

    Adaptive resonance theory (ART) describes a class of artificial neural network architectures that act as classification tools which self-organize, work in real-time, and require no retraining to classify novel sequences. We have adapted ART networks to provide support to scientists attempting to categorize tandem repeat DNA fragments from Onchocerca volvulus. In this approach, sequences of DNA fragments are presented to multiple ART-based networks which are linked together into two (or more) tiers; the first provides coarse sequence classification while the sub- sequent tiers refine the classifications as needed. The overall rating of the resulting classification of fragments is measured using statistical techniques based on those introduced to validate results from traditional phylogenetic analysis. Tests of the Hierarchical ART-based Classification Network, or HABclass network, indicate its value as a fast, easy-to-use classification tool which adapts to new data without retraining on previously classified data.

  12. Whither or wither geomicrobiology in the era of 'community metagenomics'

    USGS Publications Warehouse

    Oremland, R.S.; Capone, D.G.; Stolz, J.F.; Fuhrman, J.

    2005-01-01

    Molecular techniques are valuable tools that can improve our understanding of the structure of microbial communities. They provide the ability to probe for life in all niches of the biosphere, perhaps even supplanting the need to cultivate microorganisms or to conduct ecophysiological investigations. However, an overemphasis and strict dependence on such large information-driven endeavours as environmental metagenomics could overwhelm the field, to the detriment of microbial ecology. We now call for more balanced, hypothesis-driven research efforts that couple metagenomics with classic approaches.

  13. Evaluating the Quantitative Capabilities of Metagenomic Analysis Software.

    PubMed

    Kerepesi, Csaba; Grolmusz, Vince

    2016-05-01

    DNA sequencing technologies are applied widely and frequently today to describe metagenomes, i.e., microbial communities in environmental or clinical samples, without the need for culturing them. These technologies usually return short (100-300 base-pairs long) DNA reads, and these reads are processed by metagenomic analysis software that assign phylogenetic composition-information to the dataset. Here we evaluate three metagenomic analysis software (AmphoraNet--a webserver implementation of AMPHORA2--, MG-RAST, and MEGAN5) for their capabilities of assigning quantitative phylogenetic information for the data, describing the frequency of appearance of the microorganisms of the same taxa in the sample. The difficulties of the task arise from the fact that longer genomes produce more reads from the same organism than shorter genomes, and some software assign higher frequencies to species with longer genomes than to those with shorter ones. This phenomenon is called the "genome length bias." Dozens of complex artificial metagenome benchmarks can be found in the literature. Because of the complexity of those benchmarks, it is usually difficult to judge the resistance of a metagenomic software to this "genome length bias." Therefore, we have made a simple benchmark for the evaluation of the "taxon-counting" in a metagenomic sample: we have taken the same number of copies of three full bacterial genomes of different lengths, break them up randomly to short reads of average length of 150 bp, and mixed the reads, creating our simple benchmark. Because of its simplicity, the benchmark is not supposed to serve as a mock metagenome, but if a software fails on that simple task, it will surely fail on most real metagenomes. We applied three software for the benchmark. The ideal quantitative solution would assign the same proportion to the three bacterial taxa. We have found that AMPHORA2/AmphoraNet gave the most accurate results and the other two software were under

  14. Survey of (Meta)genomic Approaches for Understanding Microbial Community Dynamics.

    PubMed

    Sharma, Anukriti; Lal, Rup

    2017-03-01

    Advancement in the next generation sequencing technologies has led to evolution of the field of genomics and metagenomics in a slim duration with nominal cost at precipitous higher rate. While metagenomics and genomics can be separately used to reveal the culture-independent and culture-based microbial evolution, respectively, (meta)genomics together can be used to demonstrate results at population level revealing in-depth complex community interactions for specific ecotypes. The field of metagenomics which started with answering "who is out there?" based on 16S rRNA gene has evolved immensely with the precise organismal reconstruction at species/strain level from the deeply covered metagenome data outweighing the need to isolate bacteria of which 99% are de facto non-cultivable. In this review we have underlined the appeal of metagenomic-derived genomes in providing insights into the evolutionary patterns, growth dynamics, genome/gene-specific sweeps, and durability of environmental pressures. We have demonstrated the use of culture-based genomics and environmental shotgun metagenome data together to elucidate environment specific genome modulations via metagenomic recruitments in terms of gene loss/gain, accessory and core-genome extent. We further illustrated the benefit of (meta)genomics in the understanding of infectious diseases by deducing the relationship between human microbiota and clinical microbiology. This review summarizes the technological advances in the (meta)genomic strategies using the genome and metagenome datasets together to increase the resolution of microbial population studies.

  15. Fragmentation in Biaxial Tension

    SciTech Connect

    Campbell, G H; Archbold, G C; Hurricane, O A; Miller, P L

    2006-06-13

    We have carried out an experiment that places a ductile stainless steel in a state of biaxial tension at a high rate of strain. The loading of the ductile metal spherical cap is performed by the detonation of a high explosive layer with a conforming geometry to expand the metal radially outwards. Simulations of the loading and expansion of the metal predict strain rates that compare well with experimental observations. A high percentage of the HE loaded material was recovered through a soft capture process and characterization of the recovered fragments provided high quality data, including uniform strain prior to failure and fragment size. These data were used with a modified fragmentation model to determine a fragmentation energy.

  16. Diversity of virophages in metagenomic data sets.

    PubMed

    Zhou, Jinglie; Zhang, Weijia; Yan, Shuling; Xiao, Jinzhou; Zhang, Yuanyuan; Li, Bailin; Pan, Yingjie; Wang, Yongjie

    2013-04-01

    Virophages, e.g., Sputnik, Mavirus, and Organic Lake virophage (OLV), are unusual parasites of giant double-stranded DNA (dsDNA) viruses, yet little is known about their diversity. Here, we describe the global distribution, abundance, and genetic diversity of virophages based on analyzing and mapping comprehensive metagenomic databases. The results reveal a distinct abundance and worldwide distribution of virophages, involving almost all geographical zones and a variety of unique environments. These environments ranged from deep ocean to inland, iced to hydrothermal lakes, and human gut- to animal-associated habitats. Four complete virophage genomic sequences (Yellowstone Lake virophages [YSLVs]) were obtained, as was one nearly complete sequence (Ace Lake Mavirus [ALM]). The genomes obtained were 27,849 bp long with 26 predicted open reading frames (ORFs) (YSLV1), 23,184 bp with 21 ORFs (YSLV2), 27,050 bp with 23 ORFs (YSLV3), 28,306 bp with 34 ORFs (YSLV4), and 17,767 bp with 22 ORFs (ALM). The homologous counterparts of five genes, including putative FtsK-HerA family DNA packaging ATPase and genes encoding DNA helicase/primase, cysteine protease, major capsid protein (MCP), and minor capsid protein (mCP), were present in all virophages studied thus far. They also shared a conserved gene cluster comprising the two core genes of MCP and mCP. Comparative genomic and phylogenetic analyses showed that YSLVs, having a closer relationship to each other than to the other virophages, were more closely related to OLV than to Sputnik but distantly related to Mavirus and ALM. These findings indicate that virophages appear to be widespread and genetically diverse, with at least 3 major lineages.

  17. Diversity of Virophages in Metagenomic Data Sets

    PubMed Central

    Zhou, Jinglie; Zhang, Weijia; Yan, Shuling; Xiao, Jinzhou; Zhang, Yuanyuan; Li, Bailin; Pan, Yingjie

    2013-01-01

    Virophages, e.g., Sputnik, Mavirus, and Organic Lake virophage (OLV), are unusual parasites of giant double-stranded DNA (dsDNA) viruses, yet little is known about their diversity. Here, we describe the global distribution, abundance, and genetic diversity of virophages based on analyzing and mapping comprehensive metagenomic databases. The results reveal a distinct abundance and worldwide distribution of virophages, involving almost all geographical zones and a variety of unique environments. These environments ranged from deep ocean to inland, iced to hydrothermal lakes, and human gut- to animal-associated habitats. Four complete virophage genomic sequences (Yellowstone Lake virophages [YSLVs]) were obtained, as was one nearly complete sequence (Ace Lake Mavirus [ALM]). The genomes obtained were 27,849 bp long with 26 predicted open reading frames (ORFs) (YSLV1), 23,184 bp with 21 ORFs (YSLV2), 27,050 bp with 23 ORFs (YSLV3), 28,306 bp with 34 ORFs (YSLV4), and 17,767 bp with 22 ORFs (ALM). The homologous counterparts of five genes, including putative FtsK-HerA family DNA packaging ATPase and genes encoding DNA helicase/primase, cysteine protease, major capsid protein (MCP), and minor capsid protein (mCP), were present in all virophages studied thus far. They also shared a conserved gene cluster comprising the two core genes of MCP and mCP. Comparative genomic and phylogenetic analyses showed that YSLVs, having a closer relationship to each other than to the other virophages, were more closely related to OLV than to Sputnik but distantly related to Mavirus and ALM. These findings indicate that virophages appear to be widespread and genetically diverse, with at least 3 major lineages. PMID:23408616

  18. Metagenomic analysis of microbial community of an Amazonian geothermal spring in Peru.

    PubMed

    Paul, Sujay; Cortez, Yolanda; Vera, Nadia; Villena, Gretty K; Gutiérrez-Correa, Marcel

    2016-09-01

    Aguas Calientes (AC) is an isolated geothermal spring located deep into the Amazon rainforest (7°21'12″ S, 75°00'54″ W) of Peru. This geothermal spring is slightly acidic (pH 5.0-7.0) in nature, with temperatures varying from 45 to 90 °C and continually fed by plant litter, resulting in a relatively high degree of total organic content (TOC). Pooled water sample was analyzed at 16S rRNA V3-V4 hypervariable region by amplicon metagenome sequencing on Illumina HiSeq platform. A total of 2,976,534 paired ends reads were generated which were assigned into 5434 numbers of OTUs. All the resulting 16S rRNA fragments were then classified into 58 bacterial phyla and 2 archaeal phyla. Proteobacteria (88.06%) was found to be the highest represented phyla followed by Thermi (6.43%), Firmicutes (3.41%) and Aquificae (1.10%), respectively. Crenarchaeota and Euryarchaeota were the only 2 archaeal phyla detected in this study with low abundance. Metagenomic sequences were deposited to SRA database which is available at NCBI with accession number SRX1809286. Functional categorization of the assigned OTUs was performed using PICRUSt tool. In COG analysis "Amino acid transport and metabolism" (8.5%) was found to be the highest represented category whereas among predicted KEGG pathways "Metabolism" (50.6%) was the most abundant. This is the first report of a high resolution microbial phylogenetic profile of an Amazonian hot spring.

  19. Biochemical Diversity of Carboxyl Esterases and Lipases from Lake Arreo (Spain): a Metagenomic Approach

    PubMed Central

    Martínez-Martínez, Mónica; Alcaide, María; Tchigvintsev, Anatoli; Reva, Oleg; Polaina, Julio; Bargiela, Rafael; Guazzaroni, María-Eugenia; Chicote, Álvaro; Canet, Albert; Valero, Francisco; Rico Eguizabal, Eugenio; Guerrero, María del Carmen; Yakunin, Alexander F.

    2013-01-01

    The esterases and lipases from the α/β hydrolase superfamily exhibit an enormous sequence diversity, fold plasticity, and activities. Here, we present the comprehensive sequence and biochemical analyses of seven distinct esterases and lipases from the metagenome of Lake Arreo, an evaporite karstic lake in Spain (42°46′N, 2°59′W; altitude, 655 m). Together with oligonucleotide usage patterns and BLASTP analysis, our study of esterases/lipases mined from Lake Arreo suggests that its sediment contains moderately halophilic and cold-adapted proteobacteria containing DNA fragments of distantly related plasmids or chromosomal genomic islands of plasmid and phage origins. This metagenome encodes esterases/lipases with broad substrate profiles (tested over a set of 101 structurally diverse esters) and habitat-specific characteristics, as they exhibit maximal activity at alkaline pH (8.0 to 8.5) and temperature of 16 to 40°C, and they are stimulated (1.5 to 2.2 times) by chloride ions (0.1 to 1.2 M), reflecting an adaptation to environmental conditions. Our work provides further insights into the potential significance of the Lake Arreo esterases/lipases for biotechnology processes (i.e., production of enantiomers and sugar esters), because these enzymes are salt tolerant and are active at low temperatures and against a broad range of substrates. As an example, the ability of a single protein to hydrolyze triacylglycerols, (non)halogenated alkyl and aryl esters, cinnamoyl and carbohydrate esters, lactones, and chiral epoxides to a similar extent was demonstrated. PMID:23542620

  20. The Systemic Imprint of Growth and Its Uses in Ecological (Meta)Genomics

    PubMed Central

    Vieira-Silva, Sara; Rocha, Eduardo P. C.

    2010-01-01

    Microbial minimal generation times range from a few minutes to several weeks. They are evolutionarily determined by variables such as environment stability, nutrient availability, and community diversity. Selection for fast growth adaptively imprints genomes, resulting in gene amplification, adapted chromosomal organization, and biased codon usage. We found that these growth-related traits in 214 species of bacteria and archaea are highly correlated, suggesting they all result from growth optimization. While modeling their association with maximal growth rates in view of synthetic biology applications, we observed that codon usage biases are better correlates of growth rates than any other trait, including rRNA copy number. Systematic deviations to our model reveal two distinct evolutionary processes. First, genome organization shows more evolutionary inertia than growth rates. This results in over-representation of growth-related traits in fast degrading genomes. Second, selection for these traits depends on optimal growth temperature: for similar generation times purifying selection is stronger in psychrophiles, intermediate in mesophiles, and lower in thermophiles. Using this information, we created a predictor of maximal growth rate adapted to small genome fragments. We applied it to three metagenomic environmental samples to show that a transiently rich environment, as the human gut, selects for fast-growers, that a toxic environment, as the acid mine biofilm, selects for low growth rates, whereas a diverse environment, like the soil, shows all ranges of growth rates. We also demonstrate that microbial colonizers of babies gut grow faster than stabilized human adults gut communities. In conclusion, we show that one can predict maximal growth rates from sequence data alone, and we propose that such information can be used to facilitate the manipulation of generation times. Our predictor allows inferring growth rates in the vast majority of uncultivable

  1. Assessment of REPLI-g Multiple Displacement Whole Genome Amplification (WGA) Techniques for Metagenomic Applications

    PubMed Central

    Ahsanuddin, Sofia; Afshinnekoo, Ebrahim; Gandara, Jorge; Hakyemezoğlu, Mustafa; Bezdan, Daniela; Minot, Samuel; Greenfield, Nick; Mason, Christopher E.

    2017-01-01

    Amplification of minute quantities of DNA is a fundamental challenge in low-biomass metagenomic and microbiome studies because of potential biases in coverage, guanine-cytosine (GC) content, and altered species abundances. Whole genome amplification (WGA), although widely used, is notorious for introducing artifact sequences, either by amplifying laboratory contaminants or by nonrandom amplification of a sample’s DNA. In this study, we investigate the effect of REPLI-g multiple displacement amplification (MDA; Qiagen, Valencia, CA, USA) on sequencing data quality and species abundance detection in 8 paired metagenomic samples and 1 titrated, mixed control sample. We extracted and sequenced genomic DNA (gDNA) from 8 environmental samples and compared the quality of the sequencing data for the MDA and their corresponding non-MDA samples. The degree of REPLI-g MDA bias was evaluated by sequence metrics, species composition, and cross-validating observed species abundance and species diversity estimates using the One Codex and MetaPhlAn taxonomic classification tools. Here, we provide evidence of the overall efficacy of REPLI-g MDA on retaining sequencing data quality and species abundance measurements while providing increased yields of high-fidelity DNA. We find that species abundance estimates are largely consistent across samples, even with REPLI-g amplification, as demonstrated by the Spearman’s rank order coefficient (R2 > 0.8). However, REPLI-g MDA often produced fewer classified reads at the species, genera, and family level, resulting in decreased species diversity. We also observed some areas with the PCR “jackpot effect,” with varying input DNA values for the Metagenomics Research Group (MGRG) controls at specific genomic loci. We visualize this effect in whole genome coverage plots and with sequence composition analyses and note these caveats of the MDA method. Despite overall concordance of species abundance between the amplified and unamplified

  2. DIME: a novel framework for de novo metagenomic sequence assembly.

    PubMed

    Guo, Xuan; Yu, Ning; Ding, Xiaojun; Wang, Jianxin; Pan, Yi

    2015-02-01

    The recently developed next generation sequencing platforms not only decrease the cost for metagenomics data analysis, but also greatly enlarge the size of metagenomic sequence datasets. A common bottleneck of available assemblers is that the trade-off between the noise of the resulting contigs and the gain in sequence length for better annotation has not been attended enough for large-scale sequencing projects, especially for the datasets with low coverage and a large number of nonoverlapping contigs. To address this limitation and promote both accuracy and efficiency, we develop a novel metagenomic sequence assembly framework, DIME, by taking the DIvide, conquer, and MErge strategies. In addition, we give two MapReduce implementations of DIME, DIME-cap3 and DIME-genovo, on Apache Hadoop platform. For a systematic comparison of the performance of the assembly tasks, we tested DIME and five other popular short read assembly programs, Cap3, Genovo, MetaVelvet, SOAPdenovo, and SPAdes on four synthetic and three real metagenomic sequence datasets with various reads from fifty thousand to a couple million in size. The experimental results demonstrate that our method not only partitions the sequence reads with an extremely high accuracy, but also reconstructs more bases, generates higher quality assembled consensus, and yields higher assembly scores, including corrected N50 and BLAST-score-per-base, than other tools with a nearly theoretical speed-up. Results indicate that DIME offers great improvement in assembly across a range of sequence abundances and thus is robust to decreasing coverage.

  3. Metagenomics and other Methods for Measuring Antibiotic Resistance in Agroecosystems

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Background: There is broad concern regarding antibiotic resistance on farms and in fields, however there is no standard method for defining or measuring antibiotic resistance in environmental samples. Methods: We used metagenomic, culture-based, and molecular methods to characterize the amount, t...

  4. MetaGenomic Assembly by Merging (MeGAMerge)

    SciTech Connect

    Scholz Chien-Chi Lo, Matthew B.

    2015-08-03

    "MetaGenomic Assembly by Merging" (MeGAMerge)Is a novel method of merging of multiple genomic assembly or long read data sources for assembly by use of internal trimming/filtering of data, followed by use of two 3rd party tools to merge data by overlap based assembly.

  5. Metagenomic Analyses of Drinking Water Receiving Different Disinfection Treatments

    EPA Science Inventory

    A metagenome-based approach was utilized for assessing the taxonomic affiliation and function potential of microbial populations in free chlorine (CHL) and monochloramine (CHM) treated drinking water (DW). A total of 1,024, 242 (averaging 544 bp) and 849, 349 (averaging 554 bp) ...

  6. Metagenomic gene annotation by a homology-independent approach

    SciTech Connect

    Froula, Jeff; Zhang, Tao; Salmeen, Annette; Hess, Matthias; Kerfeld, Cheryl A.; Wang, Zhong; Du, Changbin

    2011-06-02

    Fully understanding the genetic potential of a microbial community requires functional annotation of all the genes it encodes. The recently developed deep metagenome sequencing approach has enabled rapid identification of millions of genes from a complex microbial community without cultivation. Current homology-based gene annotation fails to detect distantly-related or structural homologs. Furthermore, homology searches with millions of genes are very computational intensive. To overcome these limitations, we developed rhModeller, a homology-independent software pipeline to efficiently annotate genes from metagenomic sequencing projects. Using cellulases and carbonic anhydrases as two independent test cases, we demonstrated that rhModeller is much faster than HMMER but with comparable accuracy, at 94.5percent and 99.9percent accuracy, respectively. More importantly, rhModeller has the ability to detect novel proteins that do not share significant homology to any known protein families. As {approx}50percent of the 2 million genes derived from the cow rumen metagenome failed to be annotated based on sequence homology, we tested whether rhModeller could be used to annotate these genes. Preliminary results suggest that rhModeller is robust in the presence of missense and frameshift mutations, two common errors in metagenomic genes. Applying the pipeline to the cow rumen genes identified 4,990 novel cellulases candidates and 8,196 novel carbonic anhydrase candidates.In summary, we expect rhModeller to dramatically increase the speed and quality of metagnomic gene annotation.

  7. Marine Metagenome as A Resource for Novel Enzymes.

    PubMed

    Alma'abadi, Amani D; Gojobori, Takashi; Mineta, Katsuhiko

    2015-10-01

    More than 99% of identified prokaryotes, including many from the marine environment, cannot be cultured in the laboratory. This lack of capability restricts our knowledge of microbial genetics and community ecology. Metagenomics, the culture-independent cloning of environmental DNAs that are isolated directly from an environmental sample, has already provided a wealth of information about the uncultured microbial world. It has also facilitated the discovery of novel biocatalysts by allowing researchers to probe directly into a huge diversity of enzymes within natural microbial communities. Recent advances in these studies have led to a great interest in recruiting microbial enzymes for the development of environmentally-friendly industry. Although the metagenomics approach has many limitations, it is expected to provide not only scientific insights but also economic benefits, especially in industry. This review highlights the importance of metagenomics in mining microbial lipases, as an example, by using high-throughput techniques. In addition, we discuss challenges in the metagenomics as an important part of bioinformatics analysis in big data.

  8. Integrated metagenomic and metaproteomic analyses of marine biofilm communities

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Metagenomic and metaproteomic analyses were utilized to begin to understand the role varying environments play on the composition and function of complex air-water interface biofilms sampled from the hulls of two ships that were deployed in different geographic waters. Prokaryotic community analyses...

  9. Evaluating techniques for metagenome annotation using simulated sequence data

    PubMed Central

    Randle-Boggis, Richard J.; Helgason, Thorunn; Sapp, Melanie; Ashton, Peter D.

    2016-01-01

    The advent of next-generation sequencing has allowed huge amounts of DNA sequence data to be produced, advancing the capabilities of microbial ecosystem studies. The current challenge is to identify from which microorganisms and genes the DNA originated. Several tools and databases are available for annotating DNA sequences. The tools, databases and parameters used can have a significant impact on the results: naïve choice of these factors can result in a false representation of community composition and function. We use a simulated metagenome to show how different parameters affect annotation accuracy by evaluating the sequence annotation performances of MEGAN, MG-RAST, One Codex and Megablast. This simulated metagenome allowed the recovery of known organism and function abundances to be quantitatively evaluated, which is not possible for environmental metagenomes. The performance of each program and database varied, e.g. One Codex correctly annotated many sequences at the genus level, whereas MG-RAST RefSeq produced many false positive annotations. This effect decreased as the taxonomic level investigated increased. Selecting more stringent parameters decreases the annotation sensitivity, but increases precision. Ultimately, there is a trade-off between taxonomic resolution and annotation accuracy. These results should be considered when annotating metagenomes and interpreting results from previous studies. PMID:27162180

  10. Aquatic metagenomes implicate Thaumarchaeota in global cobalamin production

    PubMed Central

    Doxey, Andrew C; Kurtz, Daniel A; Lynch, Michael DJ; Sauder, Laura A; Neufeld, Josh D

    2015-01-01

    Cobalamin (vitamin B12) is a complex metabolite and essential cofactor required by many branches of life, including most eukaryotic phytoplankton. Algae and other cobalamin auxotrophs rely on environmental cobalamin supplied from a relatively small set of cobalamin-producing prokaryotic taxa. Although several Bacteria have been implicated in cobalamin biosynthesis and associated with algal symbiosis, the involvement of Archaea in cobalamin production is poorly understood, especially with respect to the Thaumarchaeota. Based on the detection of cobalamin synthesis genes in available thaumarchaeotal genomes, we hypothesized that Thaumarchaeota, which are ubiquitous and abundant in aquatic environments, have an important role in cobalamin biosynthesis within global aquatic ecosystems. To test this hypothesis, we examined cobalamin synthesis genes across sequenced thaumarchaeotal genomes and 430 metagenomes from a diverse range of marine, freshwater and hypersaline environments. Our analysis demonstrates that all available thaumarchaeotal genomes possess cobalamin synthesis genes, predominantly from the anaerobic pathway, suggesting widespread genetic capacity for cobalamin synthesis. Furthermore, although bacterial cobalamin genes dominated most surface marine metagenomes, thaumarchaeotal cobalamin genes dominated metagenomes from polar marine environments, increased with depth in marine water columns, and displayed seasonality, with increased winter abundance observed in time-series datasets (e.g., L4 surface water in the English Channel). Our results also suggest niche partitioning between thaumarchaeotal and cyanobacterial ribosomal and cobalamin synthesis genes across all metagenomic datasets analyzed. These results provide strong evidence for specific biogeographical distributions of thaumarchaeotal cobalamin genes, expanding our understanding of the global biogeochemical roles played by Thaumarchaeota in aquatic environments. PMID:25126756

  11. Aquatic metagenomes implicate Thaumarchaeota in global cobalamin production.

    PubMed

    Doxey, Andrew C; Kurtz, Daniel A; Lynch, Michael D J; Sauder, Laura A; Neufeld, Josh D

    2015-02-01

    Cobalamin (vitamin B12) is a complex metabolite and essential cofactor required by many branches of life, including most eukaryotic phytoplankton. Algae and other cobalamin auxotrophs rely on environmental cobalamin supplied from a relatively small set of cobalamin-producing prokaryotic taxa. Although several Bacteria have been implicated in cobalamin biosynthesis and associated with algal symbiosis, the involvement of Archaea in cobalamin production is poorly understood, especially with respect to the Thaumarchaeota. Based on the detection of cobalamin synthesis genes in available thaumarchaeotal genomes, we hypothesized that Thaumarchaeota, which are ubiquitous and abundant in aquatic environments, have an important role in cobalamin biosynthesis within global aquatic ecosystems. To test this hypothesis, we examined cobalamin synthesis genes across sequenced thaumarchaeotal genomes and 430 metagenomes from a diverse range of marine, freshwater and hypersaline environments. Our analysis demonstrates that all available thaumarchaeotal genomes possess cobalamin synthesis genes, predominantly from the anaerobic pathway, suggesting widespread genetic capacity for cobalamin synthesis. Furthermore, although bacterial cobalamin genes dominated most surface marine metagenomes, thaumarchaeotal cobalamin genes dominated metagenomes from polar marine environments, increased with depth in marine water columns, and displayed seasonality, with increased winter abundance observed in time-series datasets (e.g., L4 surface water in the English Channel). Our results also suggest niche partitioning between thaumarchaeotal and cyanobacterial ribosomal and cobalamin synthesis genes across all metagenomic datasets analyzed. These results provide strong evidence for specific biogeographical distributions of thaumarchaeotal cobalamin genes, expanding our understanding of the global biogeochemical roles played by Thaumarchaeota in aquatic environments.

  12. Metagenomic 16S rDNA Illumina tags are a powerful alternative to amplicon sequencing to explore diversity and structure of microbial communities.

    PubMed

    Logares, Ramiro; Sunagawa, Shinichi; Salazar, Guillem; Cornejo-Castillo, Francisco M; Ferrera, Isabel; Sarmento, Hugo; Hingamp, Pascal; Ogata, Hiroyuki; de Vargas, Colomban; Lima-Mendez, Gipsi; Raes, Jeroen; Poulain, Julie; Jaillon, Olivier; Wincker, Patrick; Kandels-Lewis, Stefanie; Karsenti, Eric; Bork, Peer; Acinas, Silvia G

    2014-09-01

    Sequencing of 16S rDNA polymerase chain reaction (PCR) amplicons is the most common approach for investigating environmental prokaryotic diversity, despite the known biases introduced during PCR. Here we show that 16S rDNA fragments derived from Illumina-sequenced environmental metagenomes (mi tags) are a powerful alternative to 16S rDNA amplicons for investigating the taxonomic diversity and structure of prokaryotic communities. As part of the Tara Oceans global expedition, marine plankton was sampled in three locations, resulting in 29 subsamples for which metagenomes were produced by shotgun Illumina sequencing (ca. 700 Gb). For comparative analyses, a subset of samples was also selected for Roche-454 sequencing using both shotgun (m454 tags; 13 metagenomes, ca. 2.4 Gb) and 16S rDNA amplicon (454 tags; ca. 0.075 Gb) approaches. Our results indicate that by overcoming PCR biases related to amplification and primer mismatch, mi tags may provide more realistic estimates of community richness and evenness than amplicon 454 tags. In addition, mi tags can capture expected beta diversity patterns. Using mi tags is now economically feasible given the dramatic reduction in high-throughput sequencing costs, having the advantage of retrieving simultaneously both taxonomic (Bacteria, Archaea and Eukarya) and functional information from the same microbial community.

  13. gbtools: Interactive Visualization of Metagenome Bins in R

    PubMed Central

    Seah, Brandon K. B.; Gruber-Vodicka, Harald R.

    2015-01-01

    Improvements in DNA sequencing technology have increased the amount and quality of sequences that can be obtained from metagenomic samples, making it practical to extract individual microbial genomes from metagenomic assemblies (“binning”). However, while many tools and methods exist for unsupervised binning with various statistical algorithms, there are few options for visualizing the results, even though visualization is vital to exploratory data analysis. We have developed gbtools, a software package that allows users to visualize metagenomic assemblies by plotting coverage (sequencing depth) and GC values of contigs, and also to annotate the plots with taxonomic information. Different sets of annotations, including taxonomic assignments from conserved marker genes or SSU rRNA genes, can be imported simultaneously; users can choose which annotations to plot. Bins can be manually defined from plots, or be imported from third-party binning tools and overlaid onto plots, such that results from different methods can be compared side-by-side. gbtools reports summary statistics of bins including marker gene completeness, and allows the user to add or subtract bins with each other. We illustrate some of the functions available in gbtools with two examples: the metagenome of Olavius algarvensis, a marine oligochaete worm that has up to five bacterial symbionts, and the metagenome of a synthetic mock community comprising 64 bacterial and archaeal strains. We show how instances of poor automated binning, sequencer GC% bias, and variation between samples can be quickly diagnosed by visualization, and demonstrate how the results from different binning tools can be combined and refined to yield manually curated bins with higher completeness. gbtools is open-source and written in R. The software package, documentation, and example data are available freely online at https://github.com/kbseah/genome-bin-tools. PMID:26732662

  14. Metagenome Skimming of Insect Specimen Pools: Potential for Comparative Genomics

    PubMed Central

    Linard, Benjamin; Crampton-Platt, Alex; Gillett, Conrad P.D.T.; Timmermans, Martijn J.T.N.; Vogler, Alfried P.

    2015-01-01

    Metagenomic analyses are challenging in metazoans, but high-copy number and repeat regions can be assembled from low-coverage sequencing by “genome skimming,” which is applied here as a new way of characterizing metagenomes obtained in an ecological or taxonomic context. Illumina shotgun sequencing on two pools of Coleoptera (beetles) of approximately 200 species each were assembled into tens of thousands of scaffolds. Repeated low-coverage sequencing recovered similar scaffold sets consistently, although approximately 70% of scaffolds could not be identified against existing genome databases. Identifiable scaffolds included mitochondrial DNA, conserved sequences with hits to expressed sequence tag and protein databases, and known repeat elements of high and low complexity, including numerous copies of rRNA and histone genes. Assemblies of histones captured a diversity of gene order and primary sequence in Coleoptera. Scaffolds with similarity to multiple sites in available coleopteran genome sequences for Dendroctonus and Tribolium revealed high specificity of scaffolds to either of these genomes, in particular for high-copy number repeats. Numerous “clusters” of scaffolds mapped to the same genomic site revealed intra- and/or intergenomic variation within a metagenome pool. In addition to effect of taxonomic composition of the metagenomes, the number of mapped scaffolds also revealed structural differences between the two reference genomes, although the significance of this striking finding remains unclear. Finally, apparently exogenous sequences were recovered, including potential food plants, fungal pathogens, and bacterial symbionts. The “metagenome skimming” approach is useful for capturing the genomic diversity of poorly studied, species-rich lineages and opens new prospects in environmental genomics. PMID:25979752

  15. Diagnosis of Bacterial Bloodstream Infections: A 16S Metagenomics Approach

    PubMed Central

    Van Puyvelde, Sandra; De Block, Tessa; Maltha, Jessica; Palpouguini, Lompo; Tahita, Marc; Tinto, Halidou; Jacobs, Jan; Deborggraeve, Stijn

    2016-01-01

    Background Bacterial bloodstream infection (bBSI) is one of the leading causes of death in critically ill patients and accurate diagnosis is therefore crucial. We here report a 16S metagenomics approach for diagnosing and understanding bBSI. Methodology/Principal Findings The proof-of-concept was delivered in 75 children (median age 15 months) with severe febrile illness in Burkina Faso. Standard blood culture and malaria testing were conducted at the time of hospital admission. 16S metagenomics testing was done retrospectively and in duplicate on the blood of all patients. Total DNA was extracted from the blood and the V3–V4 regions of the bacterial 16S rRNA genes were amplified by PCR and deep sequenced on an Illumina MiSeq sequencer. Paired reads were curated, taxonomically labeled, and filtered. Blood culture diagnosed bBSI in 12 patients, but this number increased to 22 patients when combining blood culture and 16S metagenomics results. In addition to superior sensitivity compared to standard blood culture, 16S metagenomics revealed important novel insights into the nature of bBSI. Patients with acute malaria or recovering from malaria had a 7-fold higher risk of presenting polymicrobial bloodstream infections compared to patients with no recent malaria diagnosis (p-value = 0.046). Malaria is known to affect epithelial gut function and may thus facilitate bacterial translocation from the intestinal lumen to the blood. Importantly, patients with such polymicrobial blood infections showed a 9-fold higher risk factor for not surviving their febrile illness (p-value = 0.030). Conclusions/Significance Our data demonstrate that 16S metagenomics is a powerful approach for the diagnosis and understanding of bBSI. This proof-of-concept study also showed that appropriate control samples are crucial to detect background signals due to environmental contamination. PMID:26927306

  16. IMPACT fragmentation model developments

    NASA Astrophysics Data System (ADS)

    Sorge, Marlon E.; Mains, Deanna L.

    2016-09-01

    The IMPACT fragmentation model has been used by The Aerospace Corporation for more than 25 years to analyze orbital altitude explosions and hypervelocity collisions. The model is semi-empirical, combining mass, energy and momentum conservation laws with empirically derived relationships for fragment characteristics such as number, mass, area-to-mass ratio, and spreading velocity as well as event energy distribution. Model results are used for several types of analysis including assessment of short-term risks to satellites from orbital altitude fragmentations, prediction of the long-term evolution of the orbital debris environment and forensic assessments of breakup events. A new version of IMPACT, version 6, has been completed and incorporates a number of advancements enabled by a multi-year long effort to characterize more than 11,000 debris fragments from more than three dozen historical on-orbit breakup events. These events involved a wide range of causes, energies, and fragmenting objects. Special focus was placed on the explosion model, as the majority of events examined were explosions. Revisions were made to the mass distribution used for explosion events, increasing the number of smaller fragments generated. The algorithm for modeling upper stage large fragment generation was updated. A momentum conserving asymmetric spreading velocity distribution algorithm was implemented to better represent sub-catastrophic events. An approach was developed for modeling sub-catastrophic explosions, those where the majority of the parent object remains intact, based on estimated event energy. Finally, significant modifications were made to the area-to-mass ratio distribution to incorporate the tendencies of different materials to fragment into different shapes. This ability enabled better matches between the observed area-to-mass ratios and those generated by the model. It also opened up additional possibilities for post-event analysis of breakups. The paper will discuss

  17. DOE JGI Quality Metrics; Approaches to Scaling and Improving Metagenome Assembly (Metagenomics Informatics Challenges Workshop: 10K Genomes at a Time)

    ScienceCinema

    Copeland, Alex [DOE JGI; Brown, C Titus [Michigan State University

    2016-07-12

    DOE JGI's Alex Copeland on "DOE JGI Quality Metrics" and Michigan State University's C. Titus Brown on "Approaches to Scaling and Improving Metagenome Assembly" at the Metagenomics Informatics Challenges Workshop held at the DOE JGI on October 12-13, 2011.

  18. Metagenomic Sequencing of the Chronic Obstructive Pulmonary Disease Upper Bronchial Tract Microbiome Reveals Functional Changes Associated with Disease Severity.

    PubMed

    Cameron, Simon J S; Lewis, Keir E; Huws, Sharon A; Lin, Wanchang; Hegarty, Matthew J; Lewis, Paul D; Mur, Luis A J; Pachebat, Justin A

    2016-01-01

    Chronic Obstructive Pulmonary Disease (COPD) is a major source of mortality and morbidity worldwide. The microbiome associated with this disease may be an important component of the disease, though studies to date have been based on sequencing of the 16S rRNA gene, and have revealed unequivocal results. Here, we employed metagenomic sequencing of the upper bronchial tract (UBT) microbiome to allow for greater elucidation of its taxonomic composition, and revealing functional changes associated with the disease. The bacterial metagenomes within sputum samples from eight COPD patients and ten 'healthy' smokers (Controls) were sequenced, and suggested significant changes in the abundance of bacterial species, particularly within the Streptococcus genus. The functional capacity of the COPD UBT microbiome indicated an increased capacity for bacterial growth, which could be an important feature in bacterial-associated acute exacerbations. Regression analyses correlated COPD severity (FEV1% of predicted) with differences in the abundance of Streptococcus pneumoniae and functional classifications related to a reduced capacity for bacterial sialic acid metabolism. This study suggests that the COPD UBT microbiome could be used in patient risk stratification and in identifying novel monitoring and treatment methods, but study of a longitudinal cohort will be required to unequivocally relate these features of the microbiome with COPD severity.

  19. Target fragmentation in radiobiology

    NASA Technical Reports Server (NTRS)

    Wilson, John W.; Cucinotta, Francis A.; Shinn, Judy L.; Townsend, Lawrence W.

    1993-01-01

    Nuclear reactions in biological systems produce low-energy fragments of the target nuclei seen as local high events of linear energy transfer (LET). A nuclear-reaction formalism is used to evaluate the nuclear-induced fields within biosystems and their effects within several biological models. On the basis of direct ionization interaction, one anticipates high-energy protons to have a quality factor and relative biological effectiveness (RBE) of unity. Target fragmentation contributions raise the effective quality factor of 10 GeV protons to 3.3 in reasonable agreement with RBE values for induced micronuclei in bean sprouts. Application of the Katz model indicates that the relative increase in RBE with decreasing exposure observed in cell survival experiments with 160 MeV protons is related solely to target fragmentation events. Target fragment contributions to lens opacity given an RBE of 1.4 for 2 GeV protons in agreement with the work of Lett and Cox. Predictions are made for the effective RBE for Harderian gland tumors induced by high-energy protons. An exposure model for lifetime cancer risk is derived from NCRP 98 risk tables, and protraction effects are examined for proton and helium ion exposures. The implications of dose rate enhancement effects on space radiation protection are considered.

  20. The Fragmentation of Learning.

    ERIC Educational Resources Information Center

    Downes, Stephen

    2001-01-01

    Information and communication technologies, especially the Internet, have vastly increased access to information and educational opportunities. Steadily increasing consumer demand is driving the development of online educational materials. The end result may be a "fragmentation" of learning involving multiple learning providers and delivery modes,…

  1. Cross-roads in the classification of papillomaviruses.

    PubMed

    de Villiers, Ethel-Michele

    2013-10-01

    Acceptance of an official classification for the family Papillomaviridae based purely on DNA sequence relatedness, was achieved as late as 2003. The rate of isolation and characterization of new papillomavirus types has greatly depended on and subjected to the development of new laboratory techniques. Introduction of every new technique led to a temporarily burst in the number of new isolates. In the following, the bumpy road towards achieving a classification system combined with the controversies of implementing and accepting new techniques will be summarized. An update of the classification of the 170 human papillomavirus (HPV) types presently known is presented. Arguments towards the implementation of metagenomic sequencing for this rapidly growing family will be presented.

  2. Metagenomic sequence of saline desert microbiota from wild ass sanctuary, Little Rann of Kutch, Gujarat, India.

    PubMed

    Patel, Rajesh; Mevada, Vishal; Prajapati, Dhaval; Dudhagara, Pravin; Koringa, Prakash; Joshi, C G

    2015-03-01

    We report Metagenome from the saline desert soil sample of Little Rann of Kutch, Gujarat State, India. Metagenome consisted of 633,760 sequences with size 141,307,202 bp and 56% G + C content. Metagenome sequence data are available at EBI under EBI Metagenomics database with accession no. ERP005612. Community metagenomics revealed total 1802 species belonged to 43 different phyla with dominating Marinobacter (48.7%) and Halobacterium (4.6%) genus in bacterial and archaeal domain respectively. Remarkably, 18.2% sequences in a poorly characterized group and 4% gene for various stress responses along with versatile presence of commercial enzyme were evident in a functional metagenome analysis.

  3. Classification Analysis.

    ERIC Educational Resources Information Center

    Ball, Geoffrey H.

    Sorting things into groups is a basic intellectual task that allows people to simplify with minimal reduction in information. Classification techniques, which include both clustering and discrimination, provide step-by-step computer-based procedures for sorting things based on notions of generalized similarity and on the "class description"…

  4. Metagenomes from two microbial consortia associated with Santa Barbara seep oil.

    PubMed

    Hawley, Erik R; Malfatti, Stephanie A; Pagani, Ioanna; Huntemann, Marcel; Chen, Amy; Foster, Brian; Copeland, Alexander; del Rio, Tijana Glavina; Pati, Amrita; Jansson, Janet R; Gilbert, Jack A; Tringe, Susannah Green; Lorenson, Thomas D; Hess, Matthias

    2014-12-01

    The metagenomes from two microbial consortia associated with natural oils seeping into the Pacific Ocean offshore the coast of Santa Barbara (California, USA) were determined to complement already existing metagenomes generated from microbial communities associated with hydrocarbons that pollute the marine ecosystem. This genomics resource article is the first of two publications reporting a total of four new metagenomes from oils that seep into the Santa Barbara Channel.

  5. FIELD TESTS OF GEOGRAPHICALLY-DEPENDENT VS. THRESHOLD-BASED WATERSHED CLASSIFICATION SCHEMES IN THE GREAT LAKES BASIN

    EPA Science Inventory

    We compared classification schemes based on watershed storage (wetland + lake area/watershed area) and forest fragmentation with a geographically-based classification scheme for two case studies involving 1) Lake Superior tributaries and 2) watersheds of riverine coastal wetlands...

  6. FIELD TESTS OF GEOGRAPHICALLY-DEPENDENT VS. THRESHOLD-BASED WATERSHED CLASSIFICATION SCHEMED IN THE GREAT LAKES BASIN

    EPA Science Inventory

    We compared classification schemes based on watershed storage (wetland + lake area/watershed area) and forest fragmentation with a geographically-based classification scheme for two case studies involving 1)Lake Superior tributaries and 2) watersheds of riverine coastal wetlands ...

  7. Metagenomic and biochemical characterizations of sulfur oxidation metabolism in uncultured large sausage-shaped bacterium in hot spring microbial mats.

    PubMed

    Tamazawa, Satoshi; Takasaki, Kazuto; Tamaki, Hideyuki; Kamagata, Yoichi; Hanada, Satoshi

    2012-01-01

    So-called "sulfur-turf" microbial mats in sulfide containing hot springs (55-70°C, pH 7.3-8.3) in Japan were dominated by a large sausage-shaped bacterium (LSSB) that is closely related to the genus Sulfurihydrogenibium. Several previous reports proposed that the LSSB would be involved in sulfide oxidation in hot spring. However, the LSSB has not been isolated yet, thus there has been no clear evidence showing whether it possesses any genes and enzymes responsible for sulfide oxidation. To verify this, we investigated sulfide oxidation potential in the LSSB using a metagenomic approach and subsequent biochemical analysis. Genome fragments of the LSSB (a total of 3.7 Mb sequence including overlapping fragments) were obtained from the metagenomic fosmid library constructed from genomic DNA of the sulfur-turf mats. The sequence annotation clearly revealed that the LSSB possesses sulfur oxidation-related genes coding sulfide dehydrogenase (SD), sulfide-quinone reductase and sulfite dehydrogenase. The gene encoding SD, the key enzyme for sulfide oxidation, was successfully cloned and heterologously expressed in Escherichia coli. The purified recombinant enzyme clearly showed SD activity with optimum temperature and pH of 60°C and 8.0, respectively, which were consistent with the environmental conditions in the hot spring where the sulfur-turf thrives. Furthermore, the affinity of SD to sulfide was relatively high, which also reflected the environment where the sulfide could be continuously supplied. This is the first report showing that the LSSB harbors sulfide oxidizing metabolism adapted to the hot spring environment and can be involved in sulfide oxidation in the sulfur-turf microbial mats.

  8. Metagenomic and Biochemical Characterizations of Sulfur Oxidation Metabolism in Uncultured Large Sausage-Shaped Bacterium in Hot Spring Microbial Mats

    PubMed Central

    Tamaki, Hideyuki; Kamagata, Yoichi; Hanada, Satoshi

    2012-01-01

    So-called “sulfur-turf” microbial mats in sulfide containing hot springs (55–70°C, pH 7.3–8.3) in Japan were dominated by a large sausage-shaped bacterium (LSSB) that is closely related to the genus Sulfurihydrogenibium. Several previous reports proposed that the LSSB would be involved in sulfide oxidation in hot spring. However, the LSSB has not been isolated yet, thus there has been no clear evidence showing whether it possesses any genes and enzymes responsible for sulfide oxidation. To verify this, we investigated sulfide oxidation potential in the LSSB using a metagenomic approach and subsequent biochemical analysis. Genome fragments of the LSSB (a total of 3.7 Mb sequence including overlapping fragments) were obtained from the metagenomic fosmid library constructed from genomic DNA of the sulfur-turf mats. The sequence annotation clearly revealed that the LSSB possesses sulfur oxidation-related genes coding sulfide dehydrogenase (SD), sulfide-quinone reductase and sulfite dehydrogenase. The gene encoding SD, the key enzyme for sulfide oxidation, was successfully cloned and heterologously expressed in Escherichia coli. The purified recombinant enzyme clearly showed SD activity with optimum temperature and pH of 60°C and 8.0, respectively, which were consistent with the environmental conditions in the hot spring where the sulfur-turf thrives. Furthermore, the affinity of SD to sulfide was relatively high, which also reflected the environment where the sulfide could be continuously supplied. This is the first report showing that the LSSB harbors sulfide oxidizing metabolism adapted to the hot spring environment and can be involved in sulfide oxidation in the sulfur-turf microbial mats. PMID:23185438

  9. Statistical models of brittle fragmentation

    NASA Astrophysics Data System (ADS)

    Åström, J. A.

    2006-06-01

    Recent developments in statistical models for fragmentation of brittle material are reviewed. The generic objective of these models is understanding the origin of the fragment size distributions (FSDs) that result from fracturing brittle material. Brittle fragmentation can be divided into two categories: (1) Instantaneous fragmentation for which breakup generations are not distinguishable and (2) continuous fragmentation for which generations of chronological fragment breakups can be identified. This categorization becomes obvious in mining industry applications where instantaneous fragmentation refers to blasting of rock and continuous fragmentation to the consequent crushing and grinding of the blasted rock fragments. A model of unstable cracks and crack-branch merging contains both of the FSDs usually related to instantaneous fragmentation: the scale invariant FSD with the power exponent (2-1/D) and the double exponential FSD which relates to Poisson process fragmentation. The FSDs commonly related to continuous fragmentation are: the lognormal FSD originating from uncorrelated breakup and the power-law FSD which can be modeled as a cascade of breakups. Various solutions to the generic rate equation of continuous fragmentation are briefly listed. Simulations of crushing experiments reveal that both cascade and uncorrelated fragmentations are possible, but that also a mechanism of maximizing packing density related to Apollonian packing may be relevant for slow compressive crushing.

  10. MetLab: An In Silico Experimental Design, Simulation and Analysis Tool for Viral Metagenomics Studies

    PubMed Central

    Gourlé, Hadrien; Bongcam-Rudloff, Erik; Hayer, Juliette

    2016-01-01

    Metagenomics, the sequence characterization of all genomes within a sample, is widely used as a virus discovery tool as well as a tool to study viral diversity of animals. Metagenomics can be considered to have three main steps; sample collection and preparation, sequencing and finally bioinformatics. Bioinformatic analysis of metagenomic datasets is in itself a complex process, involving few standardized methodologies, thereby hampering comparison of metagenomics studies between research groups. In this publication the new bioinformatics framework MetLab is presented, aimed at providing scientists with an integrated tool for experimental design and analysis of viral metagenomes. MetLab provides support in designing the metagenomics experiment by estimating the sequencing depth needed for the complete coverage of a species. This is achieved by applying a methodology to calculate the probability of coverage using an adaptation of Stevens’ theorem. It also provides scientists with several pipelines aimed at simplifying the analysis of viral metagenomes, including; quality control, assembly and taxonomic binning. We also implement a tool for simulating metagenomics datasets from several sequencing platforms. The overall aim is to provide virologists with an easy to use tool for designing, simulating and analyzing viral metagenomes. The results presented here include a benchmark towards other existing software, with emphasis on detection of viruses as well as speed of applications. This is packaged, as comprehensive software, readily available for Linux and OSX users at https://github.com/norling/metlab. PMID:27479078

  11. Safety analysis of a Russian phage cocktail: from metagenomic analysis to oral application in healthy human subjects.

    PubMed

    McCallin, Shawna; Alam Sarker, Shafiqul; Barretto, Caroline; Sultana, Shamima; Berger, Bernard; Huq, Sayeda; Krause, Lutz; Bibiloni, Rodrigo; Schmitt, Bertrand; Reuteler, Gloria; Brüssow, Harald

    2013-09-01

    Phage therapy has a long tradition in Eastern Europe, where preparations are comprised of complex phage cocktails whose compositions have not been described. We investigated the composition of a phage cocktail from the Russian pharmaceutical company Microgen targeting Escherichia coli/Proteus infections. Electron microscopy identified six phage types, with numerically T7-like phages dominating over T4-like phages. A metagenomic approach using taxonomical classification, reference mapping and de novo assembly identified 18 distinct phage types, including 7 genera of Podoviridae, 2 established and 2 proposed genera of Myoviridae, and 2 genera of Siphoviridae. De novo assembly yielded 7 contigs greater than 30 kb, including a 147-kb Myovirus genome and a 42-kb genome of a potentially new phage. Bioinformatic analysis did not reveal undesired genes and a small human volunteer trial did not associate adverse effects with oral phage exposure.

  12. Metagenomic insights into the dynamics of microbial communities in food.

    PubMed

    Kergourlay, Gilles; Taminiau, Bernard; Daube, Georges; Champomier Vergès, Marie-Christine

    2015-11-20

    Metagenomics has proven to be a powerful tool in exploring a large diversity of natural environments such as air, soil, water, and plants, as well as various human microbiota (e.g. digestive tract, lungs, skin). DNA sequencing techniques are becoming increasingly popular and less and less expensive. Given that high-throughput DNA sequencing approaches have only recently started to be used to decipher food microbial ecosystems, there is a significant growth potential for such technologies in the field of food microbiology. The aim of this review is to present a survey of recent food investigations via metagenomics and to illustrate how this approach can be a valuable tool in the better characterization of foods and their transformation, storage and safety. Traditional food in particular has been thoroughly explored by global approaches in order to provide information on multi-species and multi-organism communities.

  13. Unlocking the potential of metagenomics through replicated experimental design

    PubMed Central

    Knight, Rob; Jansson, Janet; Field, Dawn; Fierer, Noah; Desai, Narayan; Fuhrman, Jed A.; Hugenholtz, Phil; van der Lelie, Daniel; Meyer, Folker; Stevens, Rick; Bailey, Mark J.; Gordon, Jeffrey I.; Kowalchuk, George A.; Gilbert, Jack A.

    2015-01-01

    Metagenomics holds enormous promise for discovering novel enzymes and organisms that are biomarkers or causes of processes relevant to disease, industry and the environment. In the last two years we have seen a paradigm shift in metagenomics to the application of broad cross-sectional and longitudinal studies enabled by advances in DNA sequencing and high-performance computing. These technologies now make it possible to broadly assess microbial diversity and function, allowing systematic investigation of the largely unexplored frontier of microbial life. To achieve this aim, the global scientific community must collaborate and agree upon common objectives and data standards to enable comparative research across the Earth’s microbiome. Improvements in comparability of data will facilitate the study of biotechnologically relevant processes such as bioprospecting for new glycoside hydrolases or identifying novel energy sources. PMID:22678395

  14. Metagenomic approaches to understanding phylogenetic diversity in quorum sensing

    PubMed Central

    Kimura, Nobutada

    2014-01-01

    Quorum sensing, a form of cell–cell communication among bacteria, allows bacteria to synchronize their behaviors at the population level in order to control behaviors such as luminescence, biofilm formation, signal turnover, pigment production, antibiotics production, swarming, and virulence. A better understanding of quorum-sensing systems will provide us with greater insight into the complex interaction mechanisms used widely in the Bacteria and even the Archaea domain in the environment. Metagenomics, the use of culture-independent sequencing to study the genomic material of microorganisms, has the potential to provide direct information about the quorum-sensing systems in uncultured bacteria. This article provides an overview of the current knowledge of quorum sensing focused on phylogenetic diversity, and presents examples of studies that have used metagenomic techniques. Future technologies potentially related to quorum-sensing systems are also discussed. PMID:24429899

  15. The MG-RAST Metagenomics Database and Portal in 2015

    DOE PAGES

    Wilke, Andreas; Bischof, Jared; Gerlach, Wolfgang; ...

    2015-12-09

    MG-RAST (http://metagenomics.anl.gov) is an opensubmission data portal for processing, analyzing, sharing and disseminating metagenomic datasets. Currently, the system hosts over 200 000 datasets and is continuously updated. The volume of submissions has increased 4-fold over the past 24 months, now averaging 4 terabasepairs per month. In addition to several new features, we report changes to the analysis workflow and the technologies used to scale the pipeline up to the required throughput levels. Lastly, to show possible uses for the data from MG-RAST, we present several examples integrating data and analyses from MG-RAST into popular third-party analysis tools or sequence alignmentmore » tools.« less

  16. Gene and translation initiation site prediction in metagenomic sequences

    SciTech Connect

    Hyatt, Philip Douglas; LoCascio, Philip F; Hauser, Loren John; Uberbacher, Edward C

    2012-01-01

    Gene prediction in metagenomic sequences remains a difficult problem. Current sequencing technologies do not achieve sufficient coverage to assemble the individual genomes in a typical sample; consequently, sequencing runs produce a large number of short sequences whose exact origin is unknown. Since these sequences are usually smaller than the average length of a gene, algorithms must make predictions based on very little data. We present MetaProdigal, a metagenomic version of the gene prediction program Prodigal, that can identify genes in short, anonymous coding sequences with a high degree of accuracy. The novel value of the method consists of enhanced translation initiation site identification, ability to identify sequences that use alternate genetic codes and confidence values for each gene call. We compare the results of MetaProdigal with other methods and conclude with a discussion of future improvements.

  17. Metagenome of a Versatile Chemolithoautotroph from Expanding Oceanic Dead Zones

    SciTech Connect

    Walsh, David A.; Zaikova, Elena; Howes, Charles L.; Song, Young; Wright, Jody; Tringe, Susannah G.; Tortell, Philippe D.; Hallam, Steven J.

    2009-07-15

    Oxygen minimum zones (OMZs), also known as oceanic"dead zones", are widespread oceanographic features currently expanding due to global warming and coastal eutrophication. Although inhospitable to metazoan life, OMZs support a thriving but cryptic microbiota whose combined metabolic activity is intimately connected to nutrient and trace gas cycling within the global ocean. Here we report time-resolved metagenomic analyses of a ubiquitous and abundant but uncultivated OMZ microbe (SUP05) closely related to chemoautotrophic gill symbionts of deep-sea clams and mussels. The SUP05 metagenome harbors a versatile repertoire of genes mediating autotrophic carbon assimilation, sulfur-oxidation and nitrate respiration responsive to a wide range of water column redox states. Thus, SUP05 plays integral roles in shaping nutrient and energy flow within oxygen-deficient oceanic waters via carbon sequestration, sulfide detoxification and biological nitrogen loss with important implications for marine productivity and atmospheric greenhouse control.

  18. Windshield splatter analysis with the Galaxy metagenomic pipeline.

    PubMed

    Kosakovsky Pond, Sergei; Wadhawan, Samir; Chiaromonte, Francesca; Ananda, Guruprasad; Chung, Wen-Yu; Taylor, James; Nekrutenko, Anton

    2009-11-01

    How many species inhabit our immediate surroundings? A straightforward collection technique suitable for answering this question is known to anyone who has ever driven a car at highway speeds. The windshield of a moving vehicle is subjected to numerous insect strikes and can be used as a collection device for representative sampling. Unfortunately the analysis of biological material collected in that manner, as with most metagenomic studies, proves to be rather demanding due to the large number of required tools and considerable computational infrastructure. In this study, we use organic matter collected by a moving vehicle to design and test a comprehensive pipeline for phylogenetic profiling of metagenomic samples that includes all steps from processing and quality control of data generated by next-generation sequencing technologies to statistical analyses and data visualization. To the best of our knowledge, this is also the first publication that features a live online supplement providing access to exact analyses and workflows used in the article.

  19. Metagenomic approaches to understanding phylogenetic diversity in quorum sensing.

    PubMed

    Kimura, Nobutada

    2014-04-01

    Quorum sensing, a form of cell-cell communication among bacteria, allows bacteria to synchronize their behaviors at the population level in order to control behaviors such as luminescence, biofilm formation, signal turnover, pigment production, antibiotics production, swarming, and virulence. A better understanding of quorum-sensing systems will provide us with greater insight into the complex interaction mechanisms used widely in the Bacteria and even the Archaea domain in the environment. Metagenomics, the use of culture-independent sequencing to study the genomic material of microorganisms, has the potential to provide direct information about the quorum-sensing systems in uncultured bacteria. This article provides an overview of the current knowledge of quorum sensing focused on phylogenetic diversity, and presents examples of studies that have used metagenomic techniques. Future technologies potentially related to quorum-sensing systems are also discussed.

  20. Metagenome of a versatile chemolithoautotroph from expanding oceanic dead zones.

    PubMed

    Walsh, David A; Zaikova, Elena; Howes, Charles G; Song, Young C; Wright, Jody J; Tringe, Susannah G; Tortell, Philippe D; Hallam, Steven J

    2009-10-23

    Oxygen minimum zones, also known as oceanic "dead zones," are widespread oceanographic features currently expanding because of global warming. Although inhospitable to metazoan life, they support a cryptic microbiota whose metabolic activities affect nutrient and trace gas cycling within the global ocean. Here, we report metagenomic analyses of a ubiquitous and abundant but uncultivated oxygen minimum zone microbe (SUP05) related to chemoautotrophic gill symbionts of deep-sea clams and mussels. The SUP05 metagenome harbors a versatile repertoire of genes mediating autotrophic carbon assimilation, sulfur oxidation, and nitrate respiration responsive to a wide range of water-column redox states. Our analysis provides a genomic foundation for understanding the ecological and biogeochemical role of pelagic SUP05 in oxygen-deficient oceanic waters and its potential sensitivity to environmental changes.

  1. MOCAT2: a metagenomic assembly, annotation and profiling framework

    PubMed Central

    Kultima, Jens Roat; Coelho, Luis Pedro; Forslund, Kristoffer; Huerta-Cepas, Jaime; Li, Simone S.; Driessen, Marja; Voigt, Anita Yvonne; Zeller, Georg; Sunagawa, Shinichi; Bork, Peer

    2016-01-01

    Summary: MOCAT2 is a software pipeline for metagenomic sequence assembly and gene prediction with novel features for taxonomic and functional abundance profiling. The automated generation and efficient annotation of non-redundant reference catalogs by propagating pre-computed assignments from 18 databases covering various functional categories allows for fast and comprehensive functional characterization of metagenomes. Availability and Implementation: MOCAT2 is implemented in Perl 5 and Python 2.7, designed for 64-bit UNIX systems and offers support for high-performance computer usage via LSF, PBS or SGE queuing systems; source code is freely available under the GPL3 license at http://mocat.embl.de. Contact: bork@embl.de Supplementary information: Supplementary data are available at Bioinformatics online. PMID:27153620

  2. Metagenomic Human Repiratory Air in a Hospital Environment.

    PubMed

    Lai, Yi Yu; Li, Yanming; Lang, Jidong; Tong, Xunliang; Zhang, Lina; Fang, Jianhuo; Xing, Jingli; Cai, Meng; Xu, Hongtao; Deng, Yan; Xiao, Fei; Tian, Geng

    2015-01-01

    Hospital-acquired infection (HAI) or nosocomial infection is an issue that frequent hospital environment. We believe conventional regulated Petri dish method is insufficient to evaluate HAI. To address this problem, metagenomic sequencing was applied to screen airborne microbes in four rooms of Beijing Hospital. With air-in amount of sampler being setup to one person's respiration quantity, metagenomic sequencing identified huge numbers of species in the rooms which had already qualified widely accepted petridish exposing standard, imposing urgency for new technology. Meanwhile,the comparative culture only got small portion of recovered species and remain blind for even cultivable pathogens reminded us the limitations of old technologies. To the best of our knowledge, the method demonstrated in this study could be broadly applied in hospital indoor environment for various monitoring activities as well as HAI study. It is also potential as a transmissible pathogen real-time modelling system worldwide.

  3. A Statistical Framework for the Functional Analysis of Metagenomes

    SciTech Connect

    Sharon, Itai; Pati, Amrita; Markowitz, Victor; Pinter, Ron Y.

    2008-10-01

    Metagenomic studies consider the genetic makeup of microbial communities as a whole, rather than their individual member organisms. The functional and metabolic potential of microbial communities can be analyzed by comparing the relative abundance of gene families in their collective genomic sequences (metagenome) under different conditions. Such comparisons require accurate estimation of gene family frequencies. They present a statistical framework for assessing these frequencies based on the Lander-Waterman theory developed originally for Whole Genome Shotgun (WGS) sequencing projects. They also provide a novel method for assessing the reliability of the estimations which can be used for removing seemingly unreliable measurements. They tested their method on a wide range of datasets, including simulated genomes and real WGS data from sequencing projects of whole genomes. Results suggest that their framework corrects inherent biases in accepted methods and provides a good approximation to the true statistics of gene families in WGS projects.

  4. Biogeography and individuality shape function in the human skin metagenome.

    PubMed

    Oh, Julia; Byrd, Allyson L; Deming, Clay; Conlan, Sean; Kong, Heidi H; Segre, Julia A

    2014-10-02

    The varied topography of human skin offers a unique opportunity to study how the body's microenvironments influence the functional and taxonomic composition of microbial communities. Phylogenetic marker gene-based studies have identified many bacteria and fungi that colonize distinct skin niches. Here metagenomic analyses of diverse body sites in healthy humans demonstrate that local biogeography and strong individuality define the skin microbiome. We developed a relational analysis of bacterial, fungal and viral communities, which showed not only site specificity but also individual signatures. We further identified strain-level variation of dominant species as heterogeneous and multiphyletic. Reference-free analyses captured the uncharacterized metagenome through the development of a multi-kingdom gene catalogue, which was used to uncover genetic signatures of species lacking reference genomes. This work is foundational for human disease studies investigating inter-kingdom interactions, metabolic changes and strain tracking, and defines the dual influence of biogeography and individuality on microbial composition and function.

  5. The MG-RAST Metagenomics Database and Portal in 2015

    SciTech Connect

    Wilke, Andreas; Bischof, Jared; Gerlach, Wolfgang; Glass, Elizabeth; Harrison, Travis; Keegan, Kevin; Paczian, Tobias; Trimble, William L.; Bagchi, Saurabh; Grama, Ananth; Chaterji, Somali; Meyer, Folker

    2015-12-09

    MG-RAST (http://metagenomics.anl.gov) is an opensubmission data portal for processing, analyzing, sharing and disseminating metagenomic datasets. Currently, the system hosts over 200 000 datasets and is continuously updated. The volume of submissions has increased 4-fold over the past 24 months, now averaging 4 terabasepairs per month. In addition to several new features, we report changes to the analysis workflow and the technologies used to scale the pipeline up to the required throughput levels. Lastly, to show possible uses for the data from MG-RAST, we present several examples integrating data and analyses from MG-RAST into popular third-party analysis tools or sequence alignment tools.

  6. Fragmentation of Fractal Random Structures

    NASA Astrophysics Data System (ADS)

    Elçi, Eren Metin; Weigel, Martin; Fytas, Nikolaos G.

    2015-03-01

    We analyze the fragmentation behavior of random clusters on the lattice under a process where bonds between neighboring sites are successively broken. Modeling such structures by configurations of a generalized Potts or random-cluster model allows us to discuss a wide range of systems with fractal properties including trees as well as dense clusters. We present exact results for the densities of fragmenting edges and the distribution of fragment sizes for critical clusters in two dimensions. Dynamical fragmentation with a size cutoff leads to broad distributions of fragment sizes. The resulting power laws are shown to encode characteristic fingerprints of the fragmented objects.

  7. Recovery of a medieval Brucella melitensis genome using shotgun metagenomics.

    PubMed

    Kay, Gemma L; Sergeant, Martin J; Giuffra, Valentina; Bandiera, Pasquale; Milanese, Marco; Bramanti, Barbara; Bianucci, Raffaella; Pallen, Mark J

    2014-07-15

    Shotgun metagenomics provides a powerful assumption-free approach to the recovery of pathogen genomes from contemporary and historical material. We sequenced the metagenome of a calcified nodule from the skeleton of a 14th-century middle-aged male excavated from the medieval Sardinian settlement of Geridu. We obtained 6.5-fold coverage of a Brucella melitensis genome. Sequence reads from this genome showed signatures typical of ancient or aged DNA. Despite the relatively low coverage, we were able to use information from single-nucleotide polymorphisms to place the medieval pathogen genome within a clade of B. melitensis strains that included the well-studied Ether strain and two other recent Italian isolates. We confirmed this placement using information from deletions and IS711 insertions. We conclude that metagenomics stands ready to document past and present infections, shedding light on the emergence, evolution, and spread of microbial pathogens. Importance: Infectious diseases have shaped human populations and societies throughout history. The recovery of pathogen DNA sequences from human remains provides an opportunity to identify and characterize the causes of individual and epidemic infections. By sequencing DNA extracted from medieval human remains through shotgun metagenomics, without target-specific capture or amplification, we have obtained a draft genome sequence of an ~700-year-old Brucella melitensis strain. Using a variety of bioinformatic approaches, we have shown that this historical strain is most closely related to recent strains isolated from Italy, confirming the continuity of this zoonotic infection, and even a specific lineage, in the Mediterranean region over the centuries.

  8. The Challenge and Potential of Metagenomics in the Clinic

    PubMed Central

    Mulcahy-O’Grady, Heidi; Workentine, Matthew L.

    2016-01-01

    The bacteria, fungi, and viruses that live on and in us have a tremendous impact on our day-to-day health and are often linked to many diseases, including autoimmune disorders and infections. Diagnosing and treating these disorders relies on accurate identification and characterization of the microbial community. Current sequencing technologies allow the sequencing of the entire nucleic acid complement of a sample providing an accurate snapshot of the community members present in addition to the full genetic potential of that microbial community. There are a number of clinical applications that stand to benefit from these data sets, such as the rapid identification of pathogens present in a sample. Other applications include the identification of antibiotic-resistance genes, diagnosis and treatment of gastrointestinal disorders, and many other diseases associated with bacterial, viral, and fungal microbiomes. Metagenomics also allows the physician to probe more complex phenotypes such as microbial dysbiosis with intestinal disorders and disruptions of the skin microbiome that may be associated with skin disorders. Many of these disorders are not associated with a single pathogen but emerge as a result of complex ecological interactions within microbiota. Currently, we understand very little about these complex phenotypes, yet clearly they are important and in some cases, as with fecal microbiota transplants in Clostridium difficile infections, treating the microbiome of the patient is effective. Here, we give an overview of metagenomics and discuss a number of areas where metagenomics is applicable in the clinic, and progress being made in these areas. This includes (1) the identification of unknown pathogens, and those pathogens particularly hard to culture, (2) utilizing functional information and gene content to understand complex infections such as Clostridium difficile, and (3) predicting antimicrobial resistance of the community using genetic determinants of

  9. Metagenomic Sequencing for Surveillance of Food- and Waterborne Viral Diseases

    PubMed Central

    Nieuwenhuijse, David F.; Koopmans, Marion P. G.

    2017-01-01

    A plethora of viruses can be transmitted by the food- and waterborne route. However, their recognition is challenging because of the variety of viruses, heterogeneity of symptoms, the lack of awareness of clinicians, and limited surveillance efforts. Classical food- and waterborne viral disease outbreaks are mainly caused by caliciviruses, but the source of the virus is often not known and the foodborne mode of transmission is difficult to discriminate from human-to-human transmission. Atypical food- and waterborne viral disease can be caused by viruses such as hepatitis A and hepatitis E. In addition, a source of novel emerging viruses with a potential to spread via the food- and waterborne route is the repeated interaction of humans with wildlife. Wildlife-to-human adaptation may give rise to self- limiting outbreaks in some cases, but when fully adjusted to the human host can be devastating. Metagenomic sequencing has been investigated as a promising solution for surveillance purposes as it detects all viruses in a single protocol, delivers additional genomic information for outbreak tracing, and detects novel unknown viruses. Nevertheless, several issues must be addressed to apply metagenomic sequencing in surveillance. First, sample preparation is difficult since the genomic material of viruses is generally overshadowed by host- and bacterial genomes. Second, several data analysis issues hamper the efficient, robust, and automated processing of metagenomic data. Third, interpretation of metagenomic data is hard, because of the lack of general knowledge of the virome in the food chain and the environment. Further developments in virus-specific nucleic acid extraction methods, bioinformatic data processing applications, and unifying data visualization tools are needed to gain insightful surveillance knowledge from suspect food samples. PMID:28261185

  10. Bioprospecting metagenomics of decaying wood: mining for new glycoside hydrolases

    SciTech Connect

    Li L. L.; van der Lelie D.; Taghavi, S.; McCorkle, S. M.; Zhang, Y.-B.; Blewitt, M. G.; Brunecky, R.; Adney, W. S.; Himmel, M. E.; Brumm, P.; Drinkwater, C.; Mead, D. A.; Tringe, S. G.

    2011-08-01

    To efficiently deconstruct recalcitrant plant biomass to fermentable sugars in industrial processes, biocatalysts of higher performance and lower cost are required. The genetic diversity found in the metagenomes of natural microbial biomass decay communities may harbor such enzymes. Our goal was to discover and characterize new glycoside hydrolases (GHases) from microbial biomass decay communities, especially those from unknown or never previously cultivated microorganisms. From the metagenome sequences of an anaerobic microbial community actively decaying poplar biomass, we identified approximately 4,000 GHase homologs. Based on homology to GHase families/activities of interest and the quality of the sequences, candidates were selected for full-length cloning and subsequent expression. As an alternative strategy, a metagenome expression library was constructed and screened for GHase activities. These combined efforts resulted in the cloning of four novel GHases that could be successfully expressed in Escherichia coli. Further characterization showed that two enzymes showed significant activity on p-nitrophenyl-{alpha}-L-arabinofuranoside, one enzyme had significant activity against p-nitrophenyl-{beta}-D-glucopyranoside, and one enzyme showed significant activity against p-nitrophenyl-{beta}-D-xylopyranoside. Enzymes were also tested in the presence of ionic liquids. Metagenomics provides a good resource for mining novel biomass degrading enzymes and for screening of cellulolytic enzyme activities. The four GHases that were cloned may have potential application for deconstruction of biomass pretreated with ionic liquids, as they remain active in the presence of up to 20% ionic liquid (except for 1-ethyl-3-methylimidazolium diethyl phosphate). Alternatively, ionic liquids might be used to immobilize or stabilize these enzymes for minimal solvent processing of biomass.

  11. Assessment of Common and Emerging Bioinformatics Pipelines for Targeted Metagenomics

    PubMed Central

    Siegwald, Léa; Touzet, Hélène; Lemoine, Yves; Hot, David

    2017-01-01

    Targeted metagenomics, also known as metagenetics, is a high-throughput sequencing application focusing on a nucleotide target in a microbiome to describe its taxonomic content. A wide range of bioinformatics pipelines are available to analyze sequencing outputs, and the choice of an appropriate tool is crucial and not trivial. No standard evaluation method exists for estimating the accuracy of a pipeline for targeted metagenomics analyses. This article proposes an evaluation protocol containing real and simulated targeted metagenomics datasets, and adequate metrics allowing us to study the impact of different variables on the biological interpretation of results. This protocol was used to compare six different bioinformatics pipelines in the basic user context: Three common ones (mothur, QIIME and BMP) based on a clustering-first approach and three emerging ones (Kraken, CLARK and One Codex) using an assignment-first approach. This study surprisingly reveals that the effect of sequencing errors has a bigger impact on the results that choosing different amplified regions. Moreover, increasing sequencing throughput increases richness overestimation, even more so for microbiota of high complexity. Finally, the choice of the reference database has a bigger impact on richness estimation for clustering-first pipelines, and on correct taxa identification for assignment-first pipelines. Using emerging assignment-first pipelines is a valid approach for targeted metagenomics analyses, with a quality of results comparable to popular clustering-first pipelines, even with an error-prone sequencing technology like Ion Torrent. However, those pipelines are highly sensitive to the quality of databases and their annotations, which makes clustering-first pipelines still the only reliable approach for studying microbiomes that are not well described. PMID:28052134

  12. Metagenome Sequencing of the Hadza Hunter-Gatherer Gut Microbiota.

    PubMed

    Rampelli, Simone; Schnorr, Stephanie L; Consolandi, Clarissa; Turroni, Silvia; Severgnini, Marco; Peano, Clelia; Brigidi, Patrizia; Crittenden, Alyssa N; Henry, Amanda G; Candela, Marco

    2015-06-29

    Through human microbiome sequencing, we can better understand how host evolutionary and ontogenetic history is reflected in the microbial function. However, there has been no information on the gut metagenome configuration in hunter-gatherer populations, posing a gap in our knowledge of gut microbiota (GM)-host mutualism arising from a lifestyle that describes over 90% of human evolutionary history. Here, we present the first metagenomic analysis of GM from Hadza hunter-gatherers of Tanzania, showing a unique enrichment in metabolic pathways that aligns with the dietary and environmental factors characteristic of their foraging lifestyle. We found that the Hadza GM is adapted for broad-spectrum carbohydrate metabolism, reflecting the complex polysaccharides in their diet. Furthermore, the Hadza GM is equipped for branched-chain amino acid degradation and aromatic amino acid biosynthesis. Resistome functionality demonstrates the existence of antibiotic resistance genes in a population with little antibiotic exposure, indicating the ubiquitous presence of environmentally derived resistances. Our results demonstrate how the functional specificity of the GM correlates with certain environment and lifestyle factors and how complexity from the exogenous environment can be balanced by endogenous homeostasis. The Hadza gut metagenome structure allows us to appreciate the co-adaptive functional role of the GM in complementing the human physiology, providing a better understanding of the versatility of human life and subsistence.

  13. Metagenomic abundance estimation and diagnostic testing on species level

    PubMed Central

    Lindner, Martin S.; Renard, Bernhard Y.

    2013-01-01

    One goal of sequencing-based metagenomic community analysis is the quantitative taxonomic assessment of microbial community compositions. In particular, relative quantification of taxons is of high relevance for metagenomic diagnostics or microbial community comparison. However, the majority of existing approaches quantify at low resolution (e.g. at phylum level), rely on the existence of special genes (e.g. 16S), or have severe problems discerning species with highly similar genome sequences. Yet, problems as metagenomic diagnostics require accurate quantification on species level. We developed Genome Abundance Similarity Correction (GASiC), a method to estimate true genome abundances via read alignment by considering reference genome similarities in a non-negative LASSO approach. We demonstrate GASiC’s superior performance over existing methods on simulated benchmark data as well as on real data. In addition, we present applications to datasets of both bacterial DNA and viral RNA source. We further discuss our approach as an alternative to PCR-based DNA quantification. PMID:22941661

  14. MOCAT: A Metagenomics Assembly and Gene Prediction Toolkit

    PubMed Central

    Li, Junhua; Chen, Weineng; Chen, Hua; Mende, Daniel R.; Arumugam, Manimozhiyan; Pan, Qi; Liu, Binghang; Qin, Junjie; Wang, Jun; Bork, Peer

    2012-01-01

    MOCAT is a highly configurable, modular pipeline for fast, standardized processing of single or paired-end sequencing data generated by the Illumina platform. The pipeline uses state-of-the-art programs to quality control, map, and assemble reads from metagenomic samples sequenced at a depth of several billion base pairs, and predict protein-coding genes on assembled metagenomes. Mapping against reference databases allows for read extraction or removal, as well as abundance calculations. Relevant statistics for each processing step can be summarized into multi-sheet Excel documents and queryable SQL databases. MOCAT runs on UNIX machines and integrates seamlessly with the SGE and PBS queuing systems, commonly used to process large datasets. The open source code and modular architecture allow users to modify or exchange the programs that are utilized in the various processing steps. Individual processing steps and parameters were benchmarked and tested on artificial, real, and simulated metagenomes resulting in an improvement of selected quality metrics. MOCAT can be freely downloaded at http://www.bork.embl.de/mocat/. PMID:23082188

  15. Forest harvesting reduces the soil metagenomic potential for biomass decomposition.

    PubMed

    Cardenas, Erick; Kranabetter, J M; Hope, Graeme; Maas, Kendra R; Hallam, Steven; Mohn, William W

    2015-11-01

    Soil is the key resource that must be managed to ensure sustainable forest productivity. Soil microbial communities mediate numerous essential ecosystem functions, and recent studies show that forest harvesting alters soil community composition. From a long-term soil productivity study site in a temperate coniferous forest in British Columbia, 21 forest soil shotgun metagenomes were generated, totaling 187 Gb. A method to analyze unassembled metagenome reads from the complex community was optimized and validated. The subsequent metagenome analysis revealed that, 12 years after forest harvesting, there were 16% and 8% reductions in relative abundances of biomass decomposition genes in the organic and mineral soil layers, respectively. Organic and mineral soil layers differed markedly in genetic potential for biomass degradation, with the organic layer having greater potential and being more strongly affected by harvesting. Gene families were disproportionately affected, and we identified 41 gene families consistently affected by harvesting, including families involved in lignin, cellulose, hemicellulose and pectin degradation. The results strongly suggest that harvesting profoundly altered below-ground cycling of carbon and other nutrients at this site, with potentially important consequences for forest regeneration. Thus, it is important to determine whether these changes foreshadow long-term changes in forest productivity or resilience and whether these changes are broadly characteristic of harvested forests.

  16. Functional metagenomic selection of RubisCOs from uncultivated bacteria

    USGS Publications Warehouse

    Varaljay, Vanessa A; Satagopan, Sriram; North, Justin A.; Witteveen, Briana; Dourado, Manuella N.; Anantharaman, Karthik; Arbing, Mark A.; McCann, Shelley; Oremland, Ronald S.; Banfield, Jillian F.; Wrighton, Kelly C.; Tabita, F. Robert

    2016-01-01

    Ribulose 1,5-bisphosphate carboxylase/oxygenase (RubisCO) is a critical yet severely inefficient enzyme that catalyses the fixation of virtually all of the carbon found on Earth. Here, we report a functional metagenomic selection that recovers physiologically active RubisCO molecules directly from uncultivated and largely unknown members of natural microbial communities. Selection is based on CO2-dependent growth in a host strain capable of expressing environmental deoxyribonucleic acid (DNA), precluding the need for pure cultures or screening of recombinant clones for enzymatic activity. Seventeen functional RubisCO-encoded sequences were selected using DNA extracted from soil and river autotrophic enrichments, a photosynthetic biofilm and a subsurface groundwater aquifer. Notably, three related form II RubisCOs were recovered which share high sequence similarity with metagenomic scaffolds from uncultivated members of theGallionellaceae family. One of the Gallionellaceae RubisCOs was purified and shown to possessCO2/O2 specificity typical of form II enzymes. X-ray crystallography determined that this enzyme is a hexamer, only the second form II multimer ever solved and the first RubisCO structure obtained from an uncultivated bacterium. Functional metagenomic selection leverages natural biological diversity and billions of years of evolution inherent in environmental communities, providing a new window into the discovery of CO2-fixing enzymes not previously characterized.

  17. Culture-independent discovery of natural products from soil metagenomes.

    PubMed

    Katz, Micah; Hover, Bradley M; Brady, Sean F

    2016-03-01

    Bacterial natural products have proven to be invaluable starting points in the development of many currently used therapeutic agents. Unfortunately, traditional culture-based methods for natural product discovery have been deemphasized by pharmaceutical companies due in large part to high rediscovery rates. Culture-independent, or "metagenomic," methods, which rely on the heterologous expression of DNA extracted directly from environmental samples (eDNA), have the potential to provide access to metabolites encoded by a large fraction of the earth's microbial biosynthetic diversity. As soil is both ubiquitous and rich in bacterial diversity, it is an appealing starting point for culture-independent natural product discovery efforts. This review provides an overview of the history of soil metagenome-driven natural product discovery studies and elaborates on the recent development of new tools for sequence-based, high-throughput profiling of environmental samples used in discovering novel natural product biosynthetic gene clusters. We conclude with several examples of these new tools being employed to facilitate the recovery of novel secondary metabolite encoding gene clusters from soil metagenomes and the subsequent heterologous expression of these clusters to produce bioactive small molecules.

  18. New viruses in veterinary medicine, detected by metagenomic approaches.

    PubMed

    Belák, Sándor; Karlsson, Oskar E; Blomström, Anne-Lie; Berg, Mikael; Granberg, Fredrik

    2013-07-26

    In our world, which is faced today with exceptional environmental changes and dramatically intensifying globalisation, we are encountering challenges due to many new factors, including the emergence or re-emergence of novel, so far "unknown" infectious diseases. Although a broad arsenal of diagnostic methods is at our disposal, the majority of the conventional diagnostic tests is highly virus-specific or is targeted entirely towards a limited group of infectious agents. This specificity complicates or even hinders the detection of new or unexpected pathogens, such as new, emerging or re-emerging viruses or novel viral variants. The recently developed approaches of viral metagenomics provide an effective novel way to screen samples and detect viruses without previous knowledge of the infectious agent, thereby enabling a better diagnosis and disease control, in line with the "One World, One Health" principles (www.oneworldonehealth.org). Using metagenomic approaches, we have recently identified a broad variety of new viruses, such as novel bocaviruses, Torque Teno viruses, astroviruses, rotaviruses and kobuviruses in porcine disease syndromes, new virus variants in honeybee populations, as well as a range of other infectious agents in further host species. These findings indicate that the metagenomic detection of viral pathogens is becoming now a powerful, cultivation-independent, and useful novel diagnostic tool in veterinary diagnostic virology.

  19. Metagenomic Analysis of a Complex Community Present in Pond Sediment

    PubMed Central

    Negi, Vivek; Lal, Rup

    2017-01-01

    The metagenomic profiling of complex communities is gaining immense interest across the scientific community. A complex community present in the pond sediment of a water body located close to a hexachlorocyclohexane (HCH) production site of the Indian Pesticide Limited (IPL) (Chinhat, Lucknow) was selected in an attempt to identify and analyze the unique microbial diversity and functional profile of the site. In this study, we supplement the metagenomic study of pond sediment with a variety of binning approaches along with an in depth functional analysis. Our results improve the understanding of ecology, in terms of community dynamics. The findings are crucial with respect to the mechanisms such as those involving the lin group of genes that are known to be implicated in the HCH degradation pathway or the Type VI secretory system (T6SS) and its effector molecules. Metagenomic studies using the comparative genomics approach involving the isolates from adjacent HCH contaminated soils have contributed significantly towards improving our understanding of unexplored concepts, while simultaneously uncovering the novel mechanisms of microbial ecology. PMID:28348642

  20. Isolation and biochemical characterization of a glucose dehydrogenase from a hay infusion metagenome.

    PubMed

    Basner, Alexander; Antranikian, Garabed

    2014-01-01

    Glucose hydrolyzing enzymes are essential to determine blood glucose level. A high-throughput screening approach was established to identify NAD(P)-dependent glucose dehydrogenases for the application in test stripes and the respective blood glucose meters. In the current report a glucose hydrolyzing enzyme, derived from a metagenomic library by expressing recombinant DNA fragments isolated from hay infusion, was characterized. The recombinant clone showing activity on glucose as substrate exhibited an open reading frame of 987 bp encoding for a peptide of 328 amino acids. The isolated enzyme showed typical sequence motifs of short-chain-dehydrogenases using NAD(P) as a co-factor and had a sequence similarity between 33 and 35% to characterized glucose dehydrogenases from different Bacillus species. The identified glucose dehydrogenase gene was expressed in E. coli, purified and subsequently characterized. The enzyme, belonging to the superfamily of short-chain dehydrogenases, shows a broad substrate range with a high affinity to glucose, xylose and glucose-6-phosphate. Due to its ability to be strongly associated with its cofactor NAD(P), the enzyme is able to directly transfer electrons from glucose oxidation to external electron acceptors by regenerating the cofactor while being still associated to the protein.

  1. [Diversity of polyketide synthase genes (PKS) in metagenomic community of the freshwater sponge].

    PubMed

    Kaliuzhnaia, O V; Kulakova, N V; Itskovich, B V

    2012-01-01

    Screening of metagenomic DNA of microbial community, associated with Baikalian sponge Lubomirskia baicalensis, was made to show the presence of polyketide synthase genes (PKS). PKS enzymatic systems take part in synthesis of a great number of biologically-active substances. Cloning and sequencing of amplified products of the ketosynthase domain section of PKS gene cluster has revealed 15 fragments of PKS genes differing from each other's on 35-65% by aminoacid sequences. BLASTX analysis has shown that all these sequences belong to the KS-domains identified in various groups of microorganisms: alpha-, beta-, delta-Proteobacteria, Verrucomicrobia, Cyanobacteria, Chlorophyta. Some sequences were related to the genes which are taking part in biosynthesis of curacin A (CurI, CurJ), stigmatellin (StiC, StiG), nostophycin (NpnB), and cryptophycins (CrpB). The homology of the found sequences with those of the EMBL database varies within 50-82% confirming the presence in fresh-water sponge community the genes for synthesis of the new, yet not studied polyketide substances, possessing the biotechnological potential.

  2. Cultivation-independent characterization of 'Candidatus Magnetobacterium bavaricum' via ultrastructural, geochemical, ecological and metagenomic methods.

    PubMed

    Jogler, C; Niebler, M; Lin, W; Kube, M; Wanner, G; Kolinko, S; Stief, P; Beck, A J; De Beer, D; Petersen, N; Pan, Y; Amann, R; Reinhardt, R; Schüler, D

    2010-09-01

    'Candidatus Magnetobacterium bavaricum' is unusual among magnetotactic bacteria (MTB) in terms of cell size (8-10 µm long, 1.5-2 µm in diameter), cell architecture, magnetotactic behaviour and its distinct phylogenetic position in the deep-branching Nitrospira phylum. In the present study, improved magnetic enrichment techniques permitted high-resolution scanning electron microscopy and energy dispersive X-ray analysis, which revealed the intracellular organization of the magnetosome chains. Sulfur globule accumulation in the cytoplasm point towards a sulfur-oxidizing metabolism of 'Candidatus M. bavaricum'. Detailed analysis of 'Candidatus M. bavaricum' microhabitats revealed more complex distribution patterns than previously reported, with cells predominantly found in low oxygen concentration. No correlation to other geochemical parameters could be observed. In addition, the analysis of a metagenomic fosmid library revealed a 34 kb genomic fragment, which contains 33 genes, among them the complete rRNA gene operon of 'Candidatus M. bavaricum' as well as a gene encoding a putative type IV RubisCO large subunit.

  3. Isolation and Biochemical Characterization of a Glucose Dehydrogenase from a Hay Infusion Metagenome

    PubMed Central

    Basner, Alexander; Antranikian, Garabed

    2014-01-01

    Glucose hydrolyzing enzymes are essential to determine blood glucose level. A high-throughput screening approach was established to identify NAD(P)-dependent glucose dehydrogenases for the application in test stripes and the respective blood glucose meters. In the current report a glucose hydrolyzing enzyme, derived from a metagenomic library by expressing recombinant DNA fragments isolated from hay infusion, was characterized. The recombinant clone showing activity on glucose as substrate exhibited an open reading frame of 987 bp encoding for a peptide of 328 amino acids. The isolated enzyme showed typical sequence motifs of short-chain-dehydrogenases using NAD(P) as a co-factor and had a sequence similarity between 33 and 35% to characterized glucose dehydrogenases from different Bacillus species. The identified glucose dehydrogenase gene was expressed in E. coli, purified and subsequently characterized. The enzyme, belonging to the superfamily of short-chain dehydrogenases, shows a broad substrate range with a high affinity to glucose, xylose and glucose-6-phosphate. Due to its ability to be strongly associated with its cofactor NAD(P), the enzyme is able to directly transfer electrons from glucose oxidation to external electron acceptors by regenerating the cofactor while being still associated to the protein. PMID:24454935

  4. Diversity of putative archaeal RNA viruses in metagenomic datasets of a yellowstone acidic hot spring.

    PubMed

    Wang, Hongming; Yu, Yongxin; Liu, Taigang; Pan, Yingjie; Yan, Shuling; Wang, Yongjie

    2015-01-01

    Two genomic fragments (5,662 and 1,269 nt in size, GenBank accession no. JQ756122 and JQ756123, respectively) of novel, positive-strand RNA viruses that infect archaea were first discovered in an acidic hot spring in Yellowstone National Park (Bolduc et al., 2012). To investigate the diversity of these newly identified putative archaeal RNA viruses, global metagenomic datasets were searched for sequences that were significantly similar to those of the viruses. A total of 3,757 associated reads were retrieved solely from the Yellowstone datasets and were used to assemble the genomes of the putative archaeal RNA viruses. Nine contigs with lengths ranging from 417 to 5,866 nt were obtained, 4 of which were longer than 2,200 nt; one contig was 204 nt longer than JQ756122, representing the longest genomic sequence of the putative archaeal RNA viruses. These contigs revealed more than 50% sequence similarity to JQ756122 or JQ756123 and may be partial or nearly complete genomes of novel genogroups or genotypes of the putative archaeal RNA viruses. Sequence and phylogenetic analyses indicated that the archaeal RNA viruses are genetically diverse, with at least 3 related viral lineages in the Yellowstone acidic hot spring environment.

  5. Gram negative shuttle BAC vector for heterologous expression of metagenomic libraries.

    PubMed

    Kakirde, Kavita S; Wild, Jadwiga; Godiska, Ronald; Mead, David A; Wiggins, Andrew G; Goodman, Robert M; Szybalski, Waclaw; Liles, Mark R

    2011-04-15

    Bacterial artificial chromosome (BAC) vectors enable stable cloning of large DNA fragments from single genomes or microbial assemblages. A novel shuttle BAC vector was constructed that permits replication of BAC clones in diverse Gram-negative species. The "Gram-negative shuttle BAC" vector (pGNS-BAC) uses the F replicon for stable single-copy replication in E. coli and the broad-host-range RK2 mini-replicon for high-copy replication in diverse Gram-negative bacteria. As with other BAC vectors containing the oriV origin, this vector is capable of an arabinose-inducible increase in plasmid copy number. Resistance to both gentamicin and chloramphenicol is encoded on pGNS-BAC, permitting selection for the plasmid in diverse bacterial species. The oriT from an IncP plasmid was cloned into pGNS-BAC to enable conjugal transfer, thereby allowing both electroporation and conjugation of pGNS-BAC DNA into bacterial hosts. A soil metagenomic library was constructed in pGNS-BAC-1 (the first version of the vector, lacking gentamicin resistance and oriT), and recombinant clones were demonstrated to replicate in diverse Gram-negative hosts, including Escherichia coli, Pseudomonas spp., Salmonella enterica, Serratia marcescens, Vibrio vulnificus and Enterobacter nimipressuralis. This shuttle BAC vector can be utilized to clone genomic DNA from diverse sources, and then transfer it into diverse Gram-negative bacterial species to facilitate heterologous expression of recombinant pathways.

  6. IMG/M: A data management and analysis system for metagenomes

    SciTech Connect

    Markowitz, Victor M.; Ivanova, Natalia N.; Szeto, Ernest; Palaniappan, Krishna; Chu, Ken; Dalevi, Daniel; Chen, I-Min A.; Grechkin,Yuri; Dubchak,Inna; Anderson, Iain; Lykidis, Athanasios; Mavromatis,Konstantinos; Hug enholtz, Phil; Kyrpides, Nikos C.

    2007-08-01

    IMG/M is a data management and analysis system for microbial community genomes (metagenomes) hosted at the Joint Genome Institute (JGI). IMG/M consists of metagenome data integrated with isolate microbial genomes from the Integrated Microbial Genomes (IMG) system. IMG/M provides IMG's comparative data analysis tools extended to handle metagenome data, together with metagenome-specific analysis tools. IMG/M is available at http://img.jgi.doe.gov/m. Studies of the collective genomes (also known as metagenomes) of environmental microbial communities (also known as microbiomes) are expected to lead to advances in environmental cleanup, agriculture, industrial processes, alternative energy production, and human health (1). Metagenomes of specific microbiome samples are sequenced by organizations worldwide, such as the Department of Energy's (DOE) Joint Genome Institute (JGI), the Venter Institute and the Washington University in St. Louis using different sequencing strategies, technology platforms, and annotation procedures. According to the Genomes OnLine Database, about 28 metagenome studies have been published to date, with over 60 other projects ongoing and more in the process of being launched (2). The Department of Energy's (DOE) Joint Genome Institute (JGI) is one of the major contributors of metagenome sequence data, currently sequencing more than 50% of the reported metagenome projects worldwide. Due to the higher complexity, inherent incompleteness, and lower quality of metagenome sequence data, traditional assembly, gene prediction, and annotation methods do not perform on these datasets as well as they do on isolate microbial genome sequences (3, 4). In spite of these limitations, metagenome data are amenable to a variety of analyses, as illustrated by several recent studies (5-10). Metagenome data analysis is usually set up in the context of reference isolate genomes and considers the questions of composition and functional or metabolic potential of

  7. Deconvoluting simulated metagenomes: the performance of hard- and soft- clustering algorithms applied to metagenomic chromosome conformation capture (3C)

    PubMed Central

    DeMaere, Matthew Z.

    2016-01-01

    Background Chromosome conformation capture, coupled with high throughput DNA sequencing in protocols like Hi-C and 3C-seq, has been proposed as a viable means of generating data to resolve the genomes of microorganisms living in naturally occuring environments. Metagenomic Hi-C and 3C-seq datasets have begun to emerge, but the feasibility of resolving genomes when closely related organisms (strain-level diversity) are present in the sample has not yet been systematically characterised. Methods We developed a computational simulation pipeline for metagenomic 3C and Hi-C sequencing to evaluate the accuracy of genomic reconstructions at, above, and below an operationally defined species boundary. We simulated datasets and measured accuracy over a wide range of parameters. Five clustering algorithms were evaluated (2 hard, 3 soft) using an adaptation of the extended B-cubed validation measure. Results When all genomes in a sample are below 95% sequence identity, all of the tested clustering algorithms performed well. When sequence data contains genomes above 95% identity (our operational definition of strain-level diversity), a naive soft-clustering extension of the Louvain method achieves the highest performance. Discussion Previously, only hard-clustering algorithms have been applied to metagenomic 3C and Hi-C data, yet none of these perform well when strain-level diversity exists in a metagenomic sample. Our simple extension of the Louvain method performed the best in these scenarios, however, accuracy remained well below the levels observed for samples without strain-level diversity. Strain resolution is also highly dependent on the amount of available 3C sequence data, suggesting that depth of sequencing must be carefully considered during experimental design. Finally, there appears to be great scope to improve the accuracy of strain resolution through further algorithm development. PMID:27843713

  8. Metagenome of the Mediterranean deep chlorophyll maximum studied by direct and fosmid library 454 pyrosequencing.

    PubMed

    Ghai, Rohit; Martin-Cuadrado, Ana-Belén; Molto, Aitor Gonzaga; Heredia, Inmaculada García; Cabrera, Raúl; Martin, Javier; Verdú, Miguel; Deschamps, Philippe; Moreira, David; López-García, Purificación; Mira, Alex; Rodriguez-Valera, Francisco

    2010-09-01

    The deep chlorophyll maximum (DCM) is a zone of maximal photosynthetic activity, generally located toward the base of the photic zone in lakes and oceans. In the tropical waters, this is a permanent feature, but in the Mediterranean and other temperate waters, the DCM is a seasonal phenomenon. The metagenome from a single sample of a mature Mediterranean DCM community has been 454 pyrosequenced both directly and after cloning in fosmids. This study is the first to be carried out at this sequencing depth (ca. 600 Mb combining direct and fosmid sequencing) at any DCM. Our results indicate a microbial community massively dominated by the high-light-adapted Prochlorococcus marinus subsp. pastoris, Synechococcus sp., and the heterotroph Candidatus Pelagibacter. The sequences retrieved were remarkably similar to the existing genome of P. marinus subsp. pastoris with a nucleotide identity over 98%. Besides, we found a large number of cyanophages that could prey on this microbe, although sequence conservation was much lower. The high abundance of phage sequences in the cellular size fraction indicated a remarkably high proportion of cells suffering phage lytic attack. In addition, several fosmids clearly belonging to Group II Euryarchaeota were retrieved and recruited many fragments from the total direct DNA sequencing suggesting that this group might be quite abundant in this habitat. The comparison between the direct and fosmids sequencing revealed a bias in the fosmid libraries against low-GC DNA and specifically against the two most dominant members of the community, Candidatus Pelagibacter and P. marinus subsp. pastoris, thus unexpectedly providing a feasible method to obtain large genomic fragments from other less prevalent members of this community.

  9. FAMeS: Fidelity of Analysis of Metagenomic Samples

    DOE Data Explorer

    Metagenomics is a rapidly emerging field of research for studying microbial communities. To evaluate methods currently used to process metagenomic sequences, simulated datasets of varying complexity were constructed by combining sequencing reads randomly selected from 113 isolate genomes. These datasets were designed to model real metagenomes in terms of complexity and phylogenetic composition. Assembly, gene prediction and binning, employing methods commonly used for the analysis of metagenomic datasets at the DOE JGI, were performed. This site provides access to the simulated datasets, and aims to facilitate standardized benchmarking of tools for metagenomic analysis. FAMeS now hosts data coming from a comprehensive study of methodologies used to create OTUs from 16S rRNA targeted studies of microbial communities. Studies of phylogenetic markers at the molecular level have revealed a vast biodiversity of microorganisms living in the sea, land, and even within the human body. Microbial diversity studies of uncharacterized environments typically seek to estimate the richness and diversity of endemic microflora using a 16S rRNA gene sequencing approach. When most of the species in an environment are unknown and cannot be classified through a database search, researchers cluster 16S sequences into operational taxonomic units (OTUs) or phylotypes, thereby providing an estimate of population structure. Using real 16S sequence data, we have performed a critical analysis of OTU clustering methodologies to assess the potential variability in OTU quality. FAMeS provides the sequence data, taxonomic information, multiple sequence alignments, and distance matrices used and described in the core paper, as well as compiled results of more than 700 unique OTU methods. [The above was copied from the FAMeS home page at http://fames.jgi-psf.org/] The core paper behind FAMeS is: Konstantinos Mavromatis, Natalia Ivanova, Kerrie Barry, Harris Shapiro, Eugene Goltsman, Alice C Mc

  10. Metagenomics of the subsurface Brazos-Trinity Basin (IODP site 1320): comparison with other sediment and pyrosequenced metagenomes.

    PubMed

    Biddle, Jennifer F; White, James Robert; Teske, Andreas P; House, Christopher H

    2011-06-01

    The Brazos-Trinity Basin on the slope of the Gulf of Mexico passive margin was drilled during Integrated Ocean Drilling Progam Expedition 308. The buried anaerobic sediments of this basin are largely organic-poor and have few microbial inhabitants compared with the organic-rich sediments with high cell counts from the Peru Margin that were drilled during Ocean Drilling Program Leg 201. Nucleic acids were extracted from Brazos-Trinity Basin sediments and were subjected to whole-genome amplification and pyrosequencing. A comparison of the Brazos-Trinity Basin metagenome, consisting of 105 Mbp, and the existing Peru Margin metagenome revealed trends linking gene content, phylogenetic content, geological location and geochemical regime. The major microbial groups (Proteobacteria, Firmicutes, Euryarchaeota and Chloroflexi) occur consistently throughout all samples, yet their shifting abundances allow for discrimination between samples. The cluster of orthologous groups category abundances for some classes of genes are correlated with geochemical factors, such as the level of ammonia. Here we describe the sediment metagenome from the oligotrophic Brazos-Trinity Basin (Site 1320) and show similarities and differences with the dataset from the Pacific Peru Margin (Site 1229) and other pyrosequenced datasets. The microbial community found at Integrated Ocean Drilling Program Site 1320 likely represents the subsurface microbial inhabitants of turbiditic slopes that lack substantial upwelling.

  11. Characterization of the Gut Microbiome Using 16S or Shotgun Metagenomics

    PubMed Central

    Jovel, Juan; Patterson, Jordan; Wang, Weiwei; Hotte, Naomi; O'Keefe, Sandra; Mitchel, Troy; Perry, Troy; Kao, Dina; Mason, Andrew L.; Madsen, Karen L.; Wong, Gane K.-S.

    2016-01-01

    The advent of next generation sequencing (NGS) has enabled investigations of the gut microbiome with unprecedented resolution and throughput. This has stimulated the development of sophisticated bioinformatics tools to analyze the massive amounts of data generated. Researchers therefore need a clear understanding of the key concepts required for the design, execution and interpretation of NGS experiments on microbiomes. We conducted a literature review and used our own data to determine which approaches work best. The two main approaches for analyzing the microbiome, 16S ribosomal RNA (rRNA) gene amplicons and shotgun metagenomics, are illustrated with analyses of libraries designed to highlight their strengths and weaknesses. Several methods for taxonomic classification of bacterial sequences are discussed. We present simulations to assess the number of sequences that are required to perform reliable appraisals of bacterial community structure. To the extent that fluctuations in the diversity of gut bacterial populations correlate with health and disease, we emphasize various techniques for the analysis of bacterial communities within samples (α-diversity) and between samples (β-diversity). Finally, we demonstrate techniques to infer the metabolic capabilities of a bacteria community from these 16S and shotgun data. PMID:27148170

  12. A Possible Novel Prosthetic Joint Infection Pathogen, Mycoplasma salivarium, Identified by Metagenomic Shotgun Sequencing.

    PubMed

    Thoendel, Matthew; Jeraldo, Patricio; Greenwood-Quaintance, Kerryl E; Chia, Nicholas; Abdel, Matthew P; Steckelberg, James M; Osmon, Douglas R; Patel, Robin

    2017-04-01

    Defining the microbial etiology of culture-negative prosthetic joint infection (PJI) can be challenging. Metagenomic shotgun sequencing is a new tool to identify organisms undetected by conventional methods. We present a case where metagenomics was used to identify Mycoplasma salivarium as a novel PJI pathogen in a hypogammaglobulinemic individual.

  13. BioCreative Workshops for DOE Genome Sciences: Text Mining for Metagenomics

    SciTech Connect

    Wu, Cathy H.; Hirschman, Lynette

    2016-10-29

    The objective of this project was to host BioCreative workshops to define and develop text mining tasks to meet the needs of the Genome Sciences community, focusing on metadata information extraction in metagenomics. Following the successful introduction of metagenomics at the BioCreative IV workshop, members of the metagenomics community and BioCreative communities continued discussion to identify candidate topics for a BioCreative metagenomics track for BioCreative V. Of particular interest was the capture of environmental and isolation source information from text. The outcome was to form a “community of interest” around work on the interactive EXTRACT system, which supported interactive tagging of environmental and species data. This experiment is included in the BioCreative V virtual issue of Database. In addition, there was broad participation by members of the metagenomics community in the panels held at BioCreative V, leading to valuable exchanges between the text mining developers and members of the metagenomics research community. These exchanges are reflected in a number of the overview and perspective pieces also being captured in the BioCreative V virtual issue. Overall, this conversation has exposed the metagenomics researchers to the possibilities of text mining, and educated the text mining developers to the specific needs of the metagenomics community.

  14. The great screen anomaly--a new frontier in product discovery through functional metagenomics.

    PubMed

    Ekkers, David Matthias; Cretoiu, Mariana Silvia; Kielak, Anna Maria; Elsas, Jan Dirk van

    2012-02-01

    Functional metagenomics, the study of the collective genome of a microbial community by expressing it in a foreign host, is an emerging field in biotechnology. Over the past years, the possibility of novel product discovery through metagenomics has developed rapidly. Thus, metagenomics has been heralded as a promising mining strategy of resources for the biotechnological and pharmaceutical industry. However, in spite of innovative work in the field of functional genomics in recent years, yields from function-based metagenomics studies still fall short of producing significant amounts of new products that are valuable for biotechnological processes. Thus, a new set of strategies is required with respect to fostering gene expression in comparison to the traditional work. These new strategies should address a major issue, that is, how to successfully express a set of unknown genes of unknown origin in a foreign host in high throughput. This article is an opinionating review of functional metagenomic screening of natural microbial communities, with a focus on the optimization of new product discovery. It first summarizes current major bottlenecks in functional metagenomics and then provides an overview of the general metagenomic assessment strategies, with a focus on the challenges that are met in the screening for, and selection of, target genes in metagenomic libraries. To identify possible screening limitations, strategies to achieve optimal gene expression are reviewed, examining the molecular events all the way from the transcription level through to the secretion of the target gene product.

  15. Metagenomes obtained by 'deep sequencing' - what do they tell about the enhanced biological phosphorus removal communities?

    PubMed

    Albertsen, Mads; Saunders, Aaron M; Nielsen, Kåre L; Nielsen, Per H

    2013-01-01

    Metagenomics enables studies of the genomic potential of complex microbial communities by sequencing bulk genomic DNA directly from the environment. Knowledge of the genetic potential of a community can be used to formulate and test ecological hypotheses about stability and performance. In this study deep metagenomics and fluorescence in situ hybridization (FISH) were used to study a full-scale wastewater treatment plant with enhanced biological phosphorus removal (EBPR), and the results were compared to an existing EBPR metagenome. EBPR is a widely used process that relies on a complex community of microorganisms to function properly. Insight into community and species level stability and dynamics is valuable for knowledge-driven optimization of the EBPR process. The metagenomes of the EBPR communities were distinct compared to metagenomes of communities from a wide range of other environments, which could be attributed to selection pressures of the EBPR process. The metabolic potential of one of the key microorganisms in the EPBR process, Accumulibacter, was investigated in more detail in the two plants, revealing a potential importance of phage predation on the dynamics of Accumulibacter populations. The results demonstrate that metagenomics can be used as a powerful tool for system wide characterization of the EBPR community as well as for a deeper understanding of the function of specific community members. Furthermore, we discuss and illustrate some of the general pitfalls in metagenomics and stress the need of additional DNA extraction independent information in metagenome studies.

  16. Metagenome and metatranscriptome data for Rifle CMT-03 laboratory microcosm experiment completed in April 2014

    DOE Data Explorer

    Jewell, Talia [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Karaoz, Ulas [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Bill, Markus [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Chakraborty, Romy [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Brodie, Eoin L [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Williams, Kenneth Hurst [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Beller, Harry R [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)

    2014-04-01

    Sediment samples were collected during installation of monitoring borehole CMT-03. Microcosms were constructed and inoculated under anerobic conditions with these sediments and anaerobic Rifle artificial groundwater. Microcosm metagenomes and metatranscriptomes were sampled every 5 days for a period of 20 days. The dataset gives gene-level annotations, binning, metagenomic and metatranscriptomic coverages for these microcosms.

  17. New Scalings in Nuclear Fragmentation

    SciTech Connect

    Bonnet, E.; Bougault, R.; Galichet, E.; Gagnon-Moisan, F.; Guinet, D.; Lautesse, P.; Marini, P.; Parlog, M.

    2010-10-01

    Fragment partitions of fragmenting hot nuclei produced in central and semiperipheral collisions have been compared in the excitation energy region 4-10 MeV per nucleon where radial collective expansion takes place. It is shown that, for a given total excitation energy per nucleon, the amount of radial collective energy fixes the mean fragment multiplicity. It is also shown that, at a given total excitation energy per nucleon, the different properties of fragment partitions are completely determined by the reduced fragment multiplicity (i.e., normalized to the source size). Freeze-out volumes seem to play a role in the scalings observed.

  18. Metagenomic Insights into the Uncultured Diversity and Physiology of Microbes in Four Hypersaline Soda Lake Brines

    PubMed Central

    Vavourakis, Charlotte D.; Ghai, Rohit; Rodriguez-Valera, Francisco; Sorokin, Dimitry Y.; Tringe, Susannah G.; Hugenholtz, Philip; Muyzer, Gerard

    2016-01-01

    Soda lakes are salt lakes with a naturally alkaline pH due to evaporative concentration of sodium carbonates in the absence of major divalent cations. Hypersaline soda brines harbor microbial communities with a high species- and strain-level archaeal diversity and a large proportion of still uncultured poly-extremophiles compared to neutral brines of similar salinities. We present the first “metagenomic snapshots” of microbial communities thriving in the brines of four shallow soda lakes from the Kulunda Steppe (Altai, Russia) covering a salinity range from 170 to 400 g/L. Both amplicon sequencing of 16S rRNA fragments and direct metagenomic sequencing showed that the top-level taxa abundance was linked to the ambient salinity: Bacteroidetes, Alpha-, and Gamma-proteobacteria were dominant below a salinity of 250 g/L, Euryarchaeota at higher salinities. Within these taxa, amplicon sequences related to Halorubrum, Natrinema, Gracilimonas, purple non-sulfur bacteria (Rhizobiales, Rhodobacter, and Rhodobaca) and chemolithotrophic sulfur oxidizers (Thioalkalivibrio) were highly abundant. Twenty-four draft population genomes from novel members and ecotypes within the Nanohaloarchaea, Halobacteria, and Bacteroidetes were reconstructed to explore their metabolic features, environmental abundance and strategies for osmotic adaptation. The Halobacteria- and Bacteroidetes-related draft genomes belong to putative aerobic heterotrophs, likely with the capacity to ferment sugars in the absence of oxygen. Members from both taxonomic groups are likely involved in primary organic carbon degradation, since some of the reconstructed genomes encode the ability to hydrolyze recalcitrant substrates, such as cellulose and chitin. Putative sodium-pumping rhodopsins were found in both a Flavobacteriaceae- and a Chitinophagaceae-related draft genome. The predicted proteomes of both the latter and a Rhodothermaceae-related draft genome were indicative of a “salt-in” strategy of

  19. Metagenomic Insights into the Uncultured Diversity and Physiology of Microbes in Four Hypersaline Soda Lake Brines.

    PubMed

    Vavourakis, Charlotte D; Ghai, Rohit; Rodriguez-Valera, Francisco; Sorokin, Dimitry Y; Tringe, Susannah G; Hugenholtz, Philip; Muyzer, Gerard

    2016-01-01

    Soda lakes are salt lakes with a naturally alkaline pH due to evaporative concentration of sodium carbonates in the absence of major divalent cations. Hypersaline soda brines harbor microbial communities with a high species- and strain-level archaeal diversity and a large proportion of still uncultured poly-extremophiles compared to neutral brines of similar salinities. We present the first "metagenomic snapshots" of microbial communities thriving in the brines of four shallow soda lakes from the Kulunda Steppe (Altai, Russia) covering a salinity range from 170 to 400 g/L. Both amplicon sequencing of 16S rRNA fragments and direct metagenomic sequencing showed that the top-level taxa abundance was linked to the ambient salinity: Bacteroidetes, Alpha-, and Gamma-proteobacteria were dominant below a salinity of 250 g/L, Euryarchaeota at higher salinities. Within these taxa, amplicon sequences related to Halorubrum, Natrinema, Gracilimonas, purple non-sulfur bacteria (Rhizobiales, Rhodobacter, and Rhodobaca) and chemolithotrophic sulfur oxidizers (Thioalkalivibrio) were highly abundant. Twenty-four draft population genomes from novel members and ecotypes within the Nanohaloarchaea, Halobacteria, and Bacteroidetes were reconstructed to explore their metabolic features, environmental abundance and strategies for osmotic adaptation. The Halobacteria- and Bacteroidetes-related draft genomes belong to putative aerobic heterotrophs, likely with the capacity to ferment sugars in the absence of oxygen. Members from both taxonomic groups are likely involved in primary organic carbon degradation, since some of the reconstructed genomes encode the ability to hydrolyze recalcitrant substrates, such as cellulose and chitin. Putative sodium-pumping rhodopsins were found in both a Flavobacteriaceae- and a Chitinophagaceae-related draft genome. The predicted proteomes of both the latter and a Rhodothermaceae-related draft genome were indicative of a "salt-in" strategy of osmotic

  20. Fragmentation of random trees

    NASA Astrophysics Data System (ADS)

    Kalay, Z.; Ben-Naim, E.

    2015-01-01

    We study fragmentation of a random recursive tree into a forest by repeated removal of nodes. The initial tree consists of N nodes and it is generated by sequential addition of nodes with each new node attaching to a randomly-selected existing node. As nodes are removed from the tree, one at a time, the tree dissolves into an ensemble of separate trees, namely, a forest. We study statistical properties of trees and nodes in this heterogeneous forest, and find that the fraction of remaining nodes m characterizes the system in the limit N\\to ∞ . We obtain analytically the size density {{φ }s} of trees of size s. The size density has power-law tail {{φ }s}˜ {{s}-α } with exponent α =1+\\frac{1}{m}. Therefore, the tail becomes steeper as further nodes are removed, and the fragmentation process is unusual in that exponent α increases continuously with time. We also extend our analysis to the case where nodes are added as well as removed, and obtain the asymptotic size density for growing trees.

  1. Fragmentation of random trees

    NASA Astrophysics Data System (ADS)

    Kalay, Ziya; Ben-Naim, Eli

    2015-03-01

    We investigate the fragmentation of a random recursive tree by repeated removal of nodes, resulting in a forest of disjoint trees. The initial tree is generated by sequentially attaching new nodes to randomly chosen existing nodes until the tree contains N nodes. As nodes are removed, one at a time, the tree dissolves into an ensemble of separate trees, namely a forest. We study the statistical properties of trees and nodes in this heterogeneous forest. In the limit N --> ∞ , we find that the system is characterized by a single parameter: the fraction of remaining nodes m. We obtain analytically the size density ϕs of trees of size s, which has a power-law tail ϕs ~s-α , with exponent α = 1 + 1 / m . Therefore, the tail becomes steeper as further nodes are removed, producing an unusual scaling exponent that increases continuously with time. Furthermore, we investigate the fragment size distribution in a growing tree, where nodes are added as well as removed, and find that the distribution for this case is much narrower.

  2. MetaMLST: multi-locus strain-level bacterial typing from metagenomic samples

    PubMed Central

    Zolfo, Moreno; Tett, Adrian; Jousson, Olivier; Donati, Claudio; Segata, Nicola

    2017-01-01

    Metagenomic characterization of microbial communities has the potential to become a tool to identify pathogens in human samples. However, software tools able to extract strain-level typing information from metagenomic data are needed. Low-throughput molecular typing schema such as Multilocus Sequence Typing (MLST) are still widely used and provide a wealth of strain-level information that is currently not exploited by metagenomic methods. We introduce MetaMLST, a software tool that reconstructs the MLST loci of microorganisms present in microbial communities from metagenomic data. Tested on synthetic and spiked-in real metagenomes, the pipeline was able to reconstruct the MLST sequences with >98.5% accuracy at coverages as low as 1×. On real samples, the pipeline showed higher sensitivity than assembly-based approaches and it proved successful in identifying strains in epidemic outbreaks as well as in intestinal, skin and gastrointestinal microbiome samples. PMID:27651451

  3. Heterologous viral expression systems in fosmid vectors increase the functional analysis potential of metagenomic libraries

    PubMed Central

    Terrón-González, L.; Medina, C.; Limón-Mortés, M. C.; Santero, E.

    2013-01-01

    The extraordinary potential of metagenomic functional analyses to identify activities of interest present in uncultured microorganisms has been limited by reduced gene expression in surrogate hosts. We have developed vectors and specialized E. coli strains as improved metagenomic DNA heterologous expression systems, taking advantage of viral components that prevent transcription termination at metagenomic terminators. One of the systems uses the phage T7 RNA-polymerase to drive metagenomic gene expression, while the other approach uses the lambda phage transcription anti-termination protein N to limit transcription termination. A metagenomic library was constructed and functionally screened to identify genes conferring carbenicillin resistance to E. coli. The use of these enhanced expression systems resulted in a 6-fold increase in the frequency of carbenicillin resistant clones. Subcloning and sequence analysis showed that, besides β-lactamases, efflux pumps are not only able contribute to carbenicillin resistance but may in fact be sufficient by themselves to convey carbenicillin resistance. PMID:23346364

  4. Current opportunities and challenges in microbial metagenome analysis—a bioinformatic perspective

    PubMed Central

    Teeling, Hanno

    2012-01-01

    Metagenomics has become an indispensable tool for studying the diversity and metabolic potential of environmental microbes, whose bulk is as yet non-cultivable. Continual progress in next-generation sequencing allows for generating increasingly large metagenomes and studying multiple metagenomes over time or space. Recently, a new type of holistic ecosystem study has emerged that seeks to combine metagenomics with biodiversity, meta-expression and contextual data. Such ‘ecosystems biology’ approaches bear the potential to not only advance our understanding of environmental microbes to a new level but also impose challenges due to increasing data complexities, in particular with respect to bioinformatic post-processing. This mini review aims to address selected opportunities and challenges of modern metagenomics from a bioinformatics perspective and hopefully will serve as a useful resource for microbial ecologists and bioinformaticians alike. PMID:22966151

  5. Introduction to Metagenomics at DOE JGI: Program Overview and Program Informatics (Metagenomics Informatics Challenges Workshop: 10K Genomes at a Time)

    ScienceCinema

    Tringe, Susannah [DOE JGI

    2016-07-12

    Susannah Tringe of the DOE Joint Genome Institute talks about the Program Overview and Program Informatics at the Metagenomics Informatics Challenges Workshop held at the DOE JGI on October 12-13, 2011

  6. MG-Digger: An Automated Pipeline to Search for Giant Virus-Related Sequences in Metagenomes.

    PubMed

    Verneau, Jonathan; Levasseur, Anthony; Raoult, Didier; La Scola, Bernard; Colson, Philippe

    2016-01-01

    The number of metagenomic studies conducted each year is growing dramatically. Storage and analysis of such big data is difficult and time-consuming. Interestingly, analysis shows that environmental and human metagenomes include a significant amount of non-annotated sequences, representing a 'dark matter.' We established a bioinformatics pipeline that automatically detects metagenome reads matching query sequences from a given set and applied this tool to the detection of sequences matching large and giant DNA viral members of the proposed order Megavirales or virophages. A total of 1,045 environmental and human metagenomes (≈ 1 Terabase) were collected, processed, and stored on our bioinformatics server. In addition, nucleotide and protein sequences from 93 Megavirales representatives, including 19 giant viruses of amoeba, and 5 virophages, were collected. The pipeline was generated by scripts written in Python language and entitled MG-Digger. Metagenomes previously found to contain megavirus-like sequences were tested as controls. MG-Digger was able to annotate 100s of metagenome sequences as best matching those of giant viruses. These sequences were most often found to be similar to phycodnavirus or mimivirus sequences, but included reads related to recently available pandoraviruses, Pithovirus sibericum, and faustoviruses. Compared to other tools, MG-Digger combined stand-alone use on Linux or Windows operating systems through a user-friendly interface, implementation of ready-to-use customized metagenome databases and query sequence databases, adjustable parameters for BLAST searches, and creation of output files containing selected reads with best match identification. Compared to Metavir 2, a reference tool in viral metagenome analysis, MG-Digger detected 8% more true positive Megavirales-related reads in a control metagenome. The present work shows that massive, automated and recurrent analyses of metagenomes are effective in improving knowledge about the

  7. MG-Digger: An Automated Pipeline to Search for Giant Virus-Related Sequences in Metagenomes

    PubMed Central

    Verneau, Jonathan; Levasseur, Anthony; Raoult, Didier; La Scola, Bernard; Colson, Philippe

    2016-01-01

    The number of metagenomic studies conducted each year is growing dramatically. Storage and analysis of such big data is difficult and time-consuming. Interestingly, analysis shows that environmental and human metagenomes include a significant amount of non-annotated sequences, representing a ‘dark matter.’ We established a bioinformatics pipeline that automatically detects metagenome reads matching query sequences from a given set and applied this tool to the detection of sequences matching large and giant DNA viral members of the proposed order Megavirales or virophages. A total of 1,045 environmental and human metagenomes (≈ 1 Terabase) were collected, processed, and stored on our bioinformatics server. In addition, nucleotide and protein sequences from 93 Megavirales representatives, including 19 giant viruses of amoeba, and 5 virophages, were collected. The pipeline was generated by scripts written in Python language and entitled MG-Digger. Metagenomes previously found to contain megavirus-like sequences were tested as controls. MG-Digger was able to annotate 100s of metagenome sequences as best matching those of giant viruses. These sequences were most often found to be similar to phycodnavirus or mimivirus sequences, but included reads related to recently available pandoraviruses, Pithovirus sibericum, and faustoviruses. Compared to other tools, MG-Digger combined stand-alone use on Linux or Windows operating systems through a user-friendly interface, implementation of ready-to-use customized metagenome databases and query sequence databases, adjustable parameters for BLAST searches, and creation of output files containing selected reads with best match identification. Compared to Metavir 2, a reference tool in viral metagenome analysis, MG-Digger detected 8% more true positive Megavirales-related reads in a control metagenome. The present work shows that massive, automated and recurrent analyses of metagenomes are effective in improving knowledge about

  8. Managing microbial communities for sequentially reconstruct genomes from complex metagenomes

    NASA Astrophysics Data System (ADS)

    Delmont, Tom O.; Vogel, Timothy M.; Simonet, Pascal

    2013-04-01

    Global understanding on environmental microbial communities is currently limited by the bottleneck of genome reconstruction. Soil is a typical example where individual cells are currently mostly uncultured and metagenomic datasets unassembled. In this study, the microbial community composition of a natural grassland soil was managed under several controlled selective pressures to experiment a "multi-evenness" stratagem for sequentially attempt to reconstruct genomes from a complex metagenome. While lowly represented in the natural community, several newly dominant genomes (an enrichment attaining 105 in some cases) were successfully reconstructed under various "harsh" tested conditions. These genomes belong to several genera including (but not restricted to) Leifsonia, Rhodanobacter, Bacillus, Ktedonobacter, Xanthomonas, Streptomyces and Burkholderia. So far, from 10 to 78% of generated metagenomic datasets were reconstructed, so providing access to more than 88 000 genes of known or unknown functions and to their genetic environment. Adaptative genes directly related to selective pressures were found, mostly in large plasmids. Functions of potential industrial interest (e.g., novel polyketide synthase modules in Streptomyces) were also discovered. Furthermore, an important phage infection snapshot (>1500X of coverage for the most represented phage) was observed among the Streptomyces population (three distinct genomes reconstructed) of a particular enrichment (mercury, 0.02g/kg) during the fourth month of incubation. This "divide and conquer" strategy could be applied to other environments and using auxiliary sequencing approaches like single cell to detect, connect and mine taxa and functions of interest while creating an extensive set of reference genomes from across the planet. Next limit could turn out to become our imagination defining novel selective pressures to sequentially make dominant the 1030 cells of the biosphere.

  9. Diel Metagenomics and Metatranscriptomics of Elkhorn Slough Hypersaline Microbial Mat

    NASA Astrophysics Data System (ADS)

    Lee, J.; Detweiler, A. M.; Everroad, R. C.; Bebout, L. E.; Weber, P. K.; Pett-Ridge, J.; Bebout, B.

    2014-12-01

    To understand the variation in gene expression associated with the daytime oxygenic phototrophic and nighttime fermentation regimes seen in hypersaline microbial mats, a contiguous mat piece was subjected to sampling at regular intervals over a 24-hour diel period. Additionally, to understand the impact of sulfate reduction on biohydrogen consumption, molybdate was added to a parallel experiment in the same run. 4 metagenome and 12 metatranscriptome Illumina HiSeq lanes were completed over day / night, and control / molybdate experiments. Preliminary comparative examination of noon and midnight metatranscriptomic samples mapped using bowtie2 to reference genomes has revealed several notable results about the dominant mat-building cyanobacterium Microcoleus chthonoplastes PCC 7420. Dominant cyanobacterium M. chthonoplastes PCC 7420 shows expression in several pathways for nitrogen scavenging, including nitrogen fixation. Reads mapped to M. chthonoplastes PCC 7420 shows expression of two starch storage and utilization pathways, one as a starch-trehalose-maltose-glucose pathway, another through UDP-glucose-cellulose-β-1,4 glucan-glucose pathway. The overall trend of gene expression was primarily light driven up-regulation followed by down-regulation in dark, while much of the remaining expression profile appears to be constitutive. Co-assembly of quality-controlled reads from 4 metagenomes was performed using Ray Meta with progressively smaller K-mer sizes, with bins identified and filtered using principal component analysis of coverages from all libraries and a %GC filter, followed by reassembly of the remaining co-assembly reads and binned reads. Despite having relatively similar abundance profiles in each metagenome, this binning approach was able to distinctly resolve bins from dominant taxa, but also sulfate reducing bacteria that are desired for understanding molybdate inhibition. Bins generated from this iterative assembly process will be used for downstream

  10. Functional Metagenomics of the Bronchial Microbiome in COPD.

    PubMed

    Millares, Laura; Pérez-Brocal, Vicente; Ferrari, Rafaela; Gallego, Miguel; Pomares, Xavier; García-Núñez, Marian; Montón, Concepción; Capilla, Silvia; Monsó, Eduard; Moya, Andrés

    2015-01-01

    The course of chronic obstructive pulmonary disease (COPD) is frequently aggravated by exacerbations, and changes in the composition and activity of the microbiome may be implicated in their appearance. The aim of this study was to analyse the composition and the gene content of the microbial community in bronchial secretions of COPD patients in both stability and exacerbation. Taxonomic data were obtained by 16S rRNA gene amplification and pyrosequencing, and metabolic information through shotgun metagenomics, using the Metagenomics RAST server (MG-RAST), and the PICRUSt (Phylogenetic Investigation of Communities by Reconstruction of Unobserved States) programme, which predict metagenomes from 16S data. Eight severe COPD patients provided good quality sputum samples, and no significant differences in the relative abundance of any phyla and genera were found between stability and exacerbation. Bacterial biodiversity (Chao1 and Shannon indexes) did not show statistical differences and beta-diversity analysis (Bray-Curtis dissimilarity index) showed a similar microbial composition in the two clinical situations. Four functional categories showed statistically significant differences with MG-RAST at KEGG level 2: in exacerbation, Cell growth and Death and Transport and Catabolism decreased in abundance [1.6 (0.2-2.3) vs 3.6 (3.3-6.9), p = 0.012; and 1.8 (0-3.3) vs 3.6 (1.8-5.1), p = 0.025 respectively], while Cancer and Carbohydrate Metabolism increased [0.8 (0-1.5) vs 0 (0-0.5), p = 0.043; and 7 (6.4-9) vs 5.9 (6.3-6.1), p = 0.012 respectively]. In conclusion, the bronchial microbiome as a whole is not significantly modified when exacerbation symptoms appear in severe COPD patients, but its functional metabolic capabilities show significant changes in several pathways.

  11. Functional Metagenomics of the Bronchial Microbiome in COPD

    PubMed Central

    Millares, Laura; Pérez-Brocal, Vicente; Ferrari, Rafaela; Gallego, Miguel; Pomares, Xavier; García-Núñez, Marian; Montón, Concepción; Capilla, Silvia

    2015-01-01

    The course of chronic obstructive pulmonary disease (COPD) is frequently aggravated by exacerbations, and changes in the composition and activity of the microbiome may be implicated in their appearance. The aim of this study was to analyse the composition and the gene content of the microbial community in bronchial secretions of COPD patients in both stability and exacerbation. Taxonomic data were obtained by 16S rRNA gene amplification and pyrosequencing, and metabolic information through shotgun metagenomics, using the Metagenomics RAST server (MG-RAST), and the PICRUSt (Phylogenetic Investigation of Communities by Reconstruction of Unobserved States) programme, which predict metagenomes from 16S data. Eight severe COPD patients provided good quality sputum samples, and no significant differences in the relative abundance of any phyla and genera were found between stability and exacerbation. Bacterial biodiversity (Chao1 and Shannon indexes) did not show statistical differences and beta-diversity analysis (Bray-Curtis dissimilarity index) showed a similar microbial composition in the two clinical situations. Four functional categories showed statistically significant differences with MG-RAST at KEGG level 2: in exacerbation, Cell growth and Death and Transport and Catabolism decreased in abundance [1.6 (0.2–2.3) vs 3.6 (3.3–6.9), p = 0.012; and 1.8 (0–3.3) vs 3.6 (1.8–5.1), p = 0.025 respectively], while Cancer and Carbohydrate Metabolism increased [0.8 (0–1.5) vs 0 (0–0.5), p = 0.043; and 7 (6.4–9) vs 5.9 (6.3–6.1), p = 0.012 respectively]. In conclusion, the bronchial microbiome as a whole is not significantly modified when exacerbation symptoms appear in severe COPD patients, but its functional metabolic capabilities show significant changes in several pathways. PMID:26632844

  12. Novel thermostable amine transferases from hot spring metagenomes.

    PubMed

    Ferrandi, Erica Elisa; Previdi, Alessandra; Bassanini, Ivan; Riva, Sergio; Peng, Xu; Monti, Daniela

    2017-03-29

    Hot spring metagenomes, prepared from samples collected at temperatures ranging from 55 to 95 °C, were submitted to an in silico screening aimed at the identification of novel amine transaminases (ATAs), valuable biocatalysts for the preparation of optically pure amines. Three novel (S)-selective ATAs, namely Is3-TA, It6-TA, and B3-TA, were discovered in the metagenome of samples collected from hot springs in Iceland and in Italy, cloned from the corresponding metagenomic DNAs and overexpressed in recombinant form in E. coli. Functional characterization of the novel ATAs demonstrated that they all possess a thermophilic character and are capable of performing amine transfer reactions using a broad range of donor and acceptor substrates, thus suggesting a good potential for practical synthetic applications. In particular, the enzyme B3-TA revealed to be exceptionally thermostable, retaining 85% of activity after 5 days of incubation at 80 °C and more than 40% after 2 weeks under the same condition. These results, which were in agreement with the estimation of an apparent melting temperature around 88 °C, make B3-TA, to the best of our knowledge, the most thermostable natural ATA described to date. This biocatalyst showed also a good tolerance toward different water-miscible and water-immiscible organic solvents. A detailed inspection of the homology-based structural model of B3-TA showed that the overall active site architecture of mesophilic (S)-selective ATAs was mainly conserved in this hyperthermophilic homolog. Additionally, a subfamily of B3-TA-like transaminases, mostly uncharacterized and all from thermophilic microorganisms, was identified and analyzed in terms of phylogenetic relationships and sequence conservation.

  13. Remote homology and the functions of metagenomic dark matter

    PubMed Central

    Lobb, Briallen; Kurtz, Daniel A.; Moreno-Hagelsieb, Gabriel; Doxey, Andrew C.

    2015-01-01

    Predicted open reading frames (ORFs) that lack detectable homology to known proteins are termed ORFans. Despite their prevalence in metagenomes, the extent to which ORFans encode real proteins, the degree to which they can be annotated, and their functional contributions, remain unclear. To gain insights into these questions, we applied sensitive remote-homology detection methods to functionally analyze ORFans from soil, marine, and human gut metagenome collections. ORFans were identified, clustered into sequence families, and annotated through profile-profile comparison to proteins of known structure. We found that a considerable number of metagenomic ORFans (73,896 of 484,121, 15.3%) exhibit significant remote homology to structurally characterized proteins, providing a means for ORFan functional profiling. The extent of detected remote homology far exceeds that obtained for artificial protein families (1.4%). As expected for real genes, the predicted functions of ORFans are significantly similar to the functions of their gene neighbors (p < 0.001). Compared to the functional profiles predicted through standard homology searches, ORFans show biologically intriguing differences. Many ORFan-enriched functions are virus-related and tend to reflect biological processes associated with extreme sequence diversity. Each environment also possesses a large number of unique ORFan families and functions, including some known to play important community roles such as gut microbial polysaccharide digestion. Lastly, ORFans are a valuable resource for finding novel enzymes of interest, as we demonstrate through the identification of hundreds of novel ORFan metalloproteases that all possess a signature catalytic motif despite a general lack of similarity to known proteins. Our ORFan functional predictions are a valuable resource for discovering novel protein families and exploring the boundaries of protein sequence space. All remote homology predictions are available at http

  14. Connecting the protein structure universe by using sparse recurring fragments.

    PubMed

    Friedberg, Iddo; Godzik, Adam

    2005-08-01

    The quest to order and classify protein structures has lead to various classification schemes, focusing mostly on hierarchical relationships between structural domains. At the coarsest classification level, such schemes typically identify hundreds of types of fundamental units called folds. As a result, we picture protein structure space as a collection of isolated fold islands. It is obvious, however, that many protein folds share structural and functional commonalities. Locating those commonalities is important for our understanding of protein structure, function, and evolution. Here, we present an alternative view of the protein fold space, based on an interfold similarity measure that is related to the frequency of fragments shared between folds. In this view, protein structures form a complicated, crossconnected network with very interesting topology. We show that interfold similarity based on sequence/structure fragments correlates well with similarities of functions between protein populations in different folds.

  15. Phylogeny, culturing, and metagenomics of the human gut microbiota.

    PubMed

    Walker, Alan W; Duncan, Sylvia H; Louis, Petra; Flint, Harry J

    2014-05-01

    The human intestinal tract is colonised by a complex community of microbes, which can have major impacts on host health. Recent research on the gut microbiota has largely been driven by the advent of modern sequence-based techniques, such as metagenomics. Although these are powerful and valuable tools, they have limitations. Traditional culturing and phylogeny can mitigate some of these limitations, either by expanding reference databases or by assigning functionality to specific microbial lineages. As such, culture and phylogeny will continue to have crucially important roles in human microbiota research, and will be required for the development of novel therapeutics.

  16. Accessing the Hidden Majority of Marine Natural Products Through Metagenomics

    PubMed Central

    Donia, Mohamed S.; Ruffner, Duane E.; Cao, Sheng

    2012-01-01

    Tiny marine animals represent an untapped reservoir for undiscovered, bioactive natural products. However, their small size and extreme chemical variability preclude traditional chemical approaches to discovering new bioactive compounds. Here, we use a metagenomic method to directly discover and rapidly access cyanobactin class natural products from these variable samples, providing proof-of-concept for genome based discovery and supply of marine natural products. We also address practical optimization of complex, multistep ribosomal peptide pathways in heterologous hosts, which is still very challenging. The resulting methods and concepts will be applicable to ribosomal peptide and other biosynthetic pathways. PMID:21542088

  17. Binary stars - Formation by fragmentation

    NASA Technical Reports Server (NTRS)

    Boss, Alan P.

    1988-01-01

    Theories of binary star formation by capture, separate nuclei, fission and fragmentation are compared, assessing the success of theoretical attempts to explain the observed properties of main-sequence binary stars. The theory of formation by fragmentation is examined, discussing the prospects for checking the theory against observations of binary premain-sequence stars. It is concluded that formation by fragmentation is successful at explaining many of the key properties of main-sequence binary stars.

  18. Impact of metagenomic DNA extraction procedures on the identifiable endophytic bacterial diversity in Sorghum bicolor (L. Moench).

    PubMed

    Maropola, Mapula Kgomotso Annah; Ramond, Jean-Baptiste; Trindade, Marla

    2015-05-01

    Culture-independent studies rely on the quantity and quality of the extracted environmental metagenomic DNA (mDNA). To fully access the plant tissue microbiome, the extracted plant mDNA should allow optimal PCR applications and the genetic content must be representative of the total microbial diversity. In this study, we evaluated the endophytic bacterial diversity retrieved using different mDNA extraction procedures. Metagenomic DNA from sorghum (Sorghum bicolor L. Moench) stem and root tissues were extracted using two classical DNA extraction protocols (CTAB- and SDS-based) and five commercial kits. The mDNA yields and quality as well as the reproducibility were compared. 16S rRNA gene terminal restriction fragment length polymorphism (t-RFLP) was used to assess the impact on endophytic bacterial community structures observed. Generally, the classical protocols obtained high mDNA yields from sorghum tissues; however, they were less reproducible than the commercial kits. Commercial kits retrieved higher quality mDNA, but with lower endophytic bacterial diversities compared to classical protocols. The SDS-based protocol enabled access to the highest sorghum endophytic diversities. Therefore, "SDS-extracted" sorghum root and stem microbiome diversities were analysed via 454 pyrosequencing, and this revealed that the two tissues harbour significantly different endophytic communities. Nevertheless, both communities are dominated by agriculturally important genera such as Microbacterium, Agrobacterium, Sphingobacterium, Herbaspirillum, Erwinia, Pseudomonas and Stenotrophomonas; which have previously been shown to play a role in plant growth promotion. This study shows that DNA extraction protocols introduce biases in culture-independent studies of environmental microbial communities by influencing the mDNA quality, which impacts the microbial diversity analyses and evaluation. Using the broad-spectrum SDS-based DNA extraction protocol allows the recovery of the most

  19. Cloning and functional characterization of endo-β-1,4-glucanase gene from metagenomic library of vermicompost.

    PubMed

    Yasir, Muhammad; Khan, Haji; Azam, Syed Sikander; Telke, Amar; Kim, Seon Won; Chung, Young Ryun

    2013-06-01

    In the vermicomposting of paper mill sludge, the activity of earthworms is very dependent on dietetic polysaccharides including cellulose as energy sources. Most of these polymers are degraded by the host microbiota and considered potentially important source for cellulolytic enzymes. In the present study, a metagenomic library was constructed from vermicompost (VC) prepared with paper mill sludge and dairy sludge (fresh sludge, FS) and functionally screened for cellulolytic activities. Eighteen cellulase expressing clones were isolated from about 89,000 fosmid clones libraries. A short fragment library was constructed from the most active positive clone (cMGL504) and one open reading frame (ORF) of 1,092 bp encoding an endo-β-1,4-glucanase was indentified which showed 88% similarity with Cellvibrio mixtus cellulase A gene. The endo-β-1,4-glucanase cmgl504 gene was overexpressed in Escherichia coli. The purified recombinant cmgl504 cellulase displayed activities at a broad range of temperature (25-55°C) and pH (5.5-8.5). The enzyme degraded carboxymethyl cellulose (CMC) with 15.4 U, while having low activity against avicel. No detectable activity was found for xylan and laminarin. The enzyme activity was stimulated by potassium chloride. The deduced protein and three-dimensional structure of metagenome-derived cellulase cmgl504 possessed all features, including general architecture, signature motifs, and N-terminal signal peptide, followed by the catalytic domain of cellulase belonging to glycosyl hydrolase family 5 (GHF5). The cellulases cloned in this work may play important roles in the degradation of celluloses in vermicomposting process and could be exploited for industrial application in future.

  20. THE WESTERN LAKE SUPERIOR COMPARATIVE WATERSHED FRAMEWORK: A FIELD TEST OF GEOGRAPHICALLY-DEPENDENT VS. THRESHOLD-BASED GEOGRAPHICALLY-INDEPENDENT CLASSIFICATION

    EPA Science Inventory

    Stratified random selection of watersheds allowed us to compare geographically-independent classification schemes based on watershed storage (wetland + lake area/watershed area) and forest fragmentation with a geographically-based classification scheme within the Northern Lakes a...

  1. Remote Sensing Information Classification

    NASA Technical Reports Server (NTRS)

    Rickman, Douglas L.

    2008-01-01

    This viewgraph presentation reviews the classification of Remote Sensing data in relation to epidemiology. Classification is a way to reduce the dimensionality and precision to something a human can understand. Classification changes SCALAR data into NOMINAL data.

  2. Classification and knowledge

    NASA Technical Reports Server (NTRS)

    Kurtz, Michael J.

    1989-01-01

    Automated procedures to classify objects are discussed. The classification problem is reviewed, and the relation of epistemology and classification is considered. The classification of stellar spectra and of resolved images of galaxies is addressed.

  3. Fragment-based prediction of skin sensitization using recursive partitioning.

    PubMed

    Lu, Jing; Zheng, Mingyue; Wang, Yong; Shen, Qiancheng; Luo, Xiaomin; Jiang, Hualiang; Chen, Kaixian

    2011-09-01

    Skin sensitization is an important toxic endpoint in the risk assessment of chemicals. In this paper, structure-activity relationships analysis was performed on the skin sensitization potential of 357 compounds with local lymph node assay data. Structural fragments were extracted by GASTON (GrAph/Sequence/Tree extractiON) from the training set. Eight fragments with accuracy significantly higher than 0.73 (p<0.1) were retained to make up an indicator descriptor fragment. The fragment descriptor and eight other physicochemical descriptors closely related to the endpoint were calculated to construct the recursive partitioning tree (RP tree) for classification. The balanced accuracy of the training set, test set I, and test set II in the leave-one-out model were 0.846, 0.800, and 0.809, respectively. The results highlight that fragment-based RP tree is a preferable method for identifying skin sensitizers. Moreover, the selected fragments provide useful structural information for exploring sensitization mechanisms, and RP tree creates a graphic tree to identify the most important properties associated with skin sensitization. They can provide some guidance for designing of drugs with lower sensitization level.

  4. Fragment-based prediction of skin sensitization using recursive partitioning

    NASA Astrophysics Data System (ADS)

    Lu, Jing; Zheng, Mingyue; Wang, Yong; Shen, Qiancheng; Luo, Xiaomin; Jiang, Hualiang; Chen, Kaixian

    2011-09-01

    Skin sensitization is an important toxic endpoint in the risk assessment of chemicals. In this paper, structure-activity relationships analysis was performed on the skin sensitization potential of 357 compounds with local lymph node assay data. Structural fragments were extracted by GASTON (GrAph/Sequence/Tree extractiON) from the training set. Eight fragments with accuracy significantly higher than 0.73 ( p < 0.1) were retained to make up an indicator descriptor fragment. The fragment descriptor and eight other physicochemical descriptors closely related to the endpoint were calculated to construct the recursive partitioning tree (RP tree) for classification. The balanced accuracy of the training set, test set I, and test set II in the leave-one-out model were 0.846, 0.800, and 0.809, respectively. The results highlight that fragment-based RP tree is a preferable method for identifying skin sensitizers. Moreover, the selected fragments provide useful structural information for exploring sensitization mechanisms, and RP tree creates a graphic tree to identify the most important properties associated with skin sensitization. They can provide some guidance for designing of drugs with lower sensitization level.

  5. Species–fragmented area relationship

    PubMed Central

    Hanski, Ilkka; Zurita, Gustavo A.; Bellocq, M. Isabel; Rybicki, Joel

    2013-01-01

    The species–area relationship (SAR) gives a quantitative description of the increasing number of species in a community with increasing area of habitat. In conservation, SARs have been used to predict the number of extinctions when the area of habitat is reduced. Such predictions are most needed for landscapes rather than for individual habitat fragments, but SAR-based predictions of extinctions for landscapes with highly fragmented habitat are likely to be biased because SAR assumes contiguous habitat. In reality, habitat loss is typically accompanied by habitat fragmentation. To quantify the effect of fragmentation in addition to the effect of habitat loss on the number of species, we extend the power-law SAR to the species–fragmented area relationship. This model unites the single-species metapopulation theory with the multispecies SAR for communities. We demonstrate with a realistic simulation model and with empirical data for forest-inhabiting subtropical birds that the species–fragmented area relationship gives a far superior prediction than SAR of the number of species in fragmented landscapes. The results demonstrate that for communities of species that are not well adapted to live in fragmented landscapes, the conventional SAR underestimates the number of extinctions for landscapes in which little habitat remains and it is highly fragmented. PMID:23858440

  6. Challenges and opportunities in understanding microbial communities with metagenome assembly (accompanied by IPython Notebook tutorial)

    PubMed Central

    Howe, Adina; Chain, Patrick S. G.

    2015-01-01

    Metagenomic investigations hold great promise for informing the genetics, physiology, and ecology of environmental microorganisms. Current challenges for metagenomic analysis are related to our ability to connect the dots between sequencing reads, their population of origin, and their encoding functions. Assembly-based methods reduce dataset size by extending overlapping reads into larger contiguous sequences (contigs), providing contextual information for genetic sequences that does not rely on existing references. These methods, however, tend to be computationally intensive and are again challenged by sequencing errors as well as by genomic repeats While numerous tools have been developed based on these methodological concepts, they present confounding choices and training requirements to metagenomic investigators. To help with accessibility to assembly tools, this review also includes an IPython Notebook metagenomic assembly tutorial. This tutorial has instructions for execution any operating system using Amazon Elastic Cloud Compute and guides users through downloading, assembly, and mapping reads to contigs of a mock microbiome metagenome. Despite its challenges, metagenomic analysis has already revealed novel insights into many environments on Earth. As software, training, and data continue to emerge, metagenomic data access and its discoveries will to grow. PMID:26217314

  7. Challenges and opportunities in understanding microbial communities with metagenome assembly (accompanied by IPython Notebook tutorial)

    SciTech Connect

    Howe, Adina; Chain, Patrick S. G.

    2015-07-09

    Metagenomic investigations hold great promise for informing the genetics, physiology, and ecology of environmental microorganisms. Current challenges for metagenomic analysis are related to our ability to connect the dots between sequencing reads, their population of origin, and their encoding functions. Assembly-based methods reduce dataset size by extending overlapping reads into larger contiguous sequences (contigs), providing contextual information for genetic sequences that does not rely on existing references. These methods, however, tend to be computationally intensive and are again challenged by sequencing errors as well as by genomic repeats. While numerous tools have been developed based on these methodological concepts, they present confounding choices and training requirements to metagenomic investigators. To help with accessibility to assembly tools, this review also includes an IPython Notebook metagenomic assembly tutorial. This tutorial has instructions for execution any operating system using Amazon Elastic Cloud Compute and guides users through downloading, assembly, and mapping reads to contigs of a mock microbiome metagenome. Despite its challenges, metagenomic analysis has already revealed novel insights into many environments on Earth. As software, training, and data continue to emerge, metagenomic data access and its discoveries will to grow.

  8. Challenges and opportunities in understanding microbial communities with metagenome assembly (accompanied by IPython Notebook tutorial)

    DOE PAGES

    Howe, Adina; Chain, Patrick S. G.

    2015-07-09

    Metagenomic investigations hold great promise for informing the genetics, physiology, and ecology of environmental microorganisms. Current challenges for metagenomic analysis are related to our ability to connect the dots between sequencing reads, their population of origin, and their encoding functions. Assembly-based methods reduce dataset size by extending overlapping reads into larger contiguous sequences (contigs), providing contextual information for genetic sequences that does not rely on existing references. These methods, however, tend to be computationally intensive and are again challenged by sequencing errors as well as by genomic repeats. While numerous tools have been developed based on these methodological concepts, theymore » present confounding choices and training requirements to metagenomic investigators. To help with accessibility to assembly tools, this review also includes an IPython Notebook metagenomic assembly tutorial. This tutorial has instructions for execution any operating system using Amazon Elastic Cloud Compute and guides users through downloading, assembly, and mapping reads to contigs of a mock microbiome metagenome. Despite its challenges, metagenomic analysis has already revealed novel insights into many environments on Earth. As software, training, and data continue to emerge, metagenomic data access and its discoveries will to grow.« less

  9. MALINA: a web service for visual analytics of human gut microbiota whole-genome metagenomic reads.

    PubMed

    Tyakht, Alexander V; Popenko, Anna S; Belenikin, Maxim S; Altukhov, Ilya A; Pavlenko, Alexander V; Kostryukova, Elena S; Selezneva, Oksana V; Larin, Andrei K; Karpova, Irina Y; Alexeev, Dmitry G

    2012-12-07

    MALINA is a web service for bioinformatic analysis of whole-genome metagenomic data obtained from human gut microbiota sequencing. As input data, it accepts metagenomic reads of various sequencing technologies, including long reads (such as Sanger and 454 sequencing) and next-generation (including SOLiD and Illumina). It is the first metagenomic web service that is capable of processing SOLiD color-space reads, to authors' knowledge. The web service allows phylogenetic and functional profiling of metagenomic samples using coverage depth resulting from the alignment of the reads to the catalogue of reference sequences which are built into the pipeline and contain prevalent microbial genomes and genes of human gut microbiota. The obtained metagenomic composition vectors are processed by the statistical analysis and visualization module containing methods for clustering, dimension reduction and group comparison. Additionally, the MALINA database includes vectors of bacterial and functional composition for human gut microbiota samples from a large number of existing studies allowing their comparative analysis together with user samples, namely datasets from Russian Metagenome project, MetaHIT and Human Microbiome Project (downloaded from http://hmpdacc.org). MALINA is made freely available on the web at http://malina.metagenome.ru. The website is implemented in JavaScript (using Ext JS), Microsoft .NET Framework, MS SQL, Python, with all major browsers supported.

  10. FY08 LDRD Final Report Probabilistic Inference of Metabolic Pathways from Metagenomic Sequence Data

    SciTech Connect

    D'haeseleer, P

    2009-03-01

    Metagenomic 'shotgun' sequencing of environmental microbial communities has the potential to revolutionize microbial ecology, allowing a cultivation-independent, yet sequence-based analysis of the metabolic capabilities and functions present in an environmental sample. Although its intensive sequencing requirements are a good match for the continuously increasing bandwidth at sequencing centers, the complexity, seemingly inexhaustible novelty, and 'scrambled' nature of metagenomic data is also proving a tremendous challenge for analysis. In fact, many metagenomics projects do not go much further than providing a list of novel gene variants and over- or under-represented functional gene categories. In this project, we proposed to develop a set of novel metagenomic sequence analysis tools, including a binning method to group sequences by species, inference of phenotypes and metabolic pathways from these reconstructed species, and extraction of coarse-grained flux models. We proposed to closely collaborate with the DOE Joint Genome Institute to align these tools with their metagenomics analysis needs and the developing IMG/M metagenomics pipeline. Results would be cross-validated with simulated metagenomic data using a testing platform developed at the JGI.

  11. Effect of the strain Bacillus amyloliquefaciens FZB42 on the microbial community in the rhizosphere of lettuce under field conditions analyzed by whole metagenome sequencing

    PubMed Central

    Kröber, Magdalena; Wibberg, Daniel; Grosch, Rita; Eikmeyer, Felix; Verwaaijen, Bart; Chowdhury, Soumitra P.; Hartmann, Anton; Pühler, Alfred; Schlüter, Andreas

    2014-01-01

    Application of the plant associated bacterium Bacillus amyloliquefaciens FZB42 on lettuce (Lactuca sativa) confirmed its capability to promote plant growth and health by reducing disease severity (DS) caused by the phytopathogenic fungus Rhizoctonia solani. Therefore this strain is commercially applied as an eco-friendly plant protective agent. It is able to produce cyclic lipopeptides (CLP) and polyketides featuring antifungal and antibacterial properties. Production of these secondary metabolites led to the question of a possible impact of strain FZB42 on the composition of microbial rhizosphere communities after its application. Rating of DS and lettuce growth during a field trial confirmed the positive impact of strain FZB42 on the health of the host plant. To verify B. amyloliquefaciens as an environmentally compatible plant protective agent, its effect on the indigenous rhizosphere community was analyzed by metagenome sequencing. Rhizosphere microbial communities of lettuce treated with B. amyloliquefaciens FZB42 and non-treated plants were profiled by high-throughput metagenome sequencing of whole community DNA. Fragment recruitments of metagenome sequence reads on the genome sequence of B. amyloliquefaciens FZB42 proved the presence of the strain in the rhizosphere over 5 weeks of the field trial. Comparison of taxonomic community profiles only revealed marginal changes after application of strain FZB42. The orders Burkholderiales, Actinomycetales and Rhizobiales were most abundant in all samples. Depending on plant age a general shift within the composition of the microbial communities that was independent of the application of strain FZB42 was observed. In addition to the taxonomic profiling, functional analysis of annotated sequences revealed no major differences between samples regarding application of the inoculant strain. PMID:24904564

  12. Expression and characterization of a novel metagenome-derived cellulase Exo2b and its application to improve cellulase activity in Trichoderma reesei.

    PubMed

    Geng, Alei; Zou, Gen; Yan, Xing; Wang, Qianfu; Zhang, Jun; Liu, Fanghua; Zhu, Baoli; Zhou, Zhihua

    2012-11-01

    A metagenomic fosmid library containing 1 × 10(5) clones was constructed from a biogas digester fed with pig ordure and rice straw. In total, 121 clones with activity of 4-methylumbelliferyl-cellobiosidase were screened from the metagenomic library. A novel GH5 cellulase gene exo2b was identified from a sequenced clone EXO02C10 and expressed in Escherichia coli BL21. The corresponding recombinant Exo2b protein showed high specific activity toward both carboxymethylcellulose (CMC; 260 U/mg protein) and β-D-glucan from barley (849 U/mg), with an optimal pH and temperature of 7.5 and 58 °C, respectively. Exo2b showed stable activity at a wide pH range from 5.5 to 9.0 and was highly thermostable at 60 °C in the presence of 60 mM cysteine. Residual activity was maintained at nearly 100% when Exo2b was incubated at 60 °C for 15 h. A thin-layer chromatography analysis of the hydrolysis products confirmed that Exo2b was an endo-β-1,4-glucanase and it could also produce oligosaccharide smaller than cellotetraose. The fragment encoding the Exo2b catalytic domain was then fused with the cbh1 gene from Trichoderma reesei, and the fused gene was successfully expressed in T. reesei Rut-C30. Compared to that of the parent strain, the filter paper activity and CMCase activity of the secreted proteins of a selected transformant A1 increased by 24% and 18%, respectively. Besides, the glucose concentration from the hydrolysis of pretreated corn stover by the A1 secreted proteins increased by 19.8%. The present study demonstrated the potential application of metagenome originated cellulase genes to modify cellulase producing fungi.

  13. Comparative metagenome of a stream impacted by the urbanization phenomenon.

    PubMed

    Medeiros, Julliane Dutra; Cantão, Maurício Egídio; Cesar, Dionéia Evangelista; Nicolás, Marisa Fabiana; Diniz, Cláudio Galuppo; Silva, Vânia Lúcia; Vasconcelos, Ana Tereza Ribeiro de; Coelho, Cíntia Marques

    Rivers and streams are important reservoirs of freshwater for human consumption. These ecosystems are threatened by increasing urbanization, because raw sewage discharged into them alters their nutrient content and may affect the composition of their microbial community. In the present study, we investigate the taxonomic and functional profile of the microbial community in an urban lotic environment. Samples of running water were collected at two points in the São Pedro stream: an upstream preserved and non-urbanized area, and a polluted urbanized area with discharged sewage. The metagenomic DNA was sequenced by pyrosequencing. Differences were observed in the community composition at the two sites. The non-urbanized area was overrepresented by genera of ubiquitous microbes that act in the maintenance of environments. In contrast, the urbanized metagenome was rich in genera pathogenic to humans. The functional profile indicated that the microbes act on the metabolism of methane, nitrogen and sulfur, especially in the urbanized area. It was also found that virulence/defense (antibiotic resistance and metal resistance) and stress response-related genes were disseminated in the urbanized environment. The structure of the microbial community was altered by uncontrolled anthropic interference, highlighting the selective pressure imposed by high loads of urban sewage discharged into freshwater environments.

  14. Detection of Novel Integrons in the Metagenome of Human Saliva

    PubMed Central

    Antepowicz, Agata; Mullany, Peter; Roberts, Adam P.

    2016-01-01

    Integrons are genetic elements capable of capturing and expressing open reading frames (ORFs) embedded within gene cassettes. They are involved in the dissemination of antibiotic resistance genes (ARGs) in clinically important pathogens. Although the ARGs are common in the oral cavity the association of integrons and antibiotic resistance has not been reported there. In this work, a PCR-based approach was used to investigate the presence of integrons and associated gene cassettes in human oral metagenomic DNA obtained from both the UK and Bangladesh. We identified a diverse array of gene cassettes containing ORFs predicted to confer antimicrobial resistance and other adaptive traits. The predicted proteins include a putative streptogramin A O-acetyltransferase, a bleomycin binding protein, cof-like hydrolase, competence and motility related proteins. This is the first study detecting integron gene cassettes directly from oral metagenomic DNA samples. The predicted proteins are likely to carry out a multitude of functions; however, the function of the majority is yet unknown. PMID:27304457

  15. Metagenomics reveals flavour metabolic network of cereal vinegar microbiota.

    PubMed

    Wu, Lin-Huan; Lu, Zhen-Ming; Zhang, Xiao-Juan; Wang, Zong-Min; Yu, Yong-Jian; Shi, Jin-Song; Xu, Zheng-Hong

    2017-04-01

    Multispecies microbial community formed through centuries of repeated batch acetic acid fermentation (AAF) is crucial for the flavour quality of traditional vinegar produced from cereals. However, the metabolism to generate and/or formulate the essential flavours by the multispecies microbial community is hardly understood. Here we used metagenomic approach to clarify in situ metabolic network of key microbes responsible for flavour synthesis of a typical cereal vinegar, Zhenjiang aromatic vinegar, produced by solid-state fermentation. First, we identified 3 organic acids, 7 amino acids, and 20 volatiles as dominant vinegar metabolites. Second, we revealed taxonomic and functional composition of the microbiota by metagenomic shotgun sequencing. A total of 86 201 predicted protein-coding genes from 35 phyla (951 genera) were involved in Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways of Metabolism (42.3%), Genetic Information Processing (28.3%), and Environmental Information Processing (10.1%). Furthermore, a metabolic network for substrate breakdown and dominant flavour formation in vinegar microbiota was constructed, and microbial distribution discrepancy in different metabolic pathways was charted. This study helps elucidating different metabolic roles of microbes during flavour formation in vinegar microbiota.

  16. Genomics and Metagenomics of Extreme Acidophiles in Biomining Environments

    NASA Astrophysics Data System (ADS)

    Holmes, D. S.

    2015-12-01

    Over 160 draft or complete genomes of extreme acidophiles (pH < 3) have been published, many of which are from bioleaching and other biomining environments, or are closely related to such microorganisms. In addition, there are over 20 metagenomic studies of such environments. This provides a rich source of latent data that can be exploited for understanding the biology of biomining environments and for advancing biotechnological applications. Genomic and metagenomic data are already yielding valuable insights into cellular processes, including carbon and nitrogen management, heavy metal and acid resistance, iron and sulfur oxido-reduction, linking biogeochemical processes to organismal physiology. The data also allow the construction of useful models of the ecophysiology of biomining environments and provide insight into the gene and genome evolution of extreme acidophiles. Additionally, since most of these acidophiles are also chemoautolithotrophs that use minerals as energy sources or electron sinks, their genomes can be plundered for clues about the evolution of cellular metabolism and bioenergetic pathways during the Archaean abiotic/biotic transition on early Earth. Acknowledgements: Fondecyt 1130683.

  17. Tracking Strains in the Microbiome: Insights from Metagenomics and Models

    PubMed Central

    Brito, Ilana L.; Alm, Eric J.

    2016-01-01

    Transmission usually refers to the movement of pathogenic organisms. Yet, commensal microbes that inhabit the human body also move between individuals and environments. Surprisingly little is known about the transmission of these endogenous microbes, despite increasing realizations of their importance for human health. The health impacts arising from the transmission of commensal bacteria range widely, from the prevention of autoimmune disorders to the spread of antibiotic resistance genes. Despite this importance, there are outstanding basic questions: what is the fraction of the microbiome that is transmissible? What are the primary mechanisms of transmission? Which organisms are the most highly transmissible? Higher resolution genomic data is required to accurately link microbial sources (such as environmental reservoirs or other individuals) with sinks (such as a single person's microbiome). New computational advances enable strain-level resolution of organisms from shotgun metagenomic data, allowing the transmission of strains to be followed over time and after discrete exposure events. Here, we highlight the latest techniques that reveal strain-level resolution from raw metagenomic reads and new studies that are tracking strains across people and environments. We also propose how models of pathogenic transmission may be applied to study the movement of commensals between microbial communities. PMID:27242733

  18. Metagenomic insights into important microbes from the Dead Zone

    NASA Astrophysics Data System (ADS)

    Thrash, C.; Baker, B.; Seitz, K.; Temperton, B.; Gillies, L.; Rabalais, N. N.; Mason, O. U.

    2015-12-01

    Coastal regions of eutrophication-driven oxygen depletion are widespread and increasing in number. Also known as dead zones, these regions take their name from the deleterious effects of hypoxia (dissolved oxygen less than 2 mg/L) on shrimp, demersal fish, and other animal life. Dead zones result from nutrient enrichment of primary production, concomitant consumption by chemoorganotrophic aerobic microorganisms, and strong stratification that prevents ventilation of bottom water. One of the largest dead zones in the world occurs seasonally in the northern Gulf of Mexico (nGOM), where hypoxia can reach up to 22,000 square kilometers. While this dead zone shares many features with more well-known marine oxygen minimum zones, it is nevertheless understudied with regards to the microbial assemblages involved in biogeochemical cycling. We performed metagenomic and metatranscriptomic sequencing on six samples from the 2013 nGOM dead zone from both hypoxic and oxic bottom waters. Assembly and binning led to the recovery of over fifty partial to nearly complete metagenomes from key microbial taxa previously determined to be numerically abundant from 16S rRNA data, such as Thaumarcheaota, Marine Group II Euryarchaeota, SAR406, SAR324, Synechococcus spp., and Planctomycetes. These results provide information about the roles of these taxa in the nGOM dead zone, and opportunities for comparing this region of low oxygen to others around the globe.

  19. Metagenomic Analysis of Human Diarrhea: Viral Detection and Discovery

    PubMed Central

    Tarr, Phillip I.; Klein, Eileen J.; Kirkwood, Carl D.; Wang, David

    2008-01-01

    Worldwide, approximately 1.8 million children die from diarrhea annually, and millions more suffer multiple episodes of nonfatal diarrhea. On average, in up to 40% of cases, no etiologic agent can be identified. The advent of metagenomic sequencing has enabled systematic and unbiased characterization of microbial populations; thus, metagenomic approaches have the potential to define the spectrum of viruses, including novel viruses, present in stool during episodes of acute diarrhea. The detection of novel or unexpected viruses would then enable investigations to assess whether these agents play a causal role in human diarrhea. In this study, we characterized the eukaryotic viral communities present in diarrhea specimens from 12 children by employing a strategy of “micro-mass sequencing” that entails minimal starting sample quantity (<100 mg stool), minimal sample purification, and limited sequencing (384 reads per sample). Using this methodology we detected known enteric viruses as well as multiple sequences from putatively novel viruses with only limited sequence similarity to viruses in GenBank. PMID:18398449

  20. Tracking Strains in the Microbiome: Insights from Metagenomics and Models.

    PubMed

    Brito, Ilana L; Alm, Eric J

    2016-01-01

    Transmission usually refers to the movement of pathogenic organisms. Yet, commensal microbes that inhabit the human body also move between individuals and environments. Surprisingly little is known about the transmission of these endogenous microbes, despite increasing realizations of their importance for human health. The health impacts arising from the transmission of commensal bacteria range widely, from the prevention of autoimmune disorders to the spread of antibiotic resistance genes. Despite this importance, there are outstanding basic questions: what is the fraction of the microbiome that is transmissible? What are the primary mechanisms of transmission? Which organisms are the most highly transmissible? Higher resolution genomic data is required to accurately link microbial sources (such as environmental reservoirs or other individuals) with sinks (such as a single person's microbiome). New computational advances enable strain-level resolution of organisms from shotgun metagenomic data, allowing the transmission of strains to be followed over time and after discrete exposure events. Here, we highlight the latest techniques that reveal strain-level resolution from raw metagenomic reads and new studies that are tracking strains across people and environments. We also propose how models of pathogenic transmission may be applied to study the movement of commensals between microbial communities.

  1. Metagenomic characterization of viral communities in Goseong Bay, Korea

    NASA Astrophysics Data System (ADS)

    Hwang, Jinik; Park, So Yun; Park, Mirye; Lee, Sukchan; Jo, Yeonhwa; Cho, Won Kyong; Lee, Taek-Kyun

    2016-12-01

    In this study, seawater samples were collected from Goseong Bay, Korea in March 2014 and viral populations were examined by metagenomics assembly. Enrichment of marine viral particles using FeCl3 followed by next-generation sequencing produced numerous sequences. De novo assembly and BLAST search showed that most of the obtained contigs were unknown sequences and only 0.74% of sequences were associated with known viruses. As a result, 138 viruses, including bacteriophages (87%), viruses infecting algae and others (13%) were identified. The identified 138 viruses were divided into 11 orders, 14 families, 34 genera, and 133 species. The dominant viruses were Pelagibacter phage HTVC010P and Roseobacter phage SIO1. The viruses infecting algae, including the Ostreococcus species, accounted for 9.4% of total identified viruses. In addition, we identified pathogenic herpes viruses infecting fishes and giant viruses infecting parasitic acanthamoeba species. This is a comprehensive study to reveal the viral populations in the Goseong Bay using metagenomics. The information associated with the marine viral community in Goseong Bay, Korea will be useful for comparative analysis in other marine viral communities.

  2. (Meta)genomic insights into the pathogenome of Cellulosimicrobium cellulans

    PubMed Central

    Sharma, Anukriti; Gilbert, Jack A.; Lal, Rup

    2016-01-01

    Despite having serious clinical manifestations, Cellulosimicrobium cellulans remain under-reported with only three genome sequences available at the time of writing. Genome sequences of C. cellulans LMG16121, C. cellulans J36 and Cellulosimicrobium sp. strain MM were used to determine distribution of pathogenicity islands (PAIs) across C. cellulans, which revealed 49 potential marker genes with known association to human infections, e.g. Fic and VbhA toxin-antitoxin system. Oligonucleotide composition-based analysis of orthologous proteins (n = 791) across three genomes revealed significant negative correlation (P < 0.05) between frequency of optimal codons (Fopt) and gene G+C content, highlighting the G+C-biased gene conversion (gBGC) effect across Cellulosimicrobium strains. Bayesian molecular-clock analysis performed on three virulent PAI proteins (Fic; D-alanyl-D-alanine-carboxypeptidase; transposase) dated the divergence event at 300 million years ago from the most common recent ancestor. Synteny-based annotation of hypothetical proteins highlighted gene transfers from non-pathogenic bacteria as a key factor in the evolution of PAIs. Additonally, deciphering the metagenomic islands using strain MM’s genome with environmental data from the site of isolation (hot-spring biofilm) revealed (an)aerobic respiration as population segregation factor across the in situ cohorts. Using reference genomes and metagenomic data, our results highlight the emergence and evolution of PAIs in the genus Cellulosimicrobium. PMID:27151933

  3. Genomic and metagenomic technologies to explore the antibiotic resistance mobilome.

    PubMed

    Martínez, José L; Coque, Teresa M; Lanza, Val F; de la Cruz, Fernando; Baquero, Fernando

    2017-01-01

    Antibiotic resistance is a relevant problem for human health that requires global approaches to establish a deep understanding of the processes of acquisition, stabilization, and spread of resistance among human bacterial pathogens. Since natural (nonclinical) ecosystems are reservoirs of resistance genes, a health-integrated study of the epidemiology of antibiotic resistance requires the exploration of such ecosystems with the aim of determining the role they may play in the selection, evolution, and spread of antibiotic resistance genes, involving the so-called resistance mobilome. High-throughput sequencing techniques allow an unprecedented opportunity to describe the genetic composition of a given microbiome without the need to subculture the organisms present inside. However, bioinformatic methods for analyzing this bulk of data, mainly with respect to binning each resistance gene with the organism hosting it, are still in their infancy. Here, we discuss how current genomic methodologies can serve to analyze the resistance mobilome and its linkage with different bacterial genomes and metagenomes. In addition, we describe the drawbacks of current methodologies for analyzing the resistance mobilome, mainly in cases of complex microbiotas, and discuss the possibility of implementing novel tools to improve our current metagenomic toolbox.

  4. Functional metagenomic profiling of intestinal microbiome in extreme ageing.

    PubMed

    Rampelli, Simone; Candela, Marco; Turroni, Silvia; Biagi, Elena; Collino, Sebastiano; Franceschi, Claudio; O'Toole, Paul W; Brigidi, Patrizia

    2013-12-01

    Age-related alterations in human gut microbiota composition have been thoroughly described, but a detailed functional description of the intestinal bacterial coding capacity is still missing. In order to elucidate the contribution of the gut metagenome to the complex mosaic of human longevity, we applied shotgun sequencing to total fecal bacterial DNA in a selection of samples belonging to a well-characterized human ageing cohort. The age-related trajectory of the human gut microbiome was characterized by loss of genes for shortchain fatty acid production and an overall decrease in the saccharolytic potential, while proteolytic functions were more abundant than in the intestinal metagenome of younger adults. This altered functional profile was associated with a relevant enrichment in "pathobionts", i.e. opportunistic pro-inflammatory bacteria generally present in the adult gut ecosystem in low numbers. Finally, as a signature for long life we identified 116 microbial genes that significantly correlated with ageing. Collectively, our data emphasize the relationship between intestinal bacteria and human metabolism, by detailing the modifications in the gut microbiota as a consequence of and/or promoter of the physiological changes occurring in the human host upon ageing.

  5. (Meta)genomic insights into the pathogenome of Cellulosimicrobium cellulans

    DOE PAGES

    Sharma, Anukriti; Gilbert, Jack A.; Lal, Rup

    2016-05-06

    Despite having serious clinical manifestations, Cellulosimicrobium cellulans remain under-reported with only three genome sequences available at the time of writing. Genome sequences of C. cellulans LMG16121, C. cellulans J36 and Cellulosimicrobium sp. strain MM were used to determine distribution of pathogenicity islands (PAIs) across C. cellulans, which revealed 49 potential marker genes with known association to human infections, e.g. Fic and VbhA toxin-antitoxin system. Oligonucleotide composition-based analysis of orthologous proteins (n = 791) across three genomes revealed significant negative correlation (P < 0.05) between frequency of optimal codons (Fopt) and gene G+C content, highlighting the G+C-biased gene conversion (gBGC) effectmore » across Cellulosimicrobium strains. Bayesian molecular-clock analysis performed on three virulent PAI proteins (Fic; D-alanyl-D-alanine-carboxypeptidase; transposase) dated the divergence event at 300 million years ago from the most common recent ancestor. Synteny-based annotation of hypothetical proteins highlighted gene transfers from non-pathogenic bacteria as a key factor in the evolution of PAIs. Additonally, deciphering the metagenomic islands using strain MM's genome with environmental data from the site of isolation (hot-spring biofilm) revealed (an)aerobic respiration as population segregation factor across the in situ cohorts. Furthermore, using reference genomes and metagenomic data, our results highlight the emergence and evolution of PAIs in the genus Cellulosimicrobium.« less

  6. Evidence of methanesulfonate utilizers in the Sargasso Sea metagenome.

    PubMed

    Leitão, Elsa; Moradas-Ferreira, Pedro; De Marco, Paolo

    2009-09-01

    Methanesulfonate (MSA) is one of the products of the photo-oxidation of dimethylsulfide in the atmosphere. The genes responsible for the import of MSA into the cell (msm EFGH) and for its oxidation to formaldehyde (msm ABCD) have been previously sequenced from the soil bacterium Methylosulfonomonas methylovora str. M2 while genes for an MSA monooxygenase have been sequenced from marine bacterium Marinosulfonomonas methylotropha str. TR3. We performed a sequence-based screening of the Sargasso Sea metagenome for homologues of the MSA monooxygenase (MSAMO) and MSA import genes. Our search retrieved one scaffold bearing genes with high identity to the msm ABCD cluster plus two scaffolds bearing genes highly identical to the msm EFGH operon. We increased the available data by sequencing two metagenome plasmids, which revealed more msm genes. In these three cases synteny with the original msm operons was revealed. We also retrieved several singletons showing high identity to shorter segments of the msm clusters or individual msm genes. Furthermore, a characteristic 26-aa internal spacer of the MsmA Rieske-type motif was conserved. Our findings support the case for a significant role of MSA degraders in the marine sulfur cycle and seem to suggest that they may be prominent members of the methylotrophic community in surface ocean waters.

  7. Biogeography and individuality shape function in the human skin metagenome

    PubMed Central

    Oh, Julia; Byrd, Allyson L.; Deming, Clay; Conlan, Sean; Kong, Heidi H.; Segre, Julia A.

    2014-01-01

    Summary The varied topography of human skin offers a unique opportunity to study how the body’s microenvironments influence the functional and taxonomic composition of microbial communities. Phylogenetic marker gene-based studies have identified many bacteria and fungi that colonize distinct skin niches. Here, metagenomic analyses of diverse body sites in healthy humans demonstrate that local biogeography and strong individuality define the skin microbiome. We developed a relational analysis of bacterial, fungal, and viral communities, which showed not only site-specificity but also individual signatures. We further identified strain-level variation of dominant species as heterogeneous and multiphyletic. Reference-free analyses captured the uncharacterized metagenome through the development of a multi-kingdom gene catalog, which was used to uncover genetic signatures of species lacking reference genomes. This work is foundational for human disease studies investigating inter-kingdom interactions, metabolic changes, and strain tracking and defines the dual influence of biogeography and individuality on microbial composition and function. PMID:25279917

  8. Thermodynamical string fragmentation

    NASA Astrophysics Data System (ADS)

    Fischer, Nadine; Sjöstrand, Torbjörn

    2017-01-01

    The observation of heavy-ion-like behaviour in pp collisions at the LHC suggests that more physics mechanisms are at play than traditionally assumed. The introduction e.g. of quark-gluon plasma or colour rope formation can describe several of the observations, but as of yet there is no established paradigm. In this article we study a few possible modifications to the Pythia event generator, which describes a wealth of data but fails for a number of recent observations. Firstly, we present a new model for generating the transverse momentum of hadrons during the string fragmentation process, inspired by thermodynamics, where heavier hadrons naturally are suppressed in rate but obtain a higher average transverse momentum. Secondly, close-packing of strings is taken into account by making the temperature or string tension environment-dependent. Thirdly, a simple model for hadron rescattering is added. The effect of these modifications is studied, individually and taken together, and compared with data mainly from the LHC. While some improvements can be noted, it turns out to be nontrivial to obtain effects as big as required, and further work is called for.

  9. A Review of Bioinformatics Tools for Bio-Prospecting from Metagenomic Sequence Data.

    PubMed

    Roumpeka, Despoina D; Wallace, R John; Escalettes, Frank; Fotheringham, Ian; Watson, Mick

    2017-01-01

    The microbiome can be defined as the community of microorganisms that live in a particular environment. Metagenomics is the practice of sequencing DNA from the genomes of all organisms present in a particular sample, and has become a common method for the study of microbiome population structure and function. Increasingly, researchers are finding novel genes encoded within metagenomes, many of which may be of interest to the biotechnology and pharmaceutical industries. However, such "bioprospecting" requires a suite of sophisticated bioinformatics tools to make sense of the data. This review summarizes the most commonly used bioinformatics tools for the assembly and annotation of metagenomic sequence data with the aim of discovering novel genes.

  10. Preparation of fosmid libraries and functional metagenomic analysis of microbial community DNA.

    PubMed

    Martínez, Asunción; Osburne, Marcia S

    2013-01-01

    One of the most important challenges in contemporary microbial ecology is to assign a functional role to the large number of novel genes discovered through large-scale sequencing of natural microbial communities that lack similarity to genes of known function. Functional screening of metagenomic libraries, that is, screening environmental DNA clones for the ability to confer an activity of interest to a heterologous bacterial host, is a promising approach for bridging the gap between metagenomic DNA sequencing and functional characterization. Here, we describe methods for isolating environmental DNA and constructing metagenomic fosmid libraries, as well as methods for designing and implementing successful functional screens of such libraries.

  11. MetAMOS: a modular and open source metagenomic assembly and analysis pipeline

    PubMed Central

    2013-01-01

    We describe MetAMOS, an open source and modular metagenomic assembly and analysis pipeline. MetAMOS represents an important step towards fully automated metagenomic analysis, starting with next-generation sequencing reads and producing genomic scaffolds, open-reading frames and taxonomic or functional annotations. MetAMOS can aid in reducing assembly errors, commonly encountered when assembling metagenomic samples, and improves taxonomic assignment accuracy while also reducing computational cost. MetAMOS can be downloaded from: https://github.com/treangen/MetAMOS. PMID:23320958

  12. Fragment screening and HIV therapeutics.

    PubMed

    Bauman, Joseph D; Patel, Disha; Arnold, Eddy

    2012-01-01

    Fragment screening has proven to be a powerful alternative to traditional methods for drug discovery. Biophysical methods, such as X-ray crystallography, NMR spectroscopy, and surface plasmon resonance, are used to screen a diverse library of small molecule compounds. Although compounds identified via this approach have relatively weak affinity, they provide a good platform for lead development and are highly efficient binders with respect to their size. Fragment screening has been utilized for a wide range of targets, including HIV-1 proteins. Here, we review the fragment screening studies targeting HIV-1 proteins using X-ray crystallography or surface plasmon resonance. These studies have successfully detected binding of novel fragments to either previously established or new sites on HIV-1 protease and reverse transcriptase. In addition, fragment screening against HIV-1 reverse transcriptase has been used as a tool to better understand the complex nature of ligand binding to a flexible target.

  13. Fragmentation functions in nuclear media

    NASA Astrophysics Data System (ADS)

    Sassot, Rodolfo; Stratmann, Marco; Zurita, Pia

    2010-03-01

    We perform a detailed phenomenological analysis of how well hadronization in nuclear environments can be described in terms of effective fragmentation functions. The medium modified fragmentation functions are assumed to factorize from the partonic scattering cross sections and evolve in the hard scale in the same way as the standard or vacuum fragmentation functions. Based on precise data on semi-inclusive deep-inelastic scattering off nuclei and hadron production in deuteron-gold collisions, we extract sets of effective fragmentation functions for pions and kaons at next-to-leading order accuracy. The obtained sets provide a rather accurate description of the kinematical dependence of the analyzed cross sections and are found to differ significantly from standard fragmentation functions both in shape and magnitude. Our results support the notion of factorization and universality in the studied nuclear environments, at least in an effective way and within the precision of the available data.

  14. Parallel-META 2.0: enhanced metagenomic data analysis with functional annotation, high performance computing and advanced visualization.

    PubMed

    Su, Xiaoquan; Pan, Weihua; Song, Baoxing; Xu, Jian; Ning, Kang

    2014-01-01

    The metagenomic method directly sequences and analyses genome information from microbial communities. The main computational tasks for metagenomic analyses include taxonomical and functional structure analysis for all genomes in a microbial community (also referred to as a metagenomic sample). With the advancement of Next Generation Sequencing (NGS) techniques, the number of metagenomic samples and the data size for each sample are increasing rapidly. Current metagenomic analysis is both data- and computation- intensive, especially when there are many species in a metagenomic sample, and each has a large number of sequences. As such, metagenomic analyses require extensive computational power. The increasing analytical requirements further augment the challenges for computation analysis. In this work, we have proposed Parallel-META 2.0, a metagenomic analysis software package, to cope with such needs for efficient and fast analyses of taxonomical and functional structures for microbial communities. Parallel-META 2.0 is an extended and improved version of Parallel-META 1.0, which enhances the taxonomical analysis using multiple databases, improves computation efficiency by optimized parallel computing, and supports interactive visualization of results in multiple views. Furthermore, it enables functional analysis for metagenomic samples including short-reads assembly, gene prediction and functional annotation. Therefore, it could provide accurate taxonomical and functional analyses of the metagenomic samples in high-throughput manner and on large scale.

  15. Clinical and legal significance of fragmentation of bullets in relation to size of wounds: retrospective analysis

    PubMed Central

    Coupland, Robin

    1999-01-01

    Objective To examine the relation between fragmentation of bullets and size of wounds clinically and in the context of the Hague Declaration of 1899. Design Retrospective analysis of prospectively collected data on hospital admissions. Setting Hospitals of the International Committee of the Red Cross. Subjects 5215 people wounded by bullets in armed conflicts (5933 wounds). Main outcome measures Grade of wound computed from the Red Cross wound classification and presence of bullet fragments on radiography. Results Of the 347 wounds with fragmentation of bullets, 251 (72%) were large wounds (grade 2 or 3)—that is, those with a clinically detectable cavity. Of the 5586 wounds without fragmentation of bullets, 2915 (52.1%) were large wounds. Only 7.9% (251/3166) of large wounds were associated with fragmentation of bullets. Conclusions Fragmentation of bullets is associated with large wounds, but most large wounds do not contain bullet fragments. In addition, bullet fragments may occur in wounds that are not defined as large. Fragmentation of bullets is neither a necessary nor sufficient cause of large wounds, and surgeons should not diagnose extensive tissue damage because of the presence of fragments on radiography. Such findings also do not necessarily represent the use of bullets which contravene the law of war. Future legislation should take into account not only the construction of bullets but also their potential to transfer energy to the human body. Key messagesThe use of certain bullets has been prohibited in warWounds from bullets are caused by transfer of kinetic energy from the bullet to the tissuesThe relation between size of wound and fragmentation of bullets can be examined using the Red Cross wound classification system Fragments of bullets seen on radiographs of wounds sustained in wars do not necessarily represent large wounds or the use of illegal bulletsExisting legislation on the construction of bullets should be supplemented by legislation on

  16. Metagenomic study of the oral microbiota by Illumina high-throughput sequencing

    PubMed Central

    Lazarevic, Vladimir; Whiteson, Katrine; Huse, Susan; Hernandez, David; Farinelli, Laurent; Østerås, Magne; Schrenzel, Jacques; François, Patrice

    2013-01-01

    To date, metagenomic studies have relied on the utilization and analysis of reads obtained using 454 pyrosequencing to replace conventional Sanger sequencing. After extensively scanning the 16S ribosomal RNA (rRNA) gene, we identified the V5 hypervariable region as a short region providing reliable identification of bacterial sequences available in public databases such as the Human Oral Microbiome Database. We amplified samples from the oral cavity of three healthy individuals using primers covering an ~82-base segment of the V5 loop, and sequenced using the Illumina technology in a single orientation. We identified 135 genera or higher taxonomic ranks from the resulting 1,373,824 sequences. While the abundances of the most common phyla (Firmicutes, Proteobacteria, Actinobacteria, Fusobacteria and TM7) are largely comparable to previous studies, Bacteroidetes were less present. Potential sources for this difference include classification bias in this region of the 16S rRNA gene, human sample variation, sample preparation and primer bias. Using an Illumina sequencing approach, we achieved a much greater depth of coverage than previous oral microbiota studies, allowing us to identify several taxa not yet discovered in these types of samples, and to assess that at least 30,000 additional reads would be required to identify only one additional phylotype. The evolution of high-throughput sequencing technologies, and their subsequent improvements in read length enable the utilization of different platforms for studying communities of complex flora. Access to large amounts of data is already leading to a better representation of sample diversity at a reasonable cost. PMID:19796657

  17. The potential use of bacterial community succession in forensics as described by high throughput metagenomic sequencing.

    PubMed

    Pechal, Jennifer L; Crippen, Tawni L; Benbow, M Eric; Tarone, Aaron M; Dowd, Scot; Tomberlin, Jeffery K

    2014-01-01

    Decomposition studies of vertebrate remains primarily focus on data that can be seen with the naked eye, such as arthropod or vertebrate scavenger activity, with little regard for what might be occurring with the microorganism community. Here, we discuss the necrobiome, or community of organisms associated with the decomposition of remains, specifically, the "epinecrotic" bacterial community succession throughout decomposition of vertebrate carrion. Pyrosequencing was used to (1) detect and identify bacterial community abundance patterns that described discrete time points of the decomposition process and (2) identify bacterial taxa important for estimating physiological time, a time-temperature metric that is often commensurate with minimum post-mortem interval estimates, via thermal summation models. There were significant bacterial community structure differences in taxon richness and relative abundance patterns through the decomposition process at both phylum and family taxonomic classification levels. We found a significant negative linear relationship for overall phylum and family taxon richness as decomposition progressed. Additionally, we developed a statistical model using high throughput sequencing data of epinecrotic bacterial communities on vertebrate remains that explained 94.4 % of the time since placement of remains in the field, which was within 2-3 h of death. These bacteria taxa are potentially useful for estimating the minimum post-mortem interval. Lastly, we provide a new framework and standard operating procedure of how this novel approach of using high throughput metagenomic sequencing has remarkable potential as a new forensic tool. Documenting and identifying differences in bacterial communities is key to advancing knowledge of the carrion necrobiome and its applicability in forensic science.

  18. Use of simulated data sets to evaluate the fidelity of metagenomic processing methods.

    PubMed

    Mavromatis, Konstantinos; Ivanova, Natalia; Barry, Kerrie; Shapiro, Harris; Goltsman, Eugene; McHardy, Alice C; Rigoutsos, Isidore; Salamov, Asaf; Korzeniewski, Frank; Land, Miriam; Lapidus, Alla; Grigoriev, Igor; Richardson, Paul; Hugenholtz, Philip; Kyrpides, Nikos C

    2007-06-01

    Metagenomics is a rapidly emerging field of research for studying microbial communities. To evaluate methods presently used to process metagenomic sequences, we constructed three simulated data sets of varying complexity by combining sequencing reads randomly selected from 113 isolate genomes. These data sets were designed to model real metagenomes in terms of complexity and phylogenetic composition. We assembled sampled reads using three commonly used genome assemblers (Phrap, Arachne and JAZZ), and predicted genes using two popular gene-finding pipelines (fgenesb and CRITICA/GLIMMER). The phylogenetic origins of the assembled contigs were predicted using one sequence similarity-based (blast hit distribution) and two sequence composition-based (PhyloPythia, oligonucleotide frequencies) binning methods. We explored the effects of the simulated community structure and method combinations on the fidelity of each processing step by comparison to the corresponding isolate genomes. The simulated data sets are available online to facilitate standardized benchmarking of tools for metagenomic analysis.

  19. Metagenomic data of fungal internal transcribed spacer from serofluid dish, a traditional Chinese fermented food

    PubMed Central

    Chen, Peng; Zhao, Yang; Wu, Zhengrong; Liu, Ronghui; Xu, Ruixiang; Yan, Lei; Li, Hongyu

    2015-01-01

    Serofluid dish (or Jiangshui, in Chinese), a traditional food in the Chinese culture for thousands of years, is made from vegetables by fermentation. In this work, microorganism community of the fermented serofluid dish was investigated by the culture-independent method. The metagenomic data in this article contains the sequences of fungal internal transcribed spacer (ITS) regions of rRNA genes from 12 different serofluid dish samples. The metagenome comprised of 50,865 average raw reads with an average of 8,958,220 bp and G + C content is 45.62%. This is the first report on metagenomic data of fungal ITS from serofluid dish employing Illumina platform to profile the fungal communities of this little known fermented food from Gansu Province, China. The Metagenomic data of fungal internal transcribed spacer can be accessed at NCBI, SRA database accession no. SRP067411. PMID:26981389

  20. Metagenomic data of fungal internal transcribed spacer from serofluid dish, a traditional Chinese fermented food.

    PubMed

    Chen, Peng; Zhao, Yang; Wu, Zhengrong; Liu, Ronghui; Xu, Ruixiang; Yan, Lei; Li, Hongyu

    2016-03-01

    Serofluid dish (or Jiangshui, in Chinese), a traditional food in the Chinese culture for thousands of years, is made from vegetables by fermentation. In this work, microorganism community of the fermented serofluid dish was investigated by the culture-independent method. The metagenomic data in this article contains the sequences of fungal internal transcribed spacer (ITS) regions of rRNA genes from 12 different serofluid dish samples. The metagenome comprised of 50,865 average raw reads with an average of 8,958,220 bp and G + C content is 45.62%. This is the first report on metagenomic data of fungal ITS from serofluid dish employing Illumina platform to profile the fungal communities of this little known fermented food from Gansu Province, China. The Metagenomic data of fungal internal transcribed spacer can be accessed at NCBI, SRA database accession no. SRP067411.

  1. Screening for novel enzymes from metagenome and SIGEX, as a way to improve it

    PubMed Central

    Yun, Jiae; Ryu, Sangryeol

    2005-01-01

    Metagenomics has been successfully applied to isolate novel biocatalysts from the uncultured microbiota in the environment. Two types of screening have been used to identify clones carrying desired traits from metagenomic libraries: function-based screening, and sequence-based screening. Both function- and sequence- based screening have individual advantages and disadvantages, and they have been applied successfully to discover biocatalysts from metagenome. However, both strategies are laborious and tedious because of the low frequency of screening hits. A recent paper introduced a high throughput screening strategy, termed substrate-induced gene-expression screening (SIGEX). SIGEX is designed to select the clones harboring catabolic genes induced by various substrates in concert with fluorescence activated cell sorting (FACS). This method was applied successfully to isolate aromatic hydrocarbon-induced genes from a metagenomic library. Although SIGEX has many limitations, it is expected to provide economic advantages, especially to industry. PMID:15790425

  2. Signature Peptide-Enabled Metagenomics (Seventh Annual Sequencing, Finishing, Analysis in the Future (SFAF) Meeting 2012)

    ScienceCinema

    McMahon, Ben [LANL

    2016-07-12

    Ben McMahon of Los Alamos National Laboratory (LANL) presents "Signature Peptide-Enabled Metagenomics" at the 7th Annual Sequencing, Finishing, Analysis in the Future (SFAF) Meeting held in June, 2012 in Santa Fe, NM.

  3. Experimental Design and Bioinformatics Analysis for the Application of Metagenomics in Environmental Sciences and Biotechnology.

    PubMed

    Ju, Feng; Zhang, Tong

    2015-11-03

    Recent advances in DNA sequencing technologies have prompted the widespread application of metagenomics for the investigation of novel bioresources (e.g., industrial enzymes and bioactive molecules) and unknown biohazards (e.g., pathogens and antibiotic resistance genes) in natural and engineered microbial systems across multiple disciplines. This review discusses the rigorous experimental design and sample preparation in the context of applying metagenomics in environmental sciences and biotechnology. Moreover, this review summarizes the principles, methodologies, and state-of-the-art bioinformatics procedures, tools and database resources for metagenomics applications and discusses two popular strategies (analysis of unassembled reads versus assembled contigs/draft genomes) for quantitative or qualitative insights of microbial community structure and functions. Overall, this review aims to facilitate more extensive application of metagenomics in the investigation of uncultured microorganisms, novel enzymes, microbe-environment interactions, and biohazards in biotechnological applications where microbial communities are engineered for bioenergy production, wastewater treatment, and bioremediation.

  4. Driven fragmentation of granular gases.

    PubMed

    Cruz Hidalgo, Raúl; Pagonabarraga, Ignacio

    2008-06-01

    The dynamics of homogeneously heated granular gases which fragment due to particle collisions is analyzed. We introduce a kinetic model which accounts for correlations induced at the grain collisions and analyze both the kinetics and relevant distribution functions these systems develop. The work combines analytical and numerical studies based on direct simulation Monte Carlo calculations. A broad family of fragmentation probabilities is considered, and its implications for the system kinetics are discussed. We show that generically these driven materials evolve asymptotically into a dynamical scaling regime. If the fragmentation probability tends to a constant, the grain number diverges at a finite time, leading to a shattering singularity. If the fragmentation probability vanishes, then the number of grains grows monotonously as a power law. We consider different homogeneous thermostats and show that the kinetics of these systems depends weakly on both the grain inelasticity and driving. We observe that fragmentation plays a relevant role in the shape of the velocity distribution of the particles. When the fragmentation is driven by local stochastic events, the long velocity tail is essentially exponential independently of the heating frequency and the breaking rule. However, for a Lowe-Andersen thermostat, numerical evidence strongly supports the conjecture that the scaled velocity distribution follows a generalized exponential behavior f(c) approximately exp(-cn) , with n approximately 1.2 , regarding less the fragmentation mechanisms.

  5. Amplification of thermostable lipase genes fragment from thermogenic phase of domestic waste composting process

    NASA Astrophysics Data System (ADS)

    Nurhasanah, Nurbaiti, Santi; Madayanti, Fida; Akhmaloka

    2015-09-01

    Lipases are lipolytic enzymes, catalyze the hydrolysis of fatty acid ester bonds of triglycerides to produce free fatty acids and glycerol. The enzyme is widely used in various fields of biotechnological industry. Hence, lipases with unique properties (e.g.thermostable lipase) are still being explored by variation methods. One of the strategy is by using metagenomic approach to amplify the gene directly from environmental sample. This research was focused on amplification of lipase gene fragment directly from the thermogenic phase of domestic waste composting in aerated trenches. We used domestic waste compost from waste treatment at SABUGA, ITB for the sample. Total chromosomal DNA were directly extracted from several stages at thermogenic phase of compost. The DNA was then directly used as a template for amplification of thermostable lipase gene fragments using a set of internal primers namely Flip-1a and Rlip-1a that has been affixed with a GC clamp in reverse primer. The results showed that the primers amplified the gene from four stages of thermogenic phase with the size of lipase gene fragment of approximately 570 base pairs (bp). These results were further used for Denaturing Gradient Gel Electrophoresis (DGGE) analysis to determine diversity of thermostable lipase gene fragments.

  6. Computational and Experimental Determination of Fragmentation for Naturally Fragmenting Warheads

    DTIC Science & Technology

    1981-05-01

    Table Page I Chemical analysis of Armco iron and HF-I steel ....................... 3 2 Summary of tensile-pull measurements for transverse-direction...ntered) REPORT DOCUMENTATION PAGE - E-EFORE COMTLETING FORM I REPORT NUMBER 2 GOVT ACCESSION NO. 3 RECIPIENT’S CATALOG NUMBERNSWC TR 80-238 4 TITLE (and...Sulbtitle) S TYPE OF REPORT & PERIOD COVERED COMPUTATIONAL AND EXPERIMENTAL 1 Final DETERMINATION OF FRAGMENTATION FOR NATURALLY FRAGMENTING WARHEADS

  7. Metagenomic Approaches to Natural Products from Free-Living and Symbiotic Organisms

    PubMed Central

    Brady, Sean F.; Simmons, Luke; Kim, Jeff H.; Schmidt, Eric W.

    2010-01-01

    Bacterial cultivation has been a mainstay of natural products discovery for the past 80 years. However, the majority of bacteria are recalcitrant to culture, providing an untapped source for new natural products. Metagenomic analysis provides an alternative method to directly access the uncultivated genome for natural products research and for the discovery of novel, bioactive substances. Applications of metagenomics to diverse habitats, such as soils and the interior of animals, are described. PMID:19844642

  8. Metagenomics for the discovery of novel biosurfactants of environmental interest from marine ecosystems.

    PubMed

    Jackson, Stephen A; Borchert, Erik; O'Gara, Fergal; Dobson, Alan D W

    2015-06-01

    Research focused on the search for new biosurfactants aims to replace chemical surfactants, which while being cost-effective are ecologically undesirable. Metagenomics can lead to discovery of novel biosurfactants, tackling issues of low production yields. Recent successes include the heterologous production of biosurfactants. The dearth of biosurfactants discovered to date through metagenomics is puzzling given that good screening systems and heterologous host systems are available.

  9. The Comprehensive AOCMF Classification System: Classification and Documentation within AOCOIAC Software

    PubMed Central

    Audigé, Laurent; Cornelius, Carl-Peter; Kunz, Christoph; Buitrago-Téllez, Carlos H.; Prein, Joachim

    2014-01-01

    The AOCMF Classification Group developed a hierarchical three-level craniomaxillofacial (CMF) fracture classification system. The fundamental level 1 distinguishes four major anatomical units including the mandible (code 91), midface (code 92), skull base (code 93) and cranial vault (code 94); level 2 relates to the location of the fractures within defined topographical regions within each units; level 3 relates to fracture morphology in these regions regarding fragmentation, displacement, and bone defects, as well as the involvement of specific anatomical structures. The resulting CMF classification system has been implemented into AO comprehensive injury automatic classifier (AOCOIAC) software allowing for fracture classification as well as clinical documentation of individual cases including a selected sample of diagnostic images. This tutorial highlights the main features of the software. In addition, a series of illustrative case examples is made available electronically for viewing and editing. PMID:25489395

  10. Depleted Uranium Test Range Fragment Reclamation

    DTIC Science & Technology

    1982-07-01

    fragment drying was necessary in order to obtain adequate vacuum levels in the VIR furnaces . e. Vacuujm Induction Remelting Fragments and Casting...Acid Pickle and Water Rinse .... ........ 2 d. Drying the Fragments .... ............... 2 e. Vacuum Induction Remelting Fragments and Casting...feasibility of reclaiming test range fragments by vacuum induction remelting (VIR). The technical direction of Phase 11 was highly dependent upon the

  11. Metagenomic analyses of drinking water receiving different disinfection treatments.

    PubMed

    Gomez-Alvarez, Vicente; Revetta, Randy P; Santo Domingo, Jorge W

    2012-09-01

    A metagenome-based approach was used to assess the taxonomic affiliation and function potential of microbial populations in free-chlorine-treated (CHL) and monochloramine-treated (CHM) drinking water (DW). In all, 362,640 (averaging 544 bp) and 155,593 (averaging 554 bp) pyrosequencing reads were analyzed for the CHL and CHM samples, respectively. Most annotated proteins were found to be of bacterial origin, although eukaryotic, archaeal, and viral proteins were also identified. Differences in community structure and function were noted. Most notably, Legionella-like genes were more abundant in the CHL samples while mycobacterial genes were more abundant in CHM samples. Genes associated with multiple disinfectant mechanisms were identified in both communities. Moreover, sequences linked to virulence factors, such as antibiotic resistance mechanisms, were observed in both microbial communities. This study provides new insights into the genetic network and potential biological processes associated with the molecular microbial ecology of DW microbial communities.

  12. Metagenomic Analysis of the Ferret Fecal Viral Flora

    PubMed Central

    Smits, Saskia L.; Raj, V. Stalin; Oduber, Minoushka D.; Schapendonk, Claudia M. E.; Bodewes, Rogier; Provacia, Lisette; Stittelaar, Koert J.; Osterhaus, Albert D. M. E.; Haagmans, Bart L.

    2013-01-01

    Ferrets are widely used as a small animal model for a number of viral infections, including influenza A virus and SARS coronavirus. To further analyze the microbiological status of ferrets, their fecal viral flora was studied using a metagenomics approach. Novel viruses from the families Picorna-, Papilloma-, and Anelloviridae as well as known viruses from the families Astro-, Corona-, Parvo-, and Hepeviridae were identified in different ferret cohorts. Ferret kobu- and hepatitis E virus were mainly present in human household ferrets, whereas coronaviruses were found both in household as well as farm ferrets. Our studies illuminate the viral diversity found in ferrets and provide tools to prescreen for newly identified viruses that potentially could influence disease outcome of experimental virus infections in ferrets. PMID:23977082

  13. Metagenomics to paleogenomics: large-scale sequencing of mammoth DNA.

    PubMed

    Poinar, Hendrik N; Schwarz, Carsten; Qi, Ji; Shapiro, Beth; Macphee, Ross D E; Buigues, Bernard; Tikhonov, Alexei; Huson, Daniel H; Tomsho, Lynn P; Auch, Alexander; Rampp, Markus; Miller, Webb; Schuster, Stephan C

    2006-01-20

    We sequenced 28 million base pairs of DNA in a metagenomics approach, using a woolly mammoth (Mammuthus primigenius) sample from Siberia. As a result of exceptional sample preservation and the use of a recently developed emulsion polymerase chain reaction and pyrosequencing technique, 13 million base pairs (45.4%) of the sequencing reads were identified as mammoth DNA. Sequence identity between our data and African elephant (Loxodonta africana) was 98.55%, consistent with a paleontologically based divergence date of 5 to 6 million years. The sample includes a surprisingly small diversity of environmental DNAs. The high percentage of endogenous DNA recoverable from this single mammoth would allow for completion of its genome, unleashing the field of paleogenomics.

  14. Metagenomics, Metatranscriptomics, and Metabolomics Approaches for Microbiome Analysis

    PubMed Central

    Aguiar-Pulido, Vanessa; Huang, Wenrui; Suarez-Ulloa, Victoria; Cickovski, Trevor; Mathee, Kalai; Narasimhan, Giri

    2016-01-01

    Microbiomes are ubiquitous and are found in the ocean, the soil, and in/on other living organisms. Changes in the microbiome can impact the health of the environmental niche in which they reside. In order to learn more about these communities, different approaches based on data from multiple omics have been pursued. Metagenomics produces a taxonomical profile of the sample, metatranscriptomics helps us to obtain a functional profile, and metabolomics completes the picture by determining which byproducts are being released into the environment. Although each approach provides valuable information separately, we show that, when combined, they paint a more comprehensive picture. We conclude with a review of network-based approaches as applied to integrative studies, which we believe holds the key to in-depth understanding of microbiomes. PMID:27199545

  15. Monitoring microbial diversity of bioreactors using metagenomic approaches.

    PubMed

    Ellis, Joshua T; Sims, Ronald C; Miller, Charles D

    2012-01-01

    With the rapid development of molecular techniques, particularly 'omics' technologies, the field of microbial ecology is growing rapidly. The applications of next generation sequencing have allowed researchers to produce massive amounts of genetic data on individual microbes, providing information about microbial communities and their interactions through in situ and in vitro measurements. The ability to identify novel microbes, functions, and enzymes, along with developing an understanding of microbial interactions and functions, is necessary for efficient production of useful and high value products in bioreactors. The ability to optimize bioreactors fully and understand microbial interactions and functions within these systems will establish highly efficient industrial processes for the production of bioproducts. This chapter will provide an overview of bioreactors and metagenomic technologies to help the reader understand microbial communities, interactions, and functions in bioreactors.

  16. Dynamics of Sequence -Discrete Bacterial Populations Inferred Using Metagenomes

    SciTech Connect

    Stevens, Sarah; Bendall, Matthew; Kang, Dongwan; Froula, Jeff; Egan, Rob; Chan, Leong-Keat; Tringe, Susannah; McMahon, Katherine; Malmstrom, Rex

    2014-03-14

    From a multi-year metagenomic time series of two dissimilar Wisconsin lakes we have assembled dozens of genomes using a novel approach that bins contigs into distinct genome based on sequence composition, e.g. kmer frequencies, and contig coverage patterns at various times points. Next, we investigated how these genomes, which represent sequence-discrete bacterial populations, evolved over time and used the time series to discover the population dynamics. For example, we explored changes in single nucleotide polymorphism (SNP) frequencies as well as patterns of gene gain and loss in multiple populations. Interestingly, SNP diversity was purged at nearly every genome position in some populations during the course of this study, suggesting these populations may have experienced genome-wide selective sweeps. This represents the first direct, time-resolved observations of periodic selection in natural populations, a key process predicted by the ecotype model of bacterial diversification.

  17. Metagenomic Analysis of Microbial Symbionts in a Gutless Worm

    SciTech Connect

    Woyke, Tanja; Teeling, Hanno; Ivanova, Natalia N.; Hunteman, Marcel; Richter, Michael; Gloeckner, Frank Oliver; Boeffelli, Dario; Barry, Kerrie W.; Shapiro, Harris J.; Anderson, Iain J.; Szeto, Ernest; Kyrpides, Nikos C.; Mussmann, Marc; Amann, Rudolf; Bergin, Claudia; Ruehland, Caroline; Rubin, Edward M.; Dubilier, Nicole

    2006-05-01

    Symbioses between bacteria and eukaryotes are ubiquitous, yet our understanding of the interactions driving these associations is hampered by our inability to cultivate most host-associated microbes. Here we use a metagenomic approach to describe four co-occurring symbionts from the marine oligochaete Olavius algarvensis, a worm lacking a mouth, gut and nephridia. Shotgun sequencing and metabolic pathway reconstruction revealed that the symbionts are sulphur-oxidizing and sulphate-reducing bacteria, all of which are capable of carbon fixation, thus providing the host with multiple sources of nutrition. Molecular evidence for the uptake and recycling of worm waste products by the symbionts suggests how the worm could eliminate its excretory system, an adaptation unique among annelid worms. We propose a model that describes how the versatile metabolism within this symbiotic consortium provides the host with an optimal energy supply as it shuttles between the upper oxic and lower anoxic coastal sediments that it inhabits.

  18. Metagenomic recovery of phage genomes of uncultured freshwater actinobacteria.

    PubMed

    Ghai, Rohit; Mehrshad, Maliheh; Megumi Mizuno, Carolina; Rodriguez-Valera, Francisco

    2017-01-01

    Low-GC Actinobacteria are among the most abundant and widespread microbes in freshwaters and have largely resisted all cultivation efforts. Consequently, their phages have remained totally unknown. In this work, we have used deep metagenomic sequencing to assemble eight complete genomes of the first tailed phages that infect freshwater Actinobacteria. Their genomes encode the actinobacterial-specific transcription factor whiB, frequently found in mycobacteriophages and also in phages infecting marine pelagic Actinobacteria. Its presence suggests a common and widespread strategy of modulation of host transcriptional machinery upon infection via this transcriptional switch. We present evidence that some whiB-carrying phages infect the acI lineage of Actinobacteria. At least one of them encodes the ADP-ribosylating component of the widespread bacterial AB toxins family (for example, clostridial toxin). We posit that the presence of this toxin reflects a 'trojan horse' strategy, providing protection at the population level to the abundant host microbes against eukaryotic predators.

  19. Metagenomic analysis of the ferret fecal viral flora.

    PubMed

    Smits, Saskia L; Raj, V Stalin; Oduber, Minoushka D; Schapendonk, Claudia M E; Bodewes, Rogier; Provacia, Lisette; Stittelaar, Koert J; Osterhaus, Albert D M E; Haagmans, Bart L

    2013-01-01

    Ferrets are widely used as a small animal model for a number of viral infections, including influenza A virus and SARS coronavirus. To further analyze the microbiological status of ferrets, their fecal viral flora was studied using a metagenomics approach. Novel viruses from the families Picorna-, Papilloma-, and Anelloviridae as well as known viruses from the families Astro-, Corona-, Parvo-, and Hepeviridae were identified in different ferret cohorts. Ferret kobu- and hepatitis E virus were mainly present in human household ferrets, whereas coronaviruses were found both in household as well as farm ferrets. Our studies illuminate the viral diversity found in ferrets and provide tools to prescreen for newly identified viruses that potentially could influence disease outcome of experimental virus infections in ferrets.

  20. Novel metal resistance genes from microorganisms: a functional metagenomic approach.

    PubMed

    González-Pastor, José E; Mirete, Salvador

    2010-01-01

    Most of the known metal resistance mechanisms are based on studies of cultured microorganisms, and the abundant uncultured fraction could be an important source of genes responsible for uncharacterized resistance mechanisms. A functional metagenomic approach was selected to recover metal resistance genes from the rhizosphere microbial community of an acid-mine drainage (AMD)-adapted plant, Erica andevalensis, from Rio Tinto, Spain. A total of 13 nickel resistant clones were isolated and analyzed, encoding hypothetical or conserved hypothetical proteins of uncertain functions, or well-characterized proteins, but not previously reported to be related to nickel resistance. The resistance clones were classified into two groups according to their nickel accumulation properties: those preventing or those favoring metal accumulation. Two clones encoding putative ABC transporter components and a serine O-acetyltransferase were found as representatives of each group, respectively.

  1. Metagenome mining reveals polytheonamides as posttranslationally modified ribosomal peptides.

    PubMed

    Freeman, Michael F; Gurgui, Cristian; Helf, Maximilian J; Morinaka, Brandon I; Uria, Agustinus R; Oldham, Neil J; Sahl, Hans-Georg; Matsunaga, Shigeki; Piel, Jörn

    2012-10-19

    It is held as a paradigm that ribosomally synthesized peptides and proteins contain only l-amino acids. We demonstrate a ribosomal origin of the marine sponge-derived polytheonamides, exceptionally potent, giant natural-product toxins. Isolation of the biosynthetic genes from the sponge metagenome revealed a bacterial gene architecture. Only six candidate enzymes were identified for 48 posttranslational modifications, including 18 epimerizations and 17 methylations of nonactivated carbon centers. Three enzymes were functionally validated, which showed that a radical S-adenosylmethionine enzyme is responsible for the unidirectional epimerization of multiple and different amino acids. Collectively, these complex alterations create toxins that function as unimolecular minimalistic ion channels with near-femtomolar activity. This study broadens the biosynthetic scope of ribosomal systems and creates new opportunities for peptide and protein bioengineering.

  2. Dental caries pathogenicity: a genomic and metagenomic perspective

    PubMed Central

    Peterson, Scott N.; Snesrud, Erik; Schork, Nicholas J.; Bretz, Walter A.

    2013-01-01

    In this review we address the subject of dental caries pathogenicity from a genomic and metagenomic perspective. The application of genomic technologies is certain to yield novel insights into the relationship between the bacterial flora, dental health and disease. Three primary attributes of bacterial species are thought to have direct impact on caries development, these include: adherence on tooth surfaces (biofilm formation), acid production and acid tolerance. Attempts to define the specific aetiological agents of dental caries have proven to be elusive, supporting the notion that caries aetiology is perhaps complex and multi-faceted. The recently introduced Human Microbiome Project (HMP) that endeavors to characterise the micro-organisms living in and on the human body is likely to shed new light on these questions and improve our understanding of polymicrobial disease, microbial ecology in the oral cavity and provide new avenues for therapeutic and molecular diagnostics developments. PMID:21726221

  3. Symbiosis insights through metagenomic analysis of a microbialconsortium

    SciTech Connect

    Woyke, Tanja; Teeling, Hanno; Ivanova, Natalia N.; Hunteman,Marcel; Richter, Michael; Gloeckner, Frank Oliver; Boffelli, Dario; Barry, Kerrie W.; Shapiro, Harris J.; Anderson, Iain J.; Szeto, Ernest; Kyrpides, Nikos C.; Mussmann, Marc; Amann, Rudolf; Bergin, Claudia; Ruehland, Caroline; Rubin, Edward M.; Dubilier, Nicole

    2006-09-01

    Symbioses between bacteria and eukaryotes are ubiquitous, yet our understanding of the interactions driving these associations is hampered by our inability to cultivate most host-associated microbes. Here, we used a metagenomic approach to describe four co-occurring symbionts from the marine oligochaete Olavius algarvensis, a worm lacking a mouth, gut, and nephridia. Shotgun sequencing and metabolic pathway reconstruction revealed that the symbionts are sulfur-oxidizing and sulfate-reducing bacteria, all of which are capable of carbon fixation, providing the host with multiple sources of nutrition. Molecular evidence for the uptake and recycling of worm waste products by the symbionts suggests how the worm could eliminate its excretory system, an adaptation unique among annelid worms. We propose a model which describes how the versatile metabolism within this symbiotic consortium provides the host with an optimal energy supply as it shuttles between the upper oxic and lower anoxic coastal sediments which it inhabits.

  4. An integrated metagenome and -proteome analysis of the microbial community residing in a biogas production plant.

    PubMed

    Ortseifen, Vera; Stolze, Yvonne; Maus, Irena; Sczyrba, Alexander; Bremges, Andreas; Albaum, Stefan P; Jaenicke, Sebastian; Fracowiak, Jochen; Pühler, Alfred; Schlüter, Andreas

    2016-08-10

    To study the metaproteome of a biogas-producing microbial community, fermentation samples were taken from an agricultural biogas plant for microbial cell and protein extraction and corresponding metagenome analyses. Based on metagenome sequence data, taxonomic community profiling was performed to elucidate the composition of bacterial and archaeal sub-communities. The community's cytosolic metaproteome was represented in a 2D-PAGE approach. Metaproteome databases for protein identification were compiled based on the assembled metagenome sequence dataset for the biogas plant analyzed and non-corresponding biogas metagenomes. Protein identification results revealed that the corresponding biogas protein database facilitated the highest identification rate followed by other biogas-specific databases, whereas common public databases yielded insufficient identification rates. Proteins of the biogas microbiome identified as highly abundant were assigned to the pathways involved in methanogenesis, transport and carbon metabolism. Moreover, the integrated metagenome/-proteome approach enabled the examination of genetic-context information for genes encoding identified proteins by studying neighboring genes on the corresponding contig. Exemplarily, this approach led to the identification of a Methanoculleus sp. contig encoding 16 methanogenesis-related gene products, three of which were also detected as abundant proteins within the community's metaproteome. Thus, metagenome contigs provide additional information on the genetic environment of identified abundant proteins.

  5. Grid-Assembly: An oligonucleotide composition-based partitioning strategy to aid metagenomic sequence assembly.

    PubMed

    Ghosh, Tarini Shankar; Mehra, Varun; Mande, Sharmila S

    2015-06-01

    Metagenomics approach involves extraction, sequencing and characterization of the genomic content of entire community of microbes present in a given environment. In contrast to genomic data, accurate assembly of metagenomic sequences is a challenging task. Given the huge volume and the diverse taxonomic origin of metagenomic sequences, direct application of single genome assembly methods on metagenomes are likely to not only lead to an immense increase in requirements of computational infrastructure, but also result in the formation of chimeric contigs. A strategy to address the above challenge would be to partition metagenomic sequence datasets into clusters and assemble separately the sequences in individual clusters using any single-genome assembly method. The current study presents such an approach that uses tetranucleotide usage patterns to first represent sequences as points in a three dimensional (3D) space. The 3D space is subsequently partitioned into "Grids". Sequences within overlapping grids are then progressively assembled using any available assembler. We demonstrate the applicability of the current Grid-Assembly method using various categories of assemblers as well as different simulated metagenomic datasets. Validation results indicate that the Grid-Assembly approach helps in improving the overall quality of assembly, in terms of the purity and volume of the assembled contigs.

  6. MetaSort untangles metagenome assembly by reducing microbial community complexity

    PubMed Central

    Ji, Peifeng; Zhang, Yanming; Wang, Jinfeng; Zhao, Fangqing

    2017-01-01

    Most current approaches to analyse metagenomic data rely on reference genomes. Novel microbial communities extend far beyond the coverage of reference databases and de novo metagenome assembly from complex microbial communities remains a great challenge. Here we present a novel experimental and bioinformatic framework, metaSort, for effective construction of bacterial genomes from metagenomic samples. MetaSort provides a sorted mini-metagenome approach based on flow cytometry and single-cell sequencing methodologies, and employs new computational algorithms to efficiently recover high-quality genomes from the sorted mini-metagenome by the complementary of the original metagenome. Through extensive evaluations, we demonstrated that metaSort has an excellent and unbiased performance on genome recovery and assembly. Furthermore, we applied metaSort to an unexplored microflora colonized on the surface of marine kelp and successfully recovered 75 high-quality genomes at one time. This approach will greatly improve access to microbial genomes from complex or novel communities. PMID:28112173

  7. Novel blue light-sensitive proteins from a metagenomic approach.

    PubMed

    Pathak, G P; Ehrenreich, A; Losi, A; Streit, W R; Gärtner, W

    2009-09-01

    A microarray-based approach was used to screen a soil metagenome for the presence of blue light (BL) photoreceptor-encoding genes. The microarray carried 149 different 54-mer oligonucleotides, derived from consensus sequences of light, oxygen and voltage (LOV) domain BL photoreceptor genes. Calibration of the microarrays allowed the detection of minimally 50 ng of genomic DNA against a background of 2-5 microg of genomic DNA. Identification of a positive cosmid clone was still possible for an amount of 0.25 ng against a background of 10 microg of labelled DNA clones. The array could readily identify targets carrying 4% sequence mismatch. Using the LOV microarray, up to 1200 library clones in concentrations of c. 20 ng each with a c. 40 kb insert size could be screened in a single batch. After calibration and reliability controls, the microarray was probed with cosmid-cloned DNA from the thermophilic fraction of a soil sample. From this approach, a novel gene was isolated that encodes a protein consisting of several Per-Arnt-Sim domains, a LOV domain associated to a histidine kinase and a response regulator domain. The novel gene showed highest similarity to a known sequence from Kineococcus radiotolerans SRS30216 (58% identity for the LOV domain only) and to a gene from Methylibium petroleiphilum PM1 (57% identity). The gene, designated as ht-met1 (Hamburg Thermophile Metagenome 1), was isolated and fully sequenced (3615 bp). ht-met1 is followed by a second open reading frame encoding a Fe-chelatase, an arrangement quite frequent for BL photoreceptors. The LOV domain region of ht-met1 was subcloned and expressed yielding a fully functional, flavin-containing LOV domain. Irradiation generated the typical LOV photochemistry, with the transient formation of a flavin-protein photoadduct. The dark recovery lifetime was found as tau(REC) = 120 s (20 degrees C) and is among the fastest ones determined so far for bacterial LOV domains.

  8. [Biodiversity and Function Analyses of BIOLAK Activated Sludge Metagenome].

    PubMed

    Tian, Mei; Liu, Han-hu; Shen, Xin; Zhao, Fang-qing; Chen, Shuai; Yao, Yong-jia

    2015-05-01

    The BIOLAK is a multi-stage activated sludge process, which has been successfully promoted worldwide. However, the biological community and function of the BIOLAK activated sludge ( the core component in the process) have not been reported so far. In this study, taking Lianyungang Dapu Industrial Zone WWTP as an example, a large-scale metagenomic data (428 588 high-quality DNA sequences) of the BIOLAK activated sludge were obtained by means of a new generation of high-throughput sequencing technology. Amazing biodiversity was revealed in the BIOLAK activated sludge, which included 47 phyla, 872 genera and 1351 species. There were 33 phyla identified in the Bacteria domain (289 933 sequences). Proteohacteria was the most abundant phylum (62.54%), followed by Bacteroidetes (11.29%), Nitrospirae ( 5. 65%) and Planctomycetes (4.79%), suggesting that these groups played a key role in the BIOLAK wastewater treatment system. Among the 748 bacterial genera, Nitrospira (5.60%) was the most prevalent genus, which was a key group in the nitrogen cycle. Followed by Gemmatimonas (2.45%), which was an important genus in the biological phosphorus removal process. In Archaea domain (1019 sequences), three phyla and 39 genera were detected. In Eukaryota domain (1055 sequences), 60 genera and 10 phyla were identified, among which Ciliophora was the largest phylum (257 sequences). Meanwhile, 448 viral sequences were detected in the BIOLAK sludge metagenome, which were dominated by bacteriophages. The proportions of nitrogen, aromatic compounds and phosphorus metabolism in the BIOLAK sludge were 2.50%, 2.28% and 1.56%, respectively, which were higher than those in the sludge of United States and Australia. Among four processes of nitrogen metabolism, denitrification-related genes were most abundant (80.81%), followed by ammonification (12.78%), nitrification,(4.38%) and nitrogen fixation (2.04%). In conclusion, the BIOLAK activated sludge had amazing biodiversity, meanwhile

  9. Towards a standards-compliant genomic and metagenomic publication record

    SciTech Connect

    Fenner, Marsha W; Garrity, George M.; Field, Dawn; Kyrpides, Nikos; Hirschman, Lynette; San-sone, Susanna-Assunta; Anguiloi, Samuel; Cole, James R.; Glockner, Frank Oliver; Kolker, Eugene; Kowaluchuk, George; Moran, Mary Ann; Ussery, Dave; White, Owen

    2008-04-01

    Increasingly we are aware as a community of the growing need to manage the avalanche of genomic and metagenomic data, in addition to related data types like ribosomal RNA and barcode sequences, in a way that tightly integrates contextual data with traditional literature in a machine-readable way. It is for this reason that the Genomic Standards Consortium (GSC) formed in 2005. Here we suggest that we move beyond the development of standards and tackle standards-compliance and improved data capture at the level of the scientific publication. We are supported in this goal by the fact that the scientific community is in the midst of a publishing revolution. This revolution is marked by a growing shift away from a traditional dichotomy between 'journal articles' and 'database entries' and an increasing adoption of hybrid models of collecting and disseminating scientific information. With respect to genomes and metagenomes and related data types, we feel the scientific community would be best served by the immediate launch of a central repository of short, highly structured 'Genome Notes' that must be standards-compliant. This could be done in the context of an existing journal, but we also suggest the more radical solution of launching a new journal. Such a journal could be designed to cater to a wide range of standards-related content types that are not currently centralized in the published literature. It could also support the demand for centralizing aspects of the 'gray literature' (documents developed by institutions or communities) such as the call by the GSCl for a central repository of Standard Operating Procedures describing the genomic annotation pipelines of the major sequencing centers. We argue that such an 'eJournal', published under the Open Access paradigm by the GSC, could be an attractive publishing forum for a broader range of standardization initiatives within, and beyond, the GSC and thereby fill an unoccupied yet increasingly important niche

  10. Fragmentation of drying paint layers

    NASA Astrophysics Data System (ADS)

    Bakos, Katinka; Dombi, András; Járai-Szabó, Ferenc; Néda, Zoltán

    2013-11-01

    Fragmentation of thin layers of drying granular materials on a frictional surface are studied both by experiments and computer simulations. Besides a qualitative description of the fragmentation phenomenon, the dependence of the average fragment size as a function of the layer thickness is thoroughly investigated. Experiments are done using a special nail polish, which forms characteristic crack structures during drying. In order to control the layer thickness, we diluted the nail polish in acetone and evaporated in a controlled manner different volumes of this solution on glass surfaces. During the evaporation process we managed to get an instable paint layer, which formed cracks as it dried out. In order to understand the obtained structures a previously developed spring-block model was implemented in a three-dimensional version. The experimental and simulation results proved to be in excellent qualitative and quantitative agreement. An earlier suggested scaling relation between the average fragment size and the layer thickness is reconfirmed.

  11. A semiempirical nuclear fragmentation model

    NASA Technical Reports Server (NTRS)

    Wilson, John W.; Townsend, Lawrence W.; Badavi, F. F.

    1987-01-01

    An abrasion/ablation model of heavy ion fragmentation is derived which includes a second order correction for the surface energy term and provides a reasonable representation of the present elemental fragmentation cross sections. The full development of the model must await the resolution of disagreement among different experiments and an expansion of the experimental data base to a broader set of projectile-target combinations.

  12. High Fragmentation Steel Production Process

    DTIC Science & Technology

    1984-01-01

    phase of the project entailed the purchase and metallurgical characterization of two heats of HF-1 steel from different vendors. Performed by...At>-A 13^ nzt AD AD-E401 117 CONTRACTOR REPORT ARLCD-CR-83049 HIGH FRAGMENTATION STEEL PRODUCTION PROCESS ^"fP-PTTMirj A 1 James F. Kane...Report 6. PERFORMING ORG. REPORT NUMBER High Fragmentation Steel Production Process 7. AUTHORfs; James F. Kane, Ronald L. Kivak, Colin C. MacCrindle

  13. QGP and Modified Jet Fragmentation

    SciTech Connect

    Wang, Xin-Nian

    2005-04-18

    Recent progresses in the study of jet modification in hotmedium and their consequences in high-energy heavy-ion collisions are reviewed. In particular, I will discuss energy loss for propagating heavy quarks and the resulting modified fragmentation function. Medium modification of the parton fragmentation function due to quark recombination are formulated within finite temperature field theory and their implication on the search for deconfined quark-gluon plasma is also discussed.

  14. Imaging Systems for Size Measurements of Debrisat Fragments

    NASA Technical Reports Server (NTRS)

    Shiotani, B.; Scruggs, T.; Toledo, R.; Fitz-Coy, N.; Liou, J. C.; Sorge, M.; Huynh, T.; Opiela, J.; Krisko, P.; Cowardin, H.

    2017-01-01

    The overall objective of the DebriSat project is to provide data to update existing standard spacecraft breakup models. One of the key sets of parameters used in these models is the physical dimensions of the fragments (i.e., length, average-cross sectional area, and volume). For the DebriSat project, only fragments with at least one dimension greater than 2 mm are collected and processed. Additionally, a significant portion of the fragments recovered from the impact test are needle-like and/or flat plate-like fragments where their heights are almost negligible in comparison to their other dimensions. As a result, two fragment size categories were defined: 2D objects and 3D objects. While measurement systems are commercially available, factors such as measurement rates, system adaptability, size characterization limitations and equipment costs presented significant challenges to the project and a decision was made to develop our own size characterization systems. The size characterization systems consist of two automated image systems, one referred to as the 3D imaging system and the other as the 2D imaging system. Which imaging system to use depends on the classification of the fragment being measured. Both imaging systems utilize point-and-shoot cameras for object image acquisition and create representative point clouds of the fragments. The 3D imaging system utilizes a space-carving algorithm to generate a 3D point cloud, while the 2D imaging system utilizes an edge detection algorithm to generate a 2D point cloud. From the point clouds, the three largest orthogonal dimensions are determined using a convex hull algorithm. For 3D objects, in addition to the three largest orthogonal dimensions, the volume is computed via an alpha-shape algorithm applied to the point clouds. The average cross-sectional area is also computed for 3D objects. Both imaging systems have automated size measurements (image acquisition and image processing) driven by the need to quickly

  15. Sequence-based classification and identification of Fungi.

    PubMed

    Hibbett, David; Abarenkov, Kessy; Koljalg, Urmas; Opik, Maarja; Chai, Benli; Cole, James R; Wang, Qiong; Crous, Pedro W; Robert, Vincent A R G; Helgason, Thorunn; Herr, Josh; Kirk, Paul; Lueschow, Shiloh; O'Donnell, Kerry; Nilsson, Henrik; Oono, Ryoko; Schoch, Conrad L; Smyth, Christopher; Walker, Donny; Porras-Alfaro, Andrea; Taylor, John W; Geiser, David M

    2016-10-19

    Fungal taxonomy and ecology have been revolutionized by the application of molecular methods and both have increasing connections to genomics and functional biology. However, data streams from traditional specimen- and culture-based systematics are not yet fully integrated with those from metagenomic and metatranscriptomic studies, which limits understanding of the taxonomic diversity and metabolic properties of fungal communities. This article reviews current resources, needs, and opportunities for sequence-based classification and identification (SBCI) in fungi as well as related efforts in prokaryotes. To realize the full potential of fungal SBCI it will be necessary to make advances in multiple areas. Improvements in sequencing methods, including long-read and single-cell technologies, will empower fungal molecular ecologists to look beyond ITS and current shotgun metagenomics approaches. Data quality and accessibility will be enhanced by attention to data and metadata standards and rigorous enforcement of policies for deposition of data and workflows. Taxonomic communities will need to develop best practices for molecular characterization in their focal clades, while also contributing to globally useful datasets including ITS. Changes to nomenclatural rules are needed to enable valid publication of sequence-based taxon descriptions. Finally, cultural shifts are necessary to promote adoption of SBCI and to accord professional credit to individuals who contribute to community resources.

  16. Fragmentation Pathways in the Uracil Radical Cation

    SciTech Connect

    Zhou, Congyi; Matsika, Spiridoula; Kotur, Marija; Weinacht, Thomas C.

    2012-08-24

    We investigate pathways for fragmentation in the uracil radical cation using ab initio electronic structure calculations. We focus on the main fragments produced in pump–probe dissociative ionization experiments. These are fragments with mass to charge ratios (m/z) of 69, 28, 41, and 42. Barriers to dissociation along the ground ionic surface are reported, which provide an estimate of the energetic requirements for the production of the main fragments. Finally, direct and sequential fragmentation mechanisms have been analyzed, and it is concluded that sequential fragmentation after production of fragment with m/z 69 is the dominant mechanism for the production of the smaller fragments.

  17. Particle size statistics in dynamic fragmentation

    SciTech Connect

    Grady, D.E. )

    1990-12-15

    Condensed matter, when subjected to intense disrupting forces through impact or radiation deposition, will break up into a randomly distributed array of fragments. An earlier analysis of random fragmentation is extended to account for fragmentation in bodies which are finite in extent and for bodies within which the minimum fragment size is bounded. The statistical fragment size relations are compared with molecular dynamic simulations of dynamic fragmentation, with fragmentation caused by the high-energy collision of nuclear particles, and with the distribution of galaxies in the universe which are assumed to be fragment debris from the primordial Big Bang.

  18. The combined fragmentation and systematic molecular fragmentation methods.

    PubMed

    Collins, Michael A; Cvitkovic, Milan W; Bettens, Ryan P A

    2014-09-16

    Conspectus Chemistry, particularly organic chemistry, is mostly concerned with functional groups: amines, amides, alcohols, ketones, and so forth. This is because the reactivity of molecules can be categorized in terms of the reactions of these functional groups, and by the influence of other adjacent groups in the molecule. These simple truths ought to be reflected in the electronic structure and electronic energy of molecules, as reactivity is determined by electronic structure. However, sophisticated ab initio quantum calculations of the molecular electronic energy usually do not make these truths apparent. In recent years, several computational chemistry groups have discovered methods for estimating the electronic energy as a sum of the energies of small molecular fragments, or small sets of groups. By decomposing molecules into such fragments of adjacent functional groups, researchers can estimate the electronic energy to chemical accuracy; not just qualitative trends, but accurate enough to understand reactivity. In addition, this has the benefit of cutting down on both computational time and cost, as the necessary calculation time increases rapidly with an increasing number of electrons. Even with steady advances in computer technology, progress in the study of large molecules is slow. In this Account, we describe two related "fragmentation" methods for treating molecules, the combined fragmentation method (CFM) and systematic molecular fragmentation (SMF). In addition, we show how we can use the SMF approach to estimate the energy and properties of nonconducting crystals, by fragmenting the periodic crystal structure into relatively small pieces. A large part of this Account is devoted to simple overviews of how the methods work. We also discuss the application of these approaches to calculating reactivity and other useful properties, such as the NMR and vibrational spectra of molecules and crystals. These applications rely on the ability of these

  19. Classification Shell Game.

    ERIC Educational Resources Information Center

    Etzold, Carol

    1983-01-01

    Discusses shell classification exercises. Through keying students advanced from the "I know what a shell looks like" stage to become involved in the classification process: observing, labeling, making decisions about categories, and identifying marine animals. (Author/JN)

  20. Efflux in the Oral Metagenome: The Discovery of a Novel Tetracycline and Tigecycline ABC Transporter

    PubMed Central

    Reynolds, Liam J.; Roberts, Adam P.; Anjum, Muna F.

    2016-01-01

    Antibiotic resistance in human bacterial pathogens and commensals is threatening our ability to treat infections and conduct common medical procedures. As novel antibiotics are discovered and marketed it is important that we understand how resistance to them may arise and know what environments may act as reservoirs for such resistance genes. In this study a tetracycline and tigecycline resistant clone was identified by screening a human saliva metagenomic library in Escherichia coli EPI300 on agar containing 5 μg/ml tetracycline. Sequencing of the DNA insert present within the tetracycline resistant clone revealed it to contain a 7,765 bp fragment harboring novel ABC half transporter genes, tetAB(60). Mutagenesis studies performed on these genes confirmed that they were responsible for the tetracycline and tigecycline resistance phenotypes. Growth studies performed using E. coli EPI300 clones that harbored either the wild type, the mutated, or none of these genes indicated that there was a fitness cost associated with presence of these genes, with the isolate harboring both genes exhibiting a significantly slower growth than control strains. Given the emergence of E. coli strains that are sensitive only to tigecycline and doxycycline it is concerning that such a resistance mechanism has been identified in the human oral cavity. PMID:27999567

  1. Metagenomic Analysis of Kimchi, a Traditional Korean Fermented Food ▿ †

    PubMed Central

    Jung, Ji Young; Lee, Se Hee; Kim, Jeong Myeong; Park, Moon Su; Bae, Jin-Woo; Hahn, Yoonsoo; Madsen, Eugene L.; Jeon, Che Ok

    2011-01-01

    Kimchi, a traditional food in the Korean culture, is made from vegetables by fermentation. In this study, metagenomic approaches were used to monitor changes in bacterial populations, metabolic potential, and overall genetic features of the microbial community during the 29-day fermentation process. Metagenomic DNA was extracted from kimchi samples obtained periodically and was sequenced using a 454 GS FLX Titanium system, which yielded a total of 701,556 reads, with an average read length of 438 bp. Phylogenetic analysis based on 16S rRNA genes from the metagenome indicated that the kimchi microbiome was dominated by members of three genera: Leuconostoc, Lactobacillus, and Weissella. Assignment of metagenomic sequences to SEED categories of the Metagenome Rapid Annotation using Subsystem Technology (MG-RAST) server revealed a genetic profile characteristic of heterotrophic lactic acid fermentation of carbohydrates, which was supported by the detection of mannitol, lactate, acetate, and ethanol as fermentation products. When the metagenomic reads were mapped onto the database of completed genomes, the Leuconostoc mesenteroides subsp. mesenteroides ATCC 8293 and Lactobacillus sakei subsp. sakei 23K genomes were highly represented. These same two genera were confirmed to be important in kimchi fermentation when the majority of kimchi metagenomic sequences showed very high identity to Leuconostoc mesenteroides and Lactobacillus genes. Besides microbial genome sequences, a surprisingly large number of phage DNA sequences were identified from the cellular fractions, possibly indicating that a high proportion of cells were infected by bacteriophages during fermentation. Overall, these results provide insights into the kimchi microbial community and also shed light on fermentation processes carried out broadly by complex microbial communities. PMID:21317261

  2. Filtration and Normalization of Sequencing Read Data in Whole-Metagenome Shotgun Samples

    PubMed Central

    Chouvarine, Philippe; Wiehlmann, Lutz; Moran Losada, Patricia; DeLuca, David S.; Tümmler, Burkhard

    2016-01-01

    Ever-increasing affordability of next-generation sequencing makes whole-metagenome sequencing an attractive alternative to traditional 16S rDNA, RFLP, or culturing approaches for the analysis of microbiome samples. The advantage of whole-metagenome sequencing is that it allows direct inference of the metabolic capacity and physiological features of the studied metagenome without reliance on the knowledge of genotypes and phenotypes of the members of the bacterial community. It also makes it possible to overcome problems of 16S rDNA sequencing, such as unknown copy number of the 16S gene and lack of sufficient sequence similarity of the “universal” 16S primers to some of the target 16S genes. On the other hand, next-generation sequencing suffers from biases resulting in non-uniform coverage of the sequenced genomes. To overcome this difficulty, we present a model of GC-bias in sequencing metagenomic samples as well as filtration and normalization techniques necessary for accurate quantification of microbial organisms. While there has been substantial research in normalization and filtration of read-count data in such techniques as RNA-seq or Chip-seq, to our knowledge, this has not been the case for the field of whole-metagenome shotgun sequencing. The presented methods assume that complete genome references are available for most microorganisms of interest present in metagenomic samples. This is often a valid assumption in such fields as medical diagnostics of patient microbiota. Testing the model on two validation datasets showed four-fold reduction in root-mean-square error compared to non-normalized data in both cases. The presented methods can be applied to any pipeline for whole metagenome sequencing analysis relying on complete microbial genome references. We demonstrate that such pre-processing reduces the number of false positive hits and increases accuracy of abundance estimates. PMID:27760173

  3. Fragmentation and ablation during entry

    SciTech Connect

    Canavan, G.H.

    1997-09-01

    This note discusses objects that both fragment and ablate during entry, using the results of previous reports to describe the velocity, pressure, and fragmentation of entering objects. It shows that the mechanisms used there to describe the breakup of non-ablating objects during deceleration remain valid for most ablating objects. It treats coupled fragmentation and ablation during entry, building on earlier models that separately discuss the entry of objects that are hard, whose high heat of ablation permits little erosion, and those who are strong whose strength prevents fragmentation, which are discussed in ``Radiation from Hard Objects,`` ``Deceleration and Radiation of Strong, Hard, Asteroids During Atmospheric Impact,`` and ``Meteor Signature Interpretation.`` This note provides a more detailed treatment of the further breakup and separation of fragments during descent. It replaces the constraint on mass per unit area used earlier to determine the altitude and magnitude of peak power radiation with a detailed analytic solution of deceleration. Model predictions are shown to be in agreement with the key features of numerical calculations of deceleration. The model equations are solved for the altitudes of maximum radiation, which agree with numerical integrations. The model is inverted analytically to infer object size and speed from measurements of peak power and altitude to provide a complete model for the approximate inversion of meteor data.

  4. What Is the Usefulness of the Fragmentation Pattern of the Femoral Head in Managing Legg-Calvé-Perthes Disease?

    PubMed Central

    Woo, Seung Hun; Jang, Jae Hoon; Lee, Seung Geun; Kim, Harry K.W.; Browne, Richard

    2014-01-01

    Background Within the lateral pillar classification of the Legg-Calvé-Perthes (LCP) disease, hips seem quite variable in the pattern of fragmentation as seen in radiographs. The purpose of this study was to determine: if it is possible to reliably subdivide the lateral pillar groups into femoral head fragmentation patterns, and if such a subdivision of the lateral pillar groupings is clinically useful in managing LCP disease. Methods Two hundred and ninety-three anteroposterior radiographs taken at the maximal fragmentation stage (189 lateral pillar B, 57 B/C border, and 47 C hips; mean bone/chronologic age at the time of first visit, 6.2/7.9 years) and at skeletal maturity (mean age, 16.6 years) were analyzed. We distinguished 3 fragmentation patterns in each pillar group based on the region of major involvement. We tested the inter- and intraobserver reliability of our classification system and analyzed the relationships between the fragmentation patterns and the Stulberg outcomes as well as other factors such as surgical treatment and age. Results Inter- and intraobserver consistency in fragmentation pattern assignments was found to be substantial to excellent. A statistically significant trend (p = 0.001) in the proportion of Stulberg III or IV outcomes in comparison with Stulberg I and II was only found for the different fragmentation patterns in our lateral pillar B patients: fragmentation patterns having mainly lateral-central necrosis led to poor outcomes. No significant association was found between fragmentation patterns and Stulberg outcomes in pillar groups B/C border and C. Conclusions Our results are consistent with the lateral pillar classification itself. Therefore, fragmentation patterns in each lateral pillar classification did not provide clinical usefulness in the management of LCP disease. PMID:24900906

  5. Fragmentation of DNA by sonication.

    PubMed

    Sambrook, Joseph; Russell, David W

    2006-09-01

    INTRODUCTIONDNA fragmentation is often necessary prior to library construction or subcloning for DNA sequencing. This protocol describes a method for DNA fragmentation by sonication. During sonication, DNA samples are subjected to hydrodynamic shearing by exposure to brief periods of sonication. DNA that has been sonicated for excessive periods of time is extremely difficult to clone. Most sonicators will not shear DNA to a size of less than 300-500 bp, and it is tempting to continue sonication until the entire DNA population has been reduced in size. However, the yield of subclones is usually greater if sonication is stopped when the fragments of the target DNA first reach a size of ~700 bp.

  6. Safety analysis of a Russian phage cocktail: From MetaGenomic analysis to oral application in healthy human subjects

    SciTech Connect

    McCallin, Shawna; Alam Sarker, Shafiqul; Barretto, Caroline; Sultana, Shamima; Berger, Bernard; Huq, Sayeda; Krause, Lutz; Bibiloni, Rodrigo; Schmitt, Bertrand; Reuteler, Gloria; Brüssow, Harald

    2013-09-01

    Phage therapy has a long tradition in Eastern Europe, where preparations are comprised of complex phage cocktails whose compositions have not been described. We investigated the composition of a phage cocktail from the Russian pharmaceutical company Microgen targeting Escherichia coli/Proteus infections. Electron microscopy identified six phage types, with numerically T7-like phages dominating over T4-like phages. A metagenomic approach using taxonomical classification, reference mapping and de novo assembly identified 18 distinct phage types, including 7 genera of Podoviridae, 2 established and 2 proposed genera of Myoviridae, and 2 genera of Siphoviridae. De novo assembly yielded 7 contigs greater than 30 kb, including a 147-kb Myovirus genome and a 42-kb genome of a potentially new phage. Bioinformatic analysis did not reveal undesired genes and a small human volunteer trial did not associate adverse effects with oral phage exposure. - Highlights: • We analyzed the composition of a commercial Russian phage cocktail. • The cocktail consists of at least 10 different phage genera. • No undesired genes were detected. • No adverse effects were seen upon oral application in a small human clinical trial.

  7. Three Novel Virophage Genomes Discovered from Yellowstone Lake Metagenomes

    PubMed Central

    Zhou, Jinglie; Sun, Dawei; Childers, Alyson; McDermott, Timothy R.

    2014-01-01

    ABSTRACT Virophages are a unique group of circular double-stranded DNA viruses that are considered parasites of giant DNA viruses, which in turn are known to infect eukaryotic hosts. In this study, the genomes of three novel Yellowstone Lake virophages (YSLVs)—YSLV5, YSLV6, and YSLV7—were identified from Yellowstone Lake through metagenomic analyses. The relative abundance of these three novel virophages and previously identified Yellowstone Lake virophages YSLV1 to -4 were determined in different locations of the lake, revealing that most of the sampled locations in the lake, including both mesophilic and thermophilic habitats, had multiple virophage genotypes. This likely reflects the diverse habitats or diversity of the eukaryotic hosts and their associated giant viruses that serve as putative hosts for these virophages. YSLV5 has a 29,767-bp genome with 32 predicted open reading frames (ORFs), YSLV6 has a 24,837-bp genome with 29 predicted ORFs, and YSLV7 has a 23,193-bp genome with 26 predicted ORFs. Based on multilocus phylogenetic analysis, YSLV6 shows a close evolutionary relationship with YSLV1 to -4, whereas YSLV5 and YSLV7 are distantly related to the others, and YSLV7 represents the fourth novel virophage lineage. In addition, the genome of YSLV5 has a G+C content of 51.1% that is much higher than all other known virophages, indicating a unique host range for YSLV5. These results suggest that virophages are abundant and have diverse genotypes that likely mirror diverse giant viral and eukaryotic hosts within the Yellowstone Lake ecosystem. IMPORTANCE This study discovered novel virophages present within the Yellowstone Lake ecosystem using a conserved major capsid protein as a phylogenetic anchor for assembly of sequence reads from Yellowstone Lake metagenomic samples. The three novel virophage genomes (YSLV5 to -7) were completed by identifying specific environmental samples containing these respective virophages, and closing gaps by targeted PCR

  8. VSEARCH: a versatile open source tool for metagenomics

    PubMed Central

    Flouri, Tomáš; Nichols, Ben; Quince, Christopher; Mahé, Frédéric

    2016-01-01

    Background VSEARCH is an open source and free of charge multithreaded 64-bit tool for processing and preparing metagenomics, genomics and population genomics nucleotide sequence data. It is designed as an alternative to the widely used USEARCH tool (Edgar, 2010) for which the source code is not publicly available, algorithm details are only rudimentarily described, and only a memory-confined 32-bit version is freely available for academic use. Methods When searching nucleotide sequences, VSEARCH uses a fast heuristic based on words shared by the query and target sequences in order to quickly identify similar sequences, a similar strategy is probably used in USEARCH. VSEARCH then performs optimal global sequence alignment of the query against potential target sequences, using full dynamic programming instead of the seed-and-extend heuristic used by USEARCH. Pairwise alignments are computed in parallel using vectorisation and multiple threads. Results VSEARCH includes most commands for analysing nucleotide sequences available in USEARCH version 7 and several of those available in USEARCH version 8, including searching (exact or based on global alignment), clustering by similarity (using length pre-sorting, abundance pre-sorting or a user-defined order), chimera detection (reference-based or de novo), dereplication (full length or prefix), pairwise alignment, reverse complementation, sorting, and subsampling. VSEARCH also includes commands for FASTQ file processing, i.e., format detection, filtering, read quality statistics, and merging of paired reads. Furthermore, VSEARCH extends functionality with several new commands and improvements, including shuffling, rereplication, masking of low-complexity sequences with the well-known DUST algorithm, a choice among different similarity definitions, and FASTQ file format conversion. VSEARCH is here shown to be more accurate than USEARCH when performing searching, clustering, chimera detection and subsampling, while on a par

  9. International standardization and classification of human papillomavirus types.

    PubMed

    Bzhalava, Davit; Eklund, Carina; Dillner, Joakim

    2015-02-01

    Established Human Papillomavirus (HPV) types, up to HPV202, belong to 49 species in five genera. International standardization in classification and quality standards for HPV type designation and detection is ensured by the International HPV Reference Center. The center i) receives clones of potentially novel HPV types, re-clones and re-sequences them. If confirmed, an HPV type number is assigned and posted on www.hpvcenter.se. ii) distributes reference clone samples, for academic research, under Material Transfer Agreements agreed with the originator. iii) provides preliminary checking of whether new sequences represent novel types iv) issues international proficiency panels for HPV genotyping. The rate of HPV type discovery is increasing, probably because of metagenomic sequencing. γ-genus today contains 79HPV types and 27 species, surpassing ∝ and β genera with 65 and 51HPV types, respectively. Regular issuing of proficiency panels based on HPV reference clones has resulted in global improvement of HPV genotyping services.

  10. 21 CFR 866.5530 - Immunoglobulin G (Fc fragment specific) immunological test system.

    Code of Federal Regulations, 2011 CFR

    2011-04-01

    ... 21 Food and Drugs 8 2011-04-01 2011-04-01 false Immunoglobulin G (Fc fragment specific) immunological test system. 866.5530 Section 866.5530 Food and Drugs FOOD AND DRUG ADMINISTRATION, DEPARTMENT OF... abnormalities, e.g., gamma heavy chain disease. (b) Classification. Class I (general controls). The device...

  11. 21 CFR 866.5530 - Immunoglobulin G (Fc fragment specific) immunological test system.

    Code of Federal Regulations, 2014 CFR

    2014-04-01

    ... 21 Food and Drugs 8 2014-04-01 2014-04-01 false Immunoglobulin G (Fc fragment specific) immunological test system. 866.5530 Section 866.5530 Food and Drugs FOOD AND DRUG ADMINISTRATION, DEPARTMENT OF... abnormalities, e.g., gamma heavy chain disease. (b) Classification. Class I (general controls). The device...

  12. 21 CFR 866.5530 - Immunoglobulin G (Fc fragment specific) immunological test system.

    Code of Federal Regulations, 2012 CFR

    2012-04-01

    ... 21 Food and Drugs 8 2012-04-01 2012-04-01 false Immunoglobulin G (Fc fragment specific) immunological test system. 866.5530 Section 866.5530 Food and Drugs FOOD AND DRUG ADMINISTRATION, DEPARTMENT OF... abnormalities, e.g., gamma heavy chain disease. (b) Classification. Class I (general controls). The device...

  13. MG-RAST, a Metagenomics Service for Analysis of Microbial Community Structure and Function.

    PubMed

    Keegan, Kevin P; Glass, Elizabeth M; Meyer, Folker

    2016-01-01

    Approaches in molecular biology, particularly those that deal with high-throughput sequencing of entire microbial communities (the field of metagenomics), are rapidly advancing our understanding of the composition and functional content of microbial communities involved in climate change, environmental pollution, human health, biotechnology, etc. Metagenomics provides researchers with the most complete picture of the taxonomic (i.e., what organisms are there) and functional (i.e., what are those organisms doing) composition of natively sampled microbial communities, making it possible to perform investigations that include organisms that were previously intractable to laboratory-controlled culturing; currently, these constitute the vast majority of all microbes on the planet. All organisms contained in environmental samples are sequenced in a culture-independent manner, most often with 16S ribosomal amplicon methods to investigate the taxonomic or whole-genome shotgun-based methods to investigate the functional content of sampled communities. Metagenomics allows researchers to characterize the community composition and functional content of microbial communities, but it cannot show which functional processes are active; however, near parallel developments in transcriptomics promise a dramatic increase in our knowledge in this area as well. Since 2008, MG-RAST (Meyer et al., BMC Bioinformatics 9:386, 2008) has served as a public resource for annotation and analysis of metagenomic sequence data, providing a repository that currently houses more than 150,000 data sets (containing 60+ tera-base-pairs) with more than 23,000 publically available. MG-RAST, or the metagenomics RAST (rapid annotation using subsystems technology) server makes it possible for users to upload raw metagenomic sequence data in (preferably) fastq or fasta format. Assessments of sequence quality, annotation with respect to multiple reference databases, are performed automatically with minimal

  14. A metagenomic study of methanotrophic microorganisms in Coal Oil Point seep sediments

    PubMed Central

    2011-01-01

    Background Methane oxidizing prokaryotes in marine sediments are believed to function as a methane filter reducing the oceanic contribution to the global methane emission. In the anoxic parts of the sediments, oxidation of methane is accomplished by anaerobic methanotrophic archaea (ANME) living in syntrophy with sulphate reducing bacteria. This anaerobic oxidation of methane is assumed to be a coupling of reversed methanogenesis and dissimilatory sulphate reduction. Where oxygen is available aerobic methanotrophs take part in methane oxidation. In this study, we used metagenomics to characterize the taxonomic and metabolic potential for methane oxidation at the Tonya seep in the Coal Oil Point area, California. Two metagenomes from different sediment depth horizons (0-4 cm and 10-15 cm below sea floor) were sequenced by 454 technology. The metagenomes were analysed to characterize the distribution of aerobic and anaerobic methanotrophic taxa at the two sediment depths. To gain insight into the metabolic potential the metagenomes were searched for marker genes associated with methane oxidation. Results Blast searches followed by taxonomic binning in MEGAN revealed aerobic methanotrophs of the genus Methylococcus to be overrepresented in the 0-4 cm metagenome compared to the 10-15 cm metagenome. In the 10-15 cm metagenome, ANME of the ANME-1 clade, were identified as the most abundant methanotrophic taxon with 8.6% of the reads. Searches for particulate methane monooxygenase (pmoA) and methyl-coenzyme M reductase (mcrA), marker genes for aerobic and anaerobic oxidation of methane respectively, identified pmoA in the 0-4 cm metagenome as Methylococcaceae related. The mcrA reads from the 10-15 cm horizon were all classified as originating from the ANME-1 clade. Conclusions Most of the taxa detected were present in both metagenomes and differences in community structure and corresponding metabolic potential between the two samples were mainly due to abundance

  15. Improved Environmental Genomes via Integration of Metagenomic and Single-Cell Assemblies

    PubMed Central

    Mende, Daniel R.; Aylward, Frank O.; Eppley, John M.; Nielsen, Torben N.; DeLong, Edward F.

    2016-01-01

    Assembling complete or near complete genomes from complex microbial communities remains a significant challenge in metagenomic studies. Recent developments in single cell amplified genomes (SAGs) have enabled the sequencing of individual draft genomes representative of uncultivated microbial populations. SAGs suffer from incomplete and uneven coverage due to artifacts that arise from multiple displacement amplification techniques. Conversely, metagenomic sequence data does not suffer from the same biases as SAGs, and significant improvements have been realized in the recovery of draft genomes from metagenomes. Nevertheless, the inherent genomic complexity of many microbial communities often obfuscates facile generation of population genome assemblies from metagenomic data. Here we describe a new method for metagenomic-guided SAG assembly that leverages the advantages of both methods and significantly improves the completeness of initial SAGs assemblies. We demonstrate that SAG assemblies of two cosmopolitan marine lineages–Marine Group 1 Thaumarchaeota and SAR324 clade bacterioplankton–were substantially improved using this approach. Moreover, the improved assemblies strengthened biological inferences. For example, the improved SAR324 clade genome assembly revealed the presence of many genes in phenylalanine catabolism and flagellar assembly that were absent in the original SAG. PMID:26904016

  16. Simultaneous virus identification and characterization of severe unexplained pneumonia cases using a metagenomics sequencing technique.

    PubMed

    Zou, Xiaohui; Tang, Guangpeng; Zhao, Xiang; Huang, Yan; Chen, Tao; Lei, Mingyu; Chen, Wenbing; Yang, Lei; Zhu, Wenfei; Zhuang, Li; Yang, Jing; Feng, Zhaomin; Wang, Dayan; Wang, Dingming; Shu, Yuelong

    2017-03-01

    Many viruses can cause respiratory diseases in humans. Although great advances have been achieved in methods of diagnosis, it remains challenging to identify pathogens in unexplained pneumonia (UP) cases. In this study, we applied next-generation sequencing (NGS) technology and a metagenomic approach to detect and characterize respiratory viruses in UP cases from Guizhou Province, China. A total of 33 oropharyngeal swabs were obtained from hospitalized UP patients and subjected to NGS. An unbiased metagenomic analysis pipeline identified 13 virus species in 16 samples. Human rhinovirus C was the virus most frequently detected and was identified in seven samples. Human measles virus, adenovirus B 55 and coxsackievirus A10 were also identified. Metagenomic sequencing also provided virus genomic sequences, which enabled genotype characterization and phylogenetic analysis. For cases of multiple infection, metagenomic sequencing afforded information regarding the quantity of each virus in the sample, which could be used to evaluate each viruses' role in the disease. Our study highlights the potential of metagenomic sequencing for pathogen identification in UP cases.

  17. Multiplexed metagenome mining using short DNA sequence tags facilitates targeted discovery of epoxyketone proteasome inhibitors.

    PubMed

    Owen, Jeremy G; Charlop-Powers, Zachary; Smith, Alexandra G; Ternei, Melinda A; Calle, Paula Y; Reddy, Boojala Vijay B; Montiel, Daniel; Brady, Sean F

    2015-04-07

    In molecular evolutionary analyses, short DNA sequences are used to infer phylogenetic relationships among species. Here we apply this principle to the study of bacterial biosynthesis, enabling the targeted isolation of previously unidentified natural products directly from complex metagenomes. Our approach uses short natural product sequence tags derived from conserved biosynthetic motifs to profile biosynthetic diversity in the environment and then guide the recovery of gene clusters from metagenomic libraries. The methodology is conceptually simple, requires only a small investment in sequencing, and is not computationally demanding. To demonstrate the power of this approach to natural product discovery we conducted a computational search for epoxyketone proteasome inhibitors within 185 globally distributed soil metagenomes. This led to the identification of 99 unique epoxyketone sequence tags, falling into 6 phylogenetically distinct clades. Complete gene clusters associated with nine unique tags were recovered from four saturating soil metagenomic libraries. Using heterologous expression methodologies, seven potent epoxyketone proteasome inhibitors (clarepoxcins A-E and landepoxcins A and B) were produced from these pathways, including compounds with different warhead structures and a naturally occurring halohydrin prodrug. This study provides a template for the targeted expansion of bacterially derived natural products using the global metagenome.

  18. An integrated metagenomics pipeline for strain profiling reveals novel patterns of bacterial transmission and biogeography

    PubMed Central

    Nayfach, Stephen; Rodriguez-Mueller, Beltran; Garud, Nandita

    2016-01-01

    We present the Metagenomic Intra-species Diversity Analysis System (MIDAS), which is an integrated computational pipeline for quantifying bacterial species abundance and strain-level genomic variation, including gene content and single-nucleotide polymorphisms (SNPs), from shotgun metagenomes. Our method leverages a database of more than 30,000 bacterial reference genomes that we clustered into species groups. These cover the majority of abundant species in the human microbiome but only a small proportion of microbes in other environments, including soil and seawater. We applied MIDAS to stool metagenomes from 98 Swedish mothers and their infants over one year and used rare SNPs to track strains between hosts. Using this approach, we found that although species compositions of mothers and infants converged over time, strain-level similarity diverged. Specifically, early colonizing bacteria were often transmitted from an infant’s mother, while late colonizing bacteria were often transmitted from other sources in the environment and were enriched for spore-formation genes. We also applied MIDAS to 198 globally distributed marine metagenomes and used gene content to show that many prevalent bacterial species have population structure that correlates with geographic location. Strain-level genetic variants present in metagenomes clearly reveal extensive structure and dynamics that are obscured when data are analyzed at a coarser taxonomic resolution. PMID:27803195

  19. Metagenomic analysis reveals presence of Treponema denticola in a tissue biopsy of the Iceman.

    PubMed

    Maixner, Frank; Thomma, Anton; Cipollini, Giovanna; Widder, Stefanie; Rattei, Thomas; Zink, Albert

    2014-01-01

    Ancient hominoid genome studies can be regarded by definition as metagenomic analyses since they represent a mixture of both hominoid and microbial sequences in an environment. Here, we report the molecular detection of the oral spirochete Treponema denticola in ancient human tissue biopsies of the Iceman, a 5,300-year-old Copper Age natural ice mummy. Initially, the metagenomic data of the Iceman's genomic survey was screened for bacterial ribosomal RNA (rRNA) specific reads. Through ranking the reads by abundance a relatively high number of rRNA reads most similar to T. denticola was detected. Mapping of the metagenome sequences against the T. denticola genome revealed additional reads most similar to this opportunistic pathogen. The DNA damage pattern of specifically mapped reads suggests an ancient origin of these sequences. The haematogenous spread of bacteria of the oral microbiome often reported in the recent literature could already explain the presence of metagenomic reads specific for T. denticola in the Iceman's bone biopsy. We extended, however, our survey to an Iceman gingival tissue sample and a mouth swab sample and could thereby detect T. denticola and Porphyrimonas gingivalis, another important member of the human commensal oral microflora. Taken together, this study clearly underlines the opportunity to detect disease-associated microorganisms when applying metagenomics-enabled approaches on datasets of ancient human remains.

  20. Meta-QC-Chain: comprehensive and fast quality control method for metagenomic data.

    PubMed

    Zhou, Qian; Su, Xiaoquan; Jing, Gongchao; Ning, Kang

    2014-02-01

    Next-generation sequencing (NGS) technology has revolutionized and significantly impacted metagenomic research. However, the NGS data usually contains sequencing artifacts such as low-quality reads and contaminating reads, which will significantly compromise downstream analysis. Many quality control (QC) tools have been proposed, however, few of them have been verified to be suitable or efficient for metagenomic data, which are composed of multiple genomes and are more complex than other kinds of NGS data. Here we present a metagenomic data QC method named Meta-QC-Chain. Meta-QC-Chain combines multiple QC functions: technical tests describe input data status and identify potential errors, quality trimming filters poor sequencing-quality bases and reads, and contamination screening identifies higher eukaryotic species, which are considered as contamination for metagenomic data. Most computing processes are optimized based on parallel programming. Testing on an 8-GB real dataset showed that Meta-QC-Chain trimmed low sequencing-quality reads and contaminating reads, and the whole quality control procedure was completed within 20 min. Therefore, Meta-QC-Chain provides a comprehensive, useful and high-performance QC tool for metagenomic data. Meta-QC-Chain is publicly available for free at: http://computationalbioenergy.org/meta-qc-chain.html.

  1. Evaluation of methods to concentrate and purify ocean virus communities through comparative, replicated metagenomics

    PubMed Central

    Hurwitz, Bonnie L; Deng, Li; Poulos, Bonnie T; Sullivan, Matthew B

    2013-01-01

    Viruses have global impact through mortality, nutrient cycling and horizontal gene transfer, yet their study is limited by complex methodologies with little validation. Here, we use triplicate metagenomes to compare common aquatic viral concentration and purification methods across four combinations as follows: (i) tangential flow filtration (TFF) and DNase + CsCl, (ii) FeCl3 precipitation and DNase, (iii) FeCl3 precipitation and DNase + CsCl and (iv) FeCl3 precipitation and DNase + sucrose. Taxonomic data (30% of reads) suggested that purification methods were statistically indistinguishable at any taxonomic level while concentration methods were significantly different at family and genus levels. Specifically, TFF-concentrated viral metagenomes had significantly fewer abundant viral types (Podoviridae and Phycodnaviridae) and more variability among Myoviridae than FeCl3-precipitated viral metagenomes. More comprehensive analyses using protein clusters (66% of reads) and k-mers (100% of reads) showed 50–53% of these data were common to all four methods, and revealed trace bacterial DNA contamination in TFF-concentrated metagenomes and one of three replicates concentrated using FeCl3 and purified by DNase alone. Shared k-mer analyses also revealed that polymerases used in amplification impact the resulting metagenomes, with TaKaRa enriching for ‘rare’ reads relative to PfuTurbo. Together these results provide empirical data for making experimental design decisions in culture-independent viral ecology studies. PMID:22845467

  2. Analysis of hydrocarbon-contaminated groundwater metagenomes as revealed by high-throughput sequencing.

    PubMed

    Abbai, Nathlee S; Pillay, Balakrishna

    2013-07-01

    The tendency for chlorinated aliphatics and aromatic hydrocarbons to accumulate in environments such as groundwater and sediments poses a serious environmental threat. In this study, the metabolic capacity of hydrocarbon (aromatics and chlorinated aliphatics)-contaminated groundwater in the KwaZulu-Natal province of South Africa has been elucidated for the first time by analysis of pyrosequencing data. The taxonomic data revealed that the metagenomes were dominated by the phylum Proteobacteria (mainly Betaproteobacteria). In addition, Flavobacteriales, Sphingobacteria, Burkholderiales, and Rhodocyclales were the predominant orders present in the individual metagenomes. These orders included microorganisms (Flavobacteria, Dechloromonas aromatica RCB, and Azoarcus) involved in the degradation of aromatic compounds and various other hydrocarbons that were present in the groundwater. Although the metabolic reconstruction of the metagenome represented composite cell networks, the information obtained was sufficient to address questions regarding the metabolic potential of the microbial communities and to correlate the data to the contamination profile of the groundwater. Genes involved in the degradation of benzene and benzoate, heavy metal-resistance mechanisms appeared to provide a survival strategy used by the microbial communities. Analysis of the pyrosequencing-derived data revealed that the metagenomes represent complex microbial communities that have adapted to the geochemical conditions of the groundwater as evidenced by the presence of key enzymes/genes conferring resistance to specific contaminants. Thus, pyrosequencing analysis of the metagenomes provided insights into the microbial activities in hydrocarbon-contaminated habitats.

  3. Automated and Accurate Estimation of Gene Family Abundance from Shotgun Metagenomes.

    PubMed

    Nayfach, Stephen; Bradley, Patrick H; Wyman, Stacia K; Laurent, Timothy J; Williams, Alex; Eisen, Jonathan A; Pollard, Katherine S; Sharpton, Thomas J

    2015-11-01

    Shotgun metagenomic DNA sequencing is a widely applicable tool for characterizing the functions that are encoded by microbial communities. Several