Sample records for unbiased metagenomic approach

  1. MetaSort untangles metagenome assembly by reducing microbial community complexity

    PubMed Central

    Ji, Peifeng; Zhang, Yanming; Wang, Jinfeng; Zhao, Fangqing

    2017-01-01

    Most current approaches to analyse metagenomic data rely on reference genomes. Novel microbial communities extend far beyond the coverage of reference databases and de novo metagenome assembly from complex microbial communities remains a great challenge. Here we present a novel experimental and bioinformatic framework, metaSort, for effective construction of bacterial genomes from metagenomic samples. MetaSort provides a sorted mini-metagenome approach based on flow cytometry and single-cell sequencing methodologies, and employs new computational algorithms to efficiently recover high-quality genomes from the sorted mini-metagenome by the complementary of the original metagenome. Through extensive evaluations, we demonstrated that metaSort has an excellent and unbiased performance on genome recovery and assembly. Furthermore, we applied metaSort to an unexplored microflora colonized on the surface of marine kelp and successfully recovered 75 high-quality genomes at one time. This approach will greatly improve access to microbial genomes from complex or novel communities. PMID:28112173

  2. Simultaneous virus identification and characterization of severe unexplained pneumonia cases using a metagenomics sequencing technique.

    PubMed

    Zou, Xiaohui; Tang, Guangpeng; Zhao, Xiang; Huang, Yan; Chen, Tao; Lei, Mingyu; Chen, Wenbing; Yang, Lei; Zhu, Wenfei; Zhuang, Li; Yang, Jing; Feng, Zhaomin; Wang, Dayan; Wang, Dingming; Shu, Yuelong

    2017-03-01

    Many viruses can cause respiratory diseases in humans. Although great advances have been achieved in methods of diagnosis, it remains challenging to identify pathogens in unexplained pneumonia (UP) cases. In this study, we applied next-generation sequencing (NGS) technology and a metagenomic approach to detect and characterize respiratory viruses in UP cases from Guizhou Province, China. A total of 33 oropharyngeal swabs were obtained from hospitalized UP patients and subjected to NGS. An unbiased metagenomic analysis pipeline identified 13 virus species in 16 samples. Human rhinovirus C was the virus most frequently detected and was identified in seven samples. Human measles virus, adenovirus B 55 and coxsackievirus A10 were also identified. Metagenomic sequencing also provided virus genomic sequences, which enabled genotype characterization and phylogenetic analysis. For cases of multiple infection, metagenomic sequencing afforded information regarding the quantity of each virus in the sample, which could be used to evaluate each viruses' role in the disease. Our study highlights the potential of metagenomic sequencing for pathogen identification in UP cases.

  3. A metagenomic survey of microbes in honey bee colony collapse disorder.

    PubMed

    Cox-Foster, Diana L; Conlan, Sean; Holmes, Edward C; Palacios, Gustavo; Evans, Jay D; Moran, Nancy A; Quan, Phenix-Lan; Briese, Thomas; Hornig, Mady; Geiser, David M; Martinson, Vince; vanEngelsdorp, Dennis; Kalkstein, Abby L; Drysdale, Andrew; Hui, Jeffrey; Zhai, Junhui; Cui, Liwang; Hutchison, Stephen K; Simons, Jan Fredrik; Egholm, Michael; Pettis, Jeffery S; Lipkin, W Ian

    2007-10-12

    In colony collapse disorder (CCD), honey bee colonies inexplicably lose their workers. CCD has resulted in a loss of 50 to 90% of colonies in beekeeping operations across the United States. The observation that irradiated combs from affected colonies can be repopulated with naive bees suggests that infection may contribute to CCD. We used an unbiased metagenomic approach to survey microflora in CCD hives, normal hives, and imported royal jelly. Candidate pathogens were screened for significance of association with CCD by the examination of samples collected from several sites over a period of 3 years. One organism, Israeli acute paralysis virus of bees, was strongly correlated with CCD.

  4. Accessing the Soil Metagenome for Studies of Microbial Diversity▿ †

    PubMed Central

    Delmont, Tom O.; Robe, Patrick; Cecillon, Sébastien; Clark, Ian M.; Constancias, Florentin; Simonet, Pascal; Hirsch, Penny R.; Vogel, Timothy M.

    2011-01-01

    Soil microbial communities contain the highest level of prokaryotic diversity of any environment, and metagenomic approaches involving the extraction of DNA from soil can improve our access to these communities. Most analyses of soil biodiversity and function assume that the DNA extracted represents the microbial community in the soil, but subsequent interpretations are limited by the DNA recovered from the soil. Unfortunately, extraction methods do not provide a uniform and unbiased subsample of metagenomic DNA, and as a consequence, accurate species distributions cannot be determined. Moreover, any bias will propagate errors in estimations of overall microbial diversity and may exclude some microbial classes from study and exploitation. To improve metagenomic approaches, investigate DNA extraction biases, and provide tools for assessing the relative abundances of different groups, we explored the biodiversity of the accessible community DNA by fractioning the metagenomic DNA as a function of (i) vertical soil sampling, (ii) density gradients (cell separation), (iii) cell lysis stringency, and (iv) DNA fragment size distribution. Each fraction had a unique genetic diversity, with different predominant and rare species (based on ribosomal intergenic spacer analysis [RISA] fingerprinting and phylochips). All fractions contributed to the number of bacterial groups uncovered in the metagenome, thus increasing the DNA pool for further applications. Indeed, we were able to access a more genetically diverse proportion of the metagenome (a gain of more than 80% compared to the best single extraction method), limit the predominance of a few genomes, and increase the species richness per sequencing effort. This work stresses the difference between extracted DNA pools and the currently inaccessible complete soil metagenome. PMID:21183646

  5. Diagnosis of Fatal Human Case of St. Louis Encephalitis Virus Infection by Metagenomic Sequencing, California, 2016.

    PubMed

    Chiu, Charles Y; Coffey, Lark L; Murkey, Jamie; Symmes, Kelly; Sample, Hannah A; Wilson, Michael R; Naccache, Samia N; Arevalo, Shaun; Somasekar, Sneha; Federman, Scot; Stryke, Doug; Vespa, Paul; Schiller, Gary; Messenger, Sharon; Humphries, Romney; Miller, Steve; Klausner, Jeffrey D

    2017-10-01

    We used unbiased metagenomic next-generation sequencing to diagnose a fatal case of meningoencephalitis caused by St. Louis encephalitis virus in a patient from California in September 2016. This case is associated with the recent 2015-2016 reemergence of this virus in the southwestern United States.

  6. Unbiased Taxonomic Annotation of Metagenomic Samples

    PubMed Central

    Fosso, Bruno; Pesole, Graziano; Rosselló, Francesc

    2018-01-01

    Abstract The classification of reads from a metagenomic sample using a reference taxonomy is usually based on first mapping the reads to the reference sequences and then classifying each read at a node under the lowest common ancestor of the candidate sequences in the reference taxonomy with the least classification error. However, this taxonomic annotation can be biased by an imbalanced taxonomy and also by the presence of multiple nodes in the taxonomy with the least classification error for a given read. In this article, we show that the Rand index is a better indicator of classification error than the often used area under the receiver operating characteristic (ROC) curve and F-measure for both balanced and imbalanced reference taxonomies, and we also address the second source of bias by reducing the taxonomic annotation problem for a whole metagenomic sample to a set cover problem, for which a logarithmic approximation can be obtained in linear time and an exact solution can be obtained by integer linear programming. Experimental results with a proof-of-concept implementation of the set cover approach to taxonomic annotation in a next release of the TANGO software show that the set cover approach further reduces ambiguity in the taxonomic annotation obtained with TANGO without distorting the relative abundance profile of the metagenomic sample. PMID:29028181

  7. MIPE: A metagenome-based community structure explorer and SSU primer evaluation tool

    PubMed Central

    Zhou, Quan

    2017-01-01

    An understanding of microbial community structure is an important issue in the field of molecular ecology. The traditional molecular method involves amplification of small subunit ribosomal RNA (SSU rRNA) genes by polymerase chain reaction (PCR). However, PCR-based amplicon approaches are affected by primer bias and chimeras. With the development of high-throughput sequencing technology, unbiased SSU rRNA gene sequences can be mined from shotgun sequencing-based metagenomic or metatranscriptomic datasets to obtain a reflection of the microbial community structure in specific types of environment and to evaluate SSU primers. However, the use of short reads obtained through next-generation sequencing for primer evaluation has not been well resolved. The software MIPE (MIcrobiota metagenome Primer Explorer) was developed to adapt numerous short reads from metagenomes and metatranscriptomes. Using metagenomic or metatranscriptomic datasets as input, MIPE extracts and aligns rRNA to reveal detailed information on microbial composition and evaluate SSU rRNA primers. A mock dataset, a real Metagenomics Rapid Annotation using Subsystem Technology (MG-RAST) test dataset, two PrimerProspector test datasets and a real metatranscriptomic dataset were used to validate MIPE. The software calls Mothur (v1.33.3) and the SILVA database (v119) for the alignment and classification of rRNA genes from a metagenome or metatranscriptome. MIPE can effectively extract shotgun rRNA reads from a metagenome or metatranscriptome and is capable of classifying these sequences and exhibiting sensitivity to different SSU rRNA PCR primers. Therefore, MIPE can be used to guide primer design for specific environmental samples. PMID:28350876

  8. Virus Identification in Unknown Tropical Febrile Illness Cases Using Deep Sequencing

    PubMed Central

    Balmaseda, Angel; Harris, Eva; DeRisi, Joseph L.

    2012-01-01

    Dengue virus is an emerging infectious agent that infects an estimated 50–100 million people annually worldwide, yet current diagnostic practices cannot detect an etiologic pathogen in ∼40% of dengue-like illnesses. Metagenomic approaches to pathogen detection, such as viral microarrays and deep sequencing, are promising tools to address emerging and non-diagnosable disease challenges. In this study, we used the Virochip microarray and deep sequencing to characterize the spectrum of viruses present in human sera from 123 Nicaraguan patients presenting with dengue-like symptoms but testing negative for dengue virus. We utilized a barcoding strategy to simultaneously deep sequence multiple serum specimens, generating on average over 1 million reads per sample. We then implemented a stepwise bioinformatic filtering pipeline to remove the majority of human and low-quality sequences to improve the speed and accuracy of subsequent unbiased database searches. By deep sequencing, we were able to detect virus sequence in 37% (45/123) of previously negative cases. These included 13 cases with Human Herpesvirus 6 sequences. Other samples contained sequences with similarity to sequences from viruses in the Herpesviridae, Flaviviridae, Circoviridae, Anelloviridae, Asfarviridae, and Parvoviridae families. In some cases, the putative viral sequences were virtually identical to known viruses, and in others they diverged, suggesting that they may derive from novel viruses. These results demonstrate the utility of unbiased metagenomic approaches in the detection of known and divergent viruses in the study of tropical febrile illness. PMID:22347512

  9. Metagenomic Sequencing of an Echovirus 30 Genome From Cerebrospinal Fluid of a Patient With Aseptic Meningitis and Orchitis.

    PubMed

    Piantadosi, Anne; Mukerji, Shibani S; Chitneni, Pooja; Cho, Tracey A; Cosimi, Lisa A; Hung, Deborah T; Goldberg, Marcia B; Sabeti, Pardis C; Kuritzkes, Daniel R; Grad, Yonatan H

    2017-01-01

    Enteroviruses cause a wide spectrum of clinical disease. In this study, we describe the case of a young man with orchitis and aseptic meningitis who was diagnosed with enterovirus infection. Using unbiased "metagenomic" massively parallel sequencing, we assembled a near-complete viral genome, the first use of this method for full-genome viral sequencing from cerebrospinal fluid. We found that the genome belonged to the subgroup echovirus 30, which is a common cause of aseptic meningitis but has not been previously reported to cause orchitis.

  10. Validation of Metagenomic Next-Generation Sequencing Tests for Universal Pathogen Detection.

    PubMed

    Schlaberg, Robert; Chiu, Charles Y; Miller, Steve; Procop, Gary W; Weinstock, George

    2017-06-01

    - Metagenomic sequencing can be used for detection of any pathogens using unbiased, shotgun next-generation sequencing (NGS), without the need for sequence-specific amplification. Proof-of-concept has been demonstrated in infectious disease outbreaks of unknown causes and in patients with suspected infections but negative results for conventional tests. Metagenomic NGS tests hold great promise to improve infectious disease diagnostics, especially in immunocompromised and critically ill patients. - To discuss challenges and provide example solutions for validating metagenomic pathogen detection tests in clinical laboratories. A summary of current regulatory requirements, largely based on prior guidance for NGS testing in constitutional genetics and oncology, is provided. - Examples from 2 separate validation studies are provided for steps from assay design, and validation of wet bench and bioinformatics protocols, to quality control and assurance. - Although laboratory and data analysis workflows are still complex, metagenomic NGS tests for infectious diseases are increasingly being validated in clinical laboratories. Many parallels exist to NGS tests in other fields. Nevertheless, specimen preparation, rapidly evolving data analysis algorithms, and incomplete reference sequence databases are idiosyncratic to the field of microbiology and often overlooked.

  11. Metagenomic Analysis of Viruses in Feces from Unsolved Outbreaks of Gastroenteritis in Humans

    PubMed Central

    Moore, Nicole E.; Wang, Jing; Hewitt, Joanne; Croucher, Dawn; Williamson, Deborah A.; Paine, Shevaun; Yen, Seiha; Greening, Gail E.

    2014-01-01

    The etiology of an outbreak of gastroenteritis in humans cannot always be determined, and ∼25% of outbreaks remain unsolved in New Zealand. It is hypothesized that novel viruses may account for a proportion of unsolved cases, and new unbiased high-throughput sequencing methods hold promise for their detection. Analysis of the fecal metagenome can reveal the presence of viruses, bacteria, and parasites which may have evaded routine diagnostic testing. Thirty-one fecal samples from 26 gastroenteritis outbreaks of unknown etiology occurring in New Zealand between 2011 and 2012 were selected for de novo metagenomic analysis. A total data set of 193 million sequence reads of 150 bp in length was produced on an Illumina MiSeq. The metagenomic data set was searched for virus and parasite sequences, with no evidence of novel pathogens found. Eight viruses and one parasite were detected, each already known to be associated with gastroenteritis, including adenovirus, rotavirus, sapovirus, and Dientamoeba fragilis. In addition, we also describe the first detection of human parechovirus 3 (HPeV3) in Australasia. Metagenomics may thus provide a useful audit tool when applied retrospectively to determine where routine diagnostic processes may have failed to detect a pathogen. PMID:25339401

  12. MetaGenSense: A web-application for analysis and exploration of high throughput sequencing metagenomic data

    PubMed Central

    Denis, Jean-Baptiste; Vandenbogaert, Mathias; Caro, Valérie

    2016-01-01

    The detection and characterization of emerging infectious agents has been a continuing public health concern. High Throughput Sequencing (HTS) or Next-Generation Sequencing (NGS) technologies have proven to be promising approaches for efficient and unbiased detection of pathogens in complex biological samples, providing access to comprehensive analyses. As NGS approaches typically yield millions of putatively representative reads per sample, efficient data management and visualization resources have become mandatory. Most usually, those resources are implemented through a dedicated Laboratory Information Management System (LIMS), solely to provide perspective regarding the available information. We developed an easily deployable web-interface, facilitating management and bioinformatics analysis of metagenomics data-samples. It was engineered to run associated and dedicated Galaxy workflows for the detection and eventually classification of pathogens. The web application allows easy interaction with existing Galaxy metagenomic workflows, facilitates the organization, exploration and aggregation of the most relevant sample-specific sequences among millions of genomic sequences, allowing them to determine their relative abundance, and associate them to the most closely related organism or pathogen. The user-friendly Django-Based interface, associates the users’ input data and its metadata through a bio-IT provided set of resources (a Galaxy instance, and both sufficient storage and grid computing power). Galaxy is used to handle and analyze the user’s input data from loading, indexing, mapping, assembly and DB-searches. Interaction between our application and Galaxy is ensured by the BioBlend library, which gives API-based access to Galaxy’s main features. Metadata about samples, runs, as well as the workflow results are stored in the LIMS. For metagenomic classification and exploration purposes, we show, as a proof of concept, that integration of intuitive exploratory tools, like Krona for representation of taxonomic classification, can be achieved very easily. In the trend of Galaxy, the interface enables the sharing of scientific results to fellow team members. PMID:28451381

  13. MetaGenSense: A web-application for analysis and exploration of high throughput sequencing metagenomic data.

    PubMed

    Correia, Damien; Doppelt-Azeroual, Olivia; Denis, Jean-Baptiste; Vandenbogaert, Mathias; Caro, Valérie

    2015-01-01

    The detection and characterization of emerging infectious agents has been a continuing public health concern. High Throughput Sequencing (HTS) or Next-Generation Sequencing (NGS) technologies have proven to be promising approaches for efficient and unbiased detection of pathogens in complex biological samples, providing access to comprehensive analyses. As NGS approaches typically yield millions of putatively representative reads per sample, efficient data management and visualization resources have become mandatory. Most usually, those resources are implemented through a dedicated Laboratory Information Management System (LIMS), solely to provide perspective regarding the available information. We developed an easily deployable web-interface, facilitating management and bioinformatics analysis of metagenomics data-samples. It was engineered to run associated and dedicated Galaxy workflows for the detection and eventually classification of pathogens. The web application allows easy interaction with existing Galaxy metagenomic workflows, facilitates the organization, exploration and aggregation of the most relevant sample-specific sequences among millions of genomic sequences, allowing them to determine their relative abundance, and associate them to the most closely related organism or pathogen. The user-friendly Django-Based interface, associates the users' input data and its metadata through a bio-IT provided set of resources (a Galaxy instance, and both sufficient storage and grid computing power). Galaxy is used to handle and analyze the user's input data from loading, indexing, mapping, assembly and DB-searches. Interaction between our application and Galaxy is ensured by the BioBlend library, which gives API-based access to Galaxy's main features. Metadata about samples, runs, as well as the workflow results are stored in the LIMS. For metagenomic classification and exploration purposes, we show, as a proof of concept, that integration of intuitive exploratory tools, like Krona for representation of taxonomic classification, can be achieved very easily. In the trend of Galaxy, the interface enables the sharing of scientific results to fellow team members.

  14. Picoplankton Bloom in Global South? A High Fraction of Aerobic Anoxygenic Phototrophic Bacteria in Metagenomes from a Coastal Bay (Arraial do Cabo--Brazil).

    PubMed

    Cuadrat, Rafael R C; Ferrera, Isabel; Grossart, Hans-Peter; Dávila, Alberto M R

    2016-02-01

    Marine habitats harbor a great diversity of microorganism from the three domains of life, only a small fraction of which can be cultivated. Metagenomic approaches are increasingly popular for addressing microbial diversity without culture, serving as sensitive and relatively unbiased methods for identifying and cataloging the diversity of nucleic acid sequences derived from organisms in environmental samples. Aerobic anoxygenic phototrophic bacteria (AAP) play important roles in carbon and energy cycling in aquatic systems. In oceans, those bacteria are widely distributed; however, their abundance and importance are still poorly understood. The aim of this study was to estimate abundance and diversity of AAPs in metagenomes from an upwelling affected coastal bay in Arraial do Cabo, Brazil, using in silico screening for the anoxygenic photosynthesis core genes. Metagenomes from the Global Ocean Sample Expedition (GOS) were screened for comparative purposes. AAPs were highly abundant in the free-living bacterial fraction from Arraial do Cabo: 23.88% of total bacterial cells, compared with 15% in the GOS dataset. Of the ten most AAP abundant samples from GOS, eight were collected close to the Equator where solar irradiation is high year-round. We were able to assign most retrieved sequences to phylo-groups, with a particularly high abundance of Roseobacter in Arraial do Cabo samples. The high abundance of AAP in this tropical bay may be related to the upwelling phenomenon and subsequent picoplankton bloom. These results suggest a link between upwelling and light abundance and demonstrate AAP even in oligotrophic tropical and subtropical environments. Longitudinal studies in the Arraial do Cabo region are warranted to understand the dynamics of AAP at different locations and seasons, and the ecological role of these unique bacteria for biogeochemical and energy cycling in the ocean.

  15. A Delphi Technology Foresight Study: Mapping Social Construction of Scientific Evidence on Metagenomics Tests for Water Safety

    PubMed Central

    Birko, Stanislav; Dove, Edward S.; Özdemir, Vural

    2015-01-01

    Access to clean water is a grand challenge in the 21st century. Water safety testing for pathogens currently depends on surrogate measures such as fecal indicator bacteria (e.g., E. coli). Metagenomics concerns high-throughput, culture-independent, unbiased shotgun sequencing of DNA from environmental samples that might transform water safety by detecting waterborne pathogens directly instead of their surrogates. Yet emerging innovations such as metagenomics are often fiercely contested. Innovations are subject to shaping/construction not only by technology but also social systems/values in which they are embedded, such as experts’ attitudes towards new scientific evidence. We conducted a classic three-round Delphi survey, comprised of 107 questions. A multidisciplinary expert panel (n = 24) representing the continuum of discovery scientists and policymakers evaluated the emergence of metagenomics tests. To the best of our knowledge, we report here the first Delphi foresight study of experts’ attitudes on (1) the top 10 priority evidentiary criteria for adoption of metagenomics tests for water safety, (2) the specific issues critical to governance of metagenomics innovation trajectory where there is consensus or dissensus among experts, (3) the anticipated time lapse from discovery to practice of metagenomics tests, and (4) the role and timing of public engagement in development of metagenomics tests. The ability of a test to distinguish between harmful and benign waterborne organisms, analytical/clinical sensitivity, and reproducibility were the top three evidentiary criteria for adoption of metagenomics. Experts agree that metagenomic testing will provide novel information but there is dissensus on whether metagenomics will replace the current water safety testing methods or impact the public health end points (e.g., reduction in boil water advisories). Interestingly, experts view the publics relevant in a “downstream capacity” for adoption of metagenomics rather than a co-productionist role at the “upstream” scientific design stage of metagenomics tests. In summary, these findings offer strategic foresight to govern metagenomics innovations symmetrically: by identifying areas where acceleration (e.g., consensus areas) and deceleration/reconsideration (e.g., dissensus areas) of the innovation trajectory might be warranted. Additionally, we show how scientific evidence is subject to potential social construction by experts’ value systems and the need for greater upstream public engagement on metagenomics innovations. PMID:26066837

  16. Horizontal gene transfer in an acid mine drainage microbial community.

    PubMed

    Guo, Jiangtao; Wang, Qi; Wang, Xiaoqi; Wang, Fumeng; Yao, Jinxian; Zhu, Huaiqiu

    2015-07-04

    Horizontal gene transfer (HGT) has been widely identified in complete prokaryotic genomes. However, the roles of HGT among members of a microbial community and in evolution remain largely unknown. With the emergence of metagenomics, it is nontrivial to investigate such horizontal flow of genetic materials among members in a microbial community from the natural environment. Because of the lack of suitable methods for metagenomics gene transfer detection, microorganisms from a low-complexity community acid mine drainage (AMD) with near-complete genomes were used to detect possible gene transfer events and suggest the biological significance. Using the annotation of coding regions by the current tools, a phylogenetic approach, and an approximately unbiased test, we found that HGTs in AMD organisms are not rare, and we predicted 119 putative transferred genes. Among them, 14 HGT events were determined to be transfer events among the AMD members. Further analysis of the 14 transferred genes revealed that the HGT events affected the functional evolution of archaea or bacteria in AMD, and it probably shaped the community structure, such as the dominance of G-plasma in archaea in AMD through HGT. Our study provides a novel insight into HGT events among microorganisms in natural communities. The interconnectedness between HGT and community evolution is essential to understand microbial community formation and development.

  17. Viruses in diarrhoeic dogs include novel kobuviruses and sapoviruses.

    PubMed

    Li, Linlin; Pesavento, Patricia A; Shan, Tongling; Leutenegger, Christian M; Wang, Chunlin; Delwart, Eric

    2011-11-01

    The close interactions of dogs with humans and surrounding wildlife provide frequent opportunities for cross-species virus transmissions. In order to initiate an unbiased characterization of the eukaryotic viruses in the gut of dogs, this study used deep sequencing of partially purified viral capsid-protected nucleic acids from the faeces of 18 diarrhoeic dogs. Known canine parvoviruses, coronaviruses and rotaviruses were identified, and the genomes of the first reported canine kobuvirus and sapovirus were characterized. Canine kobuvirus, the first sequenced canine picornavirus and the closest genetic relative of the diarrhoea-causing human Aichi virus, was detected at high frequency in the faeces of both healthy and diarrhoeic dogs. Canine sapovirus constituted a novel genogroup within the genus Sapovirus, a group of viruses also associated with human and animal diarrhoea. These results highlight the high frequency of new virus detection possible even in extensively studied animal species using metagenomics approaches, and provide viral genomes for further disease-association studies.

  18. DOE JGI Quality Metrics; Approaches to Scaling and Improving Metagenome Assembly (Metagenomics Informatics Challenges Workshop: 10K Genomes at a Time)

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Copeland, Alex; Brown, C. Titus

    2011-10-13

    DOE JGI's Alex Copeland on "DOE JGI Quality Metrics" and Michigan State University's C. Titus Brown on "Approaches to Scaling and Improving Metagenome Assembly" at the Metagenomics Informatics Challenges Workshop held at the DOE JGI on October 12-13, 2011.

  19. DOE JGI Quality Metrics; Approaches to Scaling and Improving Metagenome Assembly (Metagenomics Informatics Challenges Workshop: 10K Genomes at a Time)

    ScienceCinema

    Copeland, Alex; Brown, C. Titus

    2018-04-27

    DOE JGI's Alex Copeland on "DOE JGI Quality Metrics" and Michigan State University's C. Titus Brown on "Approaches to Scaling and Improving Metagenome Assembly" at the Metagenomics Informatics Challenges Workshop held at the DOE JGI on October 12-13, 2011.

  20. Swine Fecal Metagenomics

    EPA Science Inventory

    Metagenomic approaches are providing rapid and more robust means to investigate the composition and functional genetic potential of complex microbial communities. In this study, we utilized a metagenomic approach to further understand the functional diversity of the swine gut. To...

  1. Unbiased Strain-Typing of Arbovirus Directly from Mosquitoes Using Nanopore Sequencing: A Field-forward Biosurveillance Protocol.

    PubMed

    Russell, Joseph A; Campos, Brittany; Stone, Jennifer; Blosser, Erik M; Burkett-Cadena, Nathan; Jacobs, Jonathan L

    2018-04-03

    The future of infectious disease surveillance and outbreak response is trending towards smaller hand-held solutions for point-of-need pathogen detection. Here, samples of Culex cedecei mosquitoes collected in Southern Florida, USA were tested for Venezuelan Equine Encephalitis Virus (VEEV), a previously-weaponized arthropod-borne RNA-virus capable of causing acute and fatal encephalitis in animal and human hosts. A single 20-mosquito pool tested positive for VEEV by quantitative reverse transcription polymerase chain reaction (RT-qPCR) on the Biomeme two3. The virus-positive sample was subjected to unbiased metatranscriptome sequencing on the Oxford Nanopore MinION and shown to contain Everglades Virus (EVEV), an alphavirus in the VEEV serocomplex. Our results demonstrate, for the first time, the use of unbiased sequence-based detection and subtyping of a high-consequence biothreat pathogen directly from an environmental sample using field-forward protocols. The development and validation of methods designed for field-based diagnostic metagenomics and pathogen discovery, such as those suitable for use in mobile "pocket laboratories", will address a growing demand for public health teams to carry out their mission where it is most urgent: at the point-of-need.

  2. Comparison of normalization methods for the analysis of metagenomic gene abundance data.

    PubMed

    Pereira, Mariana Buongermino; Wallroth, Mikael; Jonsson, Viktor; Kristiansson, Erik

    2018-04-20

    In shotgun metagenomics, microbial communities are studied through direct sequencing of DNA without any prior cultivation. By comparing gene abundances estimated from the generated sequencing reads, functional differences between the communities can be identified. However, gene abundance data is affected by high levels of systematic variability, which can greatly reduce the statistical power and introduce false positives. Normalization, which is the process where systematic variability is identified and removed, is therefore a vital part of the data analysis. A wide range of normalization methods for high-dimensional count data has been proposed but their performance on the analysis of shotgun metagenomic data has not been evaluated. Here, we present a systematic evaluation of nine normalization methods for gene abundance data. The methods were evaluated through resampling of three comprehensive datasets, creating a realistic setting that preserved the unique characteristics of metagenomic data. Performance was measured in terms of the methods ability to identify differentially abundant genes (DAGs), correctly calculate unbiased p-values and control the false discovery rate (FDR). Our results showed that the choice of normalization method has a large impact on the end results. When the DAGs were asymmetrically present between the experimental conditions, many normalization methods had a reduced true positive rate (TPR) and a high false positive rate (FPR). The methods trimmed mean of M-values (TMM) and relative log expression (RLE) had the overall highest performance and are therefore recommended for the analysis of gene abundance data. For larger sample sizes, CSS also showed satisfactory performance. This study emphasizes the importance of selecting a suitable normalization methods in the analysis of data from shotgun metagenomics. Our results also demonstrate that improper methods may result in unacceptably high levels of false positives, which in turn may lead to incorrect or obfuscated biological interpretation.

  3. A cloud-compatible bioinformatics pipeline for ultrarapid pathogen identification from next-generation sequencing of clinical samples

    PubMed Central

    Naccache, Samia N.; Federman, Scot; Veeraraghavan, Narayanan; Zaharia, Matei; Lee, Deanna; Samayoa, Erik; Bouquet, Jerome; Greninger, Alexander L.; Luk, Ka-Cheung; Enge, Barryett; Wadford, Debra A.; Messenger, Sharon L.; Genrich, Gillian L.; Pellegrino, Kristen; Grard, Gilda; Leroy, Eric; Schneider, Bradley S.; Fair, Joseph N.; Martínez, Miguel A.; Isa, Pavel; Crump, John A.; DeRisi, Joseph L.; Sittler, Taylor; Hackett, John; Miller, Steve; Chiu, Charles Y.

    2014-01-01

    Unbiased next-generation sequencing (NGS) approaches enable comprehensive pathogen detection in the clinical microbiology laboratory and have numerous applications for public health surveillance, outbreak investigation, and the diagnosis of infectious diseases. However, practical deployment of the technology is hindered by the bioinformatics challenge of analyzing results accurately and in a clinically relevant timeframe. Here we describe SURPI (“sequence-based ultrarapid pathogen identification”), a computational pipeline for pathogen identification from complex metagenomic NGS data generated from clinical samples, and demonstrate use of the pipeline in the analysis of 237 clinical samples comprising more than 1.1 billion sequences. Deployable on both cloud-based and standalone servers, SURPI leverages two state-of-the-art aligners for accelerated analyses, SNAP and RAPSearch, which are as accurate as existing bioinformatics tools but orders of magnitude faster in performance. In fast mode, SURPI detects viruses and bacteria by scanning data sets of 7–500 million reads in 11 min to 5 h, while in comprehensive mode, all known microorganisms are identified, followed by de novo assembly and protein homology searches for divergent viruses in 50 min to 16 h. SURPI has also directly contributed to real-time microbial diagnosis in acutely ill patients, underscoring its potential key role in the development of unbiased NGS-based clinical assays in infectious diseases that demand rapid turnaround times. PMID:24899342

  4. Comparing viral metagenomics methods using a highly multiplexed human viral pathogens reagent

    PubMed Central

    Li, Linlin; Deng, Xutao; Mee, Edward T.; Collot-Teixeira, Sophie; Anderson, Rob; Schepelmann, Silke; Minor, Philip D.; Delwart, Eric

    2014-01-01

    Unbiased metagenomic sequencing holds significant potential as a diagnostic tool for the simultaneous detection of any previously genetically described viral nucleic acids in clinical samples. Viral genome sequences can also inform on likely phenotypes including drug susceptibility or neutralization serotypes. In this study, different variables of the laboratory methods often used to generate viral metagenomics libraries on the efficiency of viral detection and virus genome coverage were compared. A biological reagent consisting of 25 different human RNA and DNA viral pathogens was used to estimate the effect of filtration and nuclease digestion, DNA/RNA extraction methods, pre-amplification and the use of different library preparation kits on the detection of viral nucleic acids. Filtration and nuclease treatment led to slight decreases in the percentage of viral sequence reads and number of viruses detected. For nucleic acid extractions silica spin columns improved viral sequence recovery relative to magnetic beads and Trizol extraction. Pre-amplification using random RT-PCR while generating more viral sequence reads resulted in detection of fewer viruses, more overlapping sequences, and lower genome coverage. The ScriptSeq library preparation method retrieved more viruses and a greater fraction of their genomes than the TruSeq and Nextera methods. Viral metagenomics sequencing was able to simultaneously detect up to 22 different viruses in the biological reagent analyzed including all those detected by qPCR. Further optimization will be required for the detection of viruses in biologically more complex samples such as tissues, blood, or feces. PMID:25497414

  5. Acute West Nile Virus Meningoencephalitis Diagnosed Via Metagenomic Deep Sequencing of Cerebrospinal Fluid in a Renal Transplant Patient.

    PubMed

    Wilson, M R; Zimmermann, L L; Crawford, E D; Sample, H A; Soni, P R; Baker, A N; Khan, L M; DeRisi, J L

    2017-03-01

    Solid organ transplant patients are vulnerable to suffering neurologic complications from a wide array of viral infections and can be sentinels in the population who are first to get serious complications from emerging infections like the recent waves of arboviruses, including West Nile virus, Chikungunya virus, Zika virus, and Dengue virus. The diverse and rapidly changing landscape of possible causes of viral encephalitis poses great challenges for traditional candidate-based infectious disease diagnostics that already fail to identify a causative pathogen in approximately 50% of encephalitis cases. We present the case of a 14-year-old girl on immunosuppression for a renal transplant who presented with acute meningoencephalitis. Traditional diagnostics failed to identify an etiology. RNA extracted from her cerebrospinal fluid was subjected to unbiased metagenomic deep sequencing, enhanced with the use of a Cas9-based technique for host depletion. This analysis identified West Nile virus (WNV). Convalescent serum serologies subsequently confirmed WNV seroconversion. These results support a clear clinical role for metagenomic deep sequencing in the setting of suspected viral encephalitis, especially in the context of the high-risk transplant patient population. © 2016 The Authors. American Journal of Transplantation published by Wiley Periodicals, Inc. on behalf of American Society of Transplant Surgeons.

  6. A Graph-Centric Approach for Metagenome-Guided Peptide and Protein Identification in Metaproteomics

    PubMed Central

    Tang, Haixu; Li, Sujun; Ye, Yuzhen

    2016-01-01

    Metaproteomic studies adopt the common bottom-up proteomics approach to investigate the protein composition and the dynamics of protein expression in microbial communities. When matched metagenomic and/or metatranscriptomic data of the microbial communities are available, metaproteomic data analyses often employ a metagenome-guided approach, in which complete or fragmental protein-coding genes are first directly predicted from metagenomic (and/or metatranscriptomic) sequences or from their assemblies, and the resulting protein sequences are then used as the reference database for peptide/protein identification from MS/MS spectra. This approach is often limited because protein coding genes predicted from metagenomes are incomplete and fragmental. In this paper, we present a graph-centric approach to improving metagenome-guided peptide and protein identification in metaproteomics. Our method exploits the de Bruijn graph structure reported by metagenome assembly algorithms to generate a comprehensive database of protein sequences encoded in the community. We tested our method using several public metaproteomic datasets with matched metagenomic and metatranscriptomic sequencing data acquired from complex microbial communities in a biological wastewater treatment plant. The results showed that many more peptides and proteins can be identified when assembly graphs were utilized, improving the characterization of the proteins expressed in the microbial communities. The additional proteins we identified contribute to the characterization of important pathways such as those involved in degradation of chemical hazards. Our tools are released as open-source software on github at https://github.com/COL-IU/Graph2Pro. PMID:27918579

  7. Current and future resources for functional metagenomics

    PubMed Central

    Lam, Kathy N.; Cheng, Jiujun; Engel, Katja; Neufeld, Josh D.; Charles, Trevor C.

    2015-01-01

    Functional metagenomics is a powerful experimental approach for studying gene function, starting from the extracted DNA of mixed microbial populations. A functional approach relies on the construction and screening of metagenomic libraries—physical libraries that contain DNA cloned from environmental metagenomes. The information obtained from functional metagenomics can help in future annotations of gene function and serve as a complement to sequence-based metagenomics. In this Perspective, we begin by summarizing the technical challenges of constructing metagenomic libraries and emphasize their value as resources. We then discuss libraries constructed using the popular cloning vector, pCC1FOS, and highlight the strengths and shortcomings of this system, alongside possible strategies to maximize existing pCC1FOS-based libraries by screening in diverse hosts. Finally, we discuss the known bias of libraries constructed from human gut and marine water samples, present results that suggest bias may also occur for soil libraries, and consider factors that bias metagenomic libraries in general. We anticipate that discussion of current resources and limitations will advance tools and technologies for functional metagenomics research. PMID:26579102

  8. Current and future resources for functional metagenomics.

    PubMed

    Lam, Kathy N; Cheng, Jiujun; Engel, Katja; Neufeld, Josh D; Charles, Trevor C

    2015-01-01

    Functional metagenomics is a powerful experimental approach for studying gene function, starting from the extracted DNA of mixed microbial populations. A functional approach relies on the construction and screening of metagenomic libraries-physical libraries that contain DNA cloned from environmental metagenomes. The information obtained from functional metagenomics can help in future annotations of gene function and serve as a complement to sequence-based metagenomics. In this Perspective, we begin by summarizing the technical challenges of constructing metagenomic libraries and emphasize their value as resources. We then discuss libraries constructed using the popular cloning vector, pCC1FOS, and highlight the strengths and shortcomings of this system, alongside possible strategies to maximize existing pCC1FOS-based libraries by screening in diverse hosts. Finally, we discuss the known bias of libraries constructed from human gut and marine water samples, present results that suggest bias may also occur for soil libraries, and consider factors that bias metagenomic libraries in general. We anticipate that discussion of current resources and limitations will advance tools and technologies for functional metagenomics research.

  9. A cloud-compatible bioinformatics pipeline for ultrarapid pathogen identification from next-generation sequencing of clinical samples.

    PubMed

    Naccache, Samia N; Federman, Scot; Veeraraghavan, Narayanan; Zaharia, Matei; Lee, Deanna; Samayoa, Erik; Bouquet, Jerome; Greninger, Alexander L; Luk, Ka-Cheung; Enge, Barryett; Wadford, Debra A; Messenger, Sharon L; Genrich, Gillian L; Pellegrino, Kristen; Grard, Gilda; Leroy, Eric; Schneider, Bradley S; Fair, Joseph N; Martínez, Miguel A; Isa, Pavel; Crump, John A; DeRisi, Joseph L; Sittler, Taylor; Hackett, John; Miller, Steve; Chiu, Charles Y

    2014-07-01

    Unbiased next-generation sequencing (NGS) approaches enable comprehensive pathogen detection in the clinical microbiology laboratory and have numerous applications for public health surveillance, outbreak investigation, and the diagnosis of infectious diseases. However, practical deployment of the technology is hindered by the bioinformatics challenge of analyzing results accurately and in a clinically relevant timeframe. Here we describe SURPI ("sequence-based ultrarapid pathogen identification"), a computational pipeline for pathogen identification from complex metagenomic NGS data generated from clinical samples, and demonstrate use of the pipeline in the analysis of 237 clinical samples comprising more than 1.1 billion sequences. Deployable on both cloud-based and standalone servers, SURPI leverages two state-of-the-art aligners for accelerated analyses, SNAP and RAPSearch, which are as accurate as existing bioinformatics tools but orders of magnitude faster in performance. In fast mode, SURPI detects viruses and bacteria by scanning data sets of 7-500 million reads in 11 min to 5 h, while in comprehensive mode, all known microorganisms are identified, followed by de novo assembly and protein homology searches for divergent viruses in 50 min to 16 h. SURPI has also directly contributed to real-time microbial diagnosis in acutely ill patients, underscoring its potential key role in the development of unbiased NGS-based clinical assays in infectious diseases that demand rapid turnaround times. © 2014 Naccache et al.; Published by Cold Spring Harbor Laboratory Press.

  10. MBMC: An Effective Markov Chain Approach for Binning Metagenomic Reads from Environmental Shotgun Sequencing Projects.

    PubMed

    Wang, Ying; Hu, Haiyan; Li, Xiaoman

    2016-08-01

    Metagenomics is a next-generation omics field currently impacting postgenomic life sciences and medicine. Binning metagenomic reads is essential for the understanding of microbial function, compositions, and interactions in given environments. Despite the existence of dozens of computational methods for metagenomic read binning, it is still very challenging to bin reads. This is especially true for reads from unknown species, from species with similar abundance, and/or from low-abundance species in environmental samples. In this study, we developed a novel taxonomy-dependent and alignment-free approach called MBMC (Metagenomic Binning by Markov Chains). Different from all existing methods, MBMC bins reads by measuring the similarity of reads to the trained Markov chains for different taxa instead of directly comparing reads with known genomic sequences. By testing on more than 24 simulated and experimental datasets with species of similar abundance, species of low abundance, and/or unknown species, we report here that MBMC reliably grouped reads from different species into separate bins. Compared with four existing approaches, we demonstrated that the performance of MBMC was comparable with existing approaches when binning reads from sequenced species, and superior to existing approaches when binning reads from unknown species. MBMC is a pivotal tool for binning metagenomic reads in the current era of Big Data and postgenomic integrative biology. The MBMC software can be freely downloaded at http://hulab.ucf.edu/research/projects/metagenomics/MBMC.html .

  11. Metagenomic approaches to exploit the biotechnological potential of the microbial consortia of marine sponges.

    PubMed

    Kennedy, Jonathan; Marchesi, Julian R; Dobson, Alan D W

    2007-05-01

    Natural products isolated from sponges are an important source of new biologically active compounds. However, the development of these compounds into drugs has been held back by the difficulties in achieving a sustainable supply of these often-complex molecules for pre-clinical and clinical development. Increasing evidence implicates microbial symbionts as the source of many of these biologically active compounds, but the vast majority of the sponge microbial community remain uncultured. Metagenomics offers a biotechnological solution to this supply problem. Metagenomes of sponge microbial communities have been shown to contain genes and gene clusters typical for the biosynthesis of biologically active natural products. Heterologous expression approaches have also led to the isolation of secondary metabolism gene clusters from uncultured microbial symbionts of marine invertebrates and from soil metagenomic libraries. Combining a metagenomic approach with heterologous expression holds much promise for the sustainable exploitation of the chemical diversity present in the sponge microbial community.

  12. Functional Responses of Salt Marsh Microbial Communities to Long-Term Nutrient Enrichment

    PubMed Central

    Graves, Christopher J.; Makrides, Elizabeth J.; Schmidt, Victor T.; Giblin, Anne E.; Cardon, Zoe G.

    2016-01-01

    ABSTRACT Environmental nutrient enrichment from human agricultural and waste runoff could cause changes to microbial communities that allow them to capitalize on newly available resources. Currently, the response of microbial communities to nutrient enrichment remains poorly understood, and, while some studies have shown no clear changes in community composition in response to heavy nutrient loading, others targeting specific genes have demonstrated clear impacts. In this study, we compared functional metagenomic profiles from sediment samples taken along two salt marsh creeks, one of which was exposed for more than 40 years to treated sewage effluent at its head. We identified strong and consistent increases in the relative abundance of microbial genes related to each of the biochemical steps in the denitrification pathway at enriched sites. Despite fine-scale local increases in the abundance of denitrification-related genes, the overall community structures based on broadly defined functional groups and taxonomic annotations were similar and varied with other environmental factors, such as salinity, which were common to both creeks. Homology-based taxonomic assignments of nitrous oxide reductase sequences in our data show that increases are spread over a broad taxonomic range, thus limiting detection from taxonomic data alone. Together, these results illustrate a functionally targeted yet taxonomically broad response of microbial communities to anthropogenic nutrient loading, indicating some resolution to the apparently conflicting results of existing studies on the impacts of nutrient loading in sediment communities. IMPORTANCE In this study, we used environmental metagenomics to assess the response of microbial communities in estuarine sediments to long-term, nutrient-rich sewage effluent exposure. Unlike previous studies, which have mainly characterized communities based on taxonomic data or primer-based amplification of specific target genes, our whole-genome metagenomics approach allowed an unbiased assessment of the abundance of denitrification-related genes across the entire community. We identified strong and consistent increases in the relative abundance of gene sequences related to denitrification pathways across a broad phylogenetic range at sites exposed to long-term nutrient addition. While further work is needed to determine the consequences of these community responses in regulating environmental nutrient cycles, the increased abundance of bacteria harboring denitrification genes suggests that such processes may be locally upregulated. In addition, our results illustrate how whole-genome metagenomics combined with targeted hypothesis testing can reveal fine-scale responses of microbial communities to environmental disturbance. PMID:26944843

  13. Functional Responses of Salt Marsh Microbial Communities to Long-Term Nutrient Enrichment.

    PubMed

    Graves, Christopher J; Makrides, Elizabeth J; Schmidt, Victor T; Giblin, Anne E; Cardon, Zoe G; Rand, David M

    2016-05-01

    Environmental nutrient enrichment from human agricultural and waste runoff could cause changes to microbial communities that allow them to capitalize on newly available resources. Currently, the response of microbial communities to nutrient enrichment remains poorly understood, and, while some studies have shown no clear changes in community composition in response to heavy nutrient loading, others targeting specific genes have demonstrated clear impacts. In this study, we compared functional metagenomic profiles from sediment samples taken along two salt marsh creeks, one of which was exposed for more than 40 years to treated sewage effluent at its head. We identified strong and consistent increases in the relative abundance of microbial genes related to each of the biochemical steps in the denitrification pathway at enriched sites. Despite fine-scale local increases in the abundance of denitrification-related genes, the overall community structures based on broadly defined functional groups and taxonomic annotations were similar and varied with other environmental factors, such as salinity, which were common to both creeks. Homology-based taxonomic assignments of nitrous oxide reductase sequences in our data show that increases are spread over a broad taxonomic range, thus limiting detection from taxonomic data alone. Together, these results illustrate a functionally targeted yet taxonomically broad response of microbial communities to anthropogenic nutrient loading, indicating some resolution to the apparently conflicting results of existing studies on the impacts of nutrient loading in sediment communities. In this study, we used environmental metagenomics to assess the response of microbial communities in estuarine sediments to long-term, nutrient-rich sewage effluent exposure. Unlike previous studies, which have mainly characterized communities based on taxonomic data or primer-based amplification of specific target genes, our whole-genome metagenomics approach allowed an unbiased assessment of the abundance of denitrification-related genes across the entire community. We identified strong and consistent increases in the relative abundance of gene sequences related to denitrification pathways across a broad phylogenetic range at sites exposed to long-term nutrient addition. While further work is needed to determine the consequences of these community responses in regulating environmental nutrient cycles, the increased abundance of bacteria harboring denitrification genes suggests that such processes may be locally upregulated. In addition, our results illustrate how whole-genome metagenomics combined with targeted hypothesis testing can reveal fine-scale responses of microbial communities to environmental disturbance. Copyright © 2016 Graves et al.

  14. Metagenomic analysis of viral diversity in respiratory samples from patients with respiratory tract infections in Kuwait.

    PubMed

    Madi, Nada; Al-Nakib, Widad; Mustafa, Abu Salim; Habibi, Nazima

    2018-03-01

    A metagenomic approach based on target independent next-generation sequencing has become a known method for the detection of both known and novel viruses in clinical samples. This study aimed to use the metagenomic sequencing approach to characterize the viral diversity in respiratory samples from patients with respiratory tract infections. We have investigated 86 respiratory samples received from various hospitals in Kuwait between 2015 and 2016 for the diagnosis of respiratory tract infections. A metagenomic approach using the next-generation sequencer to characterize viruses was used. According to the metagenomic analysis, an average of 145, 019 reads were identified, and 2% of these reads were of viral origin. Also, metagenomic analysis of the viral sequences revealed many known respiratory viruses, which were detected in 30.2% of the clinical samples. Also, sequences of non-respiratory viruses were detected in 14% of the clinical samples, while sequences of non-human viruses were detected in 55.8% of the clinical samples. The average genome coverage of the viruses was 12% with the highest genome coverage of 99.2% for respiratory syncytial virus, and the lowest was 1% for torque teno midi virus 2. Our results showed 47.7% agreement between multiplex Real-Time PCR and metagenomics sequencing in the detection of respiratory viruses in the clinical samples. Though there are some difficulties in using this method to clinical samples such as specimen quality, these observations are indicative of the promising utility of the metagenomic sequencing approach for the identification of respiratory viruses in patients with respiratory tract infections. © 2017 Wiley Periodicals, Inc.

  15. Detection of the plasmid-mediated colistin-resistance gene mcr-1 in faecal metagenomes of Dutch travellers.

    PubMed

    von Wintersdorff, Christian J H; Wolffs, Petra F G; van Niekerk, Julius M; Beuken, Erik; van Alphen, Lieke B; Stobberingh, Ellen E; Oude Lashof, Astrid M L; Hoebe, Christian J P A; Savelkoul, Paul H M; Penders, John

    2016-12-01

    Recently, the first plasmid-mediated colistin-resistance gene, mcr-1, was reported. Colistin is increasingly used as an antibiotic of last resort for the treatment of infections caused by carbapenem-resistant bacteria, which have been rapidly disseminating worldwide in recent years. The reported carriage rate of mcr-1 in humans remains sporadic thus far, except for those reported in Chinese populations. We aimed to determine its presence in the faecal metagenomes of healthy Dutch travellers between 2010 and 2012. Faecal metagenomic DNA of pre- and post-travel samples from 122 healthy Dutch long-distance travellers was screened for the presence of mcr-1 using a TaqMan quantitative PCR assay, which was designed in this study. All positive samples were confirmed by sequencing of the amplicons. The mcr-1 gene was detected in 6 (4.9%, 95% CI = 2.1%-10.5%) of 122 healthy Dutch long-distance travellers after they had visited destinations in South(-east) Asia or southern Africa between 2011 and 2012. One of these participants was already found to be positive before travel. Our study highlights the potential of PCR-based targeted metagenomics as an unbiased and sensitive method to screen for the carriage of the mcr-1 gene and suggests that mcr-1 is widespread in various parts of the world. The observation that one participant was found to be positive before travel suggests that mcr-1 may already have disseminated to the microbiomes of Dutch residents at a low prevalence, warranting a more extensive investigation of its prevalence in the general population and possible sources. © The Author 2016. Published by Oxford University Press on behalf of the British Society for Antimicrobial Chemotherapy. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  16. An integrated metagenome and -proteome analysis of the microbial community residing in a biogas production plant.

    PubMed

    Ortseifen, Vera; Stolze, Yvonne; Maus, Irena; Sczyrba, Alexander; Bremges, Andreas; Albaum, Stefan P; Jaenicke, Sebastian; Fracowiak, Jochen; Pühler, Alfred; Schlüter, Andreas

    2016-08-10

    To study the metaproteome of a biogas-producing microbial community, fermentation samples were taken from an agricultural biogas plant for microbial cell and protein extraction and corresponding metagenome analyses. Based on metagenome sequence data, taxonomic community profiling was performed to elucidate the composition of bacterial and archaeal sub-communities. The community's cytosolic metaproteome was represented in a 2D-PAGE approach. Metaproteome databases for protein identification were compiled based on the assembled metagenome sequence dataset for the biogas plant analyzed and non-corresponding biogas metagenomes. Protein identification results revealed that the corresponding biogas protein database facilitated the highest identification rate followed by other biogas-specific databases, whereas common public databases yielded insufficient identification rates. Proteins of the biogas microbiome identified as highly abundant were assigned to the pathways involved in methanogenesis, transport and carbon metabolism. Moreover, the integrated metagenome/-proteome approach enabled the examination of genetic-context information for genes encoding identified proteins by studying neighboring genes on the corresponding contig. Exemplarily, this approach led to the identification of a Methanoculleus sp. contig encoding 16 methanogenesis-related gene products, three of which were also detected as abundant proteins within the community's metaproteome. Thus, metagenome contigs provide additional information on the genetic environment of identified abundant proteins. Copyright © 2016 Elsevier B.V. All rights reserved.

  17. Transposases are the most abundant, most ubiquitous genes in nature.

    PubMed

    Aziz, Ramy K; Breitbart, Mya; Edwards, Robert A

    2010-07-01

    Genes, like organisms, struggle for existence, and the most successful genes persist and widely disseminate in nature. The unbiased determination of the most successful genes requires access to sequence data from a wide range of phylogenetic taxa and ecosystems, which has finally become achievable thanks to the deluge of genomic and metagenomic sequences. Here, we analyzed 10 million protein-encoding genes and gene tags in sequenced bacterial, archaeal, eukaryotic and viral genomes and metagenomes, and our analysis demonstrates that genes encoding transposases are the most prevalent genes in nature. The finding that these genes, classically considered as selfish genes, outnumber essential or housekeeping genes suggests that they offer selective advantage to the genomes and ecosystems they inhabit, a hypothesis in agreement with an emerging body of literature. Their mobile nature not only promotes dissemination of transposable elements within and between genomes but also leads to mutations and rearrangements that can accelerate biological diversification and--consequently--evolution. By securing their own replication and dissemination, transposases guarantee to thrive so long as nucleic acid-based life forms exist.

  18. New Hydrocarbon Degradation Pathways in the Microbial Metagenome from Brazilian Petroleum Reservoirs

    PubMed Central

    Sierra-García, Isabel Natalia; Correa Alvarez, Javier; Pantaroto de Vasconcellos, Suzan; Pereira de Souza, Anete; dos Santos Neto, Eugenio Vaz; de Oliveira, Valéria Maia

    2014-01-01

    Current knowledge of the microbial diversity and metabolic pathways involved in hydrocarbon degradation in petroleum reservoirs is still limited, mostly due to the difficulty in recovering the complex community from such an extreme environment. Metagenomics is a valuable tool to investigate the genetic and functional diversity of previously uncultured microorganisms in natural environments. Using a function-driven metagenomic approach, we investigated the metabolic abilities of microbial communities in oil reservoirs. Here, we describe novel functional metabolic pathways involved in the biodegradation of aromatic compounds in a metagenomic library obtained from an oil reservoir. Although many of the deduced proteins shared homology with known enzymes of different well-described aerobic and anaerobic catabolic pathways, the metagenomic fragments did not contain the complete clusters known to be involved in hydrocarbon degradation. Instead, the metagenomic fragments comprised genes belonging to different pathways, showing novel gene arrangements. These results reinforce the potential of the metagenomic approach for the identification and elucidation of new genes and pathways in poorly studied environments and contribute to a broader perspective on the hydrocarbon degradation processes in petroleum reservoirs. PMID:24587220

  19. SPHINX--an algorithm for taxonomic binning of metagenomic sequences.

    PubMed

    Mohammed, Monzoorul Haque; Ghosh, Tarini Shankar; Singh, Nitin Kumar; Mande, Sharmila S

    2011-01-01

    Compared with composition-based binning algorithms, the binning accuracy and specificity of alignment-based binning algorithms is significantly higher. However, being alignment-based, the latter class of algorithms require enormous amount of time and computing resources for binning huge metagenomic datasets. The motivation was to develop a binning approach that can analyze metagenomic datasets as rapidly as composition-based approaches, but nevertheless has the accuracy and specificity of alignment-based algorithms. This article describes a hybrid binning approach (SPHINX) that achieves high binning efficiency by utilizing the principles of both 'composition'- and 'alignment'-based binning algorithms. Validation results with simulated sequence datasets indicate that SPHINX is able to analyze metagenomic sequences as rapidly as composition-based algorithms. Furthermore, the binning efficiency (in terms of accuracy and specificity of assignments) of SPHINX is observed to be comparable with results obtained using alignment-based algorithms. A web server for the SPHINX algorithm is available at http://metagenomics.atc.tcs.com/SPHINX/.

  20. Microfluidic-based mini-metagenomics enables discovery of novel microbial lineages from complex environmental samples.

    PubMed

    Yu, Feiqiao Brian; Blainey, Paul C; Schulz, Frederik; Woyke, Tanja; Horowitz, Mark A; Quake, Stephen R

    2017-07-05

    Metagenomics and single-cell genomics have enabled genome discovery from unknown branches of life. However, extracting novel genomes from complex mixtures of metagenomic data can still be challenging and represents an ill-posed problem which is generally approached with ad hoc methods. Here we present a microfluidic-based mini-metagenomic method which offers a statistically rigorous approach to extract novel microbial genomes while preserving single-cell resolution. We used this approach to analyze two hot spring samples from Yellowstone National Park and extracted 29 new genomes, including three deeply branching lineages. The single-cell resolution enabled accurate quantification of genome function and abundance, down to 1% in relative abundance. Our analyses of genome level SNP distributions also revealed low to moderate environmental selection. The scale, resolution, and statistical power of microfluidic-based mini-metagenomics make it a powerful tool to dissect the genomic structure of microbial communities while effectively preserving the fundamental unit of biology, the single cell.

  1. Random whole metagenomic sequencing for forensic discrimination of soils.

    PubMed

    Khodakova, Anastasia S; Smith, Renee J; Burgoyne, Leigh; Abarno, Damien; Linacre, Adrian

    2014-01-01

    Here we assess the ability of random whole metagenomic sequencing approaches to discriminate between similar soils from two geographically distinct urban sites for application in forensic science. Repeat samples from two parklands in residential areas separated by approximately 3 km were collected and the DNA was extracted. Shotgun, whole genome amplification (WGA) and single arbitrarily primed DNA amplification (AP-PCR) based sequencing techniques were then used to generate soil metagenomic profiles. Full and subsampled metagenomic datasets were then annotated against M5NR/M5RNA (taxonomic classification) and SEED Subsystems (metabolic classification) databases. Further comparative analyses were performed using a number of statistical tools including: hierarchical agglomerative clustering (CLUSTER); similarity profile analysis (SIMPROF); non-metric multidimensional scaling (NMDS); and canonical analysis of principal coordinates (CAP) at all major levels of taxonomic and metabolic classification. Our data showed that shotgun and WGA-based approaches generated highly similar metagenomic profiles for the soil samples such that the soil samples could not be distinguished accurately. An AP-PCR based approach was shown to be successful at obtaining reproducible site-specific metagenomic DNA profiles, which in turn were employed for successful discrimination of visually similar soil samples collected from two different locations.

  2. Toward Accurate and Quantitative Comparative Metagenomics

    PubMed Central

    Nayfach, Stephen; Pollard, Katherine S.

    2016-01-01

    Shotgun metagenomics and computational analysis are used to compare the taxonomic and functional profiles of microbial communities. Leveraging this approach to understand roles of microbes in human biology and other environments requires quantitative data summaries whose values are comparable across samples and studies. Comparability is currently hampered by the use of abundance statistics that do not estimate a meaningful parameter of the microbial community and biases introduced by experimental protocols and data-cleaning approaches. Addressing these challenges, along with improving study design, data access, metadata standardization, and analysis tools, will enable accurate comparative metagenomics. We envision a future in which microbiome studies are replicable and new metagenomes are easily and rapidly integrated with existing data. Only then can the potential of metagenomics for predictive ecological modeling, well-powered association studies, and effective microbiome medicine be fully realized. PMID:27565341

  3. Toward Accurate and Quantitative Comparative Metagenomics.

    PubMed

    Nayfach, Stephen; Pollard, Katherine S

    2016-08-25

    Shotgun metagenomics and computational analysis are used to compare the taxonomic and functional profiles of microbial communities. Leveraging this approach to understand roles of microbes in human biology and other environments requires quantitative data summaries whose values are comparable across samples and studies. Comparability is currently hampered by the use of abundance statistics that do not estimate a meaningful parameter of the microbial community and biases introduced by experimental protocols and data-cleaning approaches. Addressing these challenges, along with improving study design, data access, metadata standardization, and analysis tools, will enable accurate comparative metagenomics. We envision a future in which microbiome studies are replicable and new metagenomes are easily and rapidly integrated with existing data. Only then can the potential of metagenomics for predictive ecological modeling, well-powered association studies, and effective microbiome medicine be fully realized. Copyright © 2016 Elsevier Inc. All rights reserved.

  4. Resolving the Complexity of Human Skin Metagenomes Using Single-Molecule Sequencing

    PubMed Central

    Tsai, Yu-Chih; Deming, Clayton; Segre, Julia A.; Kong, Heidi H.; Korlach, Jonas

    2016-01-01

    ABSTRACT Deep metagenomic shotgun sequencing has emerged as a powerful tool to interrogate composition and function of complex microbial communities. Computational approaches to assemble genome fragments have been demonstrated to be an effective tool for de novo reconstruction of genomes from these communities. However, the resultant “genomes” are typically fragmented and incomplete due to the limited ability of short-read sequence data to assemble complex or low-coverage regions. Here, we use single-molecule, real-time (SMRT) sequencing to reconstruct a high-quality, closed genome of a previously uncharacterized Corynebacterium simulans and its companion bacteriophage from a skin metagenomic sample. Considerable improvement in assembly quality occurs in hybrid approaches incorporating short-read data, with even relatively small amounts of long-read data being sufficient to improve metagenome reconstruction. Using short-read data to evaluate strain variation of this C. simulans in its skin community at single-nucleotide resolution, we observed a dominant C. simulans strain with moderate allelic heterozygosity throughout the population. We demonstrate the utility of SMRT sequencing and hybrid approaches in metagenome quantitation, reconstruction, and annotation. PMID:26861018

  5. Microfluidic-based mini-metagenomics enables discovery of novel microbial lineages from complex environmental samples

    PubMed Central

    Yu, Feiqiao Brian; Blainey, Paul C; Schulz, Frederik; Woyke, Tanja; Horowitz, Mark A; Quake, Stephen R

    2017-01-01

    Metagenomics and single-cell genomics have enabled genome discovery from unknown branches of life. However, extracting novel genomes from complex mixtures of metagenomic data can still be challenging and represents an ill-posed problem which is generally approached with ad hoc methods. Here we present a microfluidic-based mini-metagenomic method which offers a statistically rigorous approach to extract novel microbial genomes while preserving single-cell resolution. We used this approach to analyze two hot spring samples from Yellowstone National Park and extracted 29 new genomes, including three deeply branching lineages. The single-cell resolution enabled accurate quantification of genome function and abundance, down to 1% in relative abundance. Our analyses of genome level SNP distributions also revealed low to moderate environmental selection. The scale, resolution, and statistical power of microfluidic-based mini-metagenomics make it a powerful tool to dissect the genomic structure of microbial communities while effectively preserving the fundamental unit of biology, the single cell. DOI: http://dx.doi.org/10.7554/eLife.26580.001 PMID:28678007

  6. A user's guide to quantitative and comparative analysis of metagenomic datasets.

    PubMed

    Luo, Chengwei; Rodriguez-R, Luis M; Konstantinidis, Konstantinos T

    2013-01-01

    Metagenomics has revolutionized microbiological studies during the past decade and provided new insights into the diversity, dynamics, and metabolic potential of natural microbial communities. However, metagenomics still represents a field in development, and standardized tools and approaches to handle and compare metagenomes have not been established yet. An important reason accounting for the latter is the continuous changes in the type of sequencing data available, for example, long versus short sequencing reads. Here, we provide a guide to bioinformatic pipelines developed to accomplish the following tasks, focusing primarily on those developed by our team: (i) assemble a metagenomic dataset; (ii) determine the level of sequence coverage obtained and the amount of sequencing required to obtain complete coverage; (iii) identify the taxonomic affiliation of a metagenomic read or assembled contig; and (iv) determine differentially abundant genes, pathways, and species between different datasets. Most of these pipelines do not depend on the type of sequences available or can be easily adjusted to fit different types of sequences, and are freely available (for instance, through our lab Web site: http://www.enve-omics.gatech.edu/). The limitations of current approaches, as well as the computational aspects that can be further improved, will also be briefly discussed. The work presented here provides practical guidelines on how to perform metagenomic analysis of microbial communities characterized by varied levels of diversity and establishes approaches to handle the resulting data, independent of the sequencing platform employed. © 2013 Elsevier Inc. All rights reserved.

  7. Metagenomic Assembly: Overview, Challenges and Applications

    PubMed Central

    Ghurye, Jay S.; Cepeda-Espinoza, Victoria; Pop, Mihai

    2016-01-01

    Advances in sequencing technologies have led to the increased use of high throughput sequencing in characterizing the microbial communities associated with our bodies and our environment. Critical to the analysis of the resulting data are sequence assembly algorithms able to reconstruct genes and organisms from complex mixtures. Metagenomic assembly involves new computational challenges due to the specific characteristics of the metagenomic data. In this survey, we focus on major algorithmic approaches for genome and metagenome assembly, and discuss the new challenges and opportunities afforded by this new field. We also review several applications of metagenome assembly in addressing interesting biological problems. PMID:27698619

  8. Strain/species identification in metagenomes using genome-specific markers

    PubMed Central

    Tu, Qichao; He, Zhili; Zhou, Jizhong

    2014-01-01

    Shotgun metagenome sequencing has become a fast, cheap and high-throughput technology for characterizing microbial communities in complex environments and human body sites. However, accurate identification of microorganisms at the strain/species level remains extremely challenging. We present a novel k-mer-based approach, termed GSMer, that identifies genome-specific markers (GSMs) from currently sequenced microbial genomes, which were then used for strain/species-level identification in metagenomes. Using 5390 sequenced microbial genomes, 8 770 321 50-mer strain-specific and 11 736 360 species-specific GSMs were identified for 4088 strains and 2005 species (4933 strains), respectively. The GSMs were first evaluated against mock community metagenomes, recently sequenced genomes and real metagenomes from different body sites, suggesting that the identified GSMs were specific to their targeting genomes. Sensitivity evaluation against synthetic metagenomes with different coverage suggested that 50 GSMs per strain were sufficient to identify most microbial strains with ≥0.25× coverage, and 10% of selected GSMs in a database should be detected for confident positive callings. Application of GSMs identified 45 and 74 microbial strains/species significantly associated with type 2 diabetes patients and obese/lean individuals from corresponding gastrointestinal tract metagenomes, respectively. Our result agreed with previous studies but provided strain-level information. The approach can be directly applied to identify microbial strains/species from raw metagenomes, without the effort of complex data pre-processing. PMID:24523352

  9. Strain-Level Metagenomic Analysis of the Fermented Dairy Beverage Nunu Highlights Potential Food Safety Risks

    PubMed Central

    Walsh, Aaron M.; Crispie, Fiona; Daari, Kareem; O'Sullivan, Orla; Martin, Jennifer C.; Arthur, Cornelius T.; Claesson, Marcus J.; Scott, Karen P.

    2017-01-01

    ABSTRACT The rapid detection of pathogenic strains in food products is essential for the prevention of disease outbreaks. It has already been demonstrated that whole-metagenome shotgun sequencing can be used to detect pathogens in food but, until recently, strain-level detection of pathogens has relied on whole-metagenome assembly, which is a computationally demanding process. Here we demonstrated that three short-read-alignment-based methods, i.e., MetaMLST, PanPhlAn, and StrainPhlAn, could accurately and rapidly identify pathogenic strains in spinach metagenomes that had been intentionally spiked with Shiga toxin-producing Escherichia coli in a previous study. Subsequently, we employed the methods, in combination with other metagenomics approaches, to assess the safety of nunu, a traditional Ghanaian fermented milk product that is produced by the spontaneous fermentation of raw cow milk. We showed that nunu samples were frequently contaminated with bacteria associated with the bovine gut and, worryingly, we detected putatively pathogenic E. coli and Klebsiella pneumoniae strains in a subset of nunu samples. Ultimately, our work establishes that short-read-alignment-based bioinformatics approaches are suitable food safety tools, and we describe a real-life example of their utilization. IMPORTANCE Foodborne pathogens are responsible for millions of illnesses each year. Here we demonstrate that short-read-alignment-based bioinformatics tools can accurately and rapidly detect pathogenic strains in food products by using shotgun metagenomics data. The methods used here are considerably faster than both traditional culturing methods and alternative bioinformatics approaches that rely on metagenome assembly; therefore, they can potentially be used for more high-throughput food safety testing. Overall, our results suggest that whole-metagenome sequencing can be used as a practical food safety tool to prevent diseases or to link outbreaks to specific food products. PMID:28625983

  10. Practical application of self-organizing maps to interrelate biodiversity and functional data in NGS-based metagenomics.

    PubMed

    Weber, Marc; Teeling, Hanno; Huang, Sixing; Waldmann, Jost; Kassabgy, Mariette; Fuchs, Bernhard M; Klindworth, Anna; Klockow, Christine; Wichels, Antje; Gerdts, Gunnar; Amann, Rudolf; Glöckner, Frank Oliver

    2011-05-01

    Next-generation sequencing (NGS) technologies have enabled the application of broad-scale sequencing in microbial biodiversity and metagenome studies. Biodiversity is usually targeted by classifying 16S ribosomal RNA genes, while metagenomic approaches target metabolic genes. However, both approaches remain isolated, as long as the taxonomic and functional information cannot be interrelated. Techniques like self-organizing maps (SOMs) have been applied to cluster metagenomes into taxon-specific bins in order to link biodiversity with functions, but have not been applied to broad-scale NGS-based metagenomics yet. Here, we provide a novel implementation, demonstrate its potential and practicability, and provide a web-based service for public usage. Evaluation with published data sets mimicking varyingly complex habitats resulted into classification specificities and sensitivities of close to 100% to above 90% from phylum to genus level for assemblies exceeding 8 kb for low and medium complexity data. When applied to five real-world metagenomes of medium complexity from direct pyrosequencing of marine subsurface waters, classifications of assemblies above 2.5 kb were in good agreement with fluorescence in situ hybridizations, indicating that biodiversity was mostly retained within the metagenomes, and confirming high classification specificities. This was validated by two protein-based classifications (PBCs) methods. SOMs were able to retrieve the relevant taxa down to the genus level, while surpassing PBCs in resolution. In order to make the approach accessible to a broad audience, we implemented a feature-rich web-based SOM application named TaxSOM, which is freely available at http://www.megx.net/toolbox/taxsom. TaxSOM can classify reads or assemblies exceeding 2.5 kb with high accuracy and thus assists in linking biodiversity and functions in metagenome studies, which is a precondition to study microbial ecology in a holistic fashion.

  11. i-rDNA: alignment-free algorithm for rapid in silico detection of ribosomal gene fragments from metagenomic sequence data sets.

    PubMed

    Mohammed, Monzoorul Haque; Ghosh, Tarini Shankar; Chadaram, Sudha; Mande, Sharmila S

    2011-11-30

    Obtaining accurate estimates of microbial diversity using rDNA profiling is the first step in most metagenomics projects. Consequently, most metagenomic projects spend considerable amounts of time, money and manpower for experimentally cloning, amplifying and sequencing the rDNA content in a metagenomic sample. In the second step, the entire genomic content of the metagenome is extracted, sequenced and analyzed. Since DNA sequences obtained in this second step also contain rDNA fragments, rapid in silico identification of these rDNA fragments would drastically reduce the cost, time and effort of current metagenomic projects by entirely bypassing the experimental steps of primer based rDNA amplification, cloning and sequencing. In this study, we present an algorithm called i-rDNA that can facilitate the rapid detection of 16S rDNA fragments from amongst millions of sequences in metagenomic data sets with high detection sensitivity. Performance evaluation with data sets/database variants simulating typical metagenomic scenarios indicates the significantly high detection sensitivity of i-rDNA. Moreover, i-rDNA can process a million sequences in less than an hour on a simple desktop with modest hardware specifications. In addition to the speed of execution, high sensitivity and low false positive rate, the utility of the algorithmic approach discussed in this paper is immense given that it would help in bypassing the entire experimental step of primer-based rDNA amplification, cloning and sequencing. Application of this algorithmic approach would thus drastically reduce the cost, time and human efforts invested in all metagenomic projects. A web-server for the i-rDNA algorithm is available at http://metagenomics.atc.tcs.com/i-rDNA/

  12. Environmental surveillance of viruses by tangential flow filtration and metagenomic reconstruction.

    PubMed

    Furtak, Vyacheslav; Roivainen, Merja; Mirochnichenko, Olga; Zagorodnyaya, Tatiana; Laassri, Majid; Zaidi, Sohail Z; Rehman, Lubna; Alam, Muhammad M; Chizhikov, Vladimir; Chumakov, Konstantin

    2016-04-14

    An approach is proposed for environmental surveillance of poliovirus by concentrating sewage samples with tangential flow filtration (TFF) followed by deep sequencing of viral RNA. Subsequent to testing the method with samples from Finland, samples from Pakistan, a country endemic for poliovirus, were investigated. Genomic sequencing was either performed directly, for unbiased identification of viruses regardless of their ability to grow in cell cultures, or after virus enrichment by cell culture or immunoprecipitation. Bioinformatics enabled separation and determination of individual consensus sequences. Overall, deep sequencing of the entire viral population identified polioviruses, non-polio enteroviruses, and other viruses. In Pakistani sewage samples, adeno-associated virus, unable to replicate autonomously in cell cultures, was the most abundant human virus. The presence of recombinants of wild polioviruses of serotype 1 (WPV1) was also inferred, whereby currently circulating WPV1 of south-Asian (SOAS) lineage comprised two sub-lineages depending on their non-capsid region origin. Complete genome analyses additionally identified point mutants and intertypic recombinants between attenuated Sabin strains in the Pakistani samples, and in one Finnish sample. The approach could allow rapid environmental surveillance of viruses causing human infections. It creates a permanent digital repository of the entire virome potentially useful for retrospective screening of future discovered viruses.

  13. VIP: an integrated pipeline for metagenomics of virus identification and discovery

    PubMed Central

    Li, Yang; Wang, Hao; Nie, Kai; Zhang, Chen; Zhang, Yi; Wang, Ji; Niu, Peihua; Ma, Xuejun

    2016-01-01

    Identification and discovery of viruses using next-generation sequencing technology is a fast-developing area with potential wide application in clinical diagnostics, public health monitoring and novel virus discovery. However, tremendous sequence data from NGS study has posed great challenge both in accuracy and velocity for application of NGS study. Here we describe VIP (“Virus Identification Pipeline”), a one-touch computational pipeline for virus identification and discovery from metagenomic NGS data. VIP performs the following steps to achieve its goal: (i) map and filter out background-related reads, (ii) extensive classification of reads on the basis of nucleotide and remote amino acid homology, (iii) multiple k-mer based de novo assembly and phylogenetic analysis to provide evolutionary insight. We validated the feasibility and veracity of this pipeline with sequencing results of various types of clinical samples and public datasets. VIP has also contributed to timely virus diagnosis (~10 min) in acutely ill patients, demonstrating its potential in the performance of unbiased NGS-based clinical studies with demand of short turnaround time. VIP is released under GPLv3 and is available for free download at: https://github.com/keylabivdc/VIP. PMID:27026381

  14. Activity-Based Screening of Metagenomic Libraries for Hydrogenase Enzymes.

    PubMed

    Adam, Nicole; Perner, Mirjam

    2017-01-01

    Here we outline how to identify hydrogenase enzymes from metagenomic libraries through an activity-based screening approach. A metagenomic fosmid library is constructed in E. coli and the fosmids are transferred into a hydrogenase deletion mutant of Shewanella oneidensis (ΔhyaB) via triparental mating. If a fosmid exhibits hydrogen uptake activity, S. oneidensis' phenotype is restored and hydrogenase activity is indicated by a color change of the medium from yellow to colorless. This new method enables screening of 48 metagenomic fosmid clones in parallel.

  15. The metagenomic approach and causality in virology

    PubMed Central

    Castrignano, Silvana Beres; Nagasse-Sugahara, Teresa Keico

    2015-01-01

    Nowadays, the metagenomic approach has been a very important tool in the discovery of new viruses in environmental and biological samples. Here we discuss how these discoveries may help to elucidate the etiology of diseases and the criteria necessary to establish a causal association between a virus and a disease. PMID:25902566

  16. Xander: employing a novel method for efficient gene-targeted metagenomic assembly

    DOE PAGES

    Wang, Qiong; Fish, Jordan A.; Gilman, Mariah; ...

    2015-08-05

    Here, metagenomics can provide important insight into microbial communities. However, assembling metagenomic datasets has proven to be computationally challenging. Current methods often assemble only fragmented partial genes. We present a novel method for targeting assembly of specific protein-coding genes. This method combines a de Bruijn graph, as used in standard assembly approaches, and a protein profile hidden Markov model (HMM) for the gene of interest, as used in standard annotation approaches. These are used to create a novel combined weighted assembly graph. Xander performs both assembly and annotation concomitantly using information incorporated in this graph. We demonstrate the utility ofmore » this approach by assembling contigs for one phylogenetic marker gene and for two functional marker genes, first on Human Microbiome Project (HMP)-defined community Illumina data and then on 21 rhizosphere soil metagenomic datasets from three different crops totaling over 800 Gbp of unassembled data. We compared our method to a recently published bulk metagenome assembly method and a recently published gene-targeted assembler and found our method produced more, longer, and higher quality gene sequences. In conclusion, xander combines gene assignment with the rapid assembly of full-length or near full-length functional genes from metagenomic data without requiring bulk assembly or post-processing to find genes of interest. HMMs used for assembly can be tailored to the targeted genes, allowing flexibility to improve annotation over generic annotation pipelines.« less

  17. An application of statistics to comparative metagenomics

    PubMed Central

    Rodriguez-Brito, Beltran; Rohwer, Forest; Edwards, Robert A

    2006-01-01

    Background Metagenomics, sequence analyses of genomic DNA isolated directly from the environments, can be used to identify organisms and model community dynamics of a particular ecosystem. Metagenomics also has the potential to identify significantly different metabolic potential in different environments. Results Here we use a statistical method to compare curated subsystems, to predict the physiology, metabolism, and ecology from metagenomes. This approach can be used to identify those subsystems that are significantly different between metagenome sequences. Subsystems that were overrepresented in the Sargasso Sea and Acid Mine Drainage metagenome when compared to non-redundant databases were identified. Conclusion The methodology described herein applies statistics to the comparisons of metabolic potential in metagenomes. This analysis reveals those subsystems that are more, or less, represented in the different environments that are compared. These differences in metabolic potential lead to several testable hypotheses about physiology and metabolism of microbes from these ecosystems. PMID:16549025

  18. An application of statistics to comparative metagenomics.

    PubMed

    Rodriguez-Brito, Beltran; Rohwer, Forest; Edwards, Robert A

    2006-03-20

    Metagenomics, sequence analyses of genomic DNA isolated directly from the environments, can be used to identify organisms and model community dynamics of a particular ecosystem. Metagenomics also has the potential to identify significantly different metabolic potential in different environments. Here we use a statistical method to compare curated subsystems, to predict the physiology, metabolism, and ecology from metagenomes. This approach can be used to identify those subsystems that are significantly different between metagenome sequences. Subsystems that were overrepresented in the Sargasso Sea and Acid Mine Drainage metagenome when compared to non-redundant databases were identified. The methodology described herein applies statistics to the comparisons of metabolic potential in metagenomes. This analysis reveals those subsystems that are more, or less, represented in the different environments that are compared. These differences in metabolic potential lead to several testable hypotheses about physiology and metabolism of microbes from these ecosystems.

  19. Computational approaches to predict bacteriophage–host relationships

    PubMed Central

    Edwards, Robert A.; McNair, Katelyn; Faust, Karoline; Raes, Jeroen; Dutilh, Bas E.

    2015-01-01

    Metagenomics has changed the face of virus discovery by enabling the accurate identification of viral genome sequences without requiring isolation of the viruses. As a result, metagenomic virus discovery leaves the first and most fundamental question about any novel virus unanswered: What host does the virus infect? The diversity of the global virosphere and the volumes of data obtained in metagenomic sequencing projects demand computational tools for virus–host prediction. We focus on bacteriophages (phages, viruses that infect bacteria), the most abundant and diverse group of viruses found in environmental metagenomes. By analyzing 820 phages with annotated hosts, we review and assess the predictive power of in silico phage–host signals. Sequence homology approaches are the most effective at identifying known phage–host pairs. Compositional and abundance-based methods contain significant signal for phage–host classification, providing opportunities for analyzing the unknowns in viral metagenomes. Together, these computational approaches further our knowledge of the interactions between phages and their hosts. Importantly, we find that all reviewed signals significantly link phages to their hosts, illustrating how current knowledge and insights about the interaction mechanisms and ecology of coevolving phages and bacteria can be exploited to predict phage–host relationships, with potential relevance for medical and industrial applications. PMID:26657537

  20. Bambus 2: scaffolding metagenomes.

    PubMed

    Koren, Sergey; Treangen, Todd J; Pop, Mihai

    2011-11-01

    Sequencing projects increasingly target samples from non-clonal sources. In particular, metagenomics has enabled scientists to begin to characterize the structure of microbial communities. The software tools developed for assembling and analyzing sequencing data for clonal organisms are, however, unable to adequately process data derived from non-clonal sources. We present a new scaffolder, Bambus 2, to address some of the challenges encountered when analyzing metagenomes. Our approach relies on a combination of a novel method for detecting genomic repeats and algorithms that analyze assembly graphs to identify biologically meaningful genomic variants. We compare our software to current assemblers using simulated and real data. We demonstrate that the repeat detection algorithms have higher sensitivity than current approaches without sacrificing specificity. In metagenomic datasets, the scaffolder avoids false joins between distantly related organisms while obtaining long-range contiguity. Bambus 2 represents a first step toward automated metagenomic assembly. Bambus 2 is open source and available from http://amos.sf.net. mpop@umiacs.umd.edu. Supplementary data are available at Bioinformatics online.

  1. Bambus 2: scaffolding metagenomes

    PubMed Central

    Koren, Sergey; Treangen, Todd J.; Pop, Mihai

    2011-01-01

    Motivation: Sequencing projects increasingly target samples from non-clonal sources. In particular, metagenomics has enabled scientists to begin to characterize the structure of microbial communities. The software tools developed for assembling and analyzing sequencing data for clonal organisms are, however, unable to adequately process data derived from non-clonal sources. Results: We present a new scaffolder, Bambus 2, to address some of the challenges encountered when analyzing metagenomes. Our approach relies on a combination of a novel method for detecting genomic repeats and algorithms that analyze assembly graphs to identify biologically meaningful genomic variants. We compare our software to current assemblers using simulated and real data. We demonstrate that the repeat detection algorithms have higher sensitivity than current approaches without sacrificing specificity. In metagenomic datasets, the scaffolder avoids false joins between distantly related organisms while obtaining long-range contiguity. Bambus 2 represents a first step toward automated metagenomic assembly. Availability: Bambus 2 is open source and available from http://amos.sf.net. Contact: mpop@umiacs.umd.edu Supplementary Information: Supplementary data are available at Bioinformatics online. PMID:21926123

  2. Survey of (Meta)genomic Approaches for Understanding Microbial Community Dynamics.

    PubMed

    Sharma, Anukriti; Lal, Rup

    2017-03-01

    Advancement in the next generation sequencing technologies has led to evolution of the field of genomics and metagenomics in a slim duration with nominal cost at precipitous higher rate. While metagenomics and genomics can be separately used to reveal the culture-independent and culture-based microbial evolution, respectively, (meta)genomics together can be used to demonstrate results at population level revealing in-depth complex community interactions for specific ecotypes. The field of metagenomics which started with answering "who is out there?" based on 16S rRNA gene has evolved immensely with the precise organismal reconstruction at species/strain level from the deeply covered metagenome data outweighing the need to isolate bacteria of which 99% are de facto non-cultivable. In this review we have underlined the appeal of metagenomic-derived genomes in providing insights into the evolutionary patterns, growth dynamics, genome/gene-specific sweeps, and durability of environmental pressures. We have demonstrated the use of culture-based genomics and environmental shotgun metagenome data together to elucidate environment specific genome modulations via metagenomic recruitments in terms of gene loss/gain, accessory and core-genome extent. We further illustrated the benefit of (meta)genomics in the understanding of infectious diseases by deducing the relationship between human microbiota and clinical microbiology. This review summarizes the technological advances in the (meta)genomic strategies using the genome and metagenome datasets together to increase the resolution of microbial population studies.

  3. Metagenomic assembly through the lens of validation: recent advances in assessing and improving the quality of genomes assembled from metagenomes.

    PubMed

    Olson, Nathan D; Treangen, Todd J; Hill, Christopher M; Cepeda-Espinoza, Victoria; Ghurye, Jay; Koren, Sergey; Pop, Mihai

    2017-08-07

    Metagenomic samples are snapshots of complex ecosystems at work. They comprise hundreds of known and unknown species, contain multiple strain variants and vary greatly within and across environments. Many microbes found in microbial communities are not easily grown in culture making their DNA sequence our only clue into their evolutionary history and biological function. Metagenomic assembly is a computational process aimed at reconstructing genes and genomes from metagenomic mixtures. Current methods have made significant strides in reconstructing DNA segments comprising operons, tandem gene arrays and syntenic blocks. Shorter, higher-throughput sequencing technologies have become the de facto standard in the field. Sequencers are now able to generate billions of short reads in only a few days. Multiple metagenomic assembly strategies, pipelines and assemblers have appeared in recent years. Owing to the inherent complexity of metagenome assembly, regardless of the assembly algorithm and sequencing method, metagenome assemblies contain errors. Recent developments in assembly validation tools have played a pivotal role in improving metagenomics assemblers. Here, we survey recent progress in the field of metagenomic assembly, provide an overview of key approaches for genomic and metagenomic assembly validation and demonstrate the insights that can be derived from assemblies through the use of assembly validation strategies. We also discuss the potential for impact of long-read technologies in metagenomics. We conclude with a discussion of future challenges and opportunities in the field of metagenomic assembly and validation. © The Author 2017. Published by Oxford University Press.

  4. Effect of Changing Treatment Disinfectants on the Microbiology of Distributed Water and Pipe Biofilm Communities using Conventional and Metagenomic Approaches

    EPA Science Inventory

    The purpose of this research was to add to our knowledge of chlorine and monochloramine disinfectants, with regards to effects on the microbial communities in distribution systems. A whole metagenome-based approach using sophisticated molecular tools (e.g., next generation sequen...

  5. Strain-Level Metagenomic Analysis of the Fermented Dairy Beverage Nunu Highlights Potential Food Safety Risks.

    PubMed

    Walsh, Aaron M; Crispie, Fiona; Daari, Kareem; O'Sullivan, Orla; Martin, Jennifer C; Arthur, Cornelius T; Claesson, Marcus J; Scott, Karen P; Cotter, Paul D

    2017-08-15

    The rapid detection of pathogenic strains in food products is essential for the prevention of disease outbreaks. It has already been demonstrated that whole-metagenome shotgun sequencing can be used to detect pathogens in food but, until recently, strain-level detection of pathogens has relied on whole-metagenome assembly, which is a computationally demanding process. Here we demonstrated that three short-read-alignment-based methods, i.e., MetaMLST, PanPhlAn, and StrainPhlAn, could accurately and rapidly identify pathogenic strains in spinach metagenomes that had been intentionally spiked with Shiga toxin-producing Escherichia coli in a previous study. Subsequently, we employed the methods, in combination with other metagenomics approaches, to assess the safety of nunu, a traditional Ghanaian fermented milk product that is produced by the spontaneous fermentation of raw cow milk. We showed that nunu samples were frequently contaminated with bacteria associated with the bovine gut and, worryingly, we detected putatively pathogenic E. coli and Klebsiella pneumoniae strains in a subset of nunu samples. Ultimately, our work establishes that short-read-alignment-based bioinformatics approaches are suitable food safety tools, and we describe a real-life example of their utilization. IMPORTANCE Foodborne pathogens are responsible for millions of illnesses each year. Here we demonstrate that short-read-alignment-based bioinformatics tools can accurately and rapidly detect pathogenic strains in food products by using shotgun metagenomics data. The methods used here are considerably faster than both traditional culturing methods and alternative bioinformatics approaches that rely on metagenome assembly; therefore, they can potentially be used for more high-throughput food safety testing. Overall, our results suggest that whole-metagenome sequencing can be used as a practical food safety tool to prevent diseases or to link outbreaks to specific food products. Copyright © 2017 American Society for Microbiology.

  6. Practical application of self-organizing maps to interrelate biodiversity and functional data in NGS-based metagenomics

    PubMed Central

    Weber, Marc; Teeling, Hanno; Huang, Sixing; Waldmann, Jost; Kassabgy, Mariette; Fuchs, Bernhard M; Klindworth, Anna; Klockow, Christine; Wichels, Antje; Gerdts, Gunnar; Amann, Rudolf; Glöckner, Frank Oliver

    2011-01-01

    Next-generation sequencing (NGS) technologies have enabled the application of broad-scale sequencing in microbial biodiversity and metagenome studies. Biodiversity is usually targeted by classifying 16S ribosomal RNA genes, while metagenomic approaches target metabolic genes. However, both approaches remain isolated, as long as the taxonomic and functional information cannot be interrelated. Techniques like self-organizing maps (SOMs) have been applied to cluster metagenomes into taxon-specific bins in order to link biodiversity with functions, but have not been applied to broad-scale NGS-based metagenomics yet. Here, we provide a novel implementation, demonstrate its potential and practicability, and provide a web-based service for public usage. Evaluation with published data sets mimicking varyingly complex habitats resulted into classification specificities and sensitivities of close to 100% to above 90% from phylum to genus level for assemblies exceeding 8 kb for low and medium complexity data. When applied to five real-world metagenomes of medium complexity from direct pyrosequencing of marine subsurface waters, classifications of assemblies above 2.5 kb were in good agreement with fluorescence in situ hybridizations, indicating that biodiversity was mostly retained within the metagenomes, and confirming high classification specificities. This was validated by two protein-based classifications (PBCs) methods. SOMs were able to retrieve the relevant taxa down to the genus level, while surpassing PBCs in resolution. In order to make the approach accessible to a broad audience, we implemented a feature-rich web-based SOM application named TaxSOM, which is freely available at http://www.megx.net/toolbox/taxsom. TaxSOM can classify reads or assemblies exceeding 2.5 kb with high accuracy and thus assists in linking biodiversity and functions in metagenome studies, which is a precondition to study microbial ecology in a holistic fashion. PMID:21160538

  7. Captured metagenomics: large-scale targeting of genes based on ‘sequence capture’ reveals functional diversity in soils

    PubMed Central

    Manoharan, Lokeshwaran; Kushwaha, Sandeep K.; Hedlund, Katarina; Ahrén, Dag

    2015-01-01

    Microbial enzyme diversity is a key to understand many ecosystem processes. Whole metagenome sequencing (WMG) obtains information on functional genes, but it is costly and inefficient due to large amount of sequencing that is required. In this study, we have applied a captured metagenomics technique for functional genes in soil microorganisms, as an alternative to WMG. Large-scale targeting of functional genes, coding for enzymes related to organic matter degradation, was applied to two agricultural soil communities through captured metagenomics. Captured metagenomics uses custom-designed, hybridization-based oligonucleotide probes that enrich functional genes of interest in metagenomic libraries where only probe-bound DNA fragments are sequenced. The captured metagenomes were highly enriched with targeted genes while maintaining their target diversity and their taxonomic distribution correlated well with the traditional ribosomal sequencing. The captured metagenomes were highly enriched with genes related to organic matter degradation; at least five times more than similar, publicly available soil WMG projects. This target enrichment technique also preserves the functional representation of the soils, thereby facilitating comparative metagenomics projects. Here, we present the first study that applies the captured metagenomics approach in large scale, and this novel method allows deep investigations of central ecosystem processes by studying functional gene abundances. PMID:26490729

  8. High definition for systems biology of microbial communities: metagenomics gets genome-centric and strain-resolved.

    PubMed

    Turaev, Dmitrij; Rattei, Thomas

    2016-06-01

    The systems biology of microbial communities, organismal communities inhabiting all ecological niches on earth, has in recent years been strongly facilitated by the rapid development of experimental, sequencing and data analysis methods. Novel experimental approaches and binning methods in metagenomics render the semi-automatic reconstructions of near-complete genomes of uncultivable bacteria possible, while advances in high-resolution amplicon analysis allow for efficient and less biased taxonomic community characterization. This will also facilitate predictive modeling approaches, hitherto limited by the low resolution of metagenomic data. In this review, we pinpoint the most promising current developments in metagenomics. They facilitate microbial systems biology towards a systemic understanding of mechanisms in microbial communities with scopes of application in many areas of our daily life. Copyright © 2016 Elsevier Ltd. All rights reserved.

  9. Beyond Biodiversity: Fish Metagenomes

    PubMed Central

    Ardura, Alba; Planes, Serge; Garcia-Vazquez, Eva

    2011-01-01

    Biodiversity and intra-specific genetic diversity are interrelated and determine the potential of a community to survive and evolve. Both are considered together in Prokaryote communities treated as metagenomes or ensembles of functional variants beyond species limits. Many factors alter biodiversity in higher Eukaryote communities, and human exploitation can be one of the most important for some groups of plants and animals. For example, fisheries can modify both biodiversity and genetic diversity (intra specific). Intra-specific diversity can be drastically altered by overfishing. Intense fishing pressure on one stock may imply extinction of some genetic variants and subsequent loss of intra-specific diversity. The objective of this study was to apply a metagenome approach to fish communities and explore its value for rapid evaluation of biodiversity and genetic diversity at community level. Here we have applied the metagenome approach employing the Barcoding target gene COI as a model sequence in catch from four very different fish assemblages exploited by fisheries: freshwater communities from the Amazon River and northern Spanish rivers, and marine communities from the Cantabric and Mediterranean seas. Treating all sequences obtained from each regional catch as a biological unit (exploited community) we found that metagenomic diversity indices of the Amazonian catch sample here examined were lower than expected. Reduced diversity could be explained, at least partially, by overexploitation of the fish community that had been independently estimated by other methods. We propose using a metagenome approach for estimating diversity in Eukaryote communities and early evaluating genetic variation losses at multi-species level. PMID:21829636

  10. Beyond biodiversity: fish metagenomes.

    PubMed

    Ardura, Alba; Planes, Serge; Garcia-Vazquez, Eva

    2011-01-01

    Biodiversity and intra-specific genetic diversity are interrelated and determine the potential of a community to survive and evolve. Both are considered together in Prokaryote communities treated as metagenomes or ensembles of functional variants beyond species limits.Many factors alter biodiversity in higher Eukaryote communities, and human exploitation can be one of the most important for some groups of plants and animals. For example, fisheries can modify both biodiversity and genetic diversity (intra specific). Intra-specific diversity can be drastically altered by overfishing. Intense fishing pressure on one stock may imply extinction of some genetic variants and subsequent loss of intra-specific diversity. The objective of this study was to apply a metagenome approach to fish communities and explore its value for rapid evaluation of biodiversity and genetic diversity at community level. Here we have applied the metagenome approach employing the barcoding target gene coi as a model sequence in catch from four very different fish assemblages exploited by fisheries: freshwater communities from the Amazon River and northern Spanish rivers, and marine communities from the Cantabric and Mediterranean seas.Treating all sequences obtained from each regional catch as a biological unit (exploited community) we found that metagenomic diversity indices of the Amazonian catch sample here examined were lower than expected. Reduced diversity could be explained, at least partially, by overexploitation of the fish community that had been independently estimated by other methods.We propose using a metagenome approach for estimating diversity in Eukaryote communities and early evaluating genetic variation losses at multi-species level.

  11. Xander: employing a novel method for efficient gene-targeted metagenomic assembly.

    PubMed

    Wang, Qiong; Fish, Jordan A; Gilman, Mariah; Sun, Yanni; Brown, C Titus; Tiedje, James M; Cole, James R

    2015-01-01

    Metagenomics can provide important insight into microbial communities. However, assembling metagenomic datasets has proven to be computationally challenging. Current methods often assemble only fragmented partial genes. We present a novel method for targeting assembly of specific protein-coding genes. This method combines a de Bruijn graph, as used in standard assembly approaches, and a protein profile hidden Markov model (HMM) for the gene of interest, as used in standard annotation approaches. These are used to create a novel combined weighted assembly graph. Xander performs both assembly and annotation concomitantly using information incorporated in this graph. We demonstrate the utility of this approach by assembling contigs for one phylogenetic marker gene and for two functional marker genes, first on Human Microbiome Project (HMP)-defined community Illumina data and then on 21 rhizosphere soil metagenomic datasets from three different crops totaling over 800 Gbp of unassembled data. We compared our method to a recently published bulk metagenome assembly method and a recently published gene-targeted assembler and found our method produced more, longer, and higher quality gene sequences. Xander combines gene assignment with the rapid assembly of full-length or near full-length functional genes from metagenomic data without requiring bulk assembly or post-processing to find genes of interest. HMMs used for assembly can be tailored to the targeted genes, allowing flexibility to improve annotation over generic annotation pipelines. This method is implemented as open source software and is available at https://github.com/rdpstaff/Xander_assembler.

  12. Methods for comparative metagenomics

    PubMed Central

    Huson, Daniel H; Richter, Daniel C; Mitra, Suparna; Auch, Alexander F; Schuster, Stephan C

    2009-01-01

    Background Metagenomics is a rapidly growing field of research that aims at studying uncultured organisms to understand the true diversity of microbes, their functions, cooperation and evolution, in environments such as soil, water, ancient remains of animals, or the digestive system of animals and humans. The recent development of ultra-high throughput sequencing technologies, which do not require cloning or PCR amplification, and can produce huge numbers of DNA reads at an affordable cost, has boosted the number and scope of metagenomic sequencing projects. Increasingly, there is a need for new ways of comparing multiple metagenomics datasets, and for fast and user-friendly implementations of such approaches. Results This paper introduces a number of new methods for interactively exploring, analyzing and comparing multiple metagenomic datasets, which will be made freely available in a new, comparative version 2.0 of the stand-alone metagenome analysis tool MEGAN. Conclusion There is a great need for powerful and user-friendly tools for comparative analysis of metagenomic data and MEGAN 2.0 will help to fill this gap. PMID:19208111

  13. Soil Bacterial Community Shifts after Chitin Enrichment: An Integrative Metagenomic Approach

    PubMed Central

    Jacquiod, Samuel; Franqueville, Laure; Cécillon, Sébastien; M. Vogel, Timothy; Simonet, Pascal

    2013-01-01

    Chitin is the second most produced biopolymer on Earth after cellulose. Chitin degrading enzymes are promising but untapped sources for developing novel industrial biocatalysts. Hidden amongst uncultivated micro-organisms, new bacterial enzymes can be discovered and exploited by metagenomic approaches through extensive cloning and screening. Enrichment is also a well-known strategy, as it allows selection of organisms adapted to feed on a specific compound. In this study, we investigated how the soil bacterial community responded to chitin enrichment in a microcosm experiment. An integrative metagenomic approach coupling phylochips and high throughput shotgun pyrosequencing was established in order to assess the taxonomical and functional changes in the soil bacterial community. Results indicate that chitin enrichment leads to an increase of Actinobacteria, γ-proteobacteria and β-proteobacteria suggesting specific selection of chitin degrading bacteria belonging to these classes. Part of enriched bacterial genera were not yet reported to be involved in chitin degradation, like the members from the Micrococcineae sub-order (Actinobacteria). An increase of the observed bacterial diversity was noticed, with detection of specific genera only in chitin treated conditions. The relative proportion of metagenomic sequences related to chitin degradation was significantly increased, even if it represents only a tiny fraction of the sequence diversity found in a soil metagenome. PMID:24278158

  14. Phylogeny-guided (meta)genome mining approach for the targeted discovery of new microbial natural products.

    PubMed

    Kang, Hahk-Soo

    2017-02-01

    Genomics-based methods are now commonplace in natural products research. A phylogeny-guided mining approach provides a means to quickly screen a large number of microbial genomes or metagenomes in search of new biosynthetic gene clusters of interest. In this approach, biosynthetic genes serve as molecular markers, and phylogenetic trees built with known and unknown marker gene sequences are used to quickly prioritize biosynthetic gene clusters for their metabolites characterization. An increase in the use of this approach has been observed for the last couple of years along with the emergence of low cost sequencing technologies. The aim of this review is to discuss the basic concept of a phylogeny-guided mining approach, and also to provide examples in which this approach was successfully applied to discover new natural products from microbial genomes and metagenomes. I believe that the phylogeny-guided mining approach will continue to play an important role in genomics-based natural products research.

  15. Metagenomics and Bioinformatics in Microbial Ecology: Current Status and Beyond.

    PubMed

    Hiraoka, Satoshi; Yang, Ching-Chia; Iwasaki, Wataru

    2016-09-29

    Metagenomic approaches are now commonly used in microbial ecology to study microbial communities in more detail, including many strains that cannot be cultivated in the laboratory. Bioinformatic analyses make it possible to mine huge metagenomic datasets and discover general patterns that govern microbial ecosystems. However, the findings of typical metagenomic and bioinformatic analyses still do not completely describe the ecology and evolution of microbes in their environments. Most analyses still depend on straightforward sequence similarity searches against reference databases. We herein review the current state of metagenomics and bioinformatics in microbial ecology and discuss future directions for the field. New techniques will allow us to go beyond routine analyses and broaden our knowledge of microbial ecosystems. We need to enrich reference databases, promote platforms that enable meta- or comprehensive analyses of diverse metagenomic datasets, devise methods that utilize long-read sequence information, and develop more powerful bioinformatic methods to analyze data from diverse perspectives.

  16. Reconfigurable generation and measurement of mutually unbiased bases for time-bin qudits

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lukens, Joseph M.; Islam, Nurul T.; Lim, Charles Ci Wen

    Here, we propose a method for implementing mutually unbiased generation and measurement of time-bin qudits using a cascade of electro-optic phase modulator–coded fiber Bragg grating pairs. Our approach requires only a single spatial mode and can switch rapidly between basis choices. We obtain explicit solutions for dimensions d = 2, 3, and 4 that realize all d + 1 possible mutually unbiased bases and analyze the performance of our approach in quantum key distribution. Given its practicality and compatibility with current technology, our approach provides a promising springboard for scalable processing of high-dimensional time-bin states.

  17. Reconfigurable generation and measurement of mutually unbiased bases for time-bin qudits

    DOE PAGES

    Lukens, Joseph M.; Islam, Nurul T.; Lim, Charles Ci Wen; ...

    2018-03-12

    Here, we propose a method for implementing mutually unbiased generation and measurement of time-bin qudits using a cascade of electro-optic phase modulator–coded fiber Bragg grating pairs. Our approach requires only a single spatial mode and can switch rapidly between basis choices. We obtain explicit solutions for dimensions d = 2, 3, and 4 that realize all d + 1 possible mutually unbiased bases and analyze the performance of our approach in quantum key distribution. Given its practicality and compatibility with current technology, our approach provides a promising springboard for scalable processing of high-dimensional time-bin states.

  18. Reconfigurable generation and measurement of mutually unbiased bases for time-bin qudits

    NASA Astrophysics Data System (ADS)

    Lukens, Joseph M.; Islam, Nurul T.; Lim, Charles Ci Wen; Gauthier, Daniel J.

    2018-03-01

    We propose a method for implementing mutually unbiased generation and measurement of time-bin qudits using a cascade of electro-optic phase modulator-coded fiber Bragg grating pairs. Our approach requires only a single spatial mode and can switch rapidly between basis choices. We obtain explicit solutions for dimensions d = 2, 3, and 4 that realize all d + 1 possible mutually unbiased bases and analyze the performance of our approach in quantum key distribution. Given its practicality and compatibility with current technology, our approach provides a promising springboard for scalable processing of high-dimensional time-bin states.

  19. Novel Analysis of Oceanic Surface Water Metagenomes Suggests Importance of Polyphosphate Metabolism in Oligotrophic Environments

    PubMed Central

    Temperton, Ben; Gilbert, Jack A.; Quinn, John P.; McGrath, John W.

    2011-01-01

    Polyphosphate is a ubiquitous linear homopolymer of phosphate residues linked by high-energy bonds similar to those found in ATP. It has been associated with many processes including pathogenicity, DNA uptake and multiple stress responses across all domains. Bacteria have also been shown to use polyphosphate as a way to store phosphate when transferred from phosphate-limited to phosphate-rich media – a process exploited in wastewater treatment and other environmental contaminant remediation. Despite this, there has, to date, been little research into the role of polyphosphate in the survival of marine bacterioplankton in oligotrophic environments. The three main proteins involved in polyphosphate metabolism, Ppk1, Ppk2 and Ppx are multi-domain and have differential inter-domain and inter-gene conservation, making unbiased analysis of relative abundance in metagenomic datasets difficult. This paper describes the development of a novel Isofunctional Homolog Annotation Tool (IHAT) to detect homologs of genes with a broad range of conservation without bias of traditional expect-value cutoffs. IHAT analysis of the Global Ocean Sampling (GOS) dataset revealed that genes associated with polyphosphate metabolism are more abundant in environments where available phosphate is limited, suggesting an important role for polyphosphate metabolism in marine oligotrophs. PMID:21305044

  20. Genome signature analysis of thermal virus metagenomes reveals Archaea and thermophilic signatures

    PubMed Central

    Pride, David T; Schoenfeld, Thomas

    2008-01-01

    Background Metagenomic analysis provides a rich source of biological information for otherwise intractable viral communities. However, study of viral metagenomes has been hampered by its nearly complete reliance on BLAST algorithms for identification of DNA sequences. We sought to develop algorithms for examination of viral metagenomes to identify the origin of sequences independent of BLAST algorithms. We chose viral metagenomes obtained from two hot springs, Bear Paw and Octopus, in Yellowstone National Park, as they represent simple microbial populations where comparatively large contigs were obtained. Thermal spring metagenomes have high proportions of sequences without significant Genbank homology, which has hampered identification of viruses and their linkage with hosts. To analyze each metagenome, we developed a method to classify DNA fragments using genome signature-based phylogenetic classification (GSPC), where metagenomic fragments are compared to a database of oligonucleotide signatures for all previously sequenced Bacteria, Archaea, and viruses. Results From both Bear Paw and Octopus hot springs, each assembled contig had more similarity to other metagenome contigs than to any sequenced microbial genome based on GSPC analysis, suggesting a genome signature common to each of these extreme environments. While viral metagenomes from Bear Paw and Octopus share some similarity, the genome signatures from each locale are largely unique. GSPC using a microbial database predicts most of the Octopus metagenome has archaeal signatures, while bacterial signatures predominate in Bear Paw; a finding consistent with those of Genbank BLAST. When using a viral database, the majority of the Octopus metagenome is predicted to belong to archaeal virus Families Globuloviridae and Fuselloviridae, while none of the Bear Paw metagenome is predicted to belong to archaeal viruses. As expected, when microbial and viral databases are combined, each of the Octopus and Bear Paw metagenomic contigs are predicted to belong to viruses rather than to any Bacteria or Archaea, consistent with the apparent viral origin of both metagenomes. Conclusion That BLAST searches identify no significant homologs for most metagenome contigs, while GSPC suggests their origin as archaeal viruses or bacteriophages, indicates GSPC provides a complementary approach in viral metagenomic analysis. PMID:18798991

  1. Genome signature analysis of thermal virus metagenomes reveals Archaea and thermophilic signatures.

    PubMed

    Pride, David T; Schoenfeld, Thomas

    2008-09-17

    Metagenomic analysis provides a rich source of biological information for otherwise intractable viral communities. However, study of viral metagenomes has been hampered by its nearly complete reliance on BLAST algorithms for identification of DNA sequences. We sought to develop algorithms for examination of viral metagenomes to identify the origin of sequences independent of BLAST algorithms. We chose viral metagenomes obtained from two hot springs, Bear Paw and Octopus, in Yellowstone National Park, as they represent simple microbial populations where comparatively large contigs were obtained. Thermal spring metagenomes have high proportions of sequences without significant Genbank homology, which has hampered identification of viruses and their linkage with hosts. To analyze each metagenome, we developed a method to classify DNA fragments using genome signature-based phylogenetic classification (GSPC), where metagenomic fragments are compared to a database of oligonucleotide signatures for all previously sequenced Bacteria, Archaea, and viruses. From both Bear Paw and Octopus hot springs, each assembled contig had more similarity to other metagenome contigs than to any sequenced microbial genome based on GSPC analysis, suggesting a genome signature common to each of these extreme environments. While viral metagenomes from Bear Paw and Octopus share some similarity, the genome signatures from each locale are largely unique. GSPC using a microbial database predicts most of the Octopus metagenome has archaeal signatures, while bacterial signatures predominate in Bear Paw; a finding consistent with those of Genbank BLAST. When using a viral database, the majority of the Octopus metagenome is predicted to belong to archaeal virus Families Globuloviridae and Fuselloviridae, while none of the Bear Paw metagenome is predicted to belong to archaeal viruses. As expected, when microbial and viral databases are combined, each of the Octopus and Bear Paw metagenomic contigs are predicted to belong to viruses rather than to any Bacteria or Archaea, consistent with the apparent viral origin of both metagenomes. That BLAST searches identify no significant homologs for most metagenome contigs, while GSPC suggests their origin as archaeal viruses or bacteriophages, indicates GSPC provides a complementary approach in viral metagenomic analysis.

  2. Bioinformatics tools for quantitative and functional metagenome and metatranscriptome data analysis in microbes.

    PubMed

    Niu, Sheng-Yong; Yang, Jinyu; McDermaid, Adam; Zhao, Jing; Kang, Yu; Ma, Qin

    2017-05-08

    Metagenomic and metatranscriptomic sequencing approaches are more frequently being used to link microbiota to important diseases and ecological changes. Many analyses have been used to compare the taxonomic and functional profiles of microbiota across habitats or individuals. While a large portion of metagenomic analyses focus on species-level profiling, some studies use strain-level metagenomic analyses to investigate the relationship between specific strains and certain circumstances. Metatranscriptomic analysis provides another important insight into activities of genes by examining gene expression levels of microbiota. Hence, combining metagenomic and metatranscriptomic analyses will help understand the activity or enrichment of a given gene set, such as drug-resistant genes among microbiome samples. Here, we summarize existing bioinformatics tools of metagenomic and metatranscriptomic data analysis, the purpose of which is to assist researchers in deciding the appropriate tools for their microbiome studies. Additionally, we propose an Integrated Meta-Function mapping pipeline to incorporate various reference databases and accelerate functional gene mapping procedures for both metagenomic and metatranscriptomic analyses. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  3. Screening Currency Notes for Microbial Pathogens and Antibiotic Resistance Genes Using a Shotgun Metagenomic Approach

    PubMed Central

    Jalali, Saakshi; Kohli, Samantha; Latka, Chitra; Bhatia, Sugandha; Vellarikal, Shamsudheen Karuthedath; Sivasubbu, Sridhar; Scaria, Vinod; Ramachandran, Srinivasan

    2015-01-01

    Fomites are a well-known source of microbial infections and previous studies have provided insights into the sojourning microbiome of fomites from various sources. Paper currency notes are one of the most commonly exchanged objects and its potential to transmit pathogenic organisms has been well recognized. Approaches to identify the microbiome associated with paper currency notes have been largely limited to culture dependent approaches. Subsequent studies portrayed the use of 16S ribosomal RNA based approaches which provided insights into the taxonomical distribution of the microbiome. However, recent techniques including shotgun sequencing provides resolution at gene level and enable estimation of their copy numbers in the metagenome. We investigated the microbiome of Indian paper currency notes using a shotgun metagenome sequencing approach. Metagenomic DNA isolated from samples of frequently circulated denominations of Indian currency notes were sequenced using Illumina Hiseq sequencer. Analysis of the data revealed presence of species belonging to both eukaryotic and prokaryotic genera. The taxonomic distribution at kingdom level revealed contigs mapping to eukaryota (70%), bacteria (9%), viruses and archae (~1%). We identified 78 pathogens including Staphylococcus aureus, Corynebacterium glutamicum, Enterococcus faecalis, and 75 cellulose degrading organisms including Acidothermus cellulolyticus, Cellulomonas flavigena and Ruminococcus albus. Additionally, 78 antibiotic resistance genes were identified and 18 of these were found in all the samples. Furthermore, six out of 78 pathogens harbored at least one of the 18 common antibiotic resistance genes. To the best of our knowledge, this is the first report of shotgun metagenome sequence dataset of paper currency notes, which can be useful for future applications including as bio-surveillance of exchangeable fomites for infectious agents. PMID:26035208

  4. Heterologous viral expression systems in fosmid vectors increase the functional analysis potential of metagenomic libraries.

    PubMed

    Terrón-González, L; Medina, C; Limón-Mortés, M C; Santero, E

    2013-01-01

    The extraordinary potential of metagenomic functional analyses to identify activities of interest present in uncultured microorganisms has been limited by reduced gene expression in surrogate hosts. We have developed vectors and specialized E. coli strains as improved metagenomic DNA heterologous expression systems, taking advantage of viral components that prevent transcription termination at metagenomic terminators. One of the systems uses the phage T7 RNA-polymerase to drive metagenomic gene expression, while the other approach uses the lambda phage transcription anti-termination protein N to limit transcription termination. A metagenomic library was constructed and functionally screened to identify genes conferring carbenicillin resistance to E. coli. The use of these enhanced expression systems resulted in a 6-fold increase in the frequency of carbenicillin resistant clones. Subcloning and sequence analysis showed that, besides β-lactamases, efflux pumps are not only able contribute to carbenicillin resistance but may in fact be sufficient by themselves to convey carbenicillin resistance.

  5. Identification and assembly of genomes and genetic elements in complex metagenomic samples without using reference genomes.

    PubMed

    Nielsen, H Bjørn; Almeida, Mathieu; Juncker, Agnieszka Sierakowska; Rasmussen, Simon; Li, Junhua; Sunagawa, Shinichi; Plichta, Damian R; Gautier, Laurent; Pedersen, Anders G; Le Chatelier, Emmanuelle; Pelletier, Eric; Bonde, Ida; Nielsen, Trine; Manichanh, Chaysavanh; Arumugam, Manimozhiyan; Batto, Jean-Michel; Quintanilha Dos Santos, Marcelo B; Blom, Nikolaj; Borruel, Natalia; Burgdorf, Kristoffer S; Boumezbeur, Fouad; Casellas, Francesc; Doré, Joël; Dworzynski, Piotr; Guarner, Francisco; Hansen, Torben; Hildebrand, Falk; Kaas, Rolf S; Kennedy, Sean; Kristiansen, Karsten; Kultima, Jens Roat; Léonard, Pierre; Levenez, Florence; Lund, Ole; Moumen, Bouziane; Le Paslier, Denis; Pons, Nicolas; Pedersen, Oluf; Prifti, Edi; Qin, Junjie; Raes, Jeroen; Sørensen, Søren; Tap, Julien; Tims, Sebastian; Ussery, David W; Yamada, Takuji; Renault, Pierre; Sicheritz-Ponten, Thomas; Bork, Peer; Wang, Jun; Brunak, Søren; Ehrlich, S Dusko

    2014-08-01

    Most current approaches for analyzing metagenomic data rely on comparisons to reference genomes, but the microbial diversity of many environments extends far beyond what is covered by reference databases. De novo segregation of complex metagenomic data into specific biological entities, such as particular bacterial strains or viruses, remains a largely unsolved problem. Here we present a method, based on binning co-abundant genes across a series of metagenomic samples, that enables comprehensive discovery of new microbial organisms, viruses and co-inherited genetic entities and aids assembly of microbial genomes without the need for reference sequences. We demonstrate the method on data from 396 human gut microbiome samples and identify 7,381 co-abundance gene groups (CAGs), including 741 metagenomic species (MGS). We use these to assemble 238 high-quality microbial genomes and identify affiliations between MGS and hundreds of viruses or genetic entities. Our method provides the means for comprehensive profiling of the diversity within complex metagenomic samples.

  6. Metagenomic Frameworks for Monitoring Antibiotic Resistance in Aquatic Environments

    PubMed Central

    Port, Jesse A.; Cullen, Alison C.; Wallace, James C.; Smith, Marissa N.

    2013-01-01

    Background: High-throughput genomic technologies offer new approaches for environmental health monitoring, including metagenomic surveillance of antibiotic resistance determinants (ARDs). Although natural environments serve as reservoirs for antibiotic resistance genes that can be transferred to pathogenic and human commensal bacteria, monitoring of these determinants has been infrequent and incomplete. Furthermore, surveillance efforts have not been integrated into public health decision making. Objectives: We used a metagenomic epidemiology–based approach to develop an ARD index that quantifies antibiotic resistance potential, and we analyzed this index for common modal patterns across environmental samples. We also explored how metagenomic data such as this index could be conceptually framed within an early risk management context. Methods: We analyzed 25 published data sets from shotgun pyrosequencing projects. The samples consisted of microbial community DNA collected from marine and freshwater environments across a gradient of human impact. We used principal component analysis to identify index patterns across samples. Results: We observed significant differences in the overall index and index subcategory levels when comparing ecosystems more proximal versus distal to human impact. The selection of different sequence similarity thresholds strongly influenced the index measurements. Unique index subcategory modes distinguished the different metagenomes. Conclusions: Broad-scale screening of ARD potential using this index revealed utility for framing environmental health monitoring and surveillance. This approach holds promise as a screening tool for establishing baseline ARD levels that can be used to inform and prioritize decision making regarding management of ARD sources and human exposure routes. Citation: Port JA, Cullen AC, Wallace JC, Smith MN, Faustman EM. 2014. Metagenomic frameworks for monitoring antibiotic resistance in aquatic environments. Environ Health Perspect 122:222–228; http://dx.doi.org/10.1289/ehp.1307009 PMID:24334622

  7. Metagenomic frameworks for monitoring antibiotic resistance in aquatic environments.

    PubMed

    Port, Jesse A; Cullen, Alison C; Wallace, James C; Smith, Marissa N; Faustman, Elaine M

    2014-03-01

    High-throughput genomic technologies offer new approaches for environmental health monitoring, including metagenomic surveillance of antibiotic resistance determinants (ARDs). Although natural environments serve as reservoirs for antibiotic resistance genes that can be transferred to pathogenic and human commensal bacteria, monitoring of these determinants has been infrequent and incomplete. Furthermore, surveillance efforts have not been integrated into public health decision making. We used a metagenomic epidemiology-based approach to develop an ARD index that quantifies antibiotic resistance potential, and we analyzed this index for common modal patterns across environmental samples. We also explored how metagenomic data such as this index could be conceptually framed within an early risk management context. We analyzed 25 published data sets from shotgun pyrosequencing projects. The samples consisted of microbial community DNA collected from marine and freshwater environments across a gradient of human impact. We used principal component analysis to identify index patterns across samples. We observed significant differences in the overall index and index subcategory levels when comparing ecosystems more proximal versus distal to human impact. The selection of different sequence similarity thresholds strongly influenced the index measurements. Unique index subcategory modes distinguished the different metagenomes. Broad-scale screening of ARD potential using this index revealed utility for framing environmental health monitoring and surveillance. This approach holds promise as a screening tool for establishing baseline ARD levels that can be used to inform and prioritize decision making regarding management of ARD sources and human exposure routes. Port JA, Cullen AC, Wallace JC, Smith MN, Faustman EM. 2014. Metagenomic frameworks for monitoring antibiotic resistance in aquatic environments. Environ Health Perspect 122:222–228; http://dx.doi.org/10.1289/ehp.1307009

  8. Functional Metagenomics: Construction and High-Throughput Screening of Fosmid Libraries for Discovery of Novel Carbohydrate-Active Enzymes.

    PubMed

    Ufarté, Lisa; Bozonnet, Sophie; Laville, Elisabeth; Cecchini, Davide A; Pizzut-Serin, Sandra; Jacquiod, Samuel; Demanèche, Sandrine; Simonet, Pascal; Franqueville, Laure; Veronese, Gabrielle Potocki

    2016-01-01

    Activity-based metagenomics is one of the most efficient approaches to boost the discovery of novel biocatalysts from the huge reservoir of uncultivated bacteria. In this chapter, we describe a highly generic procedure of metagenomic library construction and high-throughput screening for carbohydrate-active enzymes. Applicable to any bacterial ecosystem, it enables the swift identification of functional enzymes that are highly efficient, alone or acting in synergy, to break down polysaccharides and oligosaccharides.

  9. Whither or wither geomicrobiology in the era of 'community metagenomics'

    USGS Publications Warehouse

    Oremland, R.S.; Capone, D.G.; Stolz, J.F.; Fuhrman, J.

    2005-01-01

    Molecular techniques are valuable tools that can improve our understanding of the structure of microbial communities. They provide the ability to probe for life in all niches of the biosphere, perhaps even supplanting the need to cultivate microorganisms or to conduct ecophysiological investigations. However, an overemphasis and strict dependence on such large information-driven endeavours as environmental metagenomics could overwhelm the field, to the detriment of microbial ecology. We now call for more balanced, hypothesis-driven research efforts that couple metagenomics with classic approaches.

  10. Metagenomic Analyses of Drinking Water Receiving Different Disinfection Treatments

    EPA Science Inventory

    A metagenome-based approach was utilized for assessing the taxonomic affiliation and function potential of microbial populations in free chlorine (CHL) and monochloramine (CHM) treated drinking water (DW). A total of 1,024, 242 (averaging 544 bp) and 849, 349 (averaging 554 bp) ...

  11. Metagenomic systems biology and metabolic modeling of the human microbiome: from species composition to community assembly rules.

    PubMed

    Levy, Roie; Borenstein, Elhanan

    2014-01-01

    The human microbiome is a key contributor to health and development. Yet little is known about the ecological forces that are at play in defining the composition of such host-associated communities. Metagenomics-based studies have uncovered clear patterns of community structure but are often incapable of distinguishing alternative structuring paradigms. In a recent study, we integrated metagenomic analysis with a systems biology approach, using a reverse ecology framework to model numerous human microbiota species and to infer metabolic interactions between species. Comparing predicted interactions with species composition data revealed that the assembly of the human microbiome is dominated at the community level by habitat filtering. Furthermore, we demonstrated that this habitat filtering cannot be accounted for by known host phenotypes or by the metabolic versatility of the various species. Here we provide a summary of our findings and offer a brief perspective on related studies and on future approaches utilizing this metagenomic systems biology framework.

  12. An Agile Functional Analysis of Metagenomic Data Using SUPER-FOCUS.

    PubMed

    Silva, Genivaldo Gueiros Z; Lopes, Fabyano A C; Edwards, Robert A

    2017-01-01

    One of the main goals in metagenomics is to identify the functional profile of a microbial community from unannotated shotgun sequencing reads. Functional annotation is important in biological research because it enables researchers to identify the abundance of functional genes of the organisms present in the sample, answering the question, "What can the organisms in the sample do?" Most currently available approaches do not scale with increasing data volumes, which is important because both the number and lengths of the reads provided by sequencing platforms keep increasing. Here, we present SUPER-FOCUS, SUbsystems Profile by databasE Reduction using FOCUS, an agile homology-based approach using a reduced reference database to report the subsystems present in metagenomic datasets and profile their abundances. SUPER-FOCUS was tested with real metagenomes, and the results show that it accurately predicts the subsystems present in the profiled microbial communities, is computationally efficient, and up to 1000 times faster than other tools. SUPER-FOCUS is freely available at http://edwards.sdsu.edu/SUPERFOCUS .

  13. Metagenomics and novel gene discovery

    PubMed Central

    Culligan, Eamonn P; Sleator, Roy D; Marchesi, Julian R; Hill, Colin

    2014-01-01

    Metagenomics provides a means of assessing the total genetic pool of all the microbes in a particular environment, in a culture-independent manner. It has revealed unprecedented diversity in microbial community composition, which is further reflected in the encoded functional diversity of the genomes, a large proportion of which consists of novel genes. Herein, we review both sequence-based and functional metagenomic methods to uncover novel genes and outline some of the associated problems of each type of approach, as well as potential solutions. Furthermore, we discuss the potential for metagenomic biotherapeutic discovery, with a particular focus on the human gut microbiome and finally, we outline how the discovery of novel genes may be used to create bioengineered probiotics. PMID:24317337

  14. Preparation of fosmid libraries and functional metagenomic analysis of microbial community DNA.

    PubMed

    Martínez, Asunción; Osburne, Marcia S

    2013-01-01

    One of the most important challenges in contemporary microbial ecology is to assign a functional role to the large number of novel genes discovered through large-scale sequencing of natural microbial communities that lack similarity to genes of known function. Functional screening of metagenomic libraries, that is, screening environmental DNA clones for the ability to confer an activity of interest to a heterologous bacterial host, is a promising approach for bridging the gap between metagenomic DNA sequencing and functional characterization. Here, we describe methods for isolating environmental DNA and constructing metagenomic fosmid libraries, as well as methods for designing and implementing successful functional screens of such libraries. © 2013 Elsevier Inc. All rights reserved.

  15. GenomePeek—an online tool for prokaryotic genome and metagenome analysis

    DOE PAGES

    McNair, Katelyn; Edwards, Robert A.

    2015-06-16

    As increases in prokaryotic sequencing take place, a method to quickly and accurately analyze this data is needed. Previous tools are mainly designed for metagenomic analysis and have limitations; such as long runtimes and significant false positive error rates. The online tool GenomePeek (edwards.sdsu.edu/GenomePeek) was developed to analyze both single genome and metagenome sequencing files, quickly and with low error rates. GenomePeek uses a sequence assembly approach where reads to a set of conserved genes are extracted, assembled and then aligned against the highly specific reference database. GenomePeek was found to be faster than traditional approaches while still keeping errormore » rates low, as well as offering unique data visualization options.« less

  16. Mining virulence genes using metagenomics.

    PubMed

    Belda-Ferre, Pedro; Cabrera-Rubio, Raúl; Moya, Andrés; Mira, Alex

    2011-01-01

    When a bacterial genome is compared to the metagenome of an environment it inhabits, most genes recruit at high sequence identity. In free-living bacteria (for instance marine bacteria compared against the ocean metagenome) certain genomic regions are totally absent in recruitment plots, representing therefore genes unique to individual bacterial isolates. We show that these Metagenomic Islands (MIs) are also visible in bacteria living in human hosts when their genomes are compared to sequences from the human microbiome, despite the compartmentalized structure of human-related environments such as the gut. From an applied point of view, MIs of human pathogens (e.g. those identified in enterohaemorragic Escherichia coli against the gut metagenome or in pathogenic Neisseria meningitidis against the oral metagenome) include virulence genes that appear to be absent in related strains or species present in the microbiome of healthy individuals. We propose that this strategy (i.e. recruitment analysis of pathogenic bacteria against the metagenome of healthy subjects) can be used to detect pathogenicity regions in species where the genes involved in virulence are poorly characterized. Using this approach, we detect well-known pathogenicity islands and identify new potential virulence genes in several human pathogens.

  17. Identification and initial characterization of a novel turkey-origin picobirnavirus using a metagenomic approach

    USDA-ARS?s Scientific Manuscript database

    Using the Genome Sequencer FLX Titanium technology (Roche, 454 Life Sciences), a ribonucleic acid (RNA) virus-specific metagenome was prepared using the pooled intestinal contents collected from North Carolina turkey flocks experiencing enteric disease signs. This analysis produced 6526 contigs rang...

  18. IDENTIFICATION OF CHICKEN-SPECIFIC FECAL MICROBIAL SEQUENCES USING A METAGENOMIC APPROACH

    EPA Science Inventory

    In this study, we applied a genome fragment enrichment (GFE) method to select for genomic regions that differ between different fecal metagenomes. Competitive DNA hybridizations were performed between chicken fecal DNA and pig fecal DNA (C-P) and between chicken fecal DNA and an ...

  19. Some considerations for analyzing biodiversity using integrative metagenomics and gene networks.

    PubMed

    Bittner, Lucie; Halary, Sébastien; Payri, Claude; Cruaud, Corinne; de Reviers, Bruno; Lopez, Philippe; Bapteste, Eric

    2010-07-30

    Improving knowledge of biodiversity will benefit conservation biology, enhance bioremediation studies, and could lead to new medical treatments. However there is no standard approach to estimate and to compare the diversity of different environments, or to study its past, and possibly, future evolution. We argue that there are two conditions for significant progress in the identification and quantification of biodiversity. First, integrative metagenomic studies - aiming at the simultaneous examination (or even better at the integration) of observations about the elements, functions and evolutionary processes captured by the massive sequencing of multiple markers - should be preferred over DNA barcoding projects and over metagenomic projects based on a single marker. Second, such metagenomic data should be studied with novel inclusive network-based approaches, designed to draw inferences both on the many units and on the many processes present in the environments. We reached these conclusions through a comparison of the theoretical foundations of two molecular approaches seeking to assess biodiversity: metagenomics (mostly used on prokaryotes and protists) and DNA barcoding (mostly used on multicellular eukaryotes), and by pragmatic considerations of the issues caused by the 'species problem' in biodiversity studies. Evolutionary gene networks reduce the risk of producing biodiversity estimates with limited explanatory power, biased either by unequal rates of LGT, or difficult to interpret due to (practical) problems caused by type I and type II grey zones. Moreover, these networks would easily accommodate additional (meta)transcriptomic and (meta)proteomic data.

  20. Identifying Differentially Abundant Metabolic Pathways in Metagenomic Datasets

    NASA Astrophysics Data System (ADS)

    Liu, Bo; Pop, Mihai

    Enabled by rapid advances in sequencing technology, metagenomic studies aim to characterize entire communities of microbes bypassing the need for culturing individual bacterial members. One major goal of such studies is to identify specific functional adaptations of microbial communities to their habitats. Here we describe a powerful analytical method (MetaPath) that can identify differentially abundant pathways in metagenomic data-sets, relying on a combination of metagenomic sequence data and prior metabolic pathway knowledge. We show that MetaPath outperforms other common approaches when evaluated on simulated datasets. We also demonstrate the power of our methods in analyzing two, publicly available, metagenomic datasets: a comparison of the gut microbiome of obese and lean twins; and a comparison of the gut microbiome of infant and adult subjects. We demonstrate that the subpathways identified by our method provide valuable insights into the biological activities of the microbiome.

  1. Estimating Unbiased Treatment Effects in Education Using a Regression Discontinuity Design

    ERIC Educational Resources Information Center

    Smith, William C.

    2014-01-01

    The ability of regression discontinuity (RD) designs to provide an unbiased treatment effect while overcoming the ethical concerns plagued by Random Control Trials (RCTs) make it a valuable and useful approach in education evaluation. RD is the only explicitly recognized quasi-experimental approach identified by the Institute of Education…

  2. Metagenomic Analysis of Fever, Thrombocytopenia and Leukopenia Syndrome (FTLS) in Henan Province, China: Discovery of a New Bunyavirus

    PubMed Central

    Ma, Hong; Zhang, Yuan; Du, Yanhua; Wang, Pengzhi; Tang, Xiaoyan; Wang, Haifeng; Kang, Kai; Zhang, Shiqiang; Zhao, Guohua; Wu, Weili; Yang, Yinhui; Chen, Haomin; Mu, Feng; Chen, Weijun

    2011-01-01

    Since 2007, many cases of fever, thrombocytopenia and leukopenia syndrome (FTLS) have emerged in Henan Province, China. Patient reports of tick bites suggested that infection could contribute to FTLS. Many tick-transmitted microbial pathogens were tested for by PCR/RT-PCR and/or indirect immunofluorescence assay (IFA). However, only 8% (24/285) of samples collected from 2007 to 2010 tested positive for human granulocytic anaplasmosis (HGA), suggesting that other pathogens could be involved. Here, we used an unbiased metagenomic approach to screen and survey for microbes possibly associated with FTLS. BLASTx analysis of deduced protein sequences revealed that a novel bunyavirus (36% identity to Tehran virus, accession: HQ412604) was present only in sera from FTLS patients. A phylogenetic analysis further showed that, although closely related to Uukuniemi virus of the Phlebovirus genus, this virus was distinct. The candidate virus was examined for association with FTLS among samples collected from Henan province during 2007–2010. RT-PCR, viral cultures, and a seroepidemiologic survey were undertaken. RT-PCR results showed that 223 of 285 (78.24%) acute-phase serum samples contained viral RNA. Of 95 patients for whom paired acute and convalescent sera were available, 73 had serologic evidence of infection, with 52 seroconversions and 21 exhibiting a 4-fold increase in antibody titer to the virus. The new virus was isolated from patient acute-phase serum samples and named Henan Fever Virus (HNF virus). Whole-genome sequencing confirmed that the virus was a novel bunyavirus with genetic similarity to known bunyaviruses, and was most closely related to the Uukuniemi virus (34%, 24%, and 29% of maximum identity, respectively, for segment L, M, S at maximum query coverage). After the release of the GenBank sequences of SFTSV, we found that they were nearly identical (>99% identity). These results show that the novel bunyavirus (HNF virus) is strongly correlated with FTLS. PMID:22114553

  3. Metagenomic Approaches to Investigate the Contribution of the Vineyard Environment to the Quality of Wine Fermentation: Potentials and Difficulties

    PubMed Central

    Stefanini, Irene; Cavalieri, Duccio

    2018-01-01

    The winemaking is a complex process that begins in the vineyard and ends at consumption moment. Recent reports have shown the relevance of microbial populations in the definition of the regional organoleptic and sensory characteristics of a wine. Metagenomic approaches, allowing the exhaustive identification of microorganisms present in complex samples, have recently played a fundamental role in the dissection of the contribution of the vineyard environment to wine fermentation. Systematic approaches have explored the impact of agronomical techniques, vineyard topologies, and climatic changes on bacterial and fungal populations found in the vineyard and in fermentations, also trying to predict or extrapolate the effects on the sensorial characteristics of the resulting wine. This review is aimed at highlighting the major technical and experimental challenges in dissecting the contribution of the vineyard and native environments microbiota to the wine fermentation process, and how metagenomic approaches can help in understanding microbial fluxes and selections across the environments and specimens related to wine fermentation. PMID:29867889

  4. DEVELOPMENT OF HOST-SPECIFIC METAGENOMIC MARKERS FOR MICROBIAL SOURCE TRACKING USING A NOVEL METAGENOMIC APPROACH

    EPA Science Inventory

    Fecal contamination of source waters is an important issue to the drinking water industry. Improper disposal of animal waste, leaky septic tanks, storm runoff, and wildlife can all be responsible for spreading enteric pathogens into source waters. As a result, methods that can pi...

  5. PathoScope 2.0: a complete computational framework for strain identification in environmental or clinical sequencing samples

    PubMed Central

    2014-01-01

    Background Recent innovations in sequencing technologies have provided researchers with the ability to rapidly characterize the microbial content of an environmental or clinical sample with unprecedented resolution. These approaches are producing a wealth of information that is providing novel insights into the microbial ecology of the environment and human health. However, these sequencing-based approaches produce large and complex datasets that require efficient and sensitive computational analysis workflows. Many recent tools for analyzing metagenomic-sequencing data have emerged, however, these approaches often suffer from issues of specificity, efficiency, and typically do not include a complete metagenomic analysis framework. Results We present PathoScope 2.0, a complete bioinformatics framework for rapidly and accurately quantifying the proportions of reads from individual microbial strains present in metagenomic sequencing data from environmental or clinical samples. The pipeline performs all necessary computational analysis steps; including reference genome library extraction and indexing, read quality control and alignment, strain identification, and summarization and annotation of results. We rigorously evaluated PathoScope 2.0 using simulated data and data from the 2011 outbreak of Shiga-toxigenic Escherichia coli O104:H4. Conclusions The results show that PathoScope 2.0 is a complete, highly sensitive, and efficient approach for metagenomic analysis that outperforms alternative approaches in scope, speed, and accuracy. The PathoScope 2.0 pipeline software is freely available for download at: http://sourceforge.net/projects/pathoscope/. PMID:25225611

  6. The Fecal Viral Flora of California Sea Lions▿†

    PubMed Central

    Li, Linlin; Shan, Tongling; Wang, Chunlin; Côté, Colette; Kolman, John; Onions, David; Gulland, Frances M. D.; Delwart, Eric

    2011-01-01

    California sea lions are one of the major marine mammal species along the Pacific coast of North America. Sea lions are susceptible to a wide variety of viruses, some of which can be transmitted to or from terrestrial mammals. Using an unbiased viral metagenomic approach, we surveyed the fecal virome in California sea lions of different ages and health statuses. Averages of 1.6 and 2.5 distinct mammalian viral species were shed by pups and juvenile sea lions, respectively. Previously undescribed mammalian viruses from four RNA virus families (Astroviridae, Picornaviridae, Caliciviridae, and Reoviridae) and one DNA virus family (Parvoviridae) were characterized. The first complete or partial genomes of sapeloviruses, sapoviruses, noroviruses, and bocavirus in marine mammals are reported. Astroviruses and bocaviruses showed the highest prevalence and abundance in California sea lion feces. The diversity of bacteriophages was higher in unweaned sea lion pups than in juveniles and animals in rehabilitation, where the phage community consisted largely of phages related to the family Microviridae. This study increases our understanding of the viral diversity in marine mammals, highlights the high rate of enteric viral infections in these highly social carnivores, and may be used as a baseline viral survey for comparison with samples from California sea lions during unexplained disease outbreaks. PMID:21795334

  7. Using a metagenomic approach to improve our understanding of Armillaria root disease

    Treesearch

    Amy Ross-Davis; Matt Settles; John W. Hanna; John D. Shaw; Andrew T. Hudak; Deborah S. Page-Dumroese; Ned B. Klopfenstein

    2015-01-01

    Metagenomics has illuminated our understanding of how microbial communities influence health and disease. Researchers are beginning to characterize what constitutes healthy microbiota in terms of structure, function, and diversity in a variety of environments. Although investigation lags behind the more well-studied human microbiome, a growing body of research is using...

  8. DEVELOPMENT OF HOST-SPECIFIC METAGENOMIC MARKERS FOR MICROBIAL SOURCE TRACKING USING A NOVEL METAGENOMIC APPROACH

    EPA Science Inventory

    Fecal contamination of source water has always been an important issue to the drinking water industry. Improper disposal of animal waste, leaky septic tanks, storm runoff, and the abundance of wildlife in natural water systems can all be responsible for the spread of enteric path...

  9. Improving microbial fitness in the mammalian gut by in vivo temporal functional metagenomics

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Yaung, Stephanie J.; Deng, Luxue; Li, Ning

    Elucidating functions of commensal microbial genes in the mammalian gut is challenging because many commensals are recalcitrant to laboratory cultivation and genetic manipulation. We present Temporal FUnctional Metagenomics sequencing (TFUMseq), a platform to functionally mine bacterial genomes for genes that contribute to fitness of commensal bacteria in vivo. Our approach uses metagenomic DNA to construct large-scale heterologous expression libraries that are tracked over time in vivo by deep sequencing and computational methods. To demonstrate our approach, we built a TFUMseq plasmid library using the gut commensal Bacteroides thetaiotaomicron (Bt) and introduced Escherichia coli carrying this library into germfree mice. Populationmore » dynamics of library clones revealed Bt genes conferring significant fitness advantages in E. coli over time, including carbohydrate utilization genes, with a Bt galactokinase central to early colonization, and subsequent dominance by a Bt glycoside hydrolase enabling sucrose metabolism coupled with co-evolution of the plasmid library and E. coli genome driving increased galactose utilization. Here, our findings highlight the utility of functional metagenomics for engineering commensal bacteria with improved properties, including expanded colonization capabilities in vivo.« less

  10. Improving microbial fitness in the mammalian gut by in vivo temporal functional metagenomics

    DOE PAGES

    Yaung, Stephanie J.; Deng, Luxue; Li, Ning; ...

    2015-03-11

    Elucidating functions of commensal microbial genes in the mammalian gut is challenging because many commensals are recalcitrant to laboratory cultivation and genetic manipulation. We present Temporal FUnctional Metagenomics sequencing (TFUMseq), a platform to functionally mine bacterial genomes for genes that contribute to fitness of commensal bacteria in vivo. Our approach uses metagenomic DNA to construct large-scale heterologous expression libraries that are tracked over time in vivo by deep sequencing and computational methods. To demonstrate our approach, we built a TFUMseq plasmid library using the gut commensal Bacteroides thetaiotaomicron (Bt) and introduced Escherichia coli carrying this library into germfree mice. Populationmore » dynamics of library clones revealed Bt genes conferring significant fitness advantages in E. coli over time, including carbohydrate utilization genes, with a Bt galactokinase central to early colonization, and subsequent dominance by a Bt glycoside hydrolase enabling sucrose metabolism coupled with co-evolution of the plasmid library and E. coli genome driving increased galactose utilization. Here, our findings highlight the utility of functional metagenomics for engineering commensal bacteria with improved properties, including expanded colonization capabilities in vivo.« less

  11. Under-detection of endospore-forming Firmicutes in metagenomic data

    DOE PAGES

    Filippidou, Sevasti; Junier, Thomas; Wunderlin, Tina; ...

    2015-04-25

    Microbial diversity studies based on metagenomic sequencing have greatly enhanced our knowledge of the microbial world. However, one caveat is the fact that not all microorganisms are equally well detected, questioning the universality of this approach. Firmicutes are known to be a dominant bacterial group. Several Firmicutes species are endospore formers and this property makes them hardy in potentially harsh conditions, and thus likely to be present in a wide variety of environments, even as residents and not functional players. While metagenomic libraries can be expected to contain endospore formers, endospores are known to be resilient to many traditional methodsmore » of DNA isolation and thus potentially undetectable. In this study we evaluated the representation of endospore-forming Firmicutes in 73 published metagenomic datasets using two molecular markers unique to this bacterial group ( spo0A and gpr). Both markers were notably absent in well-known habitats of Firmicutes such as soil, with spo0A found only in three mammalian gut microbiomes. A tailored DNA extraction method resulted in the detection of a large diversity of endospore-formers in amplicon sequencing of the 16S rRNA and spo0A genes. However, shotgun classification was still poor with only a minor fraction of the community assigned to Firmicutes. Thus, removing a specific bias in a molecular workflow improves detection in amplicon sequencing, but it was insufficient to overcome the limitations for detecting endospore-forming Firmicutes in whole-genome metagenomics. In conclusion, this study highlights the importance of understanding the specific methodological biases that can contribute to improve the universality of metagenomic approaches.« less

  12. Under-detection of endospore-forming Firmicutes in metagenomic data

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Filippidou, Sevasti; Junier, Thomas; Wunderlin, Tina

    Microbial diversity studies based on metagenomic sequencing have greatly enhanced our knowledge of the microbial world. However, one caveat is the fact that not all microorganisms are equally well detected, questioning the universality of this approach. Firmicutes are known to be a dominant bacterial group. Several Firmicutes species are endospore formers and this property makes them hardy in potentially harsh conditions, and thus likely to be present in a wide variety of environments, even as residents and not functional players. While metagenomic libraries can be expected to contain endospore formers, endospores are known to be resilient to many traditional methodsmore » of DNA isolation and thus potentially undetectable. In this study we evaluated the representation of endospore-forming Firmicutes in 73 published metagenomic datasets using two molecular markers unique to this bacterial group ( spo0A and gpr). Both markers were notably absent in well-known habitats of Firmicutes such as soil, with spo0A found only in three mammalian gut microbiomes. A tailored DNA extraction method resulted in the detection of a large diversity of endospore-formers in amplicon sequencing of the 16S rRNA and spo0A genes. However, shotgun classification was still poor with only a minor fraction of the community assigned to Firmicutes. Thus, removing a specific bias in a molecular workflow improves detection in amplicon sequencing, but it was insufficient to overcome the limitations for detecting endospore-forming Firmicutes in whole-genome metagenomics. In conclusion, this study highlights the importance of understanding the specific methodological biases that can contribute to improve the universality of metagenomic approaches.« less

  13. Mutually unbiased bases and semi-definite programming

    NASA Astrophysics Data System (ADS)

    Brierley, Stephen; Weigert, Stefan

    2010-11-01

    A complex Hilbert space of dimension six supports at least three but not more than seven mutually unbiased bases. Two computer-aided analytical methods to tighten these bounds are reviewed, based on a discretization of parameter space and on Gröbner bases. A third algorithmic approach is presented: the non-existence of more than three mutually unbiased bases in composite dimensions can be decided by a global optimization method known as semidefinite programming. The method is used to confirm that the spectral matrix cannot be part of a complete set of seven mutually unbiased bases in dimension six.

  14. Reconstructing the Genomic Content of Microbiome Taxa through Shotgun Metagenomic Deconvolution

    PubMed Central

    Carr, Rogan; Shen-Orr, Shai S.; Borenstein, Elhanan

    2013-01-01

    Metagenomics has transformed our understanding of the microbial world, allowing researchers to bypass the need to isolate and culture individual taxa and to directly characterize both the taxonomic and gene compositions of environmental samples. However, associating the genes found in a metagenomic sample with the specific taxa of origin remains a critical challenge. Existing binning methods, based on nucleotide composition or alignment to reference genomes allow only a coarse-grained classification and rely heavily on the availability of sequenced genomes from closely related taxa. Here, we introduce a novel computational framework, integrating variation in gene abundances across multiple samples with taxonomic abundance data to deconvolve metagenomic samples into taxa-specific gene profiles and to reconstruct the genomic content of community members. This assembly-free method is not bounded by various factors limiting previously described methods of metagenomic binning or metagenomic assembly and represents a fundamentally different approach to metagenomic-based genome reconstruction. An implementation of this framework is available at http://elbo.gs.washington.edu/software.html. We first describe the mathematical foundations of our framework and discuss considerations for implementing its various components. We demonstrate the ability of this framework to accurately deconvolve a set of metagenomic samples and to recover the gene content of individual taxa using synthetic metagenomic samples. We specifically characterize determinants of prediction accuracy and examine the impact of annotation errors on the reconstructed genomes. We finally apply metagenomic deconvolution to samples from the Human Microbiome Project, successfully reconstructing genus-level genomic content of various microbial genera, based solely on variation in gene count. These reconstructed genera are shown to correctly capture genus-specific properties. With the accumulation of metagenomic data, this deconvolution framework provides an essential tool for characterizing microbial taxa never before seen, laying the foundation for addressing fundamental questions concerning the taxa comprising diverse microbial communities. PMID:24146609

  15. Metagenomics: A new horizon in cancer research

    PubMed Central

    Banerjee, Joyita; Mishra, Neetu; Dhas, Yogita

    2015-01-01

    Metagenomics has broadened the scope of targeting microbes responsible for inducing various types of cancers. About 16.1% of cancers are associated with microbial infection. Metagenomics is an equitable way of identifying and studying micro-organisms within their habitat. In cancer research, this approach has revolutionized the way of identifying, analyzing and targeting the microbial diversity present in the tissue specimens of cancer patients. The genomic analyses of these micro-organisms through next generation sequencing techniques invariably facilitate in recognizing the microbial population in biopsies and their evolutionary relationships with each other. In this review an attempt has been made to generate current metagenomic view on cancer microbiota. Different types of micro-organisms have been found to be linked to various types of cancers, thus, contributing significantly in understanding the disease at molecular level. PMID:26110115

  16. Metagenomics of Thermophiles with a Focus on Discovery of Novel Thermozymes

    PubMed Central

    DeCastro, María-Eugenia; Rodríguez-Belmonte, Esther; González-Siso, María-Isabel

    2016-01-01

    Microbial populations living in environments with temperatures above 50°C (thermophiles) have been widely studied, increasing our knowledge in the composition and function of these ecological communities. Since these populations express a broad number of heat-resistant enzymes (thermozymes), they also represent an important source for novel biocatalysts that can be potentially used in industrial processes. The integrated study of the whole-community DNA from an environment, known as metagenomics, coupled with the development of next generation sequencing (NGS) technologies, has allowed the generation of large amounts of data from thermophiles. In this review, we summarize the main approaches commonly utilized for assessing the taxonomic and functional diversity of thermophiles through metagenomics, including several bioinformatics tools and some metagenome-derived methods to isolate their thermozymes. PMID:27729905

  17. An Enrichment of CRISPR and Other Defense-Related Features in Marine Sponge-Associated Microbial Metagenomes

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Horn, Hannes; Slaby, Beate M.; Jahn, Martin T.

    Many marine sponges are populated by dense and taxonomically diverse microbial consortia. We employed a metagenomics approach to unravel the differences in the functional gene repertoire among three Mediterranean sponge species, Petrosia ficiformis, Sarcotragus foetidus, Aplysina aerophoba and seawater. Different signatures were observed between sponge and seawater metagenomes with regard to microbial community composition, GC content, and estimated bacterial genome size. Our analysis showed further a pronounced repertoire for defense systems in sponge metagenomes. Specifically, clustered regularly interspaced short palindromic repeats, restriction modification, DNA phosphorothioation and phage growth limitation systems were enriched in sponge metagenomes. These data suggest that defensemore » is an important functional trait for an existence within sponges that requires mechanisms to defend against foreign DNA from microorganisms and viruses. Furthermore, this study contributes to an understanding of the evolutionary arms race between viruses/phages and bacterial genomes and it sheds light on the bacterial defenses that have evolved in the context of the sponge holobiont.« less

  18. An Enrichment of CRISPR and Other Defense-Related Features in Marine Sponge-Associated Microbial Metagenomes

    DOE PAGES

    Horn, Hannes; Slaby, Beate M.; Jahn, Martin T.; ...

    2016-11-08

    Many marine sponges are populated by dense and taxonomically diverse microbial consortia. We employed a metagenomics approach to unravel the differences in the functional gene repertoire among three Mediterranean sponge species, Petrosia ficiformis, Sarcotragus foetidus, Aplysina aerophoba and seawater. Different signatures were observed between sponge and seawater metagenomes with regard to microbial community composition, GC content, and estimated bacterial genome size. Our analysis showed further a pronounced repertoire for defense systems in sponge metagenomes. Specifically, clustered regularly interspaced short palindromic repeats, restriction modification, DNA phosphorothioation and phage growth limitation systems were enriched in sponge metagenomes. These data suggest that defensemore » is an important functional trait for an existence within sponges that requires mechanisms to defend against foreign DNA from microorganisms and viruses. Furthermore, this study contributes to an understanding of the evolutionary arms race between viruses/phages and bacterial genomes and it sheds light on the bacterial defenses that have evolved in the context of the sponge holobiont.« less

  19. MaxBin 2.0: an automated binning algorithm to recover genomes from multiple metagenomic datasets

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Wu, Yu-Wei; Simmons, Blake A.; Singer, Steven W.

    The recovery of genomes from metagenomic datasets is a critical step to defining the functional roles of the underlying uncultivated populations. We previously developed MaxBin, an automated binning approach for high-throughput recovery of microbial genomes from metagenomes. Here, we present an expanded binning algorithm, MaxBin 2.0, which recovers genomes from co-assembly of a collection of metagenomic datasets. Tests on simulated datasets revealed that MaxBin 2.0 is highly accurate in recovering individual genomes, and the application of MaxBin 2.0 to several metagenomes from environmental samples demonstrated that it could achieve two complementary goals: recovering more bacterial genomes compared to binning amore » single sample as well as comparing the microbial community composition between different sampling environments. Availability and implementation: MaxBin 2.0 is freely available at http://sourceforge.net/projects/maxbin/ under BSD license. Supplementary information: Supplementary data are available at Bioinformatics online.« less

  20. Prospecting Metagenomic Enzyme Subfamily Genes for DNA Family Shuffling by a Novel PCR-based Approach*

    PubMed Central

    Wang, Qiuyan; Wu, Huili; Wang, Anming; Du, Pengfei; Pei, Xiaolin; Li, Haifeng; Yin, Xiaopu; Huang, Lifeng; Xiong, Xiaolong

    2010-01-01

    DNA family shuffling is a powerful method for enzyme engineering, which utilizes recombination of naturally occurring functional diversity to accelerate laboratory-directed evolution. However, the use of this technique has been hindered by the scarcity of family genes with the required level of sequence identity in the genome database. We describe here a strategy for collecting metagenomic homologous genes for DNA shuffling from environmental samples by truncated metagenomic gene-specific PCR (TMGS-PCR). Using identified metagenomic gene-specific primers, twenty-three 921-bp truncated lipase gene fragments, which shared 64–99% identity with each other and formed a distinct subfamily of lipases, were retrieved from 60 metagenomic samples. These lipase genes were shuffled, and selected active clones were characterized. The chimeric clones show extensive functional and genetic diversity, as demonstrated by functional characterization and sequence analysis. Our results indicate that homologous sequences of genes captured by TMGS-PCR can be used as suitable genetic material for DNA family shuffling with broad applications in enzyme engineering. PMID:20962349

  1. Chronic Meningitis Investigated via Metagenomic Next-Generation Sequencing

    PubMed Central

    O’Donovan, Brian D.; Gelfand, Jeffrey M.; Sample, Hannah A.; Chow, Felicia C.; Betjemann, John P.; Shah, Maulik P.; Richie, Megan B.; Gorman, Mark P.; Hajj-Ali, Rula A.; Calabrese, Leonard H.; Zorn, Kelsey C.; Chow, Eric D.; Greenlee, John E.; Blum, Jonathan H.; Green, Gary; Khan, Lillian M.; Banerji, Debarko; Langelier, Charles; Bryson-Cahn, Chloe; Harrington, Whitney; Lingappa, Jairam R.; Shanbhag, Niraj M.; Green, Ari J.; Brew, Bruce J.; Soldatos, Ariane; Strnad, Luke; Doernberg, Sarah B.; Jay, Cheryl A.; Douglas, Vanja; Josephson, S. Andrew; DeRisi, Joseph L.

    2018-01-01

    Importance Identifying infectious causes of subacute or chronic meningitis can be challenging. Enhanced, unbiased diagnostic approaches are needed. Objective To present a case series of patients with diagnostically challenging subacute or chronic meningitis using metagenomic next-generation sequencing (mNGS) of cerebrospinal fluid (CSF) supported by a statistical framework generated from mNGS of control samples from the environment and from patients who were noninfectious. Design, Setting, and Participants In this case series, mNGS data obtained from the CSF of 94 patients with noninfectious neuroinflammatory disorders and from 24 water and reagent control samples were used to develop and implement a weighted scoring metric based on z scores at the species and genus levels for both nucleotide and protein alignments to prioritize and rank the mNGS results. Total RNA was extracted for mNGS from the CSF of 7 participants with subacute or chronic meningitis who were recruited between September 2013 and March 2017 as part of a multicenter study of mNGS pathogen discovery among patients with suspected neuroinflammatory conditions. The neurologic infections identified by mNGS in these 7 participants represented a diverse array of pathogens. The patients were referred from the University of California, San Francisco Medical Center (n = 2), Zuckerberg San Francisco General Hospital and Trauma Center (n = 2), Cleveland Clinic (n = 1), University of Washington (n = 1), and Kaiser Permanente (n = 1). A weighted z score was used to filter out environmental contaminants and facilitate efficient data triage and analysis. Main Outcomes and Measures Pathogens identified by mNGS and the ability of a statistical model to prioritize, rank, and simplify mNGS results. Results The 7 participants ranged in age from 10 to 55 years, and 3 (43%) were female. A parasitic worm (Taenia solium, in 2 participants), a virus (HIV-1), and 4 fungi (Cryptococcus neoformans, Aspergillus oryzae, Histoplasma capsulatum, and Candida dubliniensis) were identified among the 7 participants by using mNGS. Evaluating mNGS data with a weighted z score–based scoring algorithm reduced the reported microbial taxa by a mean of 87% (range, 41%-99%) when taxa with a combined score of 0 or less were removed, effectively separating bona fide pathogen sequences from spurious environmental sequences so that, in each case, the causative pathogen was found within the top 2 scoring microbes identified using the algorithm. Conclusions and Relevance Diverse microbial pathogens were identified by mNGS in the CSF of patients with diagnostically challenging subacute or chronic meningitis, including a case of subarachnoid neurocysticercosis that defied diagnosis for 1 year, the first reported case of CNS vasculitis caused by Aspergillus oryzae, and the fourth reported case of C dubliniensis meningitis. Prioritizing metagenomic data with a scoring algorithm greatly clarified data interpretation and highlighted the problem of attributing biological significance to organisms present in control samples used for metagenomic sequencing studies. PMID:29710329

  2. Chronic Meningitis Investigated via Metagenomic Next-Generation Sequencing.

    PubMed

    Wilson, Michael R; O'Donovan, Brian D; Gelfand, Jeffrey M; Sample, Hannah A; Chow, Felicia C; Betjemann, John P; Shah, Maulik P; Richie, Megan B; Gorman, Mark P; Hajj-Ali, Rula A; Calabrese, Leonard H; Zorn, Kelsey C; Chow, Eric D; Greenlee, John E; Blum, Jonathan H; Green, Gary; Khan, Lillian M; Banerji, Debarko; Langelier, Charles; Bryson-Cahn, Chloe; Harrington, Whitney; Lingappa, Jairam R; Shanbhag, Niraj M; Green, Ari J; Brew, Bruce J; Soldatos, Ariane; Strnad, Luke; Doernberg, Sarah B; Jay, Cheryl A; Douglas, Vanja; Josephson, S Andrew; DeRisi, Joseph L

    2018-04-16

    Identifying infectious causes of subacute or chronic meningitis can be challenging. Enhanced, unbiased diagnostic approaches are needed. To present a case series of patients with diagnostically challenging subacute or chronic meningitis using metagenomic next-generation sequencing (mNGS) of cerebrospinal fluid (CSF) supported by a statistical framework generated from mNGS of control samples from the environment and from patients who were noninfectious. In this case series, mNGS data obtained from the CSF of 94 patients with noninfectious neuroinflammatory disorders and from 24 water and reagent control samples were used to develop and implement a weighted scoring metric based on z scores at the species and genus levels for both nucleotide and protein alignments to prioritize and rank the mNGS results. Total RNA was extracted for mNGS from the CSF of 7 participants with subacute or chronic meningitis who were recruited between September 2013 and March 2017 as part of a multicenter study of mNGS pathogen discovery among patients with suspected neuroinflammatory conditions. The neurologic infections identified by mNGS in these 7 participants represented a diverse array of pathogens. The patients were referred from the University of California, San Francisco Medical Center (n = 2), Zuckerberg San Francisco General Hospital and Trauma Center (n = 2), Cleveland Clinic (n = 1), University of Washington (n = 1), and Kaiser Permanente (n = 1). A weighted z score was used to filter out environmental contaminants and facilitate efficient data triage and analysis. Pathogens identified by mNGS and the ability of a statistical model to prioritize, rank, and simplify mNGS results. The 7 participants ranged in age from 10 to 55 years, and 3 (43%) were female. A parasitic worm (Taenia solium, in 2 participants), a virus (HIV-1), and 4 fungi (Cryptococcus neoformans, Aspergillus oryzae, Histoplasma capsulatum, and Candida dubliniensis) were identified among the 7 participants by using mNGS. Evaluating mNGS data with a weighted z score-based scoring algorithm reduced the reported microbial taxa by a mean of 87% (range, 41%-99%) when taxa with a combined score of 0 or less were removed, effectively separating bona fide pathogen sequences from spurious environmental sequences so that, in each case, the causative pathogen was found within the top 2 scoring microbes identified using the algorithm. Diverse microbial pathogens were identified by mNGS in the CSF of patients with diagnostically challenging subacute or chronic meningitis, including a case of subarachnoid neurocysticercosis that defied diagnosis for 1 year, the first reported case of CNS vasculitis caused by Aspergillus oryzae, and the fourth reported case of C dubliniensis meningitis. Prioritizing metagenomic data with a scoring algorithm greatly clarified data interpretation and highlighted the problem of attributing biological significance to organisms present in control samples used for metagenomic sequencing studies.

  3. Comprehensive benchmarking and ensemble approaches for metagenomic classifiers.

    PubMed

    McIntyre, Alexa B R; Ounit, Rachid; Afshinnekoo, Ebrahim; Prill, Robert J; Hénaff, Elizabeth; Alexander, Noah; Minot, Samuel S; Danko, David; Foox, Jonathan; Ahsanuddin, Sofia; Tighe, Scott; Hasan, Nur A; Subramanian, Poorani; Moffat, Kelly; Levy, Shawn; Lonardi, Stefano; Greenfield, Nick; Colwell, Rita R; Rosen, Gail L; Mason, Christopher E

    2017-09-21

    One of the main challenges in metagenomics is the identification of microorganisms in clinical and environmental samples. While an extensive and heterogeneous set of computational tools is available to classify microorganisms using whole-genome shotgun sequencing data, comprehensive comparisons of these methods are limited. In this study, we use the largest-to-date set of laboratory-generated and simulated controls across 846 species to evaluate the performance of 11 metagenomic classifiers. Tools were characterized on the basis of their ability to identify taxa at the genus, species, and strain levels, quantify relative abundances of taxa, and classify individual reads to the species level. Strikingly, the number of species identified by the 11 tools can differ by over three orders of magnitude on the same datasets. Various strategies can ameliorate taxonomic misclassification, including abundance filtering, ensemble approaches, and tool intersection. Nevertheless, these strategies were often insufficient to completely eliminate false positives from environmental samples, which are especially important where they concern medically relevant species. Overall, pairing tools with different classification strategies (k-mer, alignment, marker) can combine their respective advantages. This study provides positive and negative controls, titrated standards, and a guide for selecting tools for metagenomic analyses by comparing ranges of precision, accuracy, and recall. We show that proper experimental design and analysis parameters can reduce false positives, provide greater resolution of species in complex metagenomic samples, and improve the interpretation of results.

  4. BusyBee Web: metagenomic data analysis by bootstrapped supervised binning and annotation

    PubMed Central

    Kiefer, Christina; Fehlmann, Tobias; Backes, Christina

    2017-01-01

    Abstract Metagenomics-based studies of mixed microbial communities are impacting biotechnology, life sciences and medicine. Computational binning of metagenomic data is a powerful approach for the culture-independent recovery of population-resolved genomic sequences, i.e. from individual or closely related, constituent microorganisms. Existing binning solutions often require a priori characterized reference genomes and/or dedicated compute resources. Extending currently available reference-independent binning tools, we developed the BusyBee Web server for the automated deconvolution of metagenomic data into population-level genomic bins using assembled contigs (Illumina) or long reads (Pacific Biosciences, Oxford Nanopore Technologies). A reversible compression step as well as bootstrapped supervised binning enable quick turnaround times. The binning results are represented in interactive 2D scatterplots. Moreover, bin quality estimates, taxonomic annotations and annotations of antibiotic resistance genes are computed and visualized. Ground truth-based benchmarks of BusyBee Web demonstrate comparably high performance to state-of-the-art binning solutions for assembled contigs and markedly improved performance for long reads (median F1 scores: 70.02–95.21%). Furthermore, the applicability to real-world metagenomic datasets is shown. In conclusion, our reference-independent approach automatically bins assembled contigs or long reads, exhibits high sensitivity and precision, enables intuitive inspection of the results, and only requires FASTA-formatted input. The web-based application is freely accessible at: https://ccb-microbe.cs.uni-saarland.de/busybee. PMID:28472498

  5. Longitudinal Metagenomic Analysis of Hospital Air Identifies Clinically Relevant Microbes.

    PubMed

    King, Paula; Pham, Long K; Waltz, Shannon; Sphar, Dan; Yamamoto, Robert T; Conrad, Douglas; Taplitz, Randy; Torriani, Francesca; Forsyth, R Allyn

    2016-01-01

    We describe the sampling of sixty-three uncultured hospital air samples collected over a six-month period and analysis using shotgun metagenomic sequencing. Our primary goals were to determine the longitudinal metagenomic variability of this environment, identify and characterize genomes of potential pathogens and determine whether they are atypical to the hospital airborne metagenome. Air samples were collected from eight locations which included patient wards, the main lobby and outside. The resulting DNA libraries produced 972 million sequences representing 51 gigabases. Hierarchical clustering of samples by the most abundant 50 microbial orders generated three major nodes which primarily clustered by type of location. Because the indoor locations were longitudinally consistent, episodic relative increases in microbial genomic signatures related to the opportunistic pathogens Aspergillus, Penicillium and Stenotrophomonas were identified as outliers at specific locations. Further analysis of microbial reads specific for Stenotrophomonas maltophilia indicated homology to a sequenced multi-drug resistant clinical strain and we observed broad sequence coverage of resistance genes. We demonstrate that a shotgun metagenomic sequencing approach can be used to characterize the resistance determinants of pathogen genomes that are uncharacteristic for an otherwise consistent hospital air microbial metagenomic profile.

  6. CoMet: a workflow using contig coverage and composition for binning a metagenomic sample with high precision.

    PubMed

    Herath, Damayanthi; Tang, Sen-Lin; Tandon, Kshitij; Ackland, David; Halgamuge, Saman Kumara

    2017-12-28

    In metagenomics, the separation of nucleotide sequences belonging to an individual or closely matched populations is termed binning. Binning helps the evaluation of underlying microbial population structure as well as the recovery of individual genomes from a sample of uncultivable microbial organisms. Both supervised and unsupervised learning methods have been employed in binning; however, characterizing a metagenomic sample containing multiple strains remains a significant challenge. In this study, we designed and implemented a new workflow, Coverage and composition based binning of Metagenomes (CoMet), for binning contigs in a single metagenomic sample. CoMet utilizes coverage values and the compositional features of metagenomic contigs. The binning strategy in CoMet includes the initial grouping of contigs in guanine-cytosine (GC) content-coverage space and refinement of bins in tetranucleotide frequencies space in a purely unsupervised manner. With CoMet, the clustering algorithm DBSCAN is employed for binning contigs. The performances of CoMet were compared against four existing approaches for binning a single metagenomic sample, including MaxBin, Metawatt, MyCC (default) and MyCC (coverage) using multiple datasets including a sample comprised of multiple strains. Binning methods based on both compositional features and coverages of contigs had higher performances than the method which is based only on compositional features of contigs. CoMet yielded higher or comparable precision in comparison to the existing binning methods on benchmark datasets of varying complexities. MyCC (coverage) had the highest ranking score in F1-score. However, the performances of CoMet were higher than MyCC (coverage) on the dataset containing multiple strains. Furthermore, CoMet recovered contigs of more species and was 18 - 39% higher in precision than the compared existing methods in discriminating species from the sample of multiple strains. CoMet resulted in higher precision than MyCC (default) and MyCC (coverage) on a real metagenome. The approach proposed with CoMet for binning contigs, improves the precision of binning while characterizing more species in a single metagenomic sample and in a sample containing multiple strains. The F1-scores obtained from different binning strategies vary with different datasets; however, CoMet yields the highest F1-score with a sample comprised of multiple strains.

  7. Aerially transmitted human fungal pathogens: what can we learn from metagenomics and comparative genomics?

    PubMed

    Aliouat-Denis, Cécile-Marie; Chabé, Magali; Delhaes, Laurence; Dei-Cas, Eduardo

    2014-01-01

    In the last few decades, aerially transmitted human fungal pathogens have been increasingly recognized to impact the clinical course of chronic pulmonary diseases, such as asthma, cystic fibrosis or chronic obstructive pulmonary disease. Thanks to recent development of culture-free high-throughput sequencing methods, the metagenomic approaches are now appropriate to detect, identify and even quantify prokaryotic or eukaryotic microorganism communities inhabiting human respiratory tract and to access the complexity of even low-burden microbe communities that are likely to play a role in chronic pulmonary diseases. In this review, we explore how metagenomics and comparative genomics studies can alleviate fungal culture bottlenecks, improve our knowledge about fungal biology, lift the veil on cross-talks between host lung and fungal microbiota, and gain insights into the pathogenic impact of these aerially transmitted fungi that affect human beings. We reviewed metagenomic studies and comparative genomic analyses of carefully chosen microorganisms, and confirmed the usefulness of such approaches to better delineate biology and pathogenesis of aerially transmitted human fungal pathogens. Efforts to generate and efficiently analyze the enormous amount of data produced by such novel approaches have to be pursued, and will potentially provide the patients suffering from chronic pulmonary diseases with a better management. This manuscript is part of the series of works presented at the "V International Workshop: Molecular genetic approaches to the study of human pathogenic fungi" (Oaxaca, Mexico, 2012). Copyright © 2013 Revista Iberoamericana de Micología. Published by Elsevier Espana. All rights reserved.

  8. Molecular analysis of the bacterial microbiome in the forestomach fluid from the dromedary camel (Camelus dromedarius).

    PubMed

    Bhatt, Vaibhav D; Dande, Suchitra S; Patil, Nitin V; Joshi, Chaitanya G

    2013-04-01

    Rumen microorganisms play an important role in ruminant digestion and absorption of nutrients and have great potential applications in the field of rumen adjusting, food fermentation and biomass utilization etc. In order to investigate the composition of microorganisms in the rumen of camel (Camelus dromedarius), this study delves in the microbial diversity by culture-independent approach. It includes comparison of rumen samples investigated in the present study to other currently available metagenomes to reveal potential differences in rumen microbial systems. Pyrosequencing based metagenomics was applied to analyze phylogenetic and metabolic profiles by MG-RAST, a web based tool. Pyrosequencing of camel rumen sample yielded 8,979,755 nucleotides assembled to 41,905 sequence reads with an average read length of 214 nucleotides. Taxonomic analysis of metagenomic reads indicated Bacteroidetes (55.5 %), Firmicutes (22.7 %) and Proteobacteria (9.2 %) phyla as predominant camel rumen taxa. At a finer phylogenetic resolution, Bacteroides species dominated the camel rumen metagenome. Functional analysis revealed that clustering-based subsystem and carbohydrate metabolism were the most abundant SEED subsystem representing 17 and 13 % of camel metagenome, respectively. A high taxonomic and functional similarity of camel rumen was found with the cow metagenome which is not surprising given the fact that both are mammalian herbivores with similar digestive tract structures and functions. Combined pyrosequencing approach and subsystems-based annotations available in the SEED database allowed us access to understand the metabolic potential of these microbiomes. Altogether, these data suggest that agricultural and animal husbandry practices can impose significant selective pressures on the rumen microbiota regardless of rumen type. The present study provides a baseline for understanding the complexity of camel rumen microbial ecology while also highlighting striking similarities and differences when compared to other animal gastrointestinal environments.

  9. Comparison of reduced metagenome and 16S rRNA gene sequencing for determination of genetic diversity and mother-child overlap of the gut associated microbiota.

    PubMed

    Ravi, Anuradha; Avershina, Ekaterina; Angell, Inga Leena; Ludvigsen, Jane; Manohar, Prasanth; Padmanaban, Sumathi; Nachimuthu, Ramesh; Snipen, Lars; Rudi, Knut

    2018-06-01

    Use of the 16S rRNA gene in microbiota studies is limited by the lack of taxonomic and functional resolution. High resolution analyses are particularly important for understanding transmission and persistence of bacteria. The aim of our work was therefore to compare a novel reduced metagenome sequencing (RMS) approach with 16S rRNA gene sequencing to determine both the metagenome genetic diversity and the mother-to-child sharing of the microbiota in a cohort of 17 mother-child pairs. We found that although both approaches gave comparable results with respect to sample separation and taxonomy, RMS gave higher resolution and the potential for genomic-/functional assignment. Using RMS we estimated that the metagenome size increased from about 60 Mbp for 4-day-old children to about 225 Mbp for mothers. The 4-day-old children shared 7% of the metagenome sequences with the mothers, while the metagenome sequence sharing was >30% among the mothers. We found 15 genomes shared across >50% of the mothers, of which 10 belonged to Clostridia. Only Bacteroides showed a direct mother-child association, with B. vulgatus being abundant in both 4-day-old children and mothers. For the functional assignments, we identified a significant association between antibiotic usage during labor, and quantity of Fosfomycin resistance genes. In conclusion, our results show a higher functional and taxonomic resolution for RMS compared to 16S rRNA gene sequencing, where RMS enabled a detailed description of mother to child gut microbiota transmission - supporting a late recruitment of most gut bacteria and an effect of antibiotic treatment during labor on infant antibiotic resistance gene patterns. Copyright © 2018. Published by Elsevier B.V.

  10. Comparative modeling and benchmarking data sets for human histone deacetylases and sirtuin families.

    PubMed

    Xia, Jie; Tilahun, Ermias Lemma; Kebede, Eyob Hailu; Reid, Terry-Elinor; Zhang, Liangren; Wang, Xiang Simon

    2015-02-23

    Histone deacetylases (HDACs) are an important class of drug targets for the treatment of cancers, neurodegenerative diseases, and other types of diseases. Virtual screening (VS) has become fairly effective approaches for drug discovery of novel and highly selective histone deacetylase inhibitors (HDACIs). To facilitate the process, we constructed maximal unbiased benchmarking data sets for HDACs (MUBD-HDACs) using our recently published methods that were originally developed for building unbiased benchmarking sets for ligand-based virtual screening (LBVS). The MUBD-HDACs cover all four classes including Class III (Sirtuins family) and 14 HDAC isoforms, composed of 631 inhibitors and 24609 unbiased decoys. Its ligand sets have been validated extensively as chemically diverse, while the decoy sets were shown to be property-matching with ligands and maximal unbiased in terms of "artificial enrichment" and "analogue bias". We also conducted comparative studies with DUD-E and DEKOIS 2.0 sets against HDAC2 and HDAC8 targets and demonstrate that our MUBD-HDACs are unique in that they can be applied unbiasedly to both LBVS and SBVS approaches. In addition, we defined a novel metric, i.e. NLBScore, to detect the "2D bias" and "LBVS favorable" effect within the benchmarking sets. In summary, MUBD-HDACs are the only comprehensive and maximal-unbiased benchmark data sets for HDACs (including Sirtuins) that are available so far. MUBD-HDACs are freely available at http://www.xswlab.org/ .

  11. Human milk metagenome: a functional capacity analysis

    PubMed Central

    2013-01-01

    Background Human milk contains a diverse population of bacteria that likely influences colonization of the infant gastrointestinal tract. Recent studies, however, have been limited to characterization of this microbial community by 16S rRNA analysis. In the present study, a metagenomic approach using Illumina sequencing of a pooled milk sample (ten donors) was employed to determine the genera of bacteria and the types of bacterial open reading frames in human milk that may influence bacterial establishment and stability in this primal food matrix. The human milk metagenome was also compared to that of breast-fed and formula-fed infants’ feces (n = 5, each) and mothers’ feces (n = 3) at the phylum level and at a functional level using open reading frame abundance. Additionally, immune-modulatory bacterial-DNA motifs were also searched for within human milk. Results The bacterial community in human milk contained over 360 prokaryotic genera, with sequences aligning predominantly to the phyla of Proteobacteria (65%) and Firmicutes (34%), and the genera of Pseudomonas (61.1%), Staphylococcus (33.4%) and Streptococcus (0.5%). From assembled human milk-derived contigs, 30,128 open reading frames were annotated and assigned to functional categories. When compared to the metagenome of infants’ and mothers’ feces, the human milk metagenome was less diverse at the phylum level, and contained more open reading frames associated with nitrogen metabolism, membrane transport and stress response (P < 0.05). The human milk metagenome also contained a similar occurrence of immune-modulatory DNA motifs to that of infants’ and mothers’ fecal metagenomes. Conclusions Our results further expand the complexity of the human milk metagenome and enforce the benefits of human milk ingestion on the microbial colonization of the infant gut and immunity. Discovery of immune-modulatory motifs in the metagenome of human milk indicates more exhaustive analyses of the functionality of the human milk metagenome are warranted. PMID:23705844

  12. Metagenomics approach to the study of the gut microbiome structure and function in zebrafish Danio rerio fed with gluten formulated diet.

    PubMed

    Koo, Hyunmin; Hakim, Joseph A; Powell, Mickie L; Kumar, Ranjit; Eipers, Peter G; Morrow, Casey D; Crowley, Michael; Lefkowitz, Elliot J; Watts, Stephen A; Bej, Asim K

    2017-04-01

    In this study, we report the gut microbial composition and predictive functional profiles of zebrafish, Danio rerio, fed with a control formulated diet (CFD), and a gluten formulated diet (GFD) using a metagenomics approach and bioinformatics tools. The microbial communities of the GFD-fed D. rerio displayed heightened abundances of Legionellales, Rhizobiaceae, and Rhodobacter, as compared to the CFD-fed counterparts. Predicted metagenomics of microbial communities (PICRUSt) in GFD-fed D. rerio showed KEGG functional categories corresponding to bile secretion, secondary bile acid biosynthesis, and the metabolism of glycine, serine, and threonine. The CFD-fed D. rerio exhibited KEGG functional categories of bacteria-mediated cobalamin biosynthesis, which was supported by the presence of cobalamin synthesizers such as Bacteroides and Lactobacillus. Though these bacteria were absent in GFD-fed D. rerio, a comparable level of the cobalamin biosynthesis KEGG functional category was observed, which could be contributed by the compensatory enrichment of Cetobacterium. Based on these results, we conclude D. rerio to be a suitable alternative animal model for the use of a targeted metagenomics approach along with bioinformatics tools to further investigate the relationship between the gluten diet and microbiome profile in the gut ecosystem leading to gastrointestinal diseases and other undesired adverse health effects. Copyright © 2017. Published by Elsevier B.V.

  13. Metasecretome-selective phage display approach for mining the functional potential of a rumen microbial community.

    PubMed

    Ciric, Milica; Moon, Christina D; Leahy, Sinead C; Creevey, Christopher J; Altermann, Eric; Attwood, Graeme T; Rakonjac, Jasna; Gagic, Dragana

    2014-05-12

    In silico, secretome proteins can be predicted from completely sequenced genomes using various available algorithms that identify membrane-targeting sequences. For metasecretome (collection of surface, secreted and transmembrane proteins from environmental microbial communities) this approach is impractical, considering that the metasecretome open reading frames (ORFs) comprise only 10% to 30% of total metagenome, and are poorly represented in the dataset due to overall low coverage of metagenomic gene pool, even in large-scale projects. By combining secretome-selective phage display and next-generation sequencing, we focused the sequence analysis of complex rumen microbial community on the metasecretome component of the metagenome. This approach achieved high enrichment (29 fold) of secreted fibrolytic enzymes from the plant-adherent microbial community of the bovine rumen. In particular, we identified hundreds of heretofore rare modules belonging to cellulosomes, cell-surface complexes specialised for recognition and degradation of the plant fibre. As a method, metasecretome phage display combined with next-generation sequencing has a power to sample the diversity of low-abundance surface and secreted proteins that would otherwise require exceptionally large metagenomic sequencing projects. As a resource, metasecretome display library backed by the dataset obtained by next-generation sequencing is ready for i) affinity selection by standard phage display methodology and ii) easy purification of displayed proteins as part of the virion for individual functional analysis.

  14. Maximal Unbiased Benchmarking Data Sets for Human Chemokine Receptors and Comparative Analysis.

    PubMed

    Xia, Jie; Reid, Terry-Elinor; Wu, Song; Zhang, Liangren; Wang, Xiang Simon

    2018-05-29

    Chemokine receptors (CRs) have long been druggable targets for the treatment of inflammatory diseases and HIV-1 infection. As a powerful technique, virtual screening (VS) has been widely applied to identifying small molecule leads for modern drug targets including CRs. For rational selection of a wide variety of VS approaches, ligand enrichment assessment based on a benchmarking data set has become an indispensable practice. However, the lack of versatile benchmarking sets for the whole CRs family that are able to unbiasedly evaluate every single approach including both structure- and ligand-based VS somewhat hinders modern drug discovery efforts. To address this issue, we constructed Maximal Unbiased Benchmarking Data sets for human Chemokine Receptors (MUBD-hCRs) using our recently developed tools of MUBD-DecoyMaker. The MUBD-hCRs encompasses 13 subtypes out of 20 chemokine receptors, composed of 404 ligands and 15756 decoys so far and is readily expandable in the future. It had been thoroughly validated that MUBD-hCRs ligands are chemically diverse while its decoys are maximal unbiased in terms of "artificial enrichment", "analogue bias". In addition, we studied the performance of MUBD-hCRs, in particular CXCR4 and CCR5 data sets, in ligand enrichment assessments of both structure- and ligand-based VS approaches in comparison with other benchmarking data sets available in the public domain and demonstrated that MUBD-hCRs is very capable of designating the optimal VS approach. MUBD-hCRs is a unique and maximal unbiased benchmarking set that covers major CRs subtypes so far.

  15. Ensemble-based classification approach for micro-RNA mining applied on diverse metagenomic sequences.

    PubMed

    ElGokhy, Sherin M; ElHefnawi, Mahmoud; Shoukry, Amin

    2014-05-06

    MicroRNAs (miRNAs) are endogenous ∼22 nt RNAs that are identified in many species as powerful regulators of gene expressions. Experimental identification of miRNAs is still slow since miRNAs are difficult to isolate by cloning due to their low expression, low stability, tissue specificity and the high cost of the cloning procedure. Thus, computational identification of miRNAs from genomic sequences provide a valuable complement to cloning. Different approaches for identification of miRNAs have been proposed based on homology, thermodynamic parameters, and cross-species comparisons. The present paper focuses on the integration of miRNA classifiers in a meta-classifier and the identification of miRNAs from metagenomic sequences collected from different environments. An ensemble of classifiers is proposed for miRNA hairpin prediction based on four well-known classifiers (Triplet SVM, Mipred, Virgo and EumiR), with non-identical features, and which have been trained on different data. Their decisions are combined using a single hidden layer neural network to increase the accuracy of the predictions. Our ensemble classifier achieved 89.3% accuracy, 82.2% f-measure, 74% sensitivity, 97% specificity, 92.5% precision and 88.2% negative predictive value when tested on real miRNA and pseudo sequence data. The area under the receiver operating characteristic curve of our classifier is 0.9 which represents a high performance index.The proposed classifier yields a significant performance improvement relative to Triplet-SVM, Virgo and EumiR and a minor refinement over MiPred.The developed ensemble classifier is used for miRNA prediction in mine drainage, groundwater and marine metagenomic sequences downloaded from the NCBI sequence reed archive. By consulting the miRBase repository, 179 miRNAs have been identified as highly probable miRNAs. Our new approach could thus be used for mining metagenomic sequences and finding new and homologous miRNAs. The paper investigates a computational tool for miRNA prediction in genomic or metagenomic data. It has been applied on three metagenomic samples from different environments (mine drainage, groundwater and marine metagenomic sequences). The prediction results provide a set of extremely potential miRNA hairpins for cloning prediction methods. Among the ensemble prediction obtained results there are pre-miRNA candidates that have been validated using miRbase while they have not been recognized by some of the base classifiers.

  16. Viral metagenomics, protein structure, and reverse genetics: Key strategies for investigating coronaviruses.

    PubMed

    Johnson, Bryan A; Graham, Rachel L; Menachery, Vineet D

    2018-04-01

    Viral metagenomics, modeling of protein structure, and manipulation of viral genetics are key approaches that have laid the foundations of our understanding of coronavirus biology. In this review, we discuss the major advances each method has provided and discuss how future studies should leverage these strategies synergistically to answer novel questions. Copyright © 2017 Elsevier Inc. All rights reserved.

  17. Genomic and metagenomic challenges and opportunities for bioleaching: a mini-review.

    PubMed

    Cárdenas, Juan Pablo; Quatrini, Raquel; Holmes, David S

    2016-09-01

    High-throughput genomic technologies are accelerating progress in understanding the diversity of microbial life in many environments. Here we highlight advances in genomics and metagenomics of microorganisms from bioleaching heaps and related acidic mining environments. Bioleaching heaps used for copper recovery provide significant opportunities to study the processes and mechanisms underlying microbial successions and the influence of community composition on ecosystem functioning. Obtaining quantitative and process-level knowledge of these dynamics is pivotal for understanding how microorganisms contribute to the solubilization of copper for industrial recovery. Advances in DNA sequencing technology provide unprecedented opportunities to obtain information about the genomes of bioleaching microorganisms, allowing predictive models of metabolic potential and ecosystem-level interactions to be constructed. These approaches are enabling predictive phenotyping of organisms many of which are recalcitrant to genetic approaches or are unculturable. This mini-review describes current bioleaching genomic and metagenomic projects and addresses the use of genome information to: (i) build metabolic models; (ii) predict microbial interactions; (iii) estimate genetic diversity; and (iv) study microbial evolution. Key challenges and perspectives of bioleaching genomics/metagenomics are addressed. Copyright © 2016 The Author(s). Published by Elsevier Masson SAS.. All rights reserved.

  18. Metagenomic Approaches to Assess Bacteriophages in Various Environmental Niches

    PubMed Central

    Hayes, Stephen; Mahony, Jennifer; Nauta, Arjen; van Sinderen, Douwe

    2017-01-01

    Bacteriophages are ubiquitous and numerous parasites of bacteria and play a critical evolutionary role in virtually every ecosystem, yet our understanding of the extent of the diversity and role of phages remains inadequate for many ecological niches, particularly in cases in which the host is unculturable. During the past 15 years, the emergence of the field of viral metagenomics has drastically enhanced our ability to analyse the so-called viral ‘dark matter’ of the biosphere. Here, we review the evolution of viral metagenomic methodologies, as well as providing an overview of some of the most significant applications and findings in this field of research. PMID:28538703

  19. Metagenomic gene annotation by a homology-independent approach

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Froula, Jeff; Zhang, Tao; Salmeen, Annette

    2011-06-02

    Fully understanding the genetic potential of a microbial community requires functional annotation of all the genes it encodes. The recently developed deep metagenome sequencing approach has enabled rapid identification of millions of genes from a complex microbial community without cultivation. Current homology-based gene annotation fails to detect distantly-related or structural homologs. Furthermore, homology searches with millions of genes are very computational intensive. To overcome these limitations, we developed rhModeller, a homology-independent software pipeline to efficiently annotate genes from metagenomic sequencing projects. Using cellulases and carbonic anhydrases as two independent test cases, we demonstrated that rhModeller is much faster than HMMERmore » but with comparable accuracy, at 94.5percent and 99.9percent accuracy, respectively. More importantly, rhModeller has the ability to detect novel proteins that do not share significant homology to any known protein families. As {approx}50percent of the 2 million genes derived from the cow rumen metagenome failed to be annotated based on sequence homology, we tested whether rhModeller could be used to annotate these genes. Preliminary results suggest that rhModeller is robust in the presence of missense and frameshift mutations, two common errors in metagenomic genes. Applying the pipeline to the cow rumen genes identified 4,990 novel cellulases candidates and 8,196 novel carbonic anhydrase candidates.In summary, we expect rhModeller to dramatically increase the speed and quality of metagnomic gene annotation.« less

  20. Tentacle: distributed quantification of genes in metagenomes.

    PubMed

    Boulund, Fredrik; Sjögren, Anders; Kristiansson, Erik

    2015-01-01

    In metagenomics, microbial communities are sequenced at increasingly high resolution, generating datasets with billions of DNA fragments. Novel methods that can efficiently process the growing volumes of sequence data are necessary for the accurate analysis and interpretation of existing and upcoming metagenomes. Here we present Tentacle, which is a novel framework that uses distributed computational resources for gene quantification in metagenomes. Tentacle is implemented using a dynamic master-worker approach in which DNA fragments are streamed via a network and processed in parallel on worker nodes. Tentacle is modular, extensible, and comes with support for six commonly used sequence aligners. It is easy to adapt Tentacle to different applications in metagenomics and easy to integrate into existing workflows. Evaluations show that Tentacle scales very well with increasing computing resources. We illustrate the versatility of Tentacle on three different use cases. Tentacle is written for Linux in Python 2.7 and is published as open source under the GNU General Public License (v3). Documentation, tutorials, installation instructions, and the source code are freely available online at: http://bioinformatics.math.chalmers.se/tentacle.

  1. Marine Metagenome as A Resource for Novel Enzymes.

    PubMed

    Alma'abadi, Amani D; Gojobori, Takashi; Mineta, Katsuhiko

    2015-10-01

    More than 99% of identified prokaryotes, including many from the marine environment, cannot be cultured in the laboratory. This lack of capability restricts our knowledge of microbial genetics and community ecology. Metagenomics, the culture-independent cloning of environmental DNAs that are isolated directly from an environmental sample, has already provided a wealth of information about the uncultured microbial world. It has also facilitated the discovery of novel biocatalysts by allowing researchers to probe directly into a huge diversity of enzymes within natural microbial communities. Recent advances in these studies have led to a great interest in recruiting microbial enzymes for the development of environmentally-friendly industry. Although the metagenomics approach has many limitations, it is expected to provide not only scientific insights but also economic benefits, especially in industry. This review highlights the importance of metagenomics in mining microbial lipases, as an example, by using high-throughput techniques. In addition, we discuss challenges in the metagenomics as an important part of bioinformatics analysis in big data. Copyright © 2015 The Authors. Production and hosting by Elsevier Ltd.. All rights reserved.

  2. An integrated metagenomics pipeline for strain profiling reveals novel patterns of bacterial transmission and biogeography

    PubMed Central

    Nayfach, Stephen; Rodriguez-Mueller, Beltran; Garud, Nandita

    2016-01-01

    We present the Metagenomic Intra-species Diversity Analysis System (MIDAS), which is an integrated computational pipeline for quantifying bacterial species abundance and strain-level genomic variation, including gene content and single-nucleotide polymorphisms (SNPs), from shotgun metagenomes. Our method leverages a database of more than 30,000 bacterial reference genomes that we clustered into species groups. These cover the majority of abundant species in the human microbiome but only a small proportion of microbes in other environments, including soil and seawater. We applied MIDAS to stool metagenomes from 98 Swedish mothers and their infants over one year and used rare SNPs to track strains between hosts. Using this approach, we found that although species compositions of mothers and infants converged over time, strain-level similarity diverged. Specifically, early colonizing bacteria were often transmitted from an infant’s mother, while late colonizing bacteria were often transmitted from other sources in the environment and were enriched for spore-formation genes. We also applied MIDAS to 198 globally distributed marine metagenomes and used gene content to show that many prevalent bacterial species have population structure that correlates with geographic location. Strain-level genetic variants present in metagenomes clearly reveal extensive structure and dynamics that are obscured when data are analyzed at a coarser taxonomic resolution. PMID:27803195

  3. An evaluation of the accuracy and speed of metagenome analysis tools

    PubMed Central

    Lindgreen, Stinus; Adair, Karen L.; Gardner, Paul P.

    2016-01-01

    Metagenome studies are becoming increasingly widespread, yielding important insights into microbial communities covering diverse environments from terrestrial and aquatic ecosystems to human skin and gut. With the advent of high-throughput sequencing platforms, the use of large scale shotgun sequencing approaches is now commonplace. However, a thorough independent benchmark comparing state-of-the-art metagenome analysis tools is lacking. Here, we present a benchmark where the most widely used tools are tested on complex, realistic data sets. Our results clearly show that the most widely used tools are not necessarily the most accurate, that the most accurate tool is not necessarily the most time consuming, and that there is a high degree of variability between available tools. These findings are important as the conclusions of any metagenomics study are affected by errors in the predicted community composition and functional capacity. Data sets and results are freely available from http://www.ucbioinformatics.org/metabenchmark.html PMID:26778510

  4. Marine metagenomics: strategies for the discovery of novel enzymes with biotechnological applications from marine environments

    PubMed Central

    Kennedy, Jonathan; Marchesi, Julian R; Dobson, Alan DW

    2008-01-01

    Metagenomic based strategies have previously been successfully employed as powerful tools to isolate and identify enzymes with novel biocatalytic activities from the unculturable component of microbial communities from various terrestrial environmental niches. Both sequence based and function based screening approaches have been employed to identify genes encoding novel biocatalytic activities and metabolic pathways from metagenomic libraries. While much of the focus to date has centred on terrestrial based microbial ecosystems, it is clear that the marine environment has enormous microbial biodiversity that remains largely unstudied. Marine microbes are both extremely abundant and diverse; the environments they occupy likewise consist of very diverse niches. As culture-dependent methods have thus far resulted in the isolation of only a tiny percentage of the marine microbiota the application of metagenomic strategies holds great potential to study and exploit the enormous microbial biodiversity which is present within these marine environments. PMID:18717988

  5. Recovery of a Medieval Brucella melitensis Genome Using Shotgun Metagenomics

    PubMed Central

    Kay, Gemma L.; Sergeant, Martin J.; Giuffra, Valentina; Bandiera, Pasquale; Milanese, Marco; Bramanti, Barbara

    2014-01-01

    ABSTRACT Shotgun metagenomics provides a powerful assumption-free approach to the recovery of pathogen genomes from contemporary and historical material. We sequenced the metagenome of a calcified nodule from the skeleton of a 14th-century middle-aged male excavated from the medieval Sardinian settlement of Geridu. We obtained 6.5-fold coverage of a Brucella melitensis genome. Sequence reads from this genome showed signatures typical of ancient or aged DNA. Despite the relatively low coverage, we were able to use information from single-nucleotide polymorphisms to place the medieval pathogen genome within a clade of B. melitensis strains that included the well-studied Ether strain and two other recent Italian isolates. We confirmed this placement using information from deletions and IS711 insertions. We conclude that metagenomics stands ready to document past and present infections, shedding light on the emergence, evolution, and spread of microbial pathogens. PMID:25028426

  6. Self-organizing approach for meta-genomes.

    PubMed

    Zhu, Jianfeng; Zheng, Wei-Mou

    2014-12-01

    We extend the self-organizing approach for annotation of a bacterial genome to analyze the raw sequencing data of the human gut metagenome without sequence assembling. The original approach divides the genomic sequence of a bacterium into non-overlapping segments of equal length and assigns to each segment one of seven 'phases', among which one is for the noncoding regions, three for the direct coding regions to indicate the three possible codon positions of the segment starting site, and three for the reverse coding regions. The noncoding phase and the six coding phases are described by two frequency tables of the 64 triplet types or 'codon usages'. A set of codon usages can be used to update the phase assignment and vice versa. An iteration after an initialization leads to a convergent phase assignment to give an annotation of the genome. In the extension of the approach to a metagenome, we consider a mixture model of a number of categories described by different codon usages. The Illumina Genome Analyzer sequencing data of the total DNA from faecal samples are then examined to understand the diversity of the human gut microbiome. Copyright © 2014 Elsevier Ltd. All rights reserved.

  7. Phylogenetic screening of a bacterial, metagenomic library using homing endonuclease restriction and marker insertion

    PubMed Central

    Yung, Pui Yi; Burke, Catherine; Lewis, Matt; Egan, Suhelen; Kjelleberg, Staffan; Thomas, Torsten

    2009-01-01

    Metagenomics provides access to the uncultured majority of the microbial world. The approaches employed in this field have, however, had limited success in linking functional genes to the taxonomic or phylogenetic origin of the organism they belong to. Here we present an efficient strategy to recover environmental DNA fragments that contain phylogenetic marker genes from metagenomic libraries. Our method involves the cleavage of 23S ribsosmal RNA (rRNA) genes within pooled library clones by the homing endonuclease I-CeuI followed by the insertion and selection of an antibiotic resistance cassette. This approach was applied to screen a library of 6500 fosmid clones derived from the microbial community associated with the sponge Cymbastela concentrica. Several fosmid clones were recovered after the screen and detailed phylogenetic and taxonomic assignment based on the rRNA gene showed that they belong to previously unknown organisms. In addition, compositional features of these fosmid clones were used to classify and taxonomically assign a dataset of environmental shotgun sequences. Our approach represents a valuable tool for the analysis of rapidly increasing, environmental DNA sequencing information. PMID:19767618

  8. Comparative Modeling and Benchmarking Data Sets for Human Histone Deacetylases and Sirtuin Families

    PubMed Central

    Xia, Jie; Tilahun, Ermias Lemma; Kebede, Eyob Hailu; Reid, Terry-Elinor; Zhang, Liangren; Wang, Xiang Simon

    2015-01-01

    Histone Deacetylases (HDACs) are an important class of drug targets for the treatment of cancers, neurodegenerative diseases and other types of diseases. Virtual screening (VS) has become fairly effective approaches for drug discovery of novel and highly selective Histone Deacetylases Inhibitors (HDACIs). To facilitate the process, we constructed the Maximal Unbiased Benchmarking Data Sets for HDACs (MUBD-HDACs) using our recently published methods that were originally developed for building unbiased benchmarking sets for ligand-based virtual screening (LBVS). The MUBD-HDACs covers all 4 Classes including Class III (Sirtuins family) and 14 HDACs isoforms, composed of 631 inhibitors and 24,609 unbiased decoys. Its ligand sets have been validated extensively as chemically diverse, while the decoy sets were shown to be property-matching with ligands and maximal unbiased in terms of “artificial enrichment” and “analogue bias”. We also conducted comparative studies with DUD-E and DEKOIS 2.0 sets against HDAC2 and HDAC8 targets, and demonstrate that our MUBD-HDACs is unique in that it can be applied unbiasedly to both LBVS and SBVS approaches. In addition, we defined a novel metric, i.e. NLBScore, to detect the “2D bias” and “LBVS favorable” effect within the benchmarking sets. In summary, MUBD-HDACs is the only comprehensive and maximal-unbiased benchmark data sets for HDACs (including Sirtuins) that is available so far. MUBD-HDACs is freely available at http://www.xswlab.org/. PMID:25633490

  9. Generalized approach for using unbiased symmetric metrics with negative values: normalized mean bias factor and normalized mean absolute error factor

    EPA Science Inventory

    Unbiased symmetric metrics provide a useful measure to quickly compare two datasets, with similar interpretations for both under and overestimations. Two examples include the normalized mean bias factor and normalized mean absolute error factor. However, the original formulations...

  10. Snapshot of the Eukaryotic Gene Expression in Muskoxen Rumen—A Metatranscriptomic Approach

    PubMed Central

    O'Toole, Nicholas; Barboza, Perry S.; Ungerfeld, Emilio; Leigh, Mary Beth; Selinger, L. Brent; Butler, Greg; Tsang, Adrian; McAllister, Tim A.; Forster, Robert J.

    2011-01-01

    Background Herbivores rely on digestive tract lignocellulolytic microorganisms, including bacteria, fungi and protozoa, to derive energy and carbon from plant cell wall polysaccharides. Culture independent metagenomic studies have been used to reveal the genetic content of the bacterial species within gut microbiomes. However, the nature of the genes encoded by eukaryotic protozoa and fungi within these environments has not been explored using metagenomic or metatranscriptomic approaches. Methodology/Principal Findings In this study, a metatranscriptomic approach was used to investigate the functional diversity of the eukaryotic microorganisms within the rumen of muskoxen (Ovibos moschatus), with a focus on plant cell wall degrading enzymes. Polyadenylated RNA (mRNA) was sequenced on the Illumina Genome Analyzer II system and 2.8 gigabases of sequences were obtained and 59129 contigs assembled. Plant cell wall degrading enzyme modules including glycoside hydrolases, carbohydrate esterases and polysaccharide lyases were identified from over 2500 contigs. These included a number of glycoside hydrolase family 6 (GH6), GH48 and swollenin modules, which have rarely been described in previous gut metagenomic studies. Conclusions/Significance The muskoxen rumen metatranscriptome demonstrates a much higher percentage of cellulase enzyme discovery and an 8.7x higher rate of total carbohydrate active enzyme discovery per gigabase of sequence than previous rumen metagenomes. This study provides a snapshot of eukaryotic gene expression in the muskoxen rumen, and identifies a number of candidate genes coding for potentially valuable lignocellulolytic enzymes. PMID:21655220

  11. Constructing and Screening a Metagenomic Library of a Cold and Alkaline Extreme Environment.

    PubMed

    Glaring, Mikkel A; Vester, Jan K; Stougaard, Peter

    2017-01-01

    Natural cold or alkaline environments are common on Earth. A rare combination of these two extremes is found in the permanently cold (less than 6 °C) and alkaline (pH above 10) ikaite columns in the Ikka Fjord in Southern Greenland. Bioprospecting efforts have established the ikaite columns as a source of bacteria and enzymes adapted to these conditions. They have also highlighted the limitations of cultivation-based methods in this extreme environment and metagenomic approaches may provide access to novel extremophilic enzymes from the uncultured majority of bacteria. Here, we describe the construction and screening of a metagenomic library of the prokaryotic community inhabiting the ikaite columns.

  12. Phylogenetic convolutional neural networks in metagenomics.

    PubMed

    Fioravanti, Diego; Giarratano, Ylenia; Maggio, Valerio; Agostinelli, Claudio; Chierici, Marco; Jurman, Giuseppe; Furlanello, Cesare

    2018-03-08

    Convolutional Neural Networks can be effectively used only when data are endowed with an intrinsic concept of neighbourhood in the input space, as is the case of pixels in images. We introduce here Ph-CNN, a novel deep learning architecture for the classification of metagenomics data based on the Convolutional Neural Networks, with the patristic distance defined on the phylogenetic tree being used as the proximity measure. The patristic distance between variables is used together with a sparsified version of MultiDimensional Scaling to embed the phylogenetic tree in a Euclidean space. Ph-CNN is tested with a domain adaptation approach on synthetic data and on a metagenomics collection of gut microbiota of 38 healthy subjects and 222 Inflammatory Bowel Disease patients, divided in 6 subclasses. Classification performance is promising when compared to classical algorithms like Support Vector Machines and Random Forest and a baseline fully connected neural network, e.g. the Multi-Layer Perceptron. Ph-CNN represents a novel deep learning approach for the classification of metagenomics data. Operatively, the algorithm has been implemented as a custom Keras layer taking care of passing to the following convolutional layer not only the data but also the ranked list of neighbourhood of each sample, thus mimicking the case of image data, transparently to the user.

  13. A retrospective metagenomics approach to studying Blastocystis.

    PubMed

    Andersen, Lee O'Brien; Bonde, Ida; Nielsen, Henrik Bjørn; Stensvold, Christen Rune

    2015-07-01

    Blastocystis is a common single-celled intestinal parasitic genus, comprising several subtypes. Here, we screened data obtained by metagenomic analysis of faecal DNA for Blastocystis by searching for subtype-specific genes in coabundance gene groups, which are groups of genes that covary across a selection of 316 human faecal samples, hence representing genes originating from a single subtype. The 316 faecal samples were from 236 healthy individuals, 13 patients with Crohn's disease (CD) and 67 patients with ulcerative colitis (UC). The prevalence of Blastocystis was 20.3% in the healthy individuals and 14.9% in patients with UC. Meanwhile, Blastocystis was absent in patients with CD. Individuals with intestinal microbiota dominated by Bacteroides were much less prone to having Blastocystis-positive stool (Matthew's correlation coefficient = -0.25, P < 0.0001) than individuals with Ruminococcus- and Prevotella-driven enterotypes. This is the first study to investigate the relationship between Blastocystis and communities of gut bacteria using a metagenomics approach. The study serves as an example of how it is possible to retrospectively investigate microbial eukaryotic communities in the gut using metagenomic datasets targeting the bacterial component of the intestinal microbiome and the interplay between these microbial communities. © FEMS 2015. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  14. Unsupervised discovery of microbial population structure within metagenomes using nucleotide base composition

    PubMed Central

    Saeed, Isaam; Tang, Sen-Lin; Halgamuge, Saman K.

    2012-01-01

    An approach to infer the unknown microbial population structure within a metagenome is to cluster nucleotide sequences based on common patterns in base composition, otherwise referred to as binning. When functional roles are assigned to the identified populations, a deeper understanding of microbial communities can be attained, more so than gene-centric approaches that explore overall functionality. In this study, we propose an unsupervised, model-based binning method with two clustering tiers, which uses a novel transformation of the oligonucleotide frequency-derived error gradient and GC content to generate coarse groups at the first tier of clustering; and tetranucleotide frequency to refine these groups at the secondary clustering tier. The proposed method has a demonstrated improvement over PhyloPythia, S-GSOM, TACOA and TaxSOM on all three benchmarks that were used for evaluation in this study. The proposed method is then applied to a pyrosequenced metagenomic library of mud volcano sediment sampled in southwestern Taiwan, with the inferred population structure validated against complementary sequencing of 16S ribosomal RNA marker genes. Finally, the proposed method was further validated against four publicly available metagenomes, including a highly complex Antarctic whale-fall bone sample, which was previously assumed to be too complex for binning prior to functional analysis. PMID:22180538

  15. Integrated metagenomic analysis of the rumen microbiome of cattle reveals key biological mechanisms associated with methane traits.

    PubMed

    Wang, Haiying; Zheng, Huiru; Browne, Fiona; Roehe, Rainer; Dewhurst, Richard J; Engel, Felix; Hemmje, Matthias; Lu, Xiangwu; Walsh, Paul

    2017-07-15

    Methane is one of the major contributors to global warming. The rumen microbiota is directly involved in methane production in cattle. The link between variation in rumen microbial communities and host genetics has important applications and implications in bioscience. Having the potential to reveal the full extent of microbial gene diversity and complex microbial interactions, integrated metagenomics and network analysis holds great promise in this endeavour. This study investigates the rumen microbial community in cattle through the integration of metagenomic and network-based approaches. Based on the relative abundance of 1570 microbial genes identified in a metagenomics analysis, the co-abundance network was constructed and functional modules of microbial genes were identified. One of the main contributions is to develop a random matrix theory-based approach to automatically determining the correlation threshold used to construct the co-abundance network. The resulting network, consisting of 549 microbial genes and 3349 connections, exhibits a clear modular structure with certain trait-specific genes highly over-represented in modules. More specifically, all the 20 genes previously identified to be associated with methane emissions are found in a module (hypergeometric test, p<10 -11 ). One third of genes are involved in methane metabolism pathways. The further examination of abundance profiles across 8 samples of genes highlights that the revealed pattern of metagenomics abundance has a strong association with methane emissions. Furthermore, the module is significantly enriched with microbial genes encoding enzymes that are directly involved in methanogenesis (hypergeometric test, p<10 -9 ). Copyright © 2017 Elsevier Inc. All rights reserved.

  16. Sequence-based screening for self-sufficient P450 monooxygenase from a metagenome library.

    PubMed

    Kim, B S; Kim, S Y; Park, J; Park, W; Hwang, K Y; Yoon, Y J; Oh, W K; Kim, B Y; Ahn, J S

    2007-05-01

    Cytochrome P450 monooxygenases (CYPs) are useful catalysts for oxidation reactions. Self-sufficient CYPs harbour a reductive domain covalently connected to a P450 domain and are known for their robust catalytic activity with great potential as biocatalysts. In an effort to expand genetic sources of self-sufficient CYPs, we devised a sequence-based screening system to identify them in a soil metagenome. We constructed a soil metagenome library and performed sequence-based screening for self-sufficient CYP genes. A new CYP gene, syk181, was identified from the metagenome library. Phylogenetic analysis revealed that SYK181 formed a distinct phylogenic line with 46% amino-acid-sequence identity to CYP102A1 which has been extensively studied as a fatty acid hydroxylase. The heterologously expressed SYK181 showed significant hydroxylase activity towards naphthalene and phenanthrene as well as towards fatty acids. Sequence-based screening of metagenome libraries is expected to be a useful approach for searching self-sufficient CYP genes. The translated product of syk181 shows self-sufficient hydroxylase activity towards fatty acids and aromatic compounds. SYK181 is the first self-sufficient CYP obtained directly from a metagenome library. The genetic and biochemical information on SYK181 are expected to be helpful for engineering self-sufficient CYPs with broader catalytic activities towards various substrates, which would be useful for bioconversion of natural products and biodegradation of organic chemicals.

  17. Caught in the middle with multiple displacement amplification: the myth of pooling for avoiding multiple displacement amplification bias in a metagenome.

    PubMed

    Marine, Rachel; McCarren, Coleen; Vorrasane, Vansay; Nasko, Dan; Crowgey, Erin; Polson, Shawn W; Wommack, K Eric

    2014-01-30

    Shotgun metagenomics has become an important tool for investigating the ecology of microorganisms. Underlying these investigations is the assumption that metagenome sequence data accurately estimates the census of microbial populations. Multiple displacement amplification (MDA) of microbial community DNA is often used in cases where it is difficult to obtain enough DNA for sequencing; however, MDA can result in amplification biases that may impact subsequent estimates of population census from metagenome data. Some have posited that pooling replicate MDA reactions negates these biases and restores the accuracy of population analyses. This assumption has not been empirically tested. Using mock viral communities, we examined the influence of pooling on population-scale analyses. In pooled and single reaction MDA treatments, sequence coverage of viral populations was highly variable and coverage patterns across viral genomes were nearly identical, indicating that initial priming biases were reproducible and that pooling did not alleviate biases. In contrast, control unamplified sequence libraries showed relatively even coverage across phage genomes. MDA should be avoided for metagenomic investigations that require quantitative estimates of microbial taxa and gene functional groups. While MDA is an indispensable technique in applications such as single-cell genomics, amplification biases cannot be overcome by combining replicate MDA reactions. Alternative library preparation techniques should be utilized for quantitative microbial ecology studies utilizing metagenomic sequencing approaches.

  18. A function-based screen for seeking RubisCO active clones from metagenomes: novel enzymes influencing RubisCO activity.

    PubMed

    Böhnke, Stefanie; Perner, Mirjam

    2015-03-01

    Ribulose-1,5-bisphosphate carboxylase/oxygenase (RubisCO) is a key enzyme of the Calvin cycle, which is responsible for most of Earth's primary production. Although research on RubisCO genes and enzymes in plants, cyanobacteria and bacteria has been ongoing for years, still little is understood about its regulation and activation in bacteria. Even more so, hardly any information exists about the function of metagenomic RubisCOs and the role of the enzymes encoded on the flanking DNA owing to the lack of available function-based screens for seeking active RubisCOs from the environment. Here we present the first solely activity-based approach for identifying RubisCO active fosmid clones from a metagenomic library. We constructed a metagenomic library from hydrothermal vent fluids and screened 1056 fosmid clones. Twelve clones exhibited RubisCO activity and the metagenomic fragments resembled genes from Thiomicrospira crunogena. One of these clones was further analyzed. It contained a 35.2 kb metagenomic insert carrying the RubisCO gene cluster and flanking DNA regions. Knockouts of twelve genes and two intergenic regions on this metagenomic fragment demonstrated that the RubisCO activity was significantly impaired and was attributed to deletions in genes encoding putative transcriptional regulators and those believed to be vital for RubisCO activation. Our new technique revealed a novel link between a poorly characterized gene and RubisCO activity. This screen opens the door to directly investigating RubisCO genes and respective enzymes from environmental samples.

  19. Novel Lipolytic Enzymes Identified from Metagenomic Library of Deep-Sea Sediment

    PubMed Central

    Jeon, Jeong Ho; Kim, Jun Tae; Lee, Hyun Sook; Kim, Sang-Jin; Kang, Sung Gyun; Choi, Sang Ho; Lee, Jung-Hyun

    2011-01-01

    Metagenomic library was constructed from a deep-sea sediment sample and screened for lipolytic activity. Open-reading frames of six positive clones showed only 33–58% amino acid identities to the known proteins. One of them was assigned to a new group while others were grouped into Families I and V or EstD Family. By employing a combination of approaches such as removing the signal sequence, coexpression of chaperone genes, and low temperature induction, we obtained five soluble recombinant proteins in Escherichia coli. The purified enzymes had optimum temperatures of 30–35°C and the cold-activity property. Among them, one enzyme showed lipase activity by preferentially hydrolyzing p-nitrophenyl palmitate and p-nitrophenyl stearate and high salt resistance with up to 4 M NaCl. Our research demonstrates the feasibility of developing novel lipolytic enzymes from marine environments by the combination of functional metagenomic approach and protein expression technology. PMID:21845199

  20. FY11 Report on Metagenome Analysis using Pathogen Marker Libraries

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Gardner, Shea N.; Allen, Jonathan E.; McLoughlin, Kevin S.

    2011-06-02

    A method, sequence library, and software suite was invented to rapidly assess whether any member of a pre-specified list of threat organisms or their near neighbors is present in a metagenome. The system was designed to handle mega- to giga-bases of FASTA-formatted raw sequence reads from short or long read next generation sequencing platforms. The approach is to pre-calculate a viral and a bacterial "Pathogen Marker Library" (PML) containing sub-sequences specific to pathogens or their near neighbors. A list of expected matches comparing every bacterial or viral genome against the PML sequences is also pre-calculated. To analyze a metagenome, readsmore » are compared to the PML, and observed PML-metagenome matches are compared to the expected PML-genome matches, and the ratio of observed relative to expected matches is reported. In other words, a 3-way comparison among the PML, metagenome, and existing genome sequences is used to quickly assess which (if any) species included in the PML is likely to be present in the metagenome, based on available sequence data. Our tests showed that the species with the most PML matches correctly indicated the organism sequenced for empirical metagenomes consisting of a cultured, relatively pure isolate. These runs completed in 1 minute to 3 hours on 12 CPU (1 thread/CPU), depending on the metagenome and PML. Using more threads on the same number of CPU resulted in speed improvements roughly proportional to the number of threads. Simulations indicated that detection sensitivity depends on both sequencing coverage levels for a species and the size of the PML: species were correctly detected even at ~0.003x coverage by the large PMLs, and at ~0.03x coverage by the smaller PMLs. Matches to true positive species were 3-4 orders of magnitude higher than to false positives. Simulations with short reads (36 nt and ~260 nt) showed that species were usually detected for metagenome coverage above 0.005x and coverage in the PML above 0.05x, and detection probability appears to be a function of both coverages. Multiple species could be detected simultaneously in a simulated low-coverage, complex metagenome, and the largest PML gave no false negative species and no false positive genera. The presence of multiple species was predicted in a complex metagenome from a human gut microbiome with 1.9 GB of short reads (75 nt); the species predicted were reasonable gut flora and no biothreat agents were detected, showing the feasibility of PML analysis of empirical complex metagenomes.« less

  1. A functional metagenomic approach for expanding the synthetic biology toolbox for biomass conversion

    PubMed Central

    Sommer, Morten OA; Church, George M; Dantas, Gautam

    2010-01-01

    Sustainable biofuel alternatives to fossil fuel energy are hampered by recalcitrance and toxicity of biomass substrates to microbial biocatalysts. To address this issue, we present a culture-independent functional metagenomic platform for mining Nature's vast enzymatic reservoir and show its relevance to biomass conversion. We performed functional selections on 4.7 Gb of metagenomic fosmid libraries and show that genetic elements conferring tolerance toward seven important biomass inhibitors can be identified. We select two metagenomic fosmids that improve the growth of Escherichia coli by 5.7- and 6.9-fold in the presence of inhibitory concentrations of syringaldehyde and 2-furoic acid, respectively, and identify the individual genes responsible for these tolerance phenotypes. Finally, we combine the individual genes to create a three-gene construct that confers tolerance to mixtures of these important biomass inhibitors. This platform presents a route for expanding the repertoire of genetic elements available to synthetic biology and provides a starting point for efforts to engineer robust strains for biofuel generation. PMID:20393580

  2. SUPER-FOCUS: A tool for agile functional analysis of shotgun metagenomic data

    DOE PAGES

    Silva, Genivaldo Gueiros Z.; Green, Kevin T.; Dutilh, Bas E.; ...

    2015-10-09

    Analyzing the functional profile of a microbial community from unannotated shotgun sequencing reads is one of the important goals in metagenomics. Functional profiling has valuable applications in biological research because it identifies the abundances of the functional genes of the organisms present in the original sample, answering the question what they can do. Currently, available tools do not scale well with increasing data volumes, which is important because both the number and lengths of the reads produced by sequencing platforms keep increasing. Here, we introduce SUPER-FOCUS, SUbsystems Profile by databasE Reduction using FOCUS, an agile homology-based approach using a reducedmore » reference database to report the subsystems present in metagenomic datasets and profile their abundances. We tested SUPER-FOCUS with over 70 real metagenomes, the results showing that it accurately predicts the subsystems present in the profiled microbial communities, and is up to 1000 times faster than other tools.« less

  3. Metagenomic approaches to identify and isolate bioactive natural products from microbiota of marine sponges.

    PubMed

    Gurgui, Cristian; Piel, Jörn

    2010-01-01

    Many marine sponges harbor massive consortia of symbiotic bacteria belonging to diverse phyla. Sponges are also an unusually rich source of biologically active natural products, and evidence is accumulating that these compounds might often be synthesized by the symbionts. Since the study of sponge-associated bacteria is generally hampered by very low cultivation rates, cultivation-independent, metagenomic methods have recently been applied to sponges. These methods allow for the isolation of biosynthetic gene clusters that can ultimately be exploited to develop sustainable natural product sources by heterologous expression. However, general challenges encountered in sponge metagenomic research are the poor quality of the isolated DNA with respect to size and yield, the difficulty to identify genes of interest among numerous homologs, insufficient clone numbers in metagenomic libraries, and time-consuming screening procedures to identify and isolate rare positive clones. Here, we give an overview of methods that address these problems and can be used to streamline isolation of biosynthetic and other genes of interest.

  4. Moleculo Long-Read Sequencing Facilitates Assembly and Genomic Binning from Complex Soil Metagenomes

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    White, Richard Allen; Bottos, Eric M.; Roy Chowdhury, Taniya

    ABSTRACT Soil metagenomics has been touted as the “grand challenge” for metagenomics, as the high microbial diversity and spatial heterogeneity of soils make them unamenable to current assembly platforms. Here, we aimed to improve soil metagenomic sequence assembly by applying the Moleculo synthetic long-read sequencing technology. In total, we obtained 267 Gbp of raw sequence data from a native prairie soil; these data included 109.7 Gbp of short-read data (~100 bp) from the Joint Genome Institute (JGI), an additional 87.7 Gbp of rapid-mode read data (~250 bp), plus 69.6 Gbp (>1.5 kbp) from Moleculo sequencing. The Moleculo data alone yielded over 5,600more » reads of >10 kbp in length, and over 95% of the unassembled reads mapped to contigs of >1.5 kbp. Hybrid assembly of all data resulted in more than 10,000 contigs over 10 kbp in length. We mapped three replicate metatranscriptomes derived from the same parent soil to the Moleculo subassembly and found that 95% of the predicted genes, based on their assignments to Enzyme Commission (EC) numbers, were expressed. The Moleculo subassembly also enabled binning of >100 microbial genome bins. We obtained via direct binning the first complete genome, that of “CandidatusPseudomonas sp. strain JKJ-1” from a native soil metagenome. By mapping metatranscriptome sequence reads back to the bins, we found that several bins corresponding to low-relative-abundanceAcidobacteriawere highly transcriptionally active, whereas bins corresponding to high-relative-abundanceVerrucomicrobiawere not. These results demonstrate that Moleculo sequencing provides a significant advance for resolving complex soil microbial communities. IMPORTANCESoil microorganisms carry out key processes for life on our planet, including cycling of carbon and other nutrients and supporting growth of plants. However, there is poor molecular-level understanding of their functional roles in ecosystem stability and responses to environmental perturbations. This knowledge gap is largely due to the difficulty in culturing the majority of soil microbes. Thus, use of culture-independent approaches, such as metagenomics, promises the direct assessment of the functional potential of soil microbiomes. Soil is, however, a challenge for metagenomic assembly due to its high microbial diversity and variable evenness, resulting in low coverage and uneven sampling of microbial genomes. Despite increasingly large soil metagenome data volumes (>200 Gbp), the majority of the data do not assemble. Here, we used the cutting-edge approach of synthetic long-read sequencing technology (Moleculo) to assemble soil metagenome sequence data into long contigs and used the assemblies for binning of genomes. Author Video: Anauthor video summaryof this article is available.« less

  5. Moleculo Long-Read Sequencing Facilitates Assembly and Genomic Binning from Complex Soil Metagenomes

    PubMed Central

    White, Richard Allen; Bottos, Eric M.; Roy Chowdhury, Taniya; Zucker, Jeremy D.; Brislawn, Colin J.; Nicora, Carrie D.; Fansler, Sarah J.; Glaesemann, Kurt R.; Glass, Kevin

    2016-01-01

    ABSTRACT Soil metagenomics has been touted as the “grand challenge” for metagenomics, as the high microbial diversity and spatial heterogeneity of soils make them unamenable to current assembly platforms. Here, we aimed to improve soil metagenomic sequence assembly by applying the Moleculo synthetic long-read sequencing technology. In total, we obtained 267 Gbp of raw sequence data from a native prairie soil; these data included 109.7 Gbp of short-read data (~100 bp) from the Joint Genome Institute (JGI), an additional 87.7 Gbp of rapid-mode read data (~250 bp), plus 69.6 Gbp (>1.5 kbp) from Moleculo sequencing. The Moleculo data alone yielded over 5,600 reads of >10 kbp in length, and over 95% of the unassembled reads mapped to contigs of >1.5 kbp. Hybrid assembly of all data resulted in more than 10,000 contigs over 10 kbp in length. We mapped three replicate metatranscriptomes derived from the same parent soil to the Moleculo subassembly and found that 95% of the predicted genes, based on their assignments to Enzyme Commission (EC) numbers, were expressed. The Moleculo subassembly also enabled binning of >100 microbial genome bins. We obtained via direct binning the first complete genome, that of “Candidatus Pseudomonas sp. strain JKJ-1” from a native soil metagenome. By mapping metatranscriptome sequence reads back to the bins, we found that several bins corresponding to low-relative-abundance Acidobacteria were highly transcriptionally active, whereas bins corresponding to high-relative-abundance Verrucomicrobia were not. These results demonstrate that Moleculo sequencing provides a significant advance for resolving complex soil microbial communities. IMPORTANCE Soil microorganisms carry out key processes for life on our planet, including cycling of carbon and other nutrients and supporting growth of plants. However, there is poor molecular-level understanding of their functional roles in ecosystem stability and responses to environmental perturbations. This knowledge gap is largely due to the difficulty in culturing the majority of soil microbes. Thus, use of culture-independent approaches, such as metagenomics, promises the direct assessment of the functional potential of soil microbiomes. Soil is, however, a challenge for metagenomic assembly due to its high microbial diversity and variable evenness, resulting in low coverage and uneven sampling of microbial genomes. Despite increasingly large soil metagenome data volumes (>200 Gbp), the majority of the data do not assemble. Here, we used the cutting-edge approach of synthetic long-read sequencing technology (Moleculo) to assemble soil metagenome sequence data into long contigs and used the assemblies for binning of genomes. Author Video: An author video summary of this article is available. PMID:27822530

  6. Shotgun metaproteomics of the human distal gut microbiota

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    VerBerkmoes, N.C.; Russell, A.L.; Shah, M.

    2008-10-15

    The human gut contains a dense, complex and diverse microbial community, comprising the gut microbiome. Metagenomics has recently revealed the composition of genes in the gut microbiome, but provides no direct information about which genes are expressed or functioning. Therefore, our goal was to develop a novel approach to directly identify microbial proteins in fecal samples to gain information about the genes expressed and about key microbial functions in the human gut. We used a non-targeted, shotgun mass spectrometry-based whole community proteomics, or metaproteomics, approach for the first deep proteome measurements of thousands of proteins in human fecal samples, thusmore » demonstrating this approach on the most complex sample type to date. The resulting metaproteomes had a skewed distribution relative to the metagenome, with more proteins for translation, energy production and carbohydrate metabolism when compared to what was earlier predicted from metagenomics. Human proteins, including antimicrobial peptides, were also identified, providing a non-targeted glimpse of the host response to the microbiota. Several unknown proteins represented previously undescribed microbial pathways or host immune responses, revealing a novel complex interplay between the human host and its associated microbes.« less

  7. A Metagenomic Approach to Cyanobacterial Genomics

    PubMed Central

    Alvarenga, Danillo O.; Fiore, Marli F.; Varani, Alessandro M.

    2017-01-01

    Cyanobacteria, or oxyphotobacteria, are primary producers that establish ecological interactions with a wide variety of organisms. Although their associations with eukaryotes have received most attention, interactions with bacterial and archaeal symbionts have also been occurring for billions of years. Due to these associations, obtaining axenic cultures of cyanobacteria is usually difficult, and most isolation efforts result in unicyanobacterial cultures containing a number of associated microbes, hence composing a microbial consortium. With rising numbers of cyanobacterial blooms due to climate change, demand for genomic evaluations of these microorganisms is increasing. However, standard genomic techniques call for the sequencing of axenic cultures, an approach that not only adds months or even years for culture purification, but also appears to be impossible for some cyanobacteria, which is reflected in the relatively low number of publicly available genomic sequences of this phylum. Under the framework of metagenomics, on the other hand, cumbersome techniques for achieving axenic growth can be circumvented and individual genomes can be successfully obtained from microbial consortia. This review focuses on approaches for the genomic and metagenomic assessment of non-axenic cyanobacterial cultures that bypass requirements for axenity. These methods enable researchers to achieve faster and less costly genomic characterizations of cyanobacterial strains and raise additional information about their associated microorganisms. While non-axenic cultures may have been previously frowned upon in cyanobacteriology, latest advancements in metagenomics have provided new possibilities for in vitro studies of oxyphotobacteria, renewing the value of microbial consortia as a reliable and functional resource for the rapid assessment of bloom-forming cyanobacteria. PMID:28536564

  8. Analysis of Microbial Functions in the Rhizosphere Using a Metabolic-Network Based Framework for Metagenomics Interpretation

    PubMed Central

    Ofaim, Shany; Ofek-Lalzar, Maya; Sela, Noa; Jinag, Jiandong; Kashi, Yechezkel; Minz, Dror; Freilich, Shiri

    2017-01-01

    Advances in metagenomics enable high resolution description of complex bacterial communities in their natural environments. Consequently, conceptual approaches for community level functional analysis are in high need. Here, we introduce a framework for a metagenomics-based analysis of community functions. Environment-specific gene catalogs, derived from metagenomes, are processed into metabolic-network representation. By applying established ecological conventions, network-edges (metabolic functions) are assigned with taxonomic annotations according to the dominance level of specific groups. Once a function-taxonomy link is established, prediction of the impact of dominant taxa on the overall community performances is assessed by simulating removal or addition of edges (taxa associated functions). This approach is demonstrated on metagenomic data describing the microbial communities from the root environment of two crop plants – wheat and cucumber. Predictions for environment-dependent effects revealed differences between treatments (root vs. soil), corresponding to documented observations. Metabolism of specific plant exudates (e.g., organic acids, flavonoids) was linked with distinct taxonomic groups in simulated root, but not soil, environments. These dependencies point to the impact of these metabolite families as determinants of community structure. Simulations of the activity of pairwise combinations of taxonomic groups (order level) predicted the possible production of complementary metabolites. Complementation profiles allow formulating a possible metabolic role for observed co-occurrence patterns. For example, production of tryptophan-associated metabolites through complementary interactions is unique to the tryptophan-deficient cucumber root environment. Our approach enables formulation of testable predictions for species contribution to community activity and exploration of the functional outcome of structural shifts in complex bacterial communities. Understanding community-level metabolism is an essential step toward the manipulation and optimization of microbial function. Here, we introduce an analysis framework addressing three key challenges of such data: producing quantified links between taxonomy and function; contextualizing discrete functions into communal networks; and simulating environmental impact on community performances. New technologies will soon provide a high-coverage description of biotic and a-biotic aspects of complex microbial communities such as these found in gut and soil. This framework was designed to allow the integration of high-throughput metabolomic and metagenomic data toward tackling the intricate associations between community structure, community function, and metabolic inputs. PMID:28878756

  9. Not All Particles Are Equal: The Selective Enrichment of Particle-Associated Bacteria from the Mediterranean Sea.

    PubMed

    López-Pérez, Mario; Kimes, Nikole E; Haro-Moreno, Jose M; Rodriguez-Valera, Francisco

    2016-01-01

    We have used two metagenomic approaches, direct sequencing of natural samples and sequencing after enrichment, to characterize communities of prokaryotes associated to particles. In the first approximation, different size filters (0.22 and 5 μm) were used to identify prokaryotic microbes of free-living and particle-attached bacterial communities in the Mediterranean water column. A subtractive metagenomic approach was used to characterize the dominant microbial groups in the large size fraction that were not present in the free-living one. They belonged mainly to Actinobacteria, Planctomycetes, Flavobacteria and Proteobacteria. In addition, marine microbial communities enriched by incubation with different kinds of particulate material have been studied by metagenomic assembly. Different particle kinds (diatomaceous earth, sand, chitin and cellulose) were colonized by very different communities of bacteria belonging to Roseobacter, Vibrio, Bacteriovorax, and Lacinutrix that were distant relatives of genomes already described from marine habitats. Besides, using assembly from deep metagenomic sequencing from the particle-specific enrichments we were able to determine a total of 20 groups of contigs (eight of them with >50% completeness) and reconstruct de novo five new genomes of novel species within marine clades (>79% completeness and <1.8% contamination). We also describe for the first time the genome of a marine Rhizobiales phage that seems to infect a broad range of Alphaproteobacteria and live in habitats as diverse as soil, marine sediment and water column. The metagenomic recruitment of the communities found by direct sequencing of the large size filter and by enrichment had nearly no overlap. These results indicate that these reconstructed genomes are part of the rare biosphere which exists at nominal levels under natural conditions.

  10. A metagenomic β-glucuronidase uncovers a core adaptive function of the human intestinal microbiome

    PubMed Central

    Gloux, Karine; Berteau, Olivier; El oumami, Hanane; Béguet, Fabienne; Leclerc, Marion; Doré, Joël

    2011-01-01

    In the human gastrointestinal tract, bacterial β-D-glucuronidases (BG; E.C. 3.2.1.31) are involved both in xenobiotic metabolism and in some of the beneficial effects of dietary compounds. Despite their biological significance, investigations are hampered by the fact that only a few BGs have so far been studied. A functional metagenomic approach was therefore performed on intestinal metagenomic libraries using chromogenic glucuronides as probes. Using this strategy, 19 positive metagenomic clones were identified but only one exhibited strong β-D-glucuronidase activity when subcloned into an expression vector. The cloned gene encoded a β-D-glucuronidase (called H11G11-BG) that had distant amino acid sequence homologies and an additional C terminus domain compared with known β-D-glucuronidases. Fifteen homologs were identified in public bacterial genome databases (38–57% identity with H11G11-BG) in the Firmicutes phylum. The genomes identified derived from strains from Ruminococcaceae, Lachnospiraceae, and Clostridiaceae. The genetic context diversity, with closely related symporters and gene duplication, argued for functional diversity and contribution to adaptive mechanisms. In contrast to the previously known β-D-glucuronidases, this previously undescribed type was present in the published microbiome of each healthy adult/child investigated (n = 11) and was specific to the human gut ecosystem. In conclusion, our functional metagenomic approach revealed a class of BGs that may be part of a functional core specifically evolved to adapt to the human gut environment with major health implications. We propose consensus motifs for this unique Firmicutes β-D-glucuronidase subfamily and for the glycosyl hydrolase family 2. PMID:20615998

  11. Impact of library preparation protocols and template quantity on the metagenomic reconstruction of a mock microbial community

    DOE PAGES

    Bowers, Robert M.; Clum, Alicia; Tice, Hope; ...

    2015-10-24

    Background: The rapid development of sequencing technologies has provided access to environments that were either once thought inhospitable to life altogether or that contain too few cells to be analyzed using genomics approaches. While 16S rRNA gene microbial community sequencing has revolutionized our understanding of community composi tion and diversity over time and space, it only provides a crude estimate of microbial functional and metabolic potential. Alternatively, shotgun metagenomics allows comprehensive sampling of all genetic material in an environment, without any underlying primer biases. Until recently, one of the major bottlenecks of shotgun metagenomics has been the requirement for largemore » initial DNA template quantities during library preparation. Results: Here, we investigate the effects of varying template concentrations across three low biomass library preparation protocols on their ability to accurately reconstruct a mock microbial community of known composition. We analyze the effects of input DNA quantity and library preparation method on library insert size, GC content, community composition, assembly quality and metagenomic binning. We found that library preparation method and the amount of starting material had significant impacts on the mock community metagenomes. In particular, GC content shifted towards more GC rich sequences at the lower input quantities regardless of library prep method, the number of low quality reads that could not be mapped to the reference genomes increased with decreasing input quantities, and the different library preparation methods had an impact on overall metagenomic community composition. Conclusions: This benchmark study provides recommendations for library creation of representative and minimally biased metagenome shotgun sequencing, enabling insights into functional attributes of low biomass ecosystem microbial communities.« less

  12. Making a living while starving in the dark: metagenomic insights into the energy dynamics of a carbonate cave.

    PubMed

    Ortiz, Marianyoly; Legatzki, Antje; Neilson, Julia W; Fryslie, Brandon; Nelson, William M; Wing, Rod A; Soderlund, Carol A; Pryor, Barry M; Maier, Raina M

    2014-02-01

    Carbonate caves represent subterranean ecosystems that are largely devoid of phototrophic primary production. In semiarid and arid regions, allochthonous organic carbon inputs entering caves with vadose-zone drip water are minimal, creating highly oligotrophic conditions; however, past research indicates that carbonate speleothem surfaces in these caves support diverse, predominantly heterotrophic prokaryotic communities. The current study applied a metagenomic approach to elucidate the community structure and potential energy dynamics of microbial communities, colonizing speleothem surfaces in Kartchner Caverns, a carbonate cave in semiarid, southeastern Arizona, USA. Manual inspection of a speleothem metagenome revealed a community genetically adapted to low-nutrient conditions with indications that a nitrogen-based primary production strategy is probable, including contributions from both Archaea and Bacteria. Genes for all six known CO2-fixation pathways were detected in the metagenome and RuBisCo genes representative of the Calvin-Benson-Bassham cycle were over-represented in Kartchner speleothem metagenomes relative to bulk soil, rhizosphere soil and deep-ocean communities. Intriguingly, quantitative PCR found Archaea to be significantly more abundant in the cave communities than in soils above the cave. MEtaGenome ANalyzer (MEGAN) analysis of speleothem metagenome sequence reads found Thaumarchaeota to be the third most abundant phylum in the community, and identified taxonomic associations to this phylum for indicator genes representative of multiple CO2-fixation pathways. The results revealed that this oligotrophic subterranean environment supports a unique chemoautotrophic microbial community with potentially novel nutrient cycling strategies. These strategies may provide key insights into other ecosystems dominated by oligotrophy, including aphotic subsurface soils or aquifers and photic systems such as arid deserts.

  13. Impact of library preparation protocols and template quantity on the metagenomic reconstruction of a mock microbial community

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bowers, Robert M.; Clum, Alicia; Tice, Hope

    Background: The rapid development of sequencing technologies has provided access to environments that were either once thought inhospitable to life altogether or that contain too few cells to be analyzed using genomics approaches. While 16S rRNA gene microbial community sequencing has revolutionized our understanding of community composi tion and diversity over time and space, it only provides a crude estimate of microbial functional and metabolic potential. Alternatively, shotgun metagenomics allows comprehensive sampling of all genetic material in an environment, without any underlying primer biases. Until recently, one of the major bottlenecks of shotgun metagenomics has been the requirement for largemore » initial DNA template quantities during library preparation. Results: Here, we investigate the effects of varying template concentrations across three low biomass library preparation protocols on their ability to accurately reconstruct a mock microbial community of known composition. We analyze the effects of input DNA quantity and library preparation method on library insert size, GC content, community composition, assembly quality and metagenomic binning. We found that library preparation method and the amount of starting material had significant impacts on the mock community metagenomes. In particular, GC content shifted towards more GC rich sequences at the lower input quantities regardless of library prep method, the number of low quality reads that could not be mapped to the reference genomes increased with decreasing input quantities, and the different library preparation methods had an impact on overall metagenomic community composition. Conclusions: This benchmark study provides recommendations for library creation of representative and minimally biased metagenome shotgun sequencing, enabling insights into functional attributes of low biomass ecosystem microbial communities.« less

  14. A de Bruijn graph approach to the quantification of closely-related genomes in a microbial community.

    PubMed

    Wang, Mingjie; Ye, Yuzhen; Tang, Haixu

    2012-06-01

    The wide applications of next-generation sequencing (NGS) technologies in metagenomics have raised many computational challenges. One of the essential problems in metagenomics is to estimate the taxonomic composition of a microbial community, which can be approached by mapping shotgun reads acquired from the community to previously characterized microbial genomes followed by quantity profiling of these species based on the number of mapped reads. This procedure, however, is not as trivial as it appears at first glance. A shotgun metagenomic dataset often contains DNA sequences from many closely-related microbial species (e.g., within the same genus) or strains (e.g., within the same species), thus it is often difficult to determine which species/strain a specific read is sampled from when it can be mapped to a common region shared by multiple genomes at high similarity. Furthermore, high genomic variations are observed among individual genomes within the same species, which are difficult to be differentiated from the inter-species variations during reads mapping. To address these issues, a commonly used approach is to quantify taxonomic distribution only at the genus level, based on the reads mapped to all species belonging to the same genus; alternatively, reads are mapped to a set of representative genomes, each selected to represent a different genus. Here, we introduce a novel approach to the quantity estimation of closely-related species within the same genus by mapping the reads to their genomes represented by a de Bruijn graph, in which the common genomic regions among them are collapsed. Using simulated and real metagenomic datasets, we show the de Bruijn graph approach has several advantages over existing methods, including (1) it avoids redundant mapping of shotgun reads to multiple copies of the common regions in different genomes, and (2) it leads to more accurate quantification for the closely-related species (and even for strains within the same species).

  15. Human systems immunology: hypothesis-based modeling and unbiased data-driven approaches.

    PubMed

    Arazi, Arnon; Pendergraft, William F; Ribeiro, Ruy M; Perelson, Alan S; Hacohen, Nir

    2013-10-31

    Systems immunology is an emerging paradigm that aims at a more systematic and quantitative understanding of the immune system. Two major approaches have been utilized to date in this field: unbiased data-driven modeling to comprehensively identify molecular and cellular components of a system and their interactions; and hypothesis-based quantitative modeling to understand the operating principles of a system by extracting a minimal set of variables and rules underlying them. In this review, we describe applications of the two approaches to the study of viral infections and autoimmune diseases in humans, and discuss possible ways by which these two approaches can synergize when applied to human immunology. Copyright © 2012 Elsevier Ltd. All rights reserved.

  16. MEBS, a software platform to evaluate large (meta)genomic collections according to their metabolic machinery: unraveling the sulfur cycle

    PubMed Central

    Zapata-Peñasco, Icoquih; Poot-Hernandez, Augusto Cesar; Eguiarte, Luis E

    2017-01-01

    Abstract The increasing number of metagenomic and genomic sequences has dramatically improved our understanding of microbial diversity, yet our ability to infer metabolic capabilities in such datasets remains challenging. We describe the Multigenomic Entropy Based Score pipeline (MEBS), a software platform designed to evaluate, compare, and infer complex metabolic pathways in large “omic” datasets, including entire biogeochemical cycles. MEBS is open source and available through https://github.com/eead-csic-compbio/metagenome_Pfam_score. To demonstrate its use, we modeled the sulfur cycle by exhaustively curating the molecular and ecological elements involved (compounds, genes, metabolic pathways, and microbial taxa). This information was reduced to a collection of 112 characteristic Pfam protein domains and a list of complete-sequenced sulfur genomes. Using the mathematical framework of relative entropy (H΄), we quantitatively measured the enrichment of these domains among sulfur genomes. The entropy of each domain was used both to build up a final score that indicates whether a (meta)genomic sample contains the metabolic machinery of interest and to propose marker domains in metagenomic sequences such as DsrC (PF04358). MEBS was benchmarked with a dataset of 2107 non-redundant microbial genomes from RefSeq and 935 metagenomes from MG-RAST. Its performance, reproducibility, and robustness were evaluated using several approaches, including random sampling, linear regression models, receiver operator characteristic plots, and the area under the curve metric (AUC). Our results support the broad applicability of this algorithm to accurately classify (AUC = 0.985) hard-to-culture genomes (e.g., Candidatus Desulforudis audaxviator), previously characterized ones, and metagenomic environments such as hydrothermal vents, or deep-sea sediment. Our benchmark indicates that an entropy-based score can capture the metabolic machinery of interest and can be used to efficiently classify large genomic and metagenomic datasets, including uncultivated/unexplored taxa. PMID:29069412

  17. MP3: a software tool for the prediction of pathogenic proteins in genomic and metagenomic data.

    PubMed

    Gupta, Ankit; Kapil, Rohan; Dhakan, Darshan B; Sharma, Vineet K

    2014-01-01

    The identification of virulent proteins in any de-novo sequenced genome is useful in estimating its pathogenic ability and understanding the mechanism of pathogenesis. Similarly, the identification of such proteins could be valuable in comparing the metagenome of healthy and diseased individuals and estimating the proportion of pathogenic species. However, the common challenge in both the above tasks is the identification of virulent proteins since a significant proportion of genomic and metagenomic proteins are novel and yet unannotated. The currently available tools which carry out the identification of virulent proteins provide limited accuracy and cannot be used on large datasets. Therefore, we have developed an MP3 standalone tool and web server for the prediction of pathogenic proteins in both genomic and metagenomic datasets. MP3 is developed using an integrated Support Vector Machine (SVM) and Hidden Markov Model (HMM) approach to carry out highly fast, sensitive and accurate prediction of pathogenic proteins. It displayed Sensitivity, Specificity, MCC and accuracy values of 92%, 100%, 0.92 and 96%, respectively, on blind dataset constructed using complete proteins. On the two metagenomic blind datasets (Blind A: 51-100 amino acids and Blind B: 30-50 amino acids), it displayed Sensitivity, Specificity, MCC and accuracy values of 82.39%, 97.86%, 0.80 and 89.32% for Blind A and 71.60%, 94.48%, 0.67 and 81.86% for Blind B, respectively. In addition, the performance of MP3 was validated on selected bacterial genomic and real metagenomic datasets. To our knowledge, MP3 is the only program that specializes in fast and accurate identification of partial pathogenic proteins predicted from short (100-150 bp) metagenomic reads and also performs exceptionally well on complete protein sequences. MP3 is publicly available at http://metagenomics.iiserb.ac.in/mp3/index.php.

  18. MP3: A Software Tool for the Prediction of Pathogenic Proteins in Genomic and Metagenomic Data

    PubMed Central

    Gupta, Ankit; Kapil, Rohan; Dhakan, Darshan B.; Sharma, Vineet K.

    2014-01-01

    The identification of virulent proteins in any de-novo sequenced genome is useful in estimating its pathogenic ability and understanding the mechanism of pathogenesis. Similarly, the identification of such proteins could be valuable in comparing the metagenome of healthy and diseased individuals and estimating the proportion of pathogenic species. However, the common challenge in both the above tasks is the identification of virulent proteins since a significant proportion of genomic and metagenomic proteins are novel and yet unannotated. The currently available tools which carry out the identification of virulent proteins provide limited accuracy and cannot be used on large datasets. Therefore, we have developed an MP3 standalone tool and web server for the prediction of pathogenic proteins in both genomic and metagenomic datasets. MP3 is developed using an integrated Support Vector Machine (SVM) and Hidden Markov Model (HMM) approach to carry out highly fast, sensitive and accurate prediction of pathogenic proteins. It displayed Sensitivity, Specificity, MCC and accuracy values of 92%, 100%, 0.92 and 96%, respectively, on blind dataset constructed using complete proteins. On the two metagenomic blind datasets (Blind A: 51–100 amino acids and Blind B: 30–50 amino acids), it displayed Sensitivity, Specificity, MCC and accuracy values of 82.39%, 97.86%, 0.80 and 89.32% for Blind A and 71.60%, 94.48%, 0.67 and 81.86% for Blind B, respectively. In addition, the performance of MP3 was validated on selected bacterial genomic and real metagenomic datasets. To our knowledge, MP3 is the only program that specializes in fast and accurate identification of partial pathogenic proteins predicted from short (100–150 bp) metagenomic reads and also performs exceptionally well on complete protein sequences. MP3 is publicly available at http://metagenomics.iiserb.ac.in/mp3/index.php. PMID:24736651

  19. MEBS, a software platform to evaluate large (meta)genomic collections according to their metabolic machinery: unraveling the sulfur cycle.

    PubMed

    De Anda, Valerie; Zapata-Peñasco, Icoquih; Poot-Hernandez, Augusto Cesar; Eguiarte, Luis E; Contreras-Moreira, Bruno; Souza, Valeria

    2017-11-01

    The increasing number of metagenomic and genomic sequences has dramatically improved our understanding of microbial diversity, yet our ability to infer metabolic capabilities in such datasets remains challenging. We describe the Multigenomic Entropy Based Score pipeline (MEBS), a software platform designed to evaluate, compare, and infer complex metabolic pathways in large "omic" datasets, including entire biogeochemical cycles. MEBS is open source and available through https://github.com/eead-csic-compbio/metagenome_Pfam_score. To demonstrate its use, we modeled the sulfur cycle by exhaustively curating the molecular and ecological elements involved (compounds, genes, metabolic pathways, and microbial taxa). This information was reduced to a collection of 112 characteristic Pfam protein domains and a list of complete-sequenced sulfur genomes. Using the mathematical framework of relative entropy (H΄), we quantitatively measured the enrichment of these domains among sulfur genomes. The entropy of each domain was used both to build up a final score that indicates whether a (meta)genomic sample contains the metabolic machinery of interest and to propose marker domains in metagenomic sequences such as DsrC (PF04358). MEBS was benchmarked with a dataset of 2107 non-redundant microbial genomes from RefSeq and 935 metagenomes from MG-RAST. Its performance, reproducibility, and robustness were evaluated using several approaches, including random sampling, linear regression models, receiver operator characteristic plots, and the area under the curve metric (AUC). Our results support the broad applicability of this algorithm to accurately classify (AUC = 0.985) hard-to-culture genomes (e.g., Candidatus Desulforudis audaxviator), previously characterized ones, and metagenomic environments such as hydrothermal vents, or deep-sea sediment. Our benchmark indicates that an entropy-based score can capture the metabolic machinery of interest and can be used to efficiently classify large genomic and metagenomic datasets, including uncultivated/unexplored taxa. © The Author 2017. Published by Oxford University Press.

  20. Functional metagenomics to mine the human gut microbiome for dietary fiber catabolic enzymes.

    PubMed

    Tasse, Lena; Bercovici, Juliette; Pizzut-Serin, Sandra; Robe, Patrick; Tap, Julien; Klopp, Christophe; Cantarel, Brandi L; Coutinho, Pedro M; Henrissat, Bernard; Leclerc, Marion; Doré, Joël; Monsan, Pierre; Remaud-Simeon, Magali; Potocki-Veronese, Gabrielle

    2010-11-01

    The human gut microbiome is a complex ecosystem composed mainly of uncultured bacteria. It plays an essential role in the catabolism of dietary fibers, the part of plant material in our diet that is not metabolized in the upper digestive tract, because the human genome does not encode adequate carbohydrate active enzymes (CAZymes). We describe a multi-step functionally based approach to guide the in-depth pyrosequencing of specific regions of the human gut metagenome encoding the CAZymes involved in dietary fiber breakdown. High-throughput functional screens were first applied to a library covering 5.4 × 10(9) bp of metagenomic DNA, allowing the isolation of 310 clones showing beta-glucanase, hemicellulase, galactanase, amylase, or pectinase activities. Based on the results of refined secondary screens, sequencing efforts were reduced to 0.84 Mb of nonredundant metagenomic DNA, corresponding to 26 clones that were particularly efficient for the degradation of raw plant polysaccharides. Seventy-three CAZymes from 35 different families were discovered. This corresponds to a fivefold target-gene enrichment compared to random sequencing of the human gut metagenome. Thirty-three of these CAZy encoding genes are highly homologous to prevalent genes found in the gut microbiome of at least 20 individuals for whose metagenomic data are available. Moreover, 18 multigenic clusters encoding complementary enzyme activities for plant cell wall degradation were also identified. Gene taxonomic assignment is consistent with horizontal gene transfer events in dominant gut species and provides new insights into the human gut functional trophic chain.

  1. Direct Detection and Identification of Prosthetic Joint Infection Pathogens in Synovial Fluid by Metagenomic Shotgun Sequencing.

    PubMed

    Ivy, Morgan I; Thoendel, Matthew J; Jeraldo, Patricio R; Greenwood-Quaintance, Kerryl E; Hanssen, Arlen D; Abdel, Matthew P; Chia, Nicholas; Yao, Janet Z; Tande, Aaron J; Mandrekar, Jayawant N; Patel, Robin

    2018-05-30

    Background: Metagenomic shotgun sequencing has the potential to transform how serious infections are diagnosed by offering universal, culture-free pathogen detection. This may be especially advantageous for microbial diagnosis of prosthetic joint infection (PJI) by synovial fluid analysis, since synovial fluid cultures are not universally positive, and synovial fluid is easily obtained pre-operatively. We applied a metagenomics-based approach to synovial fluid in an attempt to detect microorganisms in 168 failed total knee arthroplasties. Results: Genus- and species-level analysis of metagenomic sequencing yielded the known pathogen in 74 (90%) and 68 (83%) of the 82 culture-positive PJIs analyzed, respectively, with testing of two (2%) and three (4%) samples, respectively, yielding additional pathogens not detected by culture. For the 25 culture-negative PJIs tested, genus- and species-level analysis yielded 19 (76%) and 21 (84%) samples with insignificant findings, respectively, and 6 (24%) and 4 (16%) with potential pathogens detected, respectively. Genus- and species-level analysis of the 60 culture-negative aseptic failure cases yielded 53 (88.3%) and 56 (93.3%) cases with insignificant findings, and 7 (11.7%) and 4 (6.7%) with potential clinically-significant organisms detected, respectively. There was one case of aseptic failure with synovial fluid culture growth; metagenomic analysis showed insignificant findings, suggesting possible synovial fluid culture contamination. Conclusion: Metagenomic shotgun sequencing can detect pathogens involved in PJI when applied to synovial fluid and may be particularly useful for culture-negative cases. Copyright © 2018 American Society for Microbiology.

  2. Metagenomics: Probing pollutant fate in natural and engineered ecosystems.

    PubMed

    Bouhajja, Emna; Agathos, Spiros N; George, Isabelle F

    2016-12-01

    Polluted environments are a reservoir of microbial species able to degrade or to convert pollutants to harmless compounds. The proper management of microbial resources requires a comprehensive characterization of their genetic pool to assess the fate of contaminants and increase the efficiency of bioremediation processes. Metagenomics offers appropriate tools to describe microbial communities in their whole complexity without lab-based cultivation of individual strains. After a decade of use of metagenomics to study microbiomes, the scientific community has made significant progress in this field. In this review, we survey the main steps of metagenomics applied to environments contaminated with organic compounds or heavy metals. We emphasize technical solutions proposed to overcome encountered obstacles. We then compare two metagenomic approaches, i.e. library-based targeted metagenomics and direct sequencing of metagenomes. In the former, environmental DNA is cloned inside a host, and then clones of interest are selected based on (i) their expression of biodegradative functions or (ii) sequence homology with probes and primers designed from relevant, already known sequences. The highest score for the discovery of novel genes and degradation pathways has been achieved so far by functional screening of large clone libraries. On the other hand, direct sequencing of metagenomes without a cloning step has been more often applied to polluted environments for characterization of the taxonomic and functional composition of microbial communities and their dynamics. In this case, the analysis has focused on 16S rRNA genes and marker genes of biodegradation. Advances in next generation sequencing and in bioinformatic analysis of sequencing data have opened up new opportunities for assessing the potential of biodegradation by microbes, but annotation of collected genes is still hampered by a limited number of available reference sequences in databases. Although metagenomics is still facing technical and computational challenges, our review of the recent literature highlights its value as an aid to efficiently monitor the clean-up of contaminated environments and develop successful strategies to mitigate the impact of pollutants on ecosystems. Copyright © 2016 Elsevier Inc. All rights reserved.

  3. Prospecting Biotechnologically-Relevant Monooxygenases from Cold Sediment Metagenomes: An In Silico Approach

    DOE PAGES

    Musumeci, Matias A.; Lozada, Mariana; Rial, Daniela V.; ...

    2017-04-09

    The goal of this work was to identify sequences encoding monooxygenase biocatalysts with novel features by in silico mining an assembled metagenomic dataset of polar and subpolar marine sediments. The targeted enzyme sequences were Baeyer-Villiger and bacterial cytochrome P450 monooxygenases (CYP153). These enzymes have wide-ranging applications, from the synthesis of steroids, antibiotics, mycotoxins and pheromones to the synthesis of monomers for polymerization and anticancer precursors, due to their extraordinary enantio-, regio-, and chemo- selectivity that are valuable features for organic synthesis. Phylogenetic analyses were used to select the most divergent sequences affiliated to these enzyme families among the 264 putativemore » monooxygenases recovered from the ~14 million protein-coding sequences in the assembled metagenome dataset. Three-dimensional structure modeling and docking analysis suggested features useful in biotechnological applications in five metagenomic sequences, such as wide substrate range, novel substrate specificity or regioselectivity. Further analysis revealed structural features associated with psychrophilic enzymes, such as broader substrate accessibility, larger catalytic pockets or low domain interactions, suggesting that they could be applied in biooxidations at room or low temperatures, saving costs inherent to energy consumption. As a result, this work allowed the identification of putative enzyme candidates with promising features from metagenomes, providing a suitable starting point for further developments.« less

  4. Metagenomic Survey for Viruses in Western Arctic Caribou, Alaska, through Iterative Assembly of Taxonomic Units

    PubMed Central

    Schürch, Anita C.; Schipper, Debby; Bijl, Maarten A.; Dau, Jim; Beckmen, Kimberlee B.; Schapendonk, Claudia M. E.; Raj, V. Stalin; Osterhaus, Albert D. M. E.; Haagmans, Bart L.; Tryland, Morten; Smits, Saskia L.

    2014-01-01

    Pathogen surveillance in animals does not provide a sufficient level of vigilance because it is generally confined to surveillance of pathogens with known economic impact in domestic animals and practically nonexistent in wildlife species. As most (re-)emerging viral infections originate from animal sources, it is important to obtain insight into viral pathogens present in the wildlife reservoir from a public health perspective. When monitoring living, free-ranging wildlife for viruses, sample collection can be challenging and availability of nucleic acids isolated from samples is often limited. The development of viral metagenomics platforms allows a more comprehensive inventory of viruses present in wildlife. We report a metagenomic viral survey of the Western Arctic herd of barren ground caribou (Rangifer tarandus granti) in Alaska, USA. The presence of mammalian viruses in eye and nose swabs of 39 free-ranging caribou was investigated by random amplification combined with a metagenomic analysis approach that applied exhaustive iterative assembly of sequencing results to define taxonomic units of each metagenome. Through homology search methods we identified the presence of several mammalian viruses, including different papillomaviruses, a novel parvovirus, polyomavirus, and a virus that potentially represents a member of a novel genus in the family Coronaviridae. PMID:25140520

  5. Prospecting Biotechnologically-Relevant Monooxygenases from Cold Sediment Metagenomes: An In Silico Approach

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Musumeci, Matias A.; Lozada, Mariana; Rial, Daniela V.

    The goal of this work was to identify sequences encoding monooxygenase biocatalysts with novel features by in silico mining an assembled metagenomic dataset of polar and subpolar marine sediments. The targeted enzyme sequences were Baeyer-Villiger and bacterial cytochrome P450 monooxygenases (CYP153). These enzymes have wide-ranging applications, from the synthesis of steroids, antibiotics, mycotoxins and pheromones to the synthesis of monomers for polymerization and anticancer precursors, due to their extraordinary enantio-, regio-, and chemo- selectivity that are valuable features for organic synthesis. Phylogenetic analyses were used to select the most divergent sequences affiliated to these enzyme families among the 264 putativemore » monooxygenases recovered from the ~14 million protein-coding sequences in the assembled metagenome dataset. Three-dimensional structure modeling and docking analysis suggested features useful in biotechnological applications in five metagenomic sequences, such as wide substrate range, novel substrate specificity or regioselectivity. Further analysis revealed structural features associated with psychrophilic enzymes, such as broader substrate accessibility, larger catalytic pockets or low domain interactions, suggesting that they could be applied in biooxidations at room or low temperatures, saving costs inherent to energy consumption. As a result, this work allowed the identification of putative enzyme candidates with promising features from metagenomes, providing a suitable starting point for further developments.« less

  6. Prospecting Biotechnologically-Relevant Monooxygenases from Cold Sediment Metagenomes: An In Silico Approach.

    PubMed

    Musumeci, Matías A; Lozada, Mariana; Rial, Daniela V; Mac Cormack, Walter P; Jansson, Janet K; Sjöling, Sara; Carroll, JoLynn; Dionisi, Hebe M

    2017-04-09

    The goal of this work was to identify sequences encoding monooxygenase biocatalysts with novel features by in silico mining an assembled metagenomic dataset of polar and subpolar marine sediments. The targeted enzyme sequences were Baeyer-Villiger and bacterial cytochrome P450 monooxygenases (CYP153). These enzymes have wide-ranging applications, from the synthesis of steroids, antibiotics, mycotoxins and pheromones to the synthesis of monomers for polymerization and anticancer precursors, due to their extraordinary enantio-, regio-, and chemo- selectivity that are valuable features for organic synthesis. Phylogenetic analyses were used to select the most divergent sequences affiliated to these enzyme families among the 264 putative monooxygenases recovered from the ~14 million protein-coding sequences in the assembled metagenome dataset. Three-dimensional structure modeling and docking analysis suggested features useful in biotechnological applications in five metagenomic sequences, such as wide substrate range, novel substrate specificity or regioselectivity. Further analysis revealed structural features associated with psychrophilic enzymes, such as broader substrate accessibility, larger catalytic pockets or low domain interactions, suggesting that they could be applied in biooxidations at room or low temperatures, saving costs inherent to energy consumption. This work allowed the identification of putative enzyme candidates with promising features from metagenomes, providing a suitable starting point for further developments.

  7. Prospecting Biotechnologically-Relevant Monooxygenases from Cold Sediment Metagenomes: An In Silico Approach

    PubMed Central

    Musumeci, Matías A.; Lozada, Mariana; Rial, Daniela V.; Mac Cormack, Walter P.; Jansson, Janet K.; Sjöling, Sara; Carroll, JoLynn; Dionisi, Hebe M.

    2017-01-01

    The goal of this work was to identify sequences encoding monooxygenase biocatalysts with novel features by in silico mining an assembled metagenomic dataset of polar and subpolar marine sediments. The targeted enzyme sequences were Baeyer–Villiger and bacterial cytochrome P450 monooxygenases (CYP153). These enzymes have wide-ranging applications, from the synthesis of steroids, antibiotics, mycotoxins and pheromones to the synthesis of monomers for polymerization and anticancer precursors, due to their extraordinary enantio-, regio-, and chemo- selectivity that are valuable features for organic synthesis. Phylogenetic analyses were used to select the most divergent sequences affiliated to these enzyme families among the 264 putative monooxygenases recovered from the ~14 million protein-coding sequences in the assembled metagenome dataset. Three-dimensional structure modeling and docking analysis suggested features useful in biotechnological applications in five metagenomic sequences, such as wide substrate range, novel substrate specificity or regioselectivity. Further analysis revealed structural features associated with psychrophilic enzymes, such as broader substrate accessibility, larger catalytic pockets or low domain interactions, suggesting that they could be applied in biooxidations at room or low temperatures, saving costs inherent to energy consumption. This work allowed the identification of putative enzyme candidates with promising features from metagenomes, providing a suitable starting point for further developments. PMID:28397770

  8. Resource recovery from wastewater: application of meta-omics to phosphorus and carbon management.

    PubMed

    Sales, Christopher M; Lee, Patrick K H

    2015-06-01

    A growing trend at wastewater treatment plants is the recovery of resources and energy from wastewater. Enhanced biological phosphorus removal and anaerobic digestion are two established biotechnology approaches for the recovery of phosphorus and carbon, respectively. Meta-omics approaches (meta-genomics, transcriptomics, proteomics, and metabolomics) are providing novel biological insights into these complex biological systems. In particular, genome-centric metagenomics analyses are revealing the function and physiology of individual community members. Querying transcripts, proteins and metabolites are emerging techniques that can inform the cellular responses under different conditions. Overall, meta-omics approaches are shedding light into complex microbial communities once regarded as 'blackboxes', but challenges remain to integrate information from meta-omics into engineering design and operation guidelines. Copyright © 2015 Elsevier Ltd. All rights reserved.

  9. The dependability of medical students' performance ratings as documented on in-training evaluations.

    PubMed

    van Barneveld, Christina

    2005-03-01

    To demonstrate an approach to obtain an unbiased estimate of the dependability of students' performance ratings during training, when the data-collection design includes nesting of student in rater, unbalanced nest sizes, and dependent observations. In 2003, two variance components analyses of in-training evaluation (ITE) report data were conducted using urGENOVA software. In the first analysis, the dependability for the nested and unbalanced data-collection design was calculated. In the second analysis, an approach using multiple generalizability studies was used to obtain an unbiased estimate of the student variance component, resulting in an unbiased estimate of dependability. Results suggested that there is bias in estimates of the dependability of students' performance on ITEs that are attributable to the data-collection design. When the bias was corrected, the results indicated that the dependability of ratings of student performance was almost zero. The combination of the multiple generalizability studies method and the use of specialized software provides an unbiased estimate of the dependability of ratings of student performance on ITE scores for data-collection designs that include nesting of student in rater, unbalanced nest sizes, and dependent observations.

  10. Metagenomic analysis of the airborne environment in urban spaces.

    PubMed

    Be, Nicholas A; Thissen, James B; Fofanov, Viacheslav Y; Allen, Jonathan E; Rojas, Mark; Golovko, George; Fofanov, Yuriy; Koshinsky, Heather; Jaing, Crystal J

    2015-02-01

    The organisms in aerosol microenvironments, especially densely populated urban areas, are relevant to maintenance of public health and detection of potential epidemic or biothreat agents. To examine aerosolized microorganisms in this environment, we performed sequencing on the material from an urban aerosol surveillance program. Whole metagenome sequencing was applied to DNA extracted from air filters obtained during periods from each of the four seasons. The composition of bacteria, plants, fungi, invertebrates, and viruses demonstrated distinct temporal shifts. Bacillus thuringiensis serovar kurstaki was detected in samples known to be exposed to aerosolized spores, illustrating the potential utility of this approach for identification of intentionally introduced microbial agents. Together, these data demonstrate the temporally dependent metagenomic complexity of urban aerosols and the potential of genomic analytical techniques for biosurveillance and monitoring of threats to public health.

  11. Protein Structure Determination using Metagenome sequence data

    PubMed Central

    Ovchinnikov, Sergey; Park, Hahnbeom; Varghese, Neha; Huang, Po-Ssu; Pavlopoulos, Georgios A.; Kim, David E.; Kamisetty, Hetunandan; Kyrpides, Nikos C.; Baker, David

    2017-01-01

    Despite decades of work by structural biologists, there are still ~5200 protein families with unknown structure outside the range of comparative modeling. We show that Rosetta structure prediction guided by residue-residue contacts inferred from evolutionary information can accurately model proteins that belong to large families, and that metagenome sequence data more than triples the number of protein families with sufficient sequences for accurate modeling. We then integrate metagenome data, contact based structure matching and Rosetta structure calculations to generate models for 614 protein families with currently unknown structures; 206 are membrane proteins and 137 have folds not represented in the PDB. This approach provides the representative models for large protein families originally envisioned as the goal of the protein structure initiative at a fraction of the cost. PMID:28104891

  12. Next Generation Sequence Assembly with AMOS

    PubMed Central

    Treangen, Todd J; Sommer, Dan D; Angly, Florent E; Koren, Sergey; Pop, Mihai

    2011-01-01

    A Modular Open-Source Assembler (AMOS) was designed to offer a modular approach to genome assembly. AMOS includes a wide range of tools for assembly, including lightweight de novo assemblers Minimus and Minimo, and Bambus 2, a robust scaffolder able to handle metagenomic and polymorphic data. This protocol describes how to configure and use AMOS for the assembly of Next Generation sequence data. Additionally, we provide three tutorial examples that include bacterial, viral, and metagenomic datasets with specific tips for improving assembly quality. PMID:21400694

  13. Technical Report: Algorithm and Implementation for Quasispecies Abundance Inference with Confidence Intervals from Metagenomic Sequence Data

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    McLoughlin, Kevin

    2016-01-11

    This report describes the design and implementation of an algorithm for estimating relative microbial abundances, together with confidence limits, using data from metagenomic DNA sequencing. For the background behind this project and a detailed discussion of our modeling approach for metagenomic data, we refer the reader to our earlier technical report, dated March 4, 2014. Briefly, we described a fully Bayesian generative model for paired-end sequence read data, incorporating the effects of the relative abundances, the distribution of sequence fragment lengths, fragment position bias, sequencing errors and variations between the sampled genomes and the nearest reference genomes. A distinctive featuremore » of our modeling approach is the use of a Chinese restaurant process (CRP) to describe the selection of genomes to be sampled, and thus the relative abundances. The CRP component is desirable for fitting abundances to reads that may map ambiguously to multiple targets, because it naturally leads to sparse solutions that select the best representative from each set of nearly equivalent genomes.« less

  14. Metagenomic chromosome conformation capture (meta3C) unveils the diversity of chromosome organization in microorganisms

    PubMed Central

    Marbouty, Martial; Cournac, Axel; Flot, Jean-François; Marie-Nelly, Hervé; Mozziconacci, Julien; Koszul, Romain

    2014-01-01

    Genomic analyses of microbial populations in their natural environment remain limited by the difficulty to assemble full genomes of individual species. Consequently, the chromosome organization of microorganisms has been investigated in a few model species, but the extent to which the features described can be generalized to other taxa remains unknown. Using controlled mixes of bacterial and yeast species, we developed meta3C, a metagenomic chromosome conformation capture approach that allows characterizing individual genomes and their average organization within a mix of organisms. Not only can meta3C be applied to species already sequenced, but a single meta3C library can be used for assembling, scaffolding and characterizing the tridimensional organization of unknown genomes. By applying meta3C to a semi-complex environmental sample, we confirmed its promising potential. Overall, this first meta3C study highlights the remarkable diversity of microorganisms chromosome organization, while providing an elegant and integrated approach to metagenomic analysis. DOI: http://dx.doi.org/10.7554/eLife.03318.001 PMID:25517076

  15. Deep-Sea Microbes: Linking Biogeochemical Rates to -Omics Approaches

    NASA Astrophysics Data System (ADS)

    Herndl, G. J.; Sintes, E.; Bayer, B.; Bergauer, K.; Amano, C.; Hansman, R.; Garcia, J.; Reinthaler, T.

    2016-02-01

    Over the past decade substantial progress has been made in determining deep ocean microbial activity and resolving some of the enigmas in understanding the deep ocean carbon flux. Also, metagenomics approaches have shed light onto the dark ocean's microbes but linking -omics approaches to biogeochemical rate measurements are generally rare in microbial oceanography and even more so for the deep ocean. In this presentation, we will show by combining metagenomics, -proteomics and biogeochemical rate measurements on the bulk and single-cell level that deep-sea microbes exhibit characteristics of generalists with a large genome repertoire, versatile in utilizing substrate as revealed by metaproteomics. This is in striking contrast with the apparently rather uniform dissolved organic matter pool in the deep ocean. Combining the different -omics approaches with metabolic rate measurements, we will highlight some major inconsistencies and enigmas in our understanding of the carbon cycling and microbial food web structure in the dark ocean.

  16. Prioritizing causal disease genes using unbiased genomic features.

    PubMed

    Deo, Rahul C; Musso, Gabriel; Tasan, Murat; Tang, Paul; Poon, Annie; Yuan, Christiana; Felix, Janine F; Vasan, Ramachandran S; Beroukhim, Rameen; De Marco, Teresa; Kwok, Pui-Yan; MacRae, Calum A; Roth, Frederick P

    2014-12-03

    Cardiovascular disease (CVD) is the leading cause of death in the developed world. Human genetic studies, including genome-wide sequencing and SNP-array approaches, promise to reveal disease genes and mechanisms representing new therapeutic targets. In practice, however, identification of the actual genes contributing to disease pathogenesis has lagged behind identification of associated loci, thus limiting the clinical benefits. To aid in localizing causal genes, we develop a machine learning approach, Objective Prioritization for Enhanced Novelty (OPEN), which quantitatively prioritizes gene-disease associations based on a diverse group of genomic features. This approach uses only unbiased predictive features and thus is not hampered by a preference towards previously well-characterized genes. We demonstrate success in identifying genetic determinants for CVD-related traits, including cholesterol levels, blood pressure, and conduction system and cardiomyopathy phenotypes. Using OPEN, we prioritize genes, including FLNC, for association with increased left ventricular diameter, which is a defining feature of a prevalent cardiovascular disorder, dilated cardiomyopathy or DCM. Using a zebrafish model, we experimentally validate FLNC and identify a novel FLNC splice-site mutation in a patient with severe DCM. Our approach stands to assist interpretation of large-scale genetic studies without compromising their fundamentally unbiased nature.

  17. Investigation of the microbial community in the Odisha hot spring cluster based on the cultivation independent approach.

    PubMed

    Singh, Archana; Subudhi, Enketeswara; Sahoo, Rajesh Kumar; Gaur, Mahendra

    2016-03-01

    Deulajhari hot spring is located in the Angul district of Odisha. The significance of this hot spring is the presence of the hot spring cluster adjacent to the cold spring which attracts the attention of microbiologists to understand the role of physio-chemical factors of these springs on bacterial community structure. Next-generation sequencing technology helps us to depict the pioneering microflora of any ecological niche based on metagenomic approach. Our study represents the first Illumina based metagenomic study of Deulajhari hot spring DH1, and DH2 of the cluster with temperature 65 °C to 55 °C respectively establishing a difference of 10 °C. Comprehensive study of microbiota of these two hot springs was done using the metagenomic sequencing of 16S rRNA of V3-V4 region extracting metagenomic DNA from the two hot spring sediments. Sequencing community DNA reported about 28 phyla in spring DH1 of which the majority were Chloroflexi (22.98%), Proteobacteria (15.51%), Acidobacteria (14.51%), Chlorobi (9.52%), Nitrospirae (8.54%), and Armatimonadetes (7.07%), at the existing physiochemical conditions like; temperature 65 °C, pH 8.06, electro conductivity 0.020 dSm(- 1), and total organic carbon (TOC) 3.76%. About 40 phyla were detected in cluster DH2 at the existing physiochemical parameters like temperature 55 °C, pH 8.10, electro conductivity 0.019 dSm(- 1), and total organic carbon (TOC) 0.58% predominated with Chloroflexi (41.98%), Proteobacteria (10.74%), Nitrospirae (10.01%), Chlorobi (8.73%), Acidobacteria (6.73%) and Planctomycetes (3.73%). Approximately 68 class, 107 order, 171 genus and 184 species were reported in cluster DH1 but 102 class, 180 order, 375 genus and 411 species in cluster DH2. The comparative metagenomics study of the Deulajhari hot spring clusters DH1, and DH2 depicts the differential profile of the microbiota. Metagenome sequences of these two hot spring clusters are deposited to the SRA database and are available in NCBI with accession no. SRX1459734 for DH1 and SRX1459735 for DH2.

  18. Achieving high confidence protein annotations in a sea of unknowns

    NASA Astrophysics Data System (ADS)

    Timmins-Schiffman, E.; May, D. H.; Noble, W. S.; Nunn, B. L.; Mikan, M.; Harvey, H. R.

    2016-02-01

    Increased sensitivity of mass spectrometry (MS) technology allows deep and broad insight into community functional analyses. Metaproteomics holds the promise to reveal functional responses of natural microbial communities, whereas metagenomics alone can only hint at potential functions. The complex datasets resulting from ocean MS have the potential to inform diverse realms of the biological, chemical, and physical ocean sciences, yet the extent of bacterial functional diversity and redundancy has not been fully explored. To take advantage of these impressive datasets, we need a clear bioinformatics pipeline for metaproteomics peptide identification and annotation with a database that can provide confident identifications. Researchers must consider whether it is sufficient to leverage the vast quantities of available ocean sequence data or if they must invest in site-specific metagenomic sequencing. We have sequenced, to our knowledge, the first western arctic metagenomes from the Bering Strait and the Chukchi Sea. We have addressed the long standing question: Is a metagenome required to accurately complete metaproteomics and assess the biological distribution of metabolic functions controlling nutrient acquisition in the ocean? Two different protein databases were constructed from 1) a site-specific metagenome and 2) subarctic/arctic groups available in NCBI's non-redundant database. Multiple proteomic search strategies were employed, against each individual database and against both databases combined, to determine the algorithm and approach that yielded the balance of high sensitivity and confident identification. Results yielded over 8200 confidently identified proteins. Our comparison of these results allows us to quantify the utility of investing resources in a metagenome versus using the constantly expanding and immediately available public databases for metaproteomic studies.

  19. Biotechnological applications of functional metagenomics in the food and pharmaceutical industries

    PubMed Central

    Coughlan, Laura M.; Cotter, Paul D.; Hill, Colin; Alvarez-Ordóñez, Avelino

    2015-01-01

    Microorganisms are found throughout nature, thriving in a vast range of environmental conditions. The majority of them are unculturable or difficult to culture by traditional methods. Metagenomics enables the study of all microorganisms, regardless of whether they can be cultured or not, through the analysis of genomic data obtained directly from an environmental sample, providing knowledge of the species present, and allowing the extraction of information regarding the functionality of microbial communities in their natural habitat. Function-based screenings, following the cloning and expression of metagenomic DNA in a heterologous host, can be applied to the discovery of novel proteins of industrial interest encoded by the genes of previously inaccessible microorganisms. Functional metagenomics has considerable potential in the food and pharmaceutical industries, where it can, for instance, aid (i) the identification of enzymes with desirable technological properties, capable of catalyzing novel reactions or replacing existing chemically synthesized catalysts which may be difficult or expensive to produce, and able to work under a wide range of environmental conditions encountered in food and pharmaceutical processing cycles including extreme conditions of temperature, pH, osmolarity, etc; (ii) the discovery of novel bioactives including antimicrobials active against microorganisms of concern both in food and medical settings; (iii) the investigation of industrial and societal issues such as antibiotic resistance development. This review article summarizes the state-of-the-art functional metagenomic methods available and discusses the potential of functional metagenomic approaches to mine as yet unexplored environments to discover novel genes with biotechnological application in the food and pharmaceutical industries. PMID:26175729

  20. Biotechnological applications of functional metagenomics in the food and pharmaceutical industries.

    PubMed

    Coughlan, Laura M; Cotter, Paul D; Hill, Colin; Alvarez-Ordóñez, Avelino

    2015-01-01

    Microorganisms are found throughout nature, thriving in a vast range of environmental conditions. The majority of them are unculturable or difficult to culture by traditional methods. Metagenomics enables the study of all microorganisms, regardless of whether they can be cultured or not, through the analysis of genomic data obtained directly from an environmental sample, providing knowledge of the species present, and allowing the extraction of information regarding the functionality of microbial communities in their natural habitat. Function-based screenings, following the cloning and expression of metagenomic DNA in a heterologous host, can be applied to the discovery of novel proteins of industrial interest encoded by the genes of previously inaccessible microorganisms. Functional metagenomics has considerable potential in the food and pharmaceutical industries, where it can, for instance, aid (i) the identification of enzymes with desirable technological properties, capable of catalyzing novel reactions or replacing existing chemically synthesized catalysts which may be difficult or expensive to produce, and able to work under a wide range of environmental conditions encountered in food and pharmaceutical processing cycles including extreme conditions of temperature, pH, osmolarity, etc; (ii) the discovery of novel bioactives including antimicrobials active against microorganisms of concern both in food and medical settings; (iii) the investigation of industrial and societal issues such as antibiotic resistance development. This review article summarizes the state-of-the-art functional metagenomic methods available and discusses the potential of functional metagenomic approaches to mine as yet unexplored environments to discover novel genes with biotechnological application in the food and pharmaceutical industries.

  1. Metagenome, metatranscriptome, and metaproteome approaches unraveled compositions and functional relationships of microbial communities residing in biogas plants.

    PubMed

    Hassa, Julia; Maus, Irena; Off, Sandra; Pühler, Alfred; Scherer, Paul; Klocke, Michael; Schlüter, Andreas

    2018-06-01

    The production of biogas by anaerobic digestion (AD) of agricultural residues, organic wastes, animal excrements, municipal sludge, and energy crops has a firm place in sustainable energy production and bio-economy strategies. Focusing on the microbial community involved in biomass conversion offers the opportunity to control and engineer the biogas process with the objective to optimize its efficiency. Taxonomic profiling of biogas producing communities by means of high-throughput 16S rRNA gene amplicon sequencing provided high-resolution insights into bacterial and archaeal structures of AD assemblages and their linkages to fed substrates and process parameters. Commonly, the bacterial phyla Firmicutes and Bacteroidetes appeared to dominate biogas communities in varying abundances depending on the apparent process conditions. Regarding the community of methanogenic Archaea, their diversity was mainly affected by the nature and composition of the substrates, availability of nutrients and ammonium/ammonia contents, but not by the temperature. It also appeared that a high proportion of 16S rRNA sequences can only be classified on higher taxonomic ranks indicating that many community members and their participation in AD within functional networks are still unknown. Although cultivation-based approaches to isolate microorganisms from biogas fermentation samples yielded hundreds of novel species and strains, this approach intrinsically is limited to the cultivable fraction of the community. To obtain genome sequence information of non-cultivable biogas community members, metagenome sequencing including assembly and binning strategies was highly valuable. Corresponding research has led to the compilation of hundreds of metagenome-assembled genomes (MAGs) frequently representing novel taxa whose metabolism and lifestyle could be reconstructed based on nucleotide sequence information. In contrast to metagenome analyses revealing the genetic potential of microbial communities, metatranscriptome sequencing provided insights into the metabolically active community. Taking advantage of genome sequence information, transcriptional activities were evaluated considering the microorganism's genetic background. Metaproteome studies uncovered enzyme profiles expressed by biogas community members. Enzymes involved in cellulose and hemicellulose decomposition and utilization of other complex biopolymers were identified. Future studies on biogas functional microbial networks will increasingly involve integrated multi-omics analyses evaluating metagenome, transcriptome, proteome, and metabolome datasets.

  2. Est16, a New Esterase Isolated from a Metagenomic Library of a Microbial Consortium Specializing in Diesel Oil Degradation.

    PubMed

    Pereira, Mariana Rangel; Mercaldi, Gustavo Fernando; Maester, Thaís Carvalho; Balan, Andrea; Lemos, Eliana Gertrudes de Macedo

    2015-01-01

    Lipolytic enzymes have attracted attention from a global market because they show enormous biotechnological potential for applications such as detergent production, leather processing, cosmetics production, and use in perfumes and biodiesel. Due to the intense demand for biocatalysts, a metagenomic approach provides methods of identifying new enzymes. In this study, an esterase designated as Est16 was selected from 4224 clones of a fosmid metagenomic library, revealing an 87% amino acid identity with an esterase/lipase (accession number ADM63076.1) from an uncultured bacterium. Phylogenetic studies showed that the enzyme belongs to family V of bacterial lipolytic enzymes and has sequence and structural similarities with an aryl-esterase from Pseudomonas fluorescens and a patented Anti-Kazlauskas lipase (patent number US20050153404). The protein was expressed and purified as a highly soluble, thermally stable enzyme that showed a preference for basic pH. Est16 exhibited activity toward a wide range of substrates and the highest catalytic efficiency against p-nitrophenyl butyrate and p-nitrophenyl valerate. Est16 also showed tolerance to the presence of organic solvents, detergents and metals. Based on molecular modeling, we showed that the large alpha-beta domain is conserved in the patented enzymes but not the substrate pocket. Here, it was demonstrated that a metagenomic approach is suitable for discovering the lipolytic enzyme diversity and that Est16 has the biotechnological potential for use in industrial processes.

  3. MinION™ nanopore sequencing of environmental metagenomes: a synthetic approach

    PubMed Central

    Watson, Mick; Minot, Samuel S.; Rivera, Maria C.; Franklin, Rima B.

    2017-01-01

    Abstract Background: Environmental metagenomic analysis is typically accomplished by assigning taxonomy and/or function from whole genome sequencing or 16S amplicon sequences. Both of these approaches are limited, however, by read length, among other technical and biological factors. A nanopore-based sequencing platform, MinION™, produces reads that are ≥1 × 104 bp in length, potentially providing for more precise assignment, thereby alleviating some of the limitations inherent in determining metagenome composition from short reads. We tested the ability of sequence data produced by MinION (R7.3 flow cells) to correctly assign taxonomy in single bacterial species runs and in three types of low-complexity synthetic communities: a mixture of DNA using equal mass from four species, a community with one relatively rare (1%) and three abundant (33% each) components, and a mixture of genomic DNA from 20 bacterial strains of staggered representation. Taxonomic composition of the low-complexity communities was assessed by analyzing the MinION sequence data with three different bioinformatic approaches: Kraken, MG-RAST, and One Codex. Results: Long read sequences generated from libraries prepared from single strains using the version 5 kit and chemistry, run on the original MinION device, yielded as few as 224 to as many as 3497 bidirectional high-quality (2D) reads with an average overall study length of 6000 bp. For the single-strain analyses, assignment of reads to the correct genus by different methods ranged from 53.1% to 99.5%, assignment to the correct species ranged from 23.9% to 99.5%, and the majority of misassigned reads were to closely related organisms. A synthetic metagenome sequenced with the same setup yielded 714 high quality 2D reads of approximately 5500 bp that were up to 98% correctly assigned to the species level. Synthetic metagenome MinION libraries generated using version 6 kit and chemistry yielded from 899 to 3497 2D reads with lengths averaging 5700 bp with up to 98% assignment accuracy at the species level. The observed community proportions for “equal” and “rare” synthetic libraries were close to the known proportions, deviating from 0.1% to 10% across all tests. For a 20-species mock community with staggered contributions, a sequencing run detected all but 3 species (each included at <0.05% of DNA in the total mixture), 91% of reads were assigned to the correct species, 93% of reads were assigned to the correct genus, and >99% of reads were assigned to the correct family. Conclusions: At the current level of output and sequence quality (just under 4 × 103 2D reads for a synthetic metagenome), MinION sequencing followed by Kraken or One Codex analysis has the potential to provide rapid and accurate metagenomic analysis where the consortium is comprised of a limited number of taxa. Important considerations noted in this study included: high sensitivity of the MinION platform to the quality of input DNA, high variability of sequencing results across libraries and flow cells, and relatively small numbers of 2D reads per analysis limit. Together, these limited detection of very rare components of the microbial consortia, and would likely limit the utility of MinION for the sequencing of high-complexity metagenomic communities where thousands of taxa are expected. Furthermore, the limitations of the currently available data analysis tools suggest there is considerable room for improvement in the analytical approaches for the characterization of microbial communities using long reads. Nevertheless, the fact that the accurate taxonomic assignment of high-quality reads generated by MinION is approaching 99.5% and, in most cases, the inferred community structure mirrors the known proportions of a synthetic mixture warrants further exploration of practical application to environmental metagenomics as the platform continues to develop and improve. With further improvement in sequence throughput and error rate reduction, this platform shows great promise for precise real-time analysis of the composition and structure of more complex microbial communities. PMID:28327976

  4. MinION™ nanopore sequencing of environmental metagenomes: a synthetic approach.

    PubMed

    Brown, Bonnie L; Watson, Mick; Minot, Samuel S; Rivera, Maria C; Franklin, Rima B

    2017-03-01

    Environmental metagenomic analysis is typically accomplished by assigning taxonomy and/or function from whole genome sequencing or 16S amplicon sequences. Both of these approaches are limited, however, by read length, among other technical and biological factors. A nanopore-based sequencing platform, MinION™, produces reads that are ≥1 × 104 bp in length, potentially providing for more precise assignment, thereby alleviating some of the limitations inherent in determining metagenome composition from short reads. We tested the ability of sequence data produced by MinION (R7.3 flow cells) to correctly assign taxonomy in single bacterial species runs and in three types of low-complexity synthetic communities: a mixture of DNA using equal mass from four species, a community with one relatively rare (1%) and three abundant (33% each) components, and a mixture of genomic DNA from 20 bacterial strains of staggered representation. Taxonomic composition of the low-complexity communities was assessed by analyzing the MinION sequence data with three different bioinformatic approaches: Kraken, MG-RAST, and One Codex. Results: Long read sequences generated from libraries prepared from single strains using the version 5 kit and chemistry, run on the original MinION device, yielded as few as 224 to as many as 3497 bidirectional high-quality (2D) reads with an average overall study length of 6000 bp. For the single-strain analyses, assignment of reads to the correct genus by different methods ranged from 53.1% to 99.5%, assignment to the correct species ranged from 23.9% to 99.5%, and the majority of misassigned reads were to closely related organisms. A synthetic metagenome sequenced with the same setup yielded 714 high quality 2D reads of approximately 5500 bp that were up to 98% correctly assigned to the species level. Synthetic metagenome MinION libraries generated using version 6 kit and chemistry yielded from 899 to 3497 2D reads with lengths averaging 5700 bp with up to 98% assignment accuracy at the species level. The observed community proportions for “equal” and “rare” synthetic libraries were close to the known proportions, deviating from 0.1% to 10% across all tests. For a 20-species mock community with staggered contributions, a sequencing run detected all but 3 species (each included at <0.05% of DNA in the total mixture), 91% of reads were assigned to the correct species, 93% of reads were assigned to the correct genus, and >99% of reads were assigned to the correct family. Conclusions: At the current level of output and sequence quality (just under 4 × 103 2D reads for a synthetic metagenome), MinION sequencing followed by Kraken or One Codex analysis has the potential to provide rapid and accurate metagenomic analysis where the consortium is comprised of a limited number of taxa. Important considerations noted in this study included: high sensitivity of the MinION platform to the quality of input DNA, high variability of sequencing results across libraries and flow cells, and relatively small numbers of 2D reads per analysis limit. Together, these limited detection of very rare components of the microbial consortia, and would likely limit the utility of MinION for the sequencing of high-complexity metagenomic communities where thousands of taxa are expected. Furthermore, the limitations of the currently available data analysis tools suggest there is considerable room for improvement in the analytical approaches for the characterization of microbial communities using long reads. Nevertheless, the fact that the accurate taxonomic assignment of high-quality reads generated by MinION is approaching 99.5% and, in most cases, the inferred community structure mirrors the known proportions of a synthetic mixture warrants further exploration of practical application to environmental metagenomics as the platform continues to develop and improve. With further improvement in sequence throughput and error rate reduction, this platform shows great promise for precise real-time analysis of the composition and structure of more complex microbial communities. © The Author 2017. Published by Oxford University Press.

  5. BinSanity: unsupervised clustering of environmental microbial assemblies using coverage and affinity propagation

    PubMed Central

    Heidelberg, John F.; Tully, Benjamin J.

    2017-01-01

    Metagenomics has become an integral part of defining microbial diversity in various environments. Many ecosystems have characteristically low biomass and few cultured representatives. Linking potential metabolisms to phylogeny in environmental microorganisms is important for interpreting microbial community functions and the impacts these communities have on geochemical cycles. However, with metagenomic studies there is the computational hurdle of ‘binning’ contigs into phylogenetically related units or putative genomes. Binning methods have been implemented with varying approaches such as k-means clustering, Gaussian mixture models, hierarchical clustering, neural networks, and two-way clustering; however, many of these suffer from biases against low coverage/abundance organisms and closely related taxa/strains. We are introducing a new binning method, BinSanity, that utilizes the clustering algorithm affinity propagation (AP), to cluster assemblies using coverage with compositional based refinement (tetranucleotide frequency and percent GC content) to optimize bins containing multiple source organisms. This separation of composition and coverage based clustering reduces bias for closely related taxa. BinSanity was developed and tested on artificial metagenomes varying in size and complexity. Results indicate that BinSanity has a higher precision, recall, and Adjusted Rand Index compared to five commonly implemented methods. When tested on a previously published environmental metagenome, BinSanity generated high completion and low redundancy bins corresponding with the published metagenome-assembled genomes. PMID:28289564

  6. Mining for Nonribosomal Peptide Synthetase and Polyketide Synthase Genes Revealed a High Level of Diversity in the Sphagnum Bog Metagenome

    PubMed Central

    Müller, Christina A.; Oberauner-Wappis, Lisa; Peyman, Armin; Amos, Gregory C. A.; Wellington, Elizabeth M. H.

    2015-01-01

    Sphagnum bog ecosystems are among the oldest vegetation forms harboring a specific microbial community and are known to produce an exceptionally wide variety of bioactive substances. Although the Sphagnum metagenome shows a rich secondary metabolism, the genes have not yet been explored. To analyze nonribosomal peptide synthetases (NRPSs) and polyketide synthases (PKSs), the diversity of NRPS and PKS genes in Sphagnum-associated metagenomes was investigated by in silico data mining and sequence-based screening (PCR amplification of 9,500 fosmid clones). The in silico Illumina-based metagenomic approach resulted in the identification of 279 NRPSs and 346 PKSs, as well as 40 PKS-NRPS hybrid gene sequences. The occurrence of NRPS sequences was strongly dominated by the members of the Protebacteria phylum, especially by species of the Burkholderia genus, while PKS sequences were mainly affiliated with Actinobacteria. Thirteen novel NRPS-related sequences were identified by PCR amplification screening, displaying amino acid identities of 48% to 91% to annotated sequences of members of the phyla Proteobacteria, Actinobacteria, and Cyanobacteria. Some of the identified metagenomic clones showed the closest similarity to peptide synthases from Burkholderia or Lysobacter, which are emerging bacterial sources of as-yet-undescribed bioactive metabolites. This report highlights the role of the extreme natural ecosystems as a promising source for detection of secondary compounds and enzymes, serving as a source for biotechnological applications. PMID:26002894

  7. Metagenomic and metabolic profiling of nonlithifying and lithifying stromatolitic mats of Highborne Cay, The Bahamas.

    PubMed

    Khodadad, Christina L M; Foster, Jamie S

    2012-01-01

    Stromatolites are laminated carbonate build-ups formed by the metabolic activity of microbial mats and represent one of the oldest known ecosystems on Earth. In this study, we examined a living stromatolite located within the Exuma Sound, The Bahamas and profiled the metagenome and metabolic potential underlying these complex microbial communities. The metagenomes of the two dominant stromatolitic mat types, a nonlithifying (Type 1) and lithifying (Type 3) microbial mat, were partially sequenced and compared. This deep-sequencing approach was complemented by profiling the substrate utilization patterns of the mats using metabolic microarrays. Taxonomic assessment of the protein-encoding genes confirmed previous SSU rRNA analyses that bacteria dominate the metagenome of both mat types. Eukaryotes comprised less than 13% of the metagenomes and were rich in sequences associated with nematodes and heterotrophic protists. Comparative genomic analyses of the functional genes revealed extensive similarities in most of the subsystems between the nonlithifying and lithifying mat types. The one exception was an increase in the relative abundance of certain genes associated with carbohydrate metabolism in the lithifying Type 3 mats. Specifically, genes associated with the degradation of carbohydrates commonly found in exopolymeric substances, such as hexoses, deoxy- and acidic sugars were found. The genetic differences in carbohydrate metabolisms between the two mat types were confirmed using metabolic microarrays. Lithifying mats had a significant increase in diversity and utilization of carbon, nitrogen, phosphorus and sulfur substrates. The two stromatolitic mat types retained similar microbial communities, functional diversity and many genetic components within their metagenomes. However, there were major differences detected in the activity and genetic pathways of organic carbon utilization. These differences provide a strong link between the metagenome and the physiology of the mats, as well as new insights into the biological processes associated with carbonate precipitation in modern marine stromatolites.

  8. Metabolic Reconstruction for Metagenomic Data and Its Application to the Human Microbiome

    PubMed Central

    Abubucker, Sahar; Segata, Nicola; Goll, Johannes; Schubert, Alyxandria M.; Izard, Jacques; Cantarel, Brandi L.; Rodriguez-Mueller, Beltran; Zucker, Jeremy; Thiagarajan, Mathangi; Henrissat, Bernard; White, Owen; Kelley, Scott T.; Methé, Barbara; Schloss, Patrick D.; Gevers, Dirk; Mitreva, Makedonka; Huttenhower, Curtis

    2012-01-01

    Microbial communities carry out the majority of the biochemical activity on the planet, and they play integral roles in processes including metabolism and immune homeostasis in the human microbiome. Shotgun sequencing of such communities' metagenomes provides information complementary to organismal abundances from taxonomic markers, but the resulting data typically comprise short reads from hundreds of different organisms and are at best challenging to assemble comparably to single-organism genomes. Here, we describe an alternative approach to infer the functional and metabolic potential of a microbial community metagenome. We determined the gene families and pathways present or absent within a community, as well as their relative abundances, directly from short sequence reads. We validated this methodology using a collection of synthetic metagenomes, recovering the presence and abundance both of large pathways and of small functional modules with high accuracy. We subsequently applied this method, HUMAnN, to the microbial communities of 649 metagenomes drawn from seven primary body sites on 102 individuals as part of the Human Microbiome Project (HMP). This provided a means to compare functional diversity and organismal ecology in the human microbiome, and we determined a core of 24 ubiquitously present modules. Core pathways were often implemented by different enzyme families within different body sites, and 168 functional modules and 196 metabolic pathways varied in metagenomic abundance specifically to one or more niches within the microbiome. These included glycosaminoglycan degradation in the gut, as well as phosphate and amino acid transport linked to host phenotype (vaginal pH) in the posterior fornix. An implementation of our methodology is available at http://huttenhower.sph.harvard.edu/humann. This provides a means to accurately and efficiently characterize microbial metabolic pathways and functional modules directly from high-throughput sequencing reads, enabling the determination of community roles in the HMP cohort and in future metagenomic studies. PMID:22719234

  9. Unbiased RNA Shotgun Metagenomics in Social and Solitary Wild Bees Detects Associations with Eukaryote Parasites and New Viruses

    PubMed Central

    De Smet, Lina; Smagghe, Guy; Vierstraete, Andy; Braeckman, Bart P.; de Graaf, Dirk C.

    2016-01-01

    The diversity of eukaryote organisms and viruses associated with wild bees remains poorly characterized in contrast to the well-documented pathosphere of the western honey bee, Apis mellifera. Using a deliberate RNA shotgun metagenomic sequencing strategy in combination with a dedicated bioinformatics workflow, we identified the (micro-)organisms and viruses associated with two bumble bee hosts, Bombus terrestris and Bombus pascuorum, and two solitary bee hosts, Osmia cornuta and Andrena vaga. Ion Torrent semiconductor sequencing generated approximately 3.8 million high quality reads. The most significant eukaryote associations were two protozoan, Apicystis bombi and Crithidia bombi, and one nematode parasite Sphaerularia bombi in bumble bees. The trypanosome protozoan C. bombi was also found in the solitary bee O. cornuta. Next to the identification of three honey bee viruses Black queen cell virus, Sacbrood virus and Varroa destructor virus-1 and four plant viruses, we describe two novel RNA viruses Scaldis River bee virus (SRBV) and Ganda bee virus (GABV) based on their partial genomic sequences. The novel viruses belong to the class of negative-sense RNA viruses, SRBV is related to the order Mononegavirales whereas GABV is related to the family Bunyaviridae. The potential biological role of both viruses in bees is discussed in the context of recent advances in the field of arthropod viruses. Further, fragmentary sequence evidence for other undescribed viruses is presented, among which a nudivirus in O. cornuta and an unclassified virus related to Chronic bee paralysis virus in B. terrestris. Our findings extend the current knowledge of wild bee parasites in general and addsto the growing evidence of unexplored arthropod viruses in valuable insects. PMID:28006002

  10. Unbiased RNA Shotgun Metagenomics in Social and Solitary Wild Bees Detects Associations with Eukaryote Parasites and New Viruses.

    PubMed

    Schoonvaere, Karel; De Smet, Lina; Smagghe, Guy; Vierstraete, Andy; Braeckman, Bart P; de Graaf, Dirk C

    2016-01-01

    The diversity of eukaryote organisms and viruses associated with wild bees remains poorly characterized in contrast to the well-documented pathosphere of the western honey bee, Apis mellifera. Using a deliberate RNA shotgun metagenomic sequencing strategy in combination with a dedicated bioinformatics workflow, we identified the (micro-)organisms and viruses associated with two bumble bee hosts, Bombus terrestris and Bombus pascuorum, and two solitary bee hosts, Osmia cornuta and Andrena vaga. Ion Torrent semiconductor sequencing generated approximately 3.8 million high quality reads. The most significant eukaryote associations were two protozoan, Apicystis bombi and Crithidia bombi, and one nematode parasite Sphaerularia bombi in bumble bees. The trypanosome protozoan C. bombi was also found in the solitary bee O. cornuta. Next to the identification of three honey bee viruses Black queen cell virus, Sacbrood virus and Varroa destructor virus-1 and four plant viruses, we describe two novel RNA viruses Scaldis River bee virus (SRBV) and Ganda bee virus (GABV) based on their partial genomic sequences. The novel viruses belong to the class of negative-sense RNA viruses, SRBV is related to the order Mononegavirales whereas GABV is related to the family Bunyaviridae. The potential biological role of both viruses in bees is discussed in the context of recent advances in the field of arthropod viruses. Further, fragmentary sequence evidence for other undescribed viruses is presented, among which a nudivirus in O. cornuta and an unclassified virus related to Chronic bee paralysis virus in B. terrestris. Our findings extend the current knowledge of wild bee parasites in general and addsto the growing evidence of unexplored arthropod viruses in valuable insects.

  11. Next generation sequence assembly with AMOS.

    PubMed

    Treangen, Todd J; Sommer, Dan D; Angly, Florent E; Koren, Sergey; Pop, Mihai

    2011-03-01

    A Modular Open-Source Assembler (AMOS) was designed to offer a modular approach to genome assembly. AMOS includes a wide range of tools for assembly, including the lightweight de novo assemblers Minimus and Minimo, and Bambus 2, a robust scaffolder able to handle metagenomic and polymorphic data. This protocol describes how to configure and use AMOS for the assembly of Next Generation sequence data. Additionally, we provide three tutorial examples that include bacterial, viral, and metagenomic datasets with specific tips for improving assembly quality. © 2011 by John Wiley & Sons, Inc.

  12. Comparative analysis of metagenomes of Italian top soil improvers

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Gigliucci, Federica, E-mail: Federica.gigliucci@li

    Biosolids originating from Municipal Waste Water Treatment Plants are proposed as top soil improvers (TSI) for their beneficial input of organic carbon on agriculture lands. Their use to amend soil is controversial, as it may lead to the presence of emerging hazards of anthropogenic or animal origin in the environment devoted to food production. In this study, we used a shotgun metagenomics sequencing as a tool to perform a characterization of the hazards related with the TSIs. The samples showed the presence of many virulence genes associated to different diarrheagenic E. coli pathotypes as well as of different antimicrobial resistance-associatedmore » genes. The genes conferring resistance to Fluoroquinolones was the most relevant class of antimicrobial resistance genes observed in all the samples tested. To a lesser extent traits associated with the resistance to Methicillin in Staphylococci and genes conferring resistance to Streptothricin, Fosfomycin and Vancomycin were also identified. The most represented metal resistance genes were cobalt-zinc-cadmium related, accounting for 15–50% of the sequence reads in the different metagenomes out of the total number of those mapping on the class of resistance to compounds determinants. Moreover the taxonomic analysis performed by comparing compost-based samples and biosolids derived from municipal sewage-sludges treatments divided the samples into separate populations, based on the microbiota composition. The results confirm that the metagenomics is efficient to detect genomic traits associated with pathogens and antimicrobial resistance in complex matrices and this approach can be efficiently used for the traceability of TSI samples using the microorganisms’ profiles as indicators of their origin. - Highlights: • Sludge- and green- based biosolids analysed by metagenomics. • Biosolids may introduce microbial hazards in the food chain. • Metagenomics enables tracking biosolids’ sources.« less

  13. Investigating the Connection between hgcA and Mercury Methylation Rates in the Environment

    NASA Astrophysics Data System (ADS)

    King, A. J.; Christensen, G. A.; Wymore, A. M.; Podar, M.; Hurt, R. A., Jr.; Brown, S. D.; Palumbo, A. V.; Bender, K. S.; Fields, M. W.; Gilmour, C. C.; Santillan, E. F. U.; Brandt, C. C.; Elias, D. A.

    2015-12-01

    Methylmercury (MeHg) is a common contaminant in many natural environments and is known to be a neurotoxin that impacts human health through bioaccumulation in food webs. The anaerobic conversion of mercury (Hg) to MeHg by microorganisms requires the presence of both HgcA and HgcB. In an effort to link hgcAB abundance and diversity with MeHg generation rates, we performed metagenomic and 16S rRNA sequencing as well as qualitative polymerase chain reaction (qPCR) of hgcA on samples from eight mercury-contaminated sites ranging from tidal marshes to Arctic permafrost. Custom algorithms were developed to filter hgcA sequences from the metagenomes, and to then select for those lineages that also contained hgcB. In the metagenomes, the Deltaproteobacteria dominated the pool of hgcAB from all eight sites; however, Firmicutes and methanogenic Archaea were each 50% less abundant. In parallel to the metagenomics studies, clone libraries of hgcAB were constructed for each site. This more cost-effective approach allowed us to verify the identity of the hgcAB+ organism, and yielded similar results to the metagenomes. Additionally, to determine the accuracy of our new degenerate qPCR primer sets (three sets specific to the three major clades of mercury methylators) in the environment, qPCR hgcA abundance values were compared to those derived from the metagenomes. Finally, we present evidence that hgcA abundance can correlate with MeHg concentrations but that the relationship is influenced by local environmental conditions. Our work demonstrates the relative efficacy of genetic methods for assessing the presence of mercury-methylators in eight different environments contaminated with mercury as well as the strength of association between abundance of hgcA and the rate of mercury methylation.

  14. Reconsidering Cluster Bias in Multilevel Data: A Monte Carlo Comparison of Free and Constrained Baseline Approaches.

    PubMed

    Guenole, Nigel

    2018-01-01

    The test for item level cluster bias examines the improvement in model fit that results from freeing an item's between level residual variance from a baseline model with equal within and between level factor loadings and between level residual variances fixed at zero. A potential problem is that this approach may include a misspecified unrestricted model if any non-invariance is present, but the log-likelihood difference test requires that the unrestricted model is correctly specified. A free baseline approach where the unrestricted model includes only the restrictions needed for model identification should lead to better decision accuracy, but no studies have examined this yet. We ran a Monte Carlo study to investigate this issue. When the referent item is unbiased, compared to the free baseline approach, the constrained baseline approach led to similar true positive (power) rates but much higher false positive (Type I error) rates. The free baseline approach should be preferred when the referent indicator is unbiased. When the referent assumption is violated, the false positive rate was unacceptably high for both free and constrained baseline approaches, and the true positive rate was poor regardless of whether the free or constrained baseline approach was used. Neither the free or constrained baseline approach can be recommended when the referent indicator is biased. We recommend paying close attention to ensuring the referent indicator is unbiased in tests of cluster bias. All Mplus input and output files, R, and short Python scripts used to execute this simulation study are uploaded to an open access repository.

  15. Reconsidering Cluster Bias in Multilevel Data: A Monte Carlo Comparison of Free and Constrained Baseline Approaches

    PubMed Central

    Guenole, Nigel

    2018-01-01

    The test for item level cluster bias examines the improvement in model fit that results from freeing an item's between level residual variance from a baseline model with equal within and between level factor loadings and between level residual variances fixed at zero. A potential problem is that this approach may include a misspecified unrestricted model if any non-invariance is present, but the log-likelihood difference test requires that the unrestricted model is correctly specified. A free baseline approach where the unrestricted model includes only the restrictions needed for model identification should lead to better decision accuracy, but no studies have examined this yet. We ran a Monte Carlo study to investigate this issue. When the referent item is unbiased, compared to the free baseline approach, the constrained baseline approach led to similar true positive (power) rates but much higher false positive (Type I error) rates. The free baseline approach should be preferred when the referent indicator is unbiased. When the referent assumption is violated, the false positive rate was unacceptably high for both free and constrained baseline approaches, and the true positive rate was poor regardless of whether the free or constrained baseline approach was used. Neither the free or constrained baseline approach can be recommended when the referent indicator is biased. We recommend paying close attention to ensuring the referent indicator is unbiased in tests of cluster bias. All Mplus input and output files, R, and short Python scripts used to execute this simulation study are uploaded to an open access repository. PMID:29551985

  16. Bioprospecting of functional cellulases from metagenome for second generation biofuel production: a review.

    PubMed

    Tiwari, Rameshwar; Nain, Lata; Labrou, Nikolaos E; Shukla, Pratyoosh

    2018-03-01

    Second generation biofuel production has been appeared as a sustainable and alternative energy option. The ultimate aim is the development of an industrially feasible and economic conversion process of lignocellulosic biomass into biofuel molecules. Since, cellulose is the most abundant biopolymer and also represented as the photosynthetically fixed form of carbon, the efficient hydrolysis of cellulose is the most important step towards the development of a sustainable biofuel production process. The enzymatic hydrolysis of cellulose by suites of hydrolytic enzymes underlines the importance of cellulase enzyme system in whole hydrolysis process. However, the selection of the suitable cellulolytic enzymes with enhanced activities remains a challenge for the biorefinery industry to obtain efficient enzymatic hydrolysis of biomass. The present review focuses on deciphering the novel and effective cellulases from different environmental niches by unculturable metagenomic approaches. Furthermore, a comprehensive functional aspect of cellulases is also presented and evaluated by assessing the structural and catalytic properties as well as sequence identities and expression patterns. This review summarizes the recent development in metagenomics based approaches for identifying and exploring novel cellulases which open new avenues for their successful application in biorefineries.

  17. Metagenomic analysis reveals significant changes of microbial compositions and protective functions during drinking water treatment.

    PubMed

    Chao, Yuanqing; Ma, Liping; Yang, Ying; Ju, Feng; Zhang, Xu-Xiang; Wu, Wei-Min; Zhang, Tong

    2013-12-19

    The metagenomic approach was applied to characterize variations of microbial structure and functions in raw (RW) and treated water (TW) in a drinking water treatment plant (DWTP) at Pearl River Delta, China. Microbial structure was significantly influenced by the treatment processes, shifting from Gammaproteobacteria and Betaproteobacteria in RW to Alphaproteobacteria in TW. Further functional analysis indicated the basic metabolic functions of microorganisms in TW did not vary considerably. However, protective functions, i.e. glutathione synthesis genes in 'oxidative stress' and 'detoxification' subsystems, significantly increased, revealing the surviving bacteria may have higher chlorine resistance. Similar results were also found in glutathione metabolism pathway, which identified the major reaction for glutathione synthesis and supported more genes for glutathione metabolism existed in TW. This metagenomic study largely enhanced our knowledge about the influences of treatment processes, especially chlorination, on bacterial community structure and protective functions (e.g. glutathione metabolism) in ecosystems of DWTPs.

  18. Metagenomic analysis of microbial communities yields insight into impacts of nanoparticle design

    NASA Astrophysics Data System (ADS)

    Metch, Jacob W.; Burrows, Nathan D.; Murphy, Catherine J.; Pruden, Amy; Vikesland, Peter J.

    2018-01-01

    Next-generation DNA sequencing and metagenomic analysis provide powerful tools for the environmentally friendly design of nanoparticles. Herein we demonstrate this approach using a model community of environmental microbes (that is, wastewater-activated sludge) dosed with gold nanoparticles of varying surface coatings and morphologies. Metagenomic analysis was highly sensitive in detecting the microbial community response to gold nanospheres and nanorods with either cetyltrimethylammonium bromide or polyacrylic acid surface coatings. We observed that the gold-nanoparticle morphology imposes a stronger force in shaping the microbial community structure than does the surface coating. Trends were consistent in terms of the compositions of both taxonomic and functional genes, which include antibiotic resistance genes, metal resistance genes and gene-transfer elements associated with cell stress that are relevant to public health. Given that nanoparticle morphology remained constant, the potential influence of gold dissolution was minimal. Surface coating governed the nanoparticle partitioning between the bioparticulate and aqueous phases.

  19. Laboratory procedures to generate viral metagenomes.

    PubMed

    Thurber, Rebecca V; Haynes, Matthew; Breitbart, Mya; Wegley, Linda; Rohwer, Forest

    2009-01-01

    This collection of laboratory protocols describes the steps to collect viruses from various samples with the specific aim of generating viral metagenome sequence libraries (viromes). Viral metagenomics, the study of uncultured viral nucleic acid sequences from different biomes, relies on several concentration, purification, extraction, sequencing and heuristic bioinformatic methods. No single technique can provide an all-inclusive approach, and therefore the protocols presented here will be discussed in terms of hypothetical projects. However, care must be taken to individualize each step depending on the source and type of viral-particles. This protocol is a description of the processes we have successfully used to: (i) concentrate viral particles from various types of samples, (ii) eliminate contaminating cells and free nucleic acids and (iii) extract, amplify and purify viral nucleic acids. Overall, a sample can be processed to isolate viral nucleic acids suitable for high-throughput sequencing in approximately 1 week.

  20. Characterization of the SOS meta-regulon in the human gut microbiome.

    PubMed

    Cornish, Joseph P; Sanchez-Alberola, Neus; O'Neill, Patrick K; O'Keefe, Ronald; Gheba, Jameel; Erill, Ivan

    2014-05-01

    Data from metagenomics projects remain largely untapped for the analysis of transcriptional regulatory networks. Here, we provide proof-of-concept that metagenomic data can be effectively leveraged to analyze regulatory networks by characterizing the SOS meta-regulon in the human gut microbiome. We combine well-established in silico and in vitro techniques to mine the human gut microbiome data and determine the relative composition of the SOS network in a natural setting. Our analysis highlights the importance of translesion synthesis as a primary function of the SOS response. We predict the association of this network with three novel protein clusters involved in cell wall biogenesis, chromosome partitioning and restriction modification, and we confirm binding of the SOS response transcriptional repressor to sites in the promoter of a cell wall biogenesis enzyme, a phage integrase and a death-on-curing protein. We discuss the implications of these findings and the potential for this approach for metagenome analysis.

  1. Integrated Metagenomics/Metaproteomics Reveals Human Host-Microbiota Signatures of Crohn's Disease

    PubMed Central

    Darzi, Youssef; Mongodin, Emmanuel F.; Pan, Chongle; Shah, Manesh; Halfvarson, Jonas; Tysk, Curt; Henrissat, Bernard; Raes, Jeroen; Verberkmoes, Nathan C.; Jansson, Janet K.

    2012-01-01

    Crohn's disease (CD) is an inflammatory bowel disease of complex etiology, although dysbiosis of the gut microbiota has been implicated in chronic immune-mediated inflammation associated with CD. Here we combined shotgun metagenomic and metaproteomic approaches to identify potential functional signatures of CD in stool samples from six twin pairs that were either healthy, or that had CD in the ileum (ICD) or colon (CCD). Integration of these omics approaches revealed several genes, proteins, and pathways that primarily differentiated ICD from healthy subjects, including depletion of many proteins in ICD. In addition, the ICD phenotype was associated with alterations in bacterial carbohydrate metabolism, bacterial-host interactions, as well as human host-secreted enzymes. This eco-systems biology approach underscores the link between the gut microbiota and functional alterations in the pathophysiology of Crohn's disease and aids in identification of novel diagnostic targets and disease specific biomarkers. PMID:23209564

  2. SUPER-FOCUS: a tool for agile functional analysis of shotgun metagenomic data

    PubMed Central

    Green, Kevin T.; Dutilh, Bas E.; Edwards, Robert A.

    2016-01-01

    Summary: Analyzing the functional profile of a microbial community from unannotated shotgun sequencing reads is one of the important goals in metagenomics. Functional profiling has valuable applications in biological research because it identifies the abundances of the functional genes of the organisms present in the original sample, answering the question what they can do. Currently, available tools do not scale well with increasing data volumes, which is important because both the number and lengths of the reads produced by sequencing platforms keep increasing. Here, we introduce SUPER-FOCUS, SUbsystems Profile by databasE Reduction using FOCUS, an agile homology-based approach using a reduced reference database to report the subsystems present in metagenomic datasets and profile their abundances. SUPER-FOCUS was tested with over 70 real metagenomes, the results showing that it accurately predicts the subsystems present in the profiled microbial communities, and is up to 1000 times faster than other tools. Availability and implementation: SUPER-FOCUS was implemented in Python, and its source code and the tool website are freely available at https://edwards.sdsu.edu/SUPERFOCUS. Contact: redwards@mail.sdsu.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:26454280

  3. SUPER-FOCUS: a tool for agile functional analysis of shotgun metagenomic data.

    PubMed

    Silva, Genivaldo Gueiros Z; Green, Kevin T; Dutilh, Bas E; Edwards, Robert A

    2016-02-01

    Analyzing the functional profile of a microbial community from unannotated shotgun sequencing reads is one of the important goals in metagenomics. Functional profiling has valuable applications in biological research because it identifies the abundances of the functional genes of the organisms present in the original sample, answering the question what they can do. Currently, available tools do not scale well with increasing data volumes, which is important because both the number and lengths of the reads produced by sequencing platforms keep increasing. Here, we introduce SUPER-FOCUS, SUbsystems Profile by databasE Reduction using FOCUS, an agile homology-based approach using a reduced reference database to report the subsystems present in metagenomic datasets and profile their abundances. SUPER-FOCUS was tested with over 70 real metagenomes, the results showing that it accurately predicts the subsystems present in the profiled microbial communities, and is up to 1000 times faster than other tools. SUPER-FOCUS was implemented in Python, and its source code and the tool website are freely available at https://edwards.sdsu.edu/SUPERFOCUS. redwards@mail.sdsu.edu Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press.

  4. Normalization of environmental metagenomic DNA enhances the discovery of under-represented microbial community members.

    PubMed

    Ramond, J-B; Makhalanyane, T P; Tuffin, M I; Cowan, D A

    2015-04-01

    Normalization is a procedure classically employed to detect rare sequences in cellular expression profiles (i.e. cDNA libraries). Here, we present a normalization protocol involving the direct treatment of extracted environmental metagenomic DNA with S1 nuclease, referred to as normalization of metagenomic DNA: NmDNA. We demonstrate that NmDNA, prior to post hoc PCR-based experiments (16S rRNA gene T-RFLP fingerprinting and clone library), increased the diversity of sequences retrieved from environmental microbial communities by detection of rarer sequences. This approach could be used to enhance the resolution of detection of ecologically relevant rare members in environmental microbial assemblages and therefore is promising in enabling a better understanding of ecosystem functioning. This study is the first testing 'normalization' on environmental metagenomic DNA (mDNA). The aim of this procedure was to improve the identification of rare phylotypes in environmental communities. Using hypoliths as model systems, we present evidence that this post-mDNA extraction molecular procedure substantially enhances the detection of less common phylotypes and could even lead to the discovery of novel microbial genotypes within a given environment. © 2014 The Society for Applied Microbiology.

  5. Isolation and characterization of two serine proteases from metagenomic libraries of the Gobi and Death Valley deserts.

    PubMed

    Neveu, Julie; Regeard, Christophe; DuBow, Michael S

    2011-08-01

    The screening of environmental DNA metagenome libraries for functional activities can provide an important source of new molecules and enzymes. In this study, we identified 17 potential protease-producing clones from two metagenomic libraries derived from samples of surface sand from the Gobi and Death Valley deserts. Two of the proteases, DV1 and M30, were purified and biochemically examined. These two proteases displayed a molecular mass of 41.5 kDa and 45.7 kDa, respectively, on SDS polyacrylamide gels. Alignments with known protease sequences showed less than 55% amino acid sequence identity. These two serine proteases appear to belong to the subtilisin (S8A) family and displayed several unique biochemical properties. Protease DV1 had an optimum pH of 8 and an optimal activity at 55°C, while protease M30 had an optimum pH >11 and optimal activity at 40°C. The properties of these enzymes make them potentially useful for biotechnological applications and again demonstrate that metagenomic approaches can be useful, especially when coupled with the study of novel environments such as deserts.

  6. Species richness in soil bacterial communities: a proposed approach to overcome sample size bias.

    PubMed

    Youssef, Noha H; Elshahed, Mostafa S

    2008-09-01

    Estimates of species richness based on 16S rRNA gene clone libraries are increasingly utilized to gauge the level of bacterial diversity within various ecosystems. However, previous studies have indicated that regardless of the utilized approach, species richness estimates obtained are dependent on the size of the analyzed clone libraries. We here propose an approach to overcome sample size bias in species richness estimates in complex microbial communities. Parametric (Maximum likelihood-based and rarefaction curve-based) and non-parametric approaches were used to estimate species richness in a library of 13,001 near full-length 16S rRNA clones derived from soil, as well as in multiple subsets of the original library. Species richness estimates obtained increased with the increase in library size. To obtain a sample size-unbiased estimate of species richness, we calculated the theoretical clone library sizes required to encounter the estimated species richness at various clone library sizes, used curve fitting to determine the theoretical clone library size required to encounter the "true" species richness, and subsequently determined the corresponding sample size-unbiased species richness value. Using this approach, sample size-unbiased estimates of 17,230, 15,571, and 33,912 were obtained for the ML-based, rarefaction curve-based, and ACE-1 estimators, respectively, compared to bias-uncorrected values of 15,009, 11,913, and 20,909.

  7. Integrating Metagenomics and NanoSIMS to Investigate the Evolution and Ecophysiology of Magnetotactic Bacteria

    NASA Astrophysics Data System (ADS)

    Lin, W.; Zhang, W.; He, M.; Pan, Y.

    2017-12-01

    Magnetotactic bacteria (MTB) synthesize intracellular nano-sized magnetite (Fe3O4) and/or greigite (Fe3S4) crystals, called magnetosomes, which impart a permanent magnetic dipole moment to the cell causing it to align along the geomagnetic field lines as it swims. MTB play essential roles in global cycling of Fe, S, N and C, and represent an excellent model system not just for the investigation of the mechanisms of microbial engines that drive Earth's biogeochemical cycles but also for magnetotaxis and microbial biomineralization. Most of the previous studies on MTB were based on 16S rRNA gene-targeting analyses, which are powerful approaches to characterize the diversity, ecology and biogeography of MTB in nature. However, these approaches are somewhat limited in the physiological detail they can provide. In the present study, we have combined the genome-resolved metagenomics and nanoscale secondary ion mass spectrometry (NanoSIMS) analyses to study the genomic information, biomineralization mechanism and metabolic potential of environmental MTB. Two nearly complete genomes from uncultivated MTB belonging to the Nitrospirae phylum were reconstructed and their proposed metabolisms were further investigated and confirmed through NanoSIMS analyses. These results improve our understanding about the ecophysiology and evolution of MTB and their environmental function. The development of metagenomics-NanoSIMS integrated approach will provide a powerful tool for the research of geomicrobiology and environmental microbiology.

  8. Urinary microbiome of kidney transplant patients reveals dysbiosis with potential for antibiotic resistance

    PubMed Central

    Rani, Asha; Ranjan, Ravi; McGee, Halvor S.; Andropolis, Kalista E.; Panchal, Dipti V.; Hajjiri, Zahraa; Brennan, Daniel C.; Finn, Patricia W.; Perkins, David L.

    2016-01-01

    Recent studies have established that a complex community of microbes colonize the human urinary tract; however their role in kidney transplant patients treated with prophylactic antibiotics remains poorly investigated. Our aim was to investigate the urinary microbiome of kidney transplant recipients. Urine samples from 21 patients following kidney transplantation and 8 healthy controls, were collected. All patients received prophylactic treatment with the antibiotic trimethoprim/sulfamethoxazole. Metagenomic DNA was isolated from urine samples, sequenced using metagenomics shotgun sequencing approach on Illumina HiSeq2000 platform, and analyzed for microbial taxonomic and functional annotations. Our results demonstrate that the urine microbiome of kidney transplants was markedly different at all taxonomic levels from phyla to species, had decreased microbial diversity and increased abundance of potentially pathogenic species compared to healthy controls. Specifically, at the phylum level we detected a significant decrease in Actinobacteria and increase in Firmicutes due to increases in Enterococcus faecalis. In addition, there was an increase in the Proteobacteria due to increases in E. coli. Analysis of predicted functions of the urinary metagenome revealed increased abundance of enzymes in the folate pathway including dihydrofolate synthase that are not inhibited by trimethoprim/sulfamethoxazole, but can augment folate metabolism. This report characterizes the urinary microbiome of kidney transplants using shotgun metagenomics approach. Our results indicate that the urinary microbiota may be modified in the context of prophylactic antibiotics, indicating that a therapeutic intervention may shift the urinary microbiota to select bacterial species with increased resistance to antibiotics. The evaluation and development of optimal prophylactic regimens that do not promote antibiotic resistance is an important future goal. PMID:27669488

  9. Integrating metagenomic and amplicon databases to resolve the phylogenetic and ecological diversity of the Chlamydiae

    PubMed Central

    Lagkouvardos, Ilias; Weinmaier, Thomas; Lauro, Federico M; Cavicchioli, Ricardo; Rattei, Thomas; Horn, Matthias

    2014-01-01

    In the era of metagenomics and amplicon sequencing, comprehensive analyses of available sequence data remain a challenge. Here we describe an approach exploiting metagenomic and amplicon data sets from public databases to elucidate phylogenetic diversity of defined microbial taxa. We investigated the phylum Chlamydiae whose known members are obligate intracellular bacteria that represent important pathogens of humans and animals, as well as symbionts of protists. Despite their medical relevance, our knowledge about chlamydial diversity is still scarce. Most of the nine known families are represented by only a few isolates, while previous clone library-based surveys suggested the existence of yet uncharacterized members of this phylum. Here we identified more than 22 000 high quality, non-redundant chlamydial 16S rRNA gene sequences in diverse databases, as well as 1900 putative chlamydial protein-encoding genes. Even when applying the most conservative approach, clustering of chlamydial 16S rRNA gene sequences into operational taxonomic units revealed an unexpectedly high species, genus and family-level diversity within the Chlamydiae, including 181 putative families. These in silico findings were verified experimentally in one Antarctic sample, which contained a high diversity of novel Chlamydiae. In our analysis, the Rhabdochlamydiaceae, whose known members infect arthropods, represents the most diverse and species-rich chlamydial family, followed by the protist-associated Parachlamydiaceae, and a putative new family (PCF8) with unknown host specificity. Available information on the origin of metagenomic samples indicated that marine environments contain the majority of the newly discovered chlamydial lineages, highlighting this environment as an important chlamydial reservoir. PMID:23949660

  10. Inhibition of the growth of Bacillus subtilis DSM10 by a newly discovered antibacterial protein from the soil metagenome.

    PubMed

    O'Mahony, Mark M; Henneberger, Ruth; Selvin, Joseph; Kennedy, Jonathan; Doohan, Fiona; Marchesi, Julian R; Dobson, Alan D W

    2015-01-01

    A functional metagenomics based approach exploiting the microbiota of suppressive soils from an organic field site has succeeded in the identification of a clone with the ability to inhibit the growth of Bacillus subtilis DSM10. Sequencing of the fosmid identified a putative β-lactamase-like gene abgT. Transposon mutagenesis of the abgT gene resulted in a loss in ability to inhibit the growth of B. subtilis DSM10. Further analysis of the deduced amino acid sequence of AbgT revealed moderate homology to esterases, suggesting that the protein may possess hydrolytic activity. Weak lipolytic activity was detected; however the clone did not appear to produce any β-lactamase activity. Phylogenetic analysis revealed the protein is a member of the family VIII group of lipase/esterases and clusters with a number of proteins of metagenomic origin. The abgT gene was sub-cloned into a protein expression vector and when introduced into the abgT transposon mutant clones restored the ability of the clones to inhibit the growth of B. subtilis DSM10, clearly indicating that the abgT gene is involved in the antibacterial activity. While the precise role of this protein has yet to fully elucidated, it may be involved in the generation of free fatty acid with antibacterial properties. Thus functional metagenomic approaches continue to provide a significant resource for the discovery of novel functional proteins and it is clear that hydrolytic enzymes, such as AbgT, may be a potential source for the development of future antimicrobial therapies.

  11. Genometa--a fast and accurate classifier for short metagenomic shotgun reads.

    PubMed

    Davenport, Colin F; Neugebauer, Jens; Beckmann, Nils; Friedrich, Benedikt; Kameri, Burim; Kokott, Svea; Paetow, Malte; Siekmann, Björn; Wieding-Drewes, Matthias; Wienhöfer, Markus; Wolf, Stefan; Tümmler, Burkhard; Ahlers, Volker; Sprengel, Frauke

    2012-01-01

    Metagenomic studies use high-throughput sequence data to investigate microbial communities in situ. However, considerable challenges remain in the analysis of these data, particularly with regard to speed and reliable analysis of microbial species as opposed to higher level taxa such as phyla. We here present Genometa, a computationally undemanding graphical user interface program that enables identification of bacterial species and gene content from datasets generated by inexpensive high-throughput short read sequencing technologies. Our approach was first verified on two simulated metagenomic short read datasets, detecting 100% and 94% of the bacterial species included with few false positives or false negatives. Subsequent comparative benchmarking analysis against three popular metagenomic algorithms on an Illumina human gut dataset revealed Genometa to attribute the most reads to bacteria at species level (i.e. including all strains of that species) and demonstrate similar or better accuracy than the other programs. Lastly, speed was demonstrated to be many times that of BLAST due to the use of modern short read aligners. Our method is highly accurate if bacteria in the sample are represented by genomes in the reference sequence but cannot find species absent from the reference. This method is one of the most user-friendly and resource efficient approaches and is thus feasible for rapidly analysing millions of short reads on a personal computer. The Genometa program, a step by step tutorial and Java source code are freely available from http://genomics1.mh-hannover.de/genometa/ and on http://code.google.com/p/genometa/. This program has been tested on Ubuntu Linux and Windows XP/7.

  12. Modulations of the Chicken Cecal Microbiome and Metagenome in Response to Anticoccidial and Growth Promoter Treatment

    PubMed Central

    Danzeisen, Jessica L.; Kim, Hyeun Bum; Isaacson, Richard E.; Tu, Zheng Jin; Johnson, Timothy J.

    2011-01-01

    With increasing pressures to reduce or eliminate the use of antimicrobials for growth promotion purposes in production animals, there is a growing need to better understand the effects elicited by these agents in order to identify alternative approaches that might be used to maintain animal health. Antibiotic usage at subtherapeutic levels is postulated to confer a number of modulations in the microbes within the gut that ultimately result in growth promotion and reduced occurrence of disease. This study examined the effects of the coccidiostat monensin and the growth promoters virginiamycin and tylosin on the broiler chicken cecal microbiome and metagenome. Using a longitudinal design, cecal contents of commercial chickens were extracted and examined using 16S rRNA and total DNA shotgun metagenomic pyrosequencing. A number of genus-level enrichments and depletions were observed in response to monensin alone, or monensin in combination with virginiamycin or tylosin. Of note, monensin effects included depletions of Roseburia, Lactobacillus and Enterococcus, and enrichments in Coprococcus and Anaerofilum. The most notable effect observed in the monensin/virginiamycin and monensin/tylosin treatments, but not in the monensin-alone treatments, was enrichments in Escherichia coli. Analysis of the metagenomic dataset identified enrichments in transport system genes, type I fimbrial genes, and type IV conjugative secretion system genes. No significant differences were observed with regard to antimicrobial resistance gene counts. Overall, this study provides a more comprehensive glimpse of the chicken cecum microbial community, the modulations of this community in response to growth promoters, and targets for future efforts to mimic these effects using alternative approaches. PMID:22114729

  13. Metagenomic Insights into the Evolution, Function, and Complexity of the Planktonic Microbial Community of Lake Lanier, a Temperate Freshwater Ecosystem ▿†

    PubMed Central

    Oh, Seungdae; Caro-Quintero, Alejandro; Tsementzi, Despina; DeLeon-Rodriguez, Natasha; Luo, Chengwei; Poretsky, Rachel; Konstantinidis, Konstantinos T.

    2011-01-01

    Lake Lanier is an important freshwater lake for the southeast United States, as it represents the main source of drinking water for the Atlanta metropolitan area and is popular for recreational activities. Temperate freshwater lakes such as Lake Lanier are underrepresented among the growing number of environmental metagenomic data sets, and little is known about how functional gene content in freshwater communities relates to that of other ecosystems. To better characterize the gene content and variability of this freshwater planktonic microbial community, we sequenced several samples obtained around a strong summer storm event and during the fall water mixing using a random whole-genome shotgun (WGS) approach. Comparative metagenomics revealed that the gene content was relatively stable over time and more related to that of another freshwater lake and the surface ocean than to soil. However, the phylogenetic diversity of Lake Lanier communities was distinct from that of soil and marine communities. We identified several important genomic adaptations that account for these findings, such as the use of potassium (as opposed to sodium) osmoregulators by freshwater organisms and differences in the community average genome size. We show that the lake community is predominantly composed of sequence-discrete populations and describe a simple method to assess community complexity based on population richness and evenness and to determine the sequencing effort required to cover diversity in a sample. This study provides the first comprehensive analysis of the genetic diversity and metabolic potential of a temperate planktonic freshwater community and advances approaches for comparative metagenomics. PMID:21764968

  14. Mining for Nonribosomal Peptide Synthetase and Polyketide Synthase Genes Revealed a High Level of Diversity in the Sphagnum Bog Metagenome.

    PubMed

    Müller, Christina A; Oberauner-Wappis, Lisa; Peyman, Armin; Amos, Gregory C A; Wellington, Elizabeth M H; Berg, Gabriele

    2015-08-01

    Sphagnum bog ecosystems are among the oldest vegetation forms harboring a specific microbial community and are known to produce an exceptionally wide variety of bioactive substances. Although the Sphagnum metagenome shows a rich secondary metabolism, the genes have not yet been explored. To analyze nonribosomal peptide synthetases (NRPSs) and polyketide synthases (PKSs), the diversity of NRPS and PKS genes in Sphagnum-associated metagenomes was investigated by in silico data mining and sequence-based screening (PCR amplification of 9,500 fosmid clones). The in silico Illumina-based metagenomic approach resulted in the identification of 279 NRPSs and 346 PKSs, as well as 40 PKS-NRPS hybrid gene sequences. The occurrence of NRPS sequences was strongly dominated by the members of the Protebacteria phylum, especially by species of the Burkholderia genus, while PKS sequences were mainly affiliated with Actinobacteria. Thirteen novel NRPS-related sequences were identified by PCR amplification screening, displaying amino acid identities of 48% to 91% to annotated sequences of members of the phyla Proteobacteria, Actinobacteria, and Cyanobacteria. Some of the identified metagenomic clones showed the closest similarity to peptide synthases from Burkholderia or Lysobacter, which are emerging bacterial sources of as-yet-undescribed bioactive metabolites. This report highlights the role of the extreme natural ecosystems as a promising source for detection of secondary compounds and enzymes, serving as a source for biotechnological applications. Copyright © 2015, American Society for Microbiology. All Rights Reserved.

  15. An Unbiased Method To Build Benchmarking Sets for Ligand-Based Virtual Screening and its Application To GPCRs

    PubMed Central

    2015-01-01

    Benchmarking data sets have become common in recent years for the purpose of virtual screening, though the main focus had been placed on the structure-based virtual screening (SBVS) approaches. Due to the lack of crystal structures, there is great need for unbiased benchmarking sets to evaluate various ligand-based virtual screening (LBVS) methods for important drug targets such as G protein-coupled receptors (GPCRs). To date these ready-to-apply data sets for LBVS are fairly limited, and the direct usage of benchmarking sets designed for SBVS could bring the biases to the evaluation of LBVS. Herein, we propose an unbiased method to build benchmarking sets for LBVS and validate it on a multitude of GPCRs targets. To be more specific, our methods can (1) ensure chemical diversity of ligands, (2) maintain the physicochemical similarity between ligands and decoys, (3) make the decoys dissimilar in chemical topology to all ligands to avoid false negatives, and (4) maximize spatial random distribution of ligands and decoys. We evaluated the quality of our Unbiased Ligand Set (ULS) and Unbiased Decoy Set (UDS) using three common LBVS approaches, with Leave-One-Out (LOO) Cross-Validation (CV) and a metric of average AUC of the ROC curves. Our method has greatly reduced the “artificial enrichment” and “analogue bias” of a published GPCRs benchmarking set, i.e., GPCR Ligand Library (GLL)/GPCR Decoy Database (GDD). In addition, we addressed an important issue about the ratio of decoys per ligand and found that for a range of 30 to 100 it does not affect the quality of the benchmarking set, so we kept the original ratio of 39 from the GLL/GDD. PMID:24749745

  16. An unbiased method to build benchmarking sets for ligand-based virtual screening and its application to GPCRs.

    PubMed

    Xia, Jie; Jin, Hongwei; Liu, Zhenming; Zhang, Liangren; Wang, Xiang Simon

    2014-05-27

    Benchmarking data sets have become common in recent years for the purpose of virtual screening, though the main focus had been placed on the structure-based virtual screening (SBVS) approaches. Due to the lack of crystal structures, there is great need for unbiased benchmarking sets to evaluate various ligand-based virtual screening (LBVS) methods for important drug targets such as G protein-coupled receptors (GPCRs). To date these ready-to-apply data sets for LBVS are fairly limited, and the direct usage of benchmarking sets designed for SBVS could bring the biases to the evaluation of LBVS. Herein, we propose an unbiased method to build benchmarking sets for LBVS and validate it on a multitude of GPCRs targets. To be more specific, our methods can (1) ensure chemical diversity of ligands, (2) maintain the physicochemical similarity between ligands and decoys, (3) make the decoys dissimilar in chemical topology to all ligands to avoid false negatives, and (4) maximize spatial random distribution of ligands and decoys. We evaluated the quality of our Unbiased Ligand Set (ULS) and Unbiased Decoy Set (UDS) using three common LBVS approaches, with Leave-One-Out (LOO) Cross-Validation (CV) and a metric of average AUC of the ROC curves. Our method has greatly reduced the "artificial enrichment" and "analogue bias" of a published GPCRs benchmarking set, i.e., GPCR Ligand Library (GLL)/GPCR Decoy Database (GDD). In addition, we addressed an important issue about the ratio of decoys per ligand and found that for a range of 30 to 100 it does not affect the quality of the benchmarking set, so we kept the original ratio of 39 from the GLL/GDD.

  17. MeCorS: Metagenome-enabled error correction of single cell sequencing reads

    DOE PAGES

    Bremges, Andreas; Singer, Esther; Woyke, Tanja; ...

    2016-03-15

    Here we present a new tool, MeCorS, to correct chimeric reads and sequencing errors in Illumina data generated from single amplified genomes (SAGs). It uses sequence information derived from accompanying metagenome sequencing to accurately correct errors in SAG reads, even from ultra-low coverage regions. In evaluations on real data, we show that MeCorS outperforms BayesHammer, the most widely used state-of-the-art approach. MeCorS performs particularly well in correcting chimeric reads, which greatly improves both accuracy and contiguity of de novo SAG assemblies.

  18. Novel picornavirus associated with avian keratin disorder in Alaskan birds

    USGS Publications Warehouse

    Zylberberg, Maxine; Van Hemert, Caroline R.; Dumbacher, John P.; Handel, Colleen M.; Tihan, Tarik; DeRisi, Joseph L.

    2016-01-01

    Avian keratin disorder (AKD), characterized by debilitating overgrowth of the avian beak, was first documented in black-capped chickadees (Poecile atricapillus) in Alaska. Subsequently, similar deformities have appeared in numerous species across continents. Despite the widespread distribution of this emerging pathology, the cause of AKD remains elusive. As a result, it is unknown whether suspected cases of AKD in the afflicted species are causally linked, and the impacts of this pathology at the population and community levels are difficult to evaluate. We applied unbiased, metagenomic next-generation sequencing to search for candidate pathogens in birds affected with AKD. We identified and sequenced the complete coding region of a novel picornavirus, which we are calling poecivirus. Subsequent screening of 19 AKD-affected black-capped chickadees and 9 control individuals for the presence of poecivirus revealed that 19/19 (100%) AKD-affected individuals were positive, while only 2/9 (22%) control individuals were infected with poecivirus. Two northwestern crows (Corvus caurinus) and two red-breasted nuthatches (Sitta canadensis) with AKD-consistent pathology also tested positive for poecivirus. We suggest that poecivirus is a candidate etiological agent of AKD.

  19. Metagenome Assembly at the DOE JGI (Metagenomics Informatics Challenges Workshop: 10K Genomes at a Time)

    ScienceCinema

    Chain, Patrick

    2018-01-25

    Patrick Chain of DOE JGI at LANL, Co-Chair of the Metagenome-specific Assembly session, on Metagenome Assembly at the DOE JGIat the Metagenomics Informatics Challenges Workshop held at the DOE JGI on October 12-13, 2011.

  20. Rapid and efficient method to extract metagenomic DNA from estuarine sediments.

    PubMed

    Shamim, Kashif; Sharma, Jaya; Dubey, Santosh Kumar

    2017-07-01

    Metagenomic DNA from sediments of selective estuaries of Goa, India was extracted using a simple, fast, efficient and environment friendly method. The recovery of pure metagenomic DNA from our method was significantly high as compared to other well-known methods since the concentration of recovered metagenomic DNA ranged from 1185.1 to 4579.7 µg/g of sediment. The purity of metagenomic DNA was also considerably high as the ratio of absorbance at 260 and 280 nm ranged from 1.88 to 1.94. Therefore, the recovered metagenomic DNA was directly used to perform various molecular biology experiments viz. restriction digestion, PCR amplification, cloning and metagenomic library construction. This clearly proved that our protocol for metagenomic DNA extraction using silica gel efficiently removed the contaminants and prevented shearing of the metagenomic DNA. Thus, this modified method can be used to recover pure metagenomic DNA from various estuarine sediments in a rapid, efficient and eco-friendly manner.

  1. Developing a Bacteroides System for Function-Based Screening of DNA from the Human Gut Microbiome.

    PubMed

    Lam, Kathy N; Martens, Eric C; Charles, Trevor C

    2018-01-01

    Functional metagenomics is a powerful method that allows the isolation of genes whose role may not have been predicted from DNA sequence. In this approach, first, environmental DNA is cloned to generate metagenomic libraries that are maintained in Escherichia coli, and second, the cloned DNA is screened for activities of interest. Typically, functional screens are carried out using E. coli as a surrogate host, although there likely exist barriers to gene expression, such as lack of recognition of native promoters. Here, we describe efforts to develop Bacteroides thetaiotaomicron as a surrogate host for screening metagenomic DNA from the human gut. We construct a B. thetaiotaomicron-compatible fosmid cloning vector, generate a fosmid clone library using DNA from the human gut, and show successful functional complementation of a B. thetaiotaomicron glycan utilization mutant. Though we were unable to retrieve the physical fosmid after complementation, we used genome sequencing to identify the complementing genes derived from the human gut microbiome. Our results demonstrate that the use of B. thetaiotaomicron to express metagenomic DNA is promising, but they also exemplify the challenges that can be encountered in the development of new surrogate hosts for functional screening. IMPORTANCE Human gut microbiome research has been supported by advances in DNA sequencing that make it possible to obtain gigabases of sequence data from metagenomes but is limited by a lack of knowledge of gene function that leads to incomplete annotation of these data sets. There is a need for the development of methods that can provide experimental data regarding microbial gene function. Functional metagenomics is one such method, but functional screens are often carried out using hosts that may not be able to express the bulk of the environmental DNA being screened. We expand the range of current screening hosts and demonstrate that human gut-derived metagenomic libraries can be introduced into the gut microbe Bacteroides thetaiotaomicron to identify genes based on activity screening. Our results support the continuing development of genetically tractable systems to obtain information about gene function.

  2. Meta-Storms: efficient search for similar microbial communities based on a novel indexing scheme and similarity score for metagenomic data.

    PubMed

    Su, Xiaoquan; Xu, Jian; Ning, Kang

    2012-10-01

    It has long been intriguing scientists to effectively compare different microbial communities (also referred as 'metagenomic samples' here) in a large scale: given a set of unknown samples, find similar metagenomic samples from a large repository and examine how similar these samples are. With the current metagenomic samples accumulated, it is possible to build a database of metagenomic samples of interests. Any metagenomic samples could then be searched against this database to find the most similar metagenomic sample(s). However, on one hand, current databases with a large number of metagenomic samples mostly serve as data repositories that offer few functionalities for analysis; and on the other hand, methods to measure the similarity of metagenomic data work well only for small set of samples by pairwise comparison. It is not yet clear, how to efficiently search for metagenomic samples against a large metagenomic database. In this study, we have proposed a novel method, Meta-Storms, that could systematically and efficiently organize and search metagenomic data. It includes the following components: (i) creating a database of metagenomic samples based on their taxonomical annotations, (ii) efficient indexing of samples in the database based on a hierarchical taxonomy indexing strategy, (iii) searching for a metagenomic sample against the database by a fast scoring function based on quantitative phylogeny and (iv) managing database by index export, index import, data insertion, data deletion and database merging. We have collected more than 1300 metagenomic data from the public domain and in-house facilities, and tested the Meta-Storms method on these datasets. Our experimental results show that Meta-Storms is capable of database creation and effective searching for a large number of metagenomic samples, and it could achieve similar accuracies compared with the current popular significance testing-based methods. Meta-Storms method would serve as a suitable database management and search system to quickly identify similar metagenomic samples from a large pool of samples. ningkang@qibebt.ac.cn Supplementary data are available at Bioinformatics online.

  3. Minimum mean squared error (MSE) adjustment and the optimal Tykhonov-Phillips regularization parameter via reproducing best invariant quadratic uniformly unbiased estimates (repro-BIQUUE)

    NASA Astrophysics Data System (ADS)

    Schaffrin, Burkhard

    2008-02-01

    In a linear Gauss-Markov model, the parameter estimates from BLUUE (Best Linear Uniformly Unbiased Estimate) are not robust against possible outliers in the observations. Moreover, by giving up the unbiasedness constraint, the mean squared error (MSE) risk may be further reduced, in particular when the problem is ill-posed. In this paper, the α-weighted S-homBLE (Best homogeneously Linear Estimate) is derived via formulas originally used for variance component estimation on the basis of the repro-BIQUUE (reproducing Best Invariant Quadratic Uniformly Unbiased Estimate) principle in a model with stochastic prior information. In the present model, however, such prior information is not included, which allows the comparison of the stochastic approach (α-weighted S-homBLE) with the well-established algebraic approach of Tykhonov-Phillips regularization, also known as R-HAPS (Hybrid APproximation Solution), whenever the inverse of the “substitute matrix” S exists and is chosen as the R matrix that defines the relative impact of the regularizing term on the final result.

  4. Phylogenetically Novel LuxI/LuxR-Type Quorum Sensing Systems Isolated Using a Metagenomic Approach

    PubMed Central

    Nasuno, Eri; Fujita, Masaki J.; Nakatsu, Cindy H.; Kamagata, Yoichi; Hanada, Satoshi

    2012-01-01

    A great deal of research has been done to understand bacterial cell-to-cell signaling systems, but there is still a large gap in our current knowledge because the majority of microorganisms in natural environments do not have cultivated representatives. Metagenomics is one approach to identify novel quorum sensing (QS) systems from uncultured bacteria in environmental samples. In this study, fosmid metagenomic libraries were constructed from a forest soil and an activated sludge from a coke plant, and the target genes were detected using a green fluorescent protein (GFP)-based Escherichia coli biosensor strain whose fluorescence was screened by spectrophotometry. DNA sequence analysis revealed two pairs of new LuxI family N-acyl-l-homoserine lactone (AHL) synthases and LuxR family transcriptional regulators (clones N16 and N52, designated AubI/AubR and AusI/AusR, respectively). AubI and AusI each produced an identical AHL, N-dodecanoyl-l-homoserine lactone (C12-HSL), as determined by nuclear magnetic resonance (NMR) and mass spectrometry. Phylogenetic analysis based on amino acid sequences suggested that AusI/AusR was from an uncultured member of the Betaproteobacteria and AubI/AubR was very deeply branched from previously described LuxI/LuxR homologues in isolates of the Proteobacteria. The phylogenetic position of AubI/AubR indicates that they represent a QS system not acquired recently from the Proteobacteria by horizontal gene transfer but share a more ancient ancestry. We demonstrated that metagenomic screening is useful to provide further insight into the phylogenetic diversity of bacterial QS systems by describing two new LuxI/LuxR-type QS systems from uncultured bacteria. PMID:22983963

  5. Phylogenetic analysis of a spontaneous cocoa bean fermentation metagenome reveals new insights into its bacterial and fungal community diversity.

    PubMed

    Illeghems, Koen; De Vuyst, Luc; Papalexandratou, Zoi; Weckx, Stefan

    2012-01-01

    This is the first report on the phylogenetic analysis of the community diversity of a single spontaneous cocoa bean box fermentation sample through a metagenomic approach involving 454 pyrosequencing. Several sequence-based and composition-based taxonomic profiling tools were used and evaluated to avoid software-dependent results and their outcome was validated by comparison with previously obtained culture-dependent and culture-independent data. Overall, this approach revealed a wider bacterial (mainly γ-Proteobacteria) and fungal diversity than previously found. Further, the use of a combination of different classification methods, in a software-independent way, helped to understand the actual composition of the microbial ecosystem under study. In addition, bacteriophage-related sequences were found. The bacterial diversity depended partially on the methods used, as composition-based methods predicted a wider diversity than sequence-based methods, and as classification methods based solely on phylogenetic marker genes predicted a more restricted diversity compared with methods that took all reads into account. The metagenomic sequencing analysis identified Hanseniaspora uvarum, Hanseniaspora opuntiae, Saccharomyces cerevisiae, Lactobacillus fermentum, and Acetobacter pasteurianus as the prevailing species. Also, the presence of occasional members of the cocoa bean fermentation process was revealed (such as Erwinia tasmaniensis, Lactobacillus brevis, Lactobacillus casei, Lactobacillus rhamnosus, Lactococcus lactis, Leuconostoc mesenteroides, and Oenococcus oeni). Furthermore, the sequence reads associated with viral communities were of a restricted diversity, dominated by Myoviridae and Siphoviridae, and reflecting Lactobacillus as the dominant host. To conclude, an accurate overview of all members of a cocoa bean fermentation process sample was revealed, indicating the superiority of metagenomic sequencing over previously used techniques.

  6. Application of Metagenomic Sequencing to Food Safety: Detection of Shiga Toxin-Producing Escherichia coli on Fresh Bagged Spinach

    PubMed Central

    Leonard, Susan R.; Mammel, Mark K.; Lacher, David W.

    2015-01-01

    Culture-independent diagnostics reduce the reliance on traditional (and slower) culture-based methodologies. Here we capitalize on advances in next-generation sequencing (NGS) to apply this approach to food pathogen detection utilizing NGS as an analytical tool. In this study, spiking spinach with Shiga toxin-producing Escherichia coli (STEC) following an established FDA culture-based protocol was used in conjunction with shotgun metagenomic sequencing to determine the limits of detection, sensitivity, and specificity levels and to obtain information on the microbiology of the protocol. We show that an expected level of contamination (∼10 CFU/100 g) could be adequately detected (including key virulence determinants and strain-level specificity) within 8 h of enrichment at a sequencing depth of 10,000,000 reads. We also rationalize the relative benefit of static versus shaking culture conditions and the addition of selected antimicrobial agents, thereby validating the long-standing culture-based parameters behind such protocols. Moreover, the shotgun metagenomic approach was informative regarding the dynamics of microbial communities during the enrichment process, including initial surveys of the microbial loads associated with bagged spinach; the microbes found included key genera such as Pseudomonas, Pantoea, and Exiguobacterium. Collectively, our metagenomic study highlights and considers various parameters required for transitioning to such sequencing-based diagnostics for food safety and the potential to develop better enrichment processes in a high-throughput manner not previously possible. Future studies will investigate new species-specific DNA signature target regimens, rational design of medium components in concert with judicious use of additives, such as antibiotics, and alterations in the sample processing protocol to enhance detection. PMID:26386062

  7. Metagenomics analysis of microbial communities associated with a traditional rice wine starter culture (Xaj-pitha) of Assam, India.

    PubMed

    Bora, Sudipta Sankar; Keot, Jyotshna; Das, Saurav; Sarma, Kishore; Barooah, Madhumita

    2016-12-01

    This is the first report on the microbial diversity of xaj-pitha, a rice wine fermentation starter culture through a metagenomics approach involving Illumine-based whole genome shotgun (WGS) sequencing method. Metagenomic DNA was extracted from rice wine starter culture concocted by Ahom community of Assam and analyzed using a MiSeq ® System. A total of 2,78,231 contigs, with an average read length of 640.13 bp, were obtained. Data obtained from the use of several taxonomic profiling tools were compared with previously reported microbial diversity studies through the culture-dependent and culture-independent method. The microbial community revealed the existence of amylase producers, such as Rhizopus delemar, Mucor circinelloides, and Aspergillus sp. Ethanol producers viz., Meyerozyma guilliermondii, Wickerhamomyces ciferrii, Saccharomyces cerevisiae, Candida glabrata, Debaryomyces hansenii, Ogataea parapolymorpha, and Dekkera bruxellensis, were found associated with the starter culture along with a diverse range of opportunistic contaminants. The bacterial microflora was dominated by lactic acid bacteria (LAB). The most frequent occurring LAB was Lactobacillus plantarum, Lactobacillus brevis, Leuconostoc lactis, Weissella cibaria, Lactococcus lactis, Weissella para mesenteroides, Leuconostoc pseudomesenteroides, etc. Our study provided a comprehensive picture of microbial diversity associated with rice wine fermentation starter and indicated the superiority of metagenomic sequencing over previously used techniques.

  8. Comparison of microbial DNA enrichment tools for metagenomic whole genome sequencing.

    PubMed

    Thoendel, Matthew; Jeraldo, Patricio R; Greenwood-Quaintance, Kerryl E; Yao, Janet Z; Chia, Nicholas; Hanssen, Arlen D; Abdel, Matthew P; Patel, Robin

    2016-08-01

    Metagenomic whole genome sequencing for detection of pathogens in clinical samples is an exciting new area for discovery and clinical testing. A major barrier to this approach is the overwhelming ratio of human to pathogen DNA in samples with low pathogen abundance, which is typical of most clinical specimens. Microbial DNA enrichment methods offer the potential to relieve this limitation by improving this ratio. Two commercially available enrichment kits, the NEBNext Microbiome DNA Enrichment Kit and the Molzym MolYsis Basic kit, were tested for their ability to enrich for microbial DNA from resected arthroplasty component sonicate fluids from prosthetic joint infections or uninfected sonicate fluids spiked with Staphylococcus aureus. Using spiked uninfected sonicate fluid there was a 6-fold enrichment of bacterial DNA with the NEBNext kit and 76-fold enrichment with the MolYsis kit. Metagenomic whole genome sequencing of sonicate fluid revealed 13- to 85-fold enrichment of bacterial DNA using the NEBNext enrichment kit. The MolYsis approach achieved 481- to 9580-fold enrichment, resulting in 7 to 59% of sequencing reads being from the pathogens known to be present in the samples. These results demonstrate the usefulness of these tools when testing clinical samples with low microbial burden using next generation sequencing. Copyright © 2016 Elsevier B.V. All rights reserved.

  9. Constructing statistically unbiased cortical surface templates using feature-space covariance

    NASA Astrophysics Data System (ADS)

    Parvathaneni, Prasanna; Lyu, Ilwoo; Huo, Yuankai; Blaber, Justin; Hainline, Allison E.; Kang, Hakmook; Woodward, Neil D.; Landman, Bennett A.

    2018-03-01

    The choice of surface template plays an important role in cross-sectional subject analyses involving cortical brain surfaces because there is a tendency toward registration bias given variations in inter-individual and inter-group sulcal and gyral patterns. In order to account for the bias and spatial smoothing, we propose a feature-based unbiased average template surface. In contrast to prior approaches, we factor in the sample population covariance and assign weights based on feature information to minimize the influence of covariance in the sampled population. The mean surface is computed by applying the weights obtained from an inverse covariance matrix, which guarantees that multiple representations from similar groups (e.g., involving imaging, demographic, diagnosis information) are down-weighted to yield an unbiased mean in feature space. Results are validated by applying this approach in two different applications. For evaluation, the proposed unbiased weighted surface mean is compared with un-weighted means both qualitatively and quantitatively (mean squared error and absolute relative distance of both the means with baseline). In first application, we validated the stability of the proposed optimal mean on a scan-rescan reproducibility dataset by incrementally adding duplicate subjects. In the second application, we used clinical research data to evaluate the difference between the weighted and unweighted mean when different number of subjects were included in control versus schizophrenia groups. In both cases, the proposed method achieved greater stability that indicated reduced impacts of sampling bias. The weighted mean is built based on covariance information in feature space as opposed to spatial location, thus making this a generic approach to be applicable to any feature of interest.

  10. Modeling ecological drivers in marine viral communities using comparative metagenomics and network analyses.

    PubMed

    Hurwitz, Bonnie L; Westveld, Anton H; Brum, Jennifer R; Sullivan, Matthew B

    2014-07-22

    Long-standing questions in marine viral ecology are centered on understanding how viral assemblages change along gradients in space and time. However, investigating these fundamental ecological questions has been challenging due to incomplete representation of naturally occurring viral diversity in single gene- or morphology-based studies and an inability to identify up to 90% of reads in viral metagenomes (viromes). Although protein clustering techniques provide a significant advance by helping organize this unknown metagenomic sequence space, they typically use only ∼75% of the data and rely on assembly methods not yet tuned for naturally occurring sequence variation. Here, we introduce an annotation- and assembly-free strategy for comparative metagenomics that combines shared k-mer and social network analyses (regression modeling). This robust statistical framework enables visualization of complex sample networks and determination of ecological factors driving community structure. Application to 32 viromes from the Pacific Ocean Virome dataset identified clusters of samples broadly delineated by photic zone and revealed that geographic region, depth, and proximity to shore were significant predictors of community structure. Within subsets of this dataset, depth, season, and oxygen concentration were significant drivers of viral community structure at a single open ocean station, whereas variability along onshore-offshore transects was driven by oxygen concentration in an area with an oxygen minimum zone and not depth or proximity to shore, as might be expected. Together these results demonstrate that this highly scalable approach using complete metagenomic network-based comparisons can both test and generate hypotheses for ecological investigation of viral and microbial communities in nature.

  11. Modeling ecological drivers in marine viral communities using comparative metagenomics and network analyses

    PubMed Central

    Hurwitz, Bonnie L.; Westveld, Anton H.; Brum, Jennifer R.; Sullivan, Matthew B.

    2014-01-01

    Long-standing questions in marine viral ecology are centered on understanding how viral assemblages change along gradients in space and time. However, investigating these fundamental ecological questions has been challenging due to incomplete representation of naturally occurring viral diversity in single gene- or morphology-based studies and an inability to identify up to 90% of reads in viral metagenomes (viromes). Although protein clustering techniques provide a significant advance by helping organize this unknown metagenomic sequence space, they typically use only ∼75% of the data and rely on assembly methods not yet tuned for naturally occurring sequence variation. Here, we introduce an annotation- and assembly-free strategy for comparative metagenomics that combines shared k-mer and social network analyses (regression modeling). This robust statistical framework enables visualization of complex sample networks and determination of ecological factors driving community structure. Application to 32 viromes from the Pacific Ocean Virome dataset identified clusters of samples broadly delineated by photic zone and revealed that geographic region, depth, and proximity to shore were significant predictors of community structure. Within subsets of this dataset, depth, season, and oxygen concentration were significant drivers of viral community structure at a single open ocean station, whereas variability along onshore–offshore transects was driven by oxygen concentration in an area with an oxygen minimum zone and not depth or proximity to shore, as might be expected. Together these results demonstrate that this highly scalable approach using complete metagenomic network-based comparisons can both test and generate hypotheses for ecological investigation of viral and microbial communities in nature. PMID:25002514

  12. Discovery of novel enzymes with industrial potential from a cold and alkaline environment by a combination of functional metagenomics and culturing

    PubMed Central

    2014-01-01

    Background The use of cold-active enzymes has many advantages, including reduced energy consumption and easy inactivation. The ikaite columns of SW Greenland are permanently cold (4-6°C) and alkaline (above pH 10), and the microorganisms living there and their enzymes are adapted to these conditions. Since only a small fraction of the total microbial diversity can be cultured in the laboratory, a combined approach involving functional screening of a strain collection and a metagenomic library was undertaken for discovery of novel enzymes from the ikaite columns. Results A strain collection with 322 cultured isolates was screened for enzymatic activities identifying a large number of enzyme producers, with a high re-discovery rate to previously characterized strains. A functional expression library established in Escherichia coli identified a number of novel cold-active enzymes. Both α-amylases and β-galactosidases were characterized in more detail with respect to temperature and pH profiles and one of the β-galactosidases, BGalI17E2, was able to hydrolyze lactose at 5°C. A metagenome sequence of the expression library indicated that the majority of enzymatic activities were not detected by functional expression. Phylogenetic analysis showed that different bacterial communities were targeted with the culture dependent and independent approaches and revealed the bias of multiple displacement amplification (MDA) of DNA isolated from complex microbial communities. Conclusions Many cold- and/or alkaline-active enzymes of industrial relevance were identified in the culture based approach and the majority of the enzyme-producing isolates were closely related to previously characterized strains. The function-based metagenomic approach, on the other hand, identified several enzymes (β-galactosidases, α-amylases and a phosphatase) with low homology to known sequences that were easily expressed in the production host E. coli. The β-galactosidase BGalI17E2 was able to hydrolyze lactose at low temperature, suggesting a possibly use in the dairy industry for this enzyme. The two different approaches complemented each other by targeting different microbial communities, highlighting the usefulness of combining methods for bioprospecting. Finally, we document here that ikaite columns constitute an important source of cold- and/or alkaline-active enzymes with industrial application potential. PMID:24886068

  13. Discovery of novel enzymes with industrial potential from a cold and alkaline environment by a combination of functional metagenomics and culturing.

    PubMed

    Vester, Jan Kjølhede; Glaring, Mikkel Andreas; Stougaard, Peter

    2014-05-20

    The use of cold-active enzymes has many advantages, including reduced energy consumption and easy inactivation. The ikaite columns of SW Greenland are permanently cold (4-6°C) and alkaline (above pH 10), and the microorganisms living there and their enzymes are adapted to these conditions. Since only a small fraction of the total microbial diversity can be cultured in the laboratory, a combined approach involving functional screening of a strain collection and a metagenomic library was undertaken for discovery of novel enzymes from the ikaite columns. A strain collection with 322 cultured isolates was screened for enzymatic activities identifying a large number of enzyme producers, with a high re-discovery rate to previously characterized strains. A functional expression library established in Escherichia coli identified a number of novel cold-active enzymes. Both α-amylases and β-galactosidases were characterized in more detail with respect to temperature and pH profiles and one of the β-galactosidases, BGalI17E2, was able to hydrolyze lactose at 5°C. A metagenome sequence of the expression library indicated that the majority of enzymatic activities were not detected by functional expression. Phylogenetic analysis showed that different bacterial communities were targeted with the culture dependent and independent approaches and revealed the bias of multiple displacement amplification (MDA) of DNA isolated from complex microbial communities. Many cold- and/or alkaline-active enzymes of industrial relevance were identified in the culture based approach and the majority of the enzyme-producing isolates were closely related to previously characterized strains. The function-based metagenomic approach, on the other hand, identified several enzymes (β-galactosidases, α-amylases and a phosphatase) with low homology to known sequences that were easily expressed in the production host E. coli. The β-galactosidase BGalI17E2 was able to hydrolyze lactose at low temperature, suggesting a possibly use in the dairy industry for this enzyme. The two different approaches complemented each other by targeting different microbial communities, highlighting the usefulness of combining methods for bioprospecting. Finally, we document here that ikaite columns constitute an important source of cold- and/or alkaline-active enzymes with industrial application potential.

  14. International Standards for Genomes, Transcriptomes, and Metagenomes

    PubMed Central

    Mason, Christopher E.; Afshinnekoo, Ebrahim; Tighe, Scott; Wu, Shixiu; Levy, Shawn

    2017-01-01

    Challenges and biases in preparing, characterizing, and sequencing DNA and RNA can have significant impacts on research in genomics across all kingdoms of life, including experiments in single-cells, RNA profiling, and metagenomics (across multiple genomes). Technical artifacts and contamination can arise at each point of sample manipulation, extraction, sequencing, and analysis. Thus, the measurement and benchmarking of these potential sources of error are of paramount importance as next-generation sequencing (NGS) projects become more global and ubiquitous. Fortunately, a variety of methods, standards, and technologies have recently emerged that improve measurements in genomics and sequencing, from the initial input material to the computational pipelines that process and annotate the data. Here we review current standards and their applications in genomics, including whole genomes, transcriptomes, mixed genomic samples (metagenomes), and the modified bases within each (epigenomes and epitranscriptomes). These standards, tools, and metrics are critical for quantifying the accuracy of NGS methods, which will be essential for robust approaches in clinical genomics and precision medicine. PMID:28337071

  15. Enhanced arbovirus surveillance with deep sequencing: Identification of novel rhabdoviruses and bunyaviruses in Australian mosquitoes.

    PubMed

    Coffey, Lark L; Page, Brady L; Greninger, Alexander L; Herring, Belinda L; Russell, Richard C; Doggett, Stephen L; Haniotis, John; Wang, Chunlin; Deng, Xutao; Delwart, Eric L

    2014-01-05

    Viral metagenomics characterizes known and identifies unknown viruses based on sequence similarities to any previously sequenced viral genomes. A metagenomics approach was used to identify virus sequences in Australian mosquitoes causing cytopathic effects in inoculated mammalian cell cultures. Sequence comparisons revealed strains of Liao Ning virus (Reovirus, Seadornavirus), previously detected only in China, livestock-infecting Stretch Lagoon virus (Reovirus, Orbivirus), two novel dimarhabdoviruses, named Beaumont and North Creek viruses, and two novel orthobunyaviruses, named Murrumbidgee and Salt Ash viruses. The novel virus proteomes diverged by ≥ 50% relative to their closest previously genetically characterized viral relatives. Deep sequencing also generated genomes of Warrego and Wallal viruses, orbiviruses linked to kangaroo blindness, whose genomes had not been fully characterized. This study highlights viral metagenomics in concert with traditional arbovirus surveillance to characterize known and new arboviruses in field-collected mosquitoes. Follow-up epidemiological studies are required to determine whether the novel viruses infect humans. © 2013 Elsevier Inc. All rights reserved.

  16. Dip in the gene pool: metagenomic survey of natural coccolithovirus communities.

    PubMed

    Pagarete, António; Kusonmano, Kanthida; Petersen, Kjell; Kimmance, Susan A; Martínez Martínez, Joaquín; Wilson, William H; Hehemann, Jan-Hendrik; Allen, Michael J; Sandaa, Ruth-Anne

    2014-10-01

    Despite the global oceanic distribution and recognised biogeochemical impact of coccolithoviruses (EhV), their diversity remains poorly understood. Here we employed a metagenomic approach to study the occurrence and progression of natural EhV community genomic variability. Analysis of EhV metagenomes from the early and late stages of an induced bloom led to three main discoveries. First, we observed resilient and specific genomic signatures in the EhV community associated with the Norwegian coast, which reinforce the existence of limitations to the capacity of dispersal and genomic exchange among EhV populations. Second, we identified a hyper-variable region (approximately 21kbp long) in the coccolithovirus genome. Third, we observed a clear trend for EhV relative amino-acid diversity to reduce from early to late stages of the bloom. This study validated two new methodological combinations, and proved very useful in the discovery of new genomic features associated with coccolithovirus natural communities. Copyright © 2014 Elsevier Inc. All rights reserved.

  17. Quasi-metagenomics and realtime sequencing aided detection and subtyping of Salmonella enterica from food samples.

    PubMed

    Hyeon, Ji-Yeon; Li, Shaoting; Mann, David A; Zhang, Shaokang; Li, Zhen; Chen, Yi; Deng, Xiangyu

    2017-12-01

    Metagenomics analysis of food samples promises isolation-independent detection and subtyping of foodborne bacterial pathogens in a single workflow. Selective concentration of Salmonella genomic DNA through immunomagnetic separation (IMS) and multiple displacement amplification (MDA) were shown to shorten culture enrichment of Salmonella -spiked raw chicken breast samples by over 12 hours while permitting serotyping and high-fidelity single nucleotide polymorphisms (SNP) typing of the pathogen using short shotgun sequencing reads. The herein termed quasi-metagenomics approach was evaluated on Salmonella -spiked lettuce and black peppercorn samples as well as retail chicken parts naturally contaminated with different serotypes of Salmonella. Between 8 and 24 h culture enrichment was required for detecting and subtyping naturally occurring Salmonella from unspiked chicken parts compared with 4 to 12 h culture enrichment when Salmonella -spiked food samples were analyzed, indicating the likely need for longer culture enrichment to revive low levels of stressed or injured Salmonella cells in food. Further acceleration of the workflow was achieved by real-time nanopore sequencing. After 1.5 hours of analysis on a potable sequencer, sufficient data were generated from sequencing IMS-MDA product of a cultured-enriched lettuce sample to allow serotyping and robust phylogenetic placement of the inoculated isolate. Importance Both culture enrichment and next-generation sequencing remain to be time-consuming processes for food testing where rapid methods for pathogen detection are widely available. Our study demonstrated substantial acceleration of the respective process through IMS-MDA and real-time nanopore sequencing. In one example, the combined use of the two methods delivered a less than 24 h turnaround time from a Salmonella -contaminated lettuce sample to phylogenetic identification of the pathogen. Improved efficiency like this is important for further expanding the use of whole genome and metagenomics sequencing in microbial analysis of food. Our results suggest the potential of the quasi-metagenomics approach in areas where rapid detection and subtyping of foodborne pathogens is important, such as foodborne outbreak response and precision tracking and monitoring of foodborne pathogens in production environments and supply chains. Copyright © 2017 American Society for Microbiology.

  18. The metagenomic data life-cycle: standards and best practices

    PubMed Central

    ten Hoopen, Petra; Finn, Robert D.; Bongo, Lars Ailo; Corre, Erwan; Meyer, Folker; Mitchell, Alex; Pelletier, Eric; Pesole, Graziano; Santamaria, Monica; Willassen, Nils Peder

    2017-01-01

    Abstract Metagenomics data analyses from independent studies can only be compared if the analysis workflows are described in a harmonized way. In this overview, we have mapped the landscape of data standards available for the description of essential steps in metagenomics: (i) material sampling, (ii) material sequencing, (iii) data analysis, and (iv) data archiving and publishing. Taking examples from marine research, we summarize essential variables used to describe material sampling processes and sequencing procedures in a metagenomics experiment. These aspects of metagenomics dataset generation have been to some extent addressed by the scientific community, but greater awareness and adoption is still needed. We emphasize the lack of standards relating to reporting how metagenomics datasets are analysed and how the metagenomics data analysis outputs should be archived and published. We propose best practice as a foundation for a community standard to enable reproducibility and better sharing of metagenomics datasets, leading ultimately to greater metagenomics data reuse and repurposing. PMID:28637310

  19. The metagenomic data life-cycle: standards and best practices

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    ten Hoopen, Petra; Finn, Robert D.; Bongo, Lars Ailo

    Metagenomics data analyses from independent studies can only be compared if the analysis workflows are described in a harmonised way. In this overview, we have mapped the landscape of data standards available for the description of essential steps in metagenomics: (1) material sampling, (2) material sequencing (3) data analysis and (4) data archiving & publishing. Taking examples from marine research, we summarise essential variables used to describe material sampling processes and sequencing procedures in a metagenomics experiment. These aspects of metagenomics dataset generation have been to some extent addressed by the scientific community but greater awareness and adoption is stillmore » needed. We emphasise the lack of standards relating to reporting how metagenomics datasets are analysed and how the metagenomics data analysis outputs should be archived and published. We propose best practice as a foundation for a community standard to enable reproducibility and better sharing of metagenomics datasets, leading ultimately to greater metagenomics data reuse and repurposing.« less

  20. Metagenomic insights into the ecology and physiology of microbes in bioelectrochemical systems.

    PubMed

    Kouzuma, Atsushi; Ishii, Shun'ichi; Watanabe, Kazuya

    2018-05-01

    In bioelectrochemical systems (BESs), electrons are transferred between electrochemically active microbes (EAMs) and conductive materials, such as electrodes, via extracellular electron transfer (EET) pathways, and electrons thus transferred stimulate intracellular catabolic reactions. Catabolic and EET pathways have extensively been studied for several model EAMs, such as Shewanella oneidensis MR-1 and Geobacter sulfurreducens PCA, whereas it is also important to understand the ecophysiology of EAMs in naturally occurring microbiomes, such as those in anode biofilms in microbial fuel cells treating wastewater. Recent studies have exploited metagenomics and metatranscriptomics (meta-omics) approaches to characterize EAMs in BES-associated microbiomes. Here we review recent BES studies that used meta-omics approaches and show that these studies have discovered unexpected features of EAMs and deepened our understanding of functions and behaviors of microbes in BESs. It is desired that more studies will employ meta-omics approaches for advancing our knowledge on microbes in BESs. Copyright © 2018 Elsevier Ltd. All rights reserved.

  1. Culture-Independent Identification of Manganese-Oxidizing Genes from Deep-Sea Hydrothermal Vent Chemoautotrophic Ferromanganese Microbial Communities Using a Metagenomic Approach

    NASA Astrophysics Data System (ADS)

    Davis, R.; Tebo, B. M.

    2013-12-01

    Microbial activity has long been recognized as being important to the fate of manganese (Mn) in hydrothermal systems, yet we know very little about the organisms that catalyze Mn oxidation, the mechanisms by which Mn is oxidized or the physiological function that Mn oxidation serves in these hydrothermal systems. Hydrothermal vents with thick ferromanganese microbial mats and Mn oxide-coated rocks observed throughout the Pacific Ring of Fire are ideal models to study the mechanisms of microbial Mn oxidation, as well as primary productivity in these metal-cycling ecosystems. We sampled ferromanganese microbial mats from Vai Lili Vent Field (Tmax=43°C) located on the Eastern Lau Spreading Center and Mn oxide-encrusted rhyolytic pumice (4°C) from Niua South Seamount on the Tonga Volcanic Arc. Metagenomic libraries were constructed and assembled from these samples and key genes known to be involved in Mn oxidation and carbon fixation pathways were identified in the reconstructed genomes. The Vai Lili metagenome assembled to form 121,157 contiguous sequences (contigs) greater than 1000bp in length, with an N50 of 8,261bp and a total metagenome size of 593 Mbp. Contigs were binned using an emergent self-organizing map of tetranucleotide frequencies. Putative homologs of the multicopper Mn-oxidase MnxG were found in the metagenome that were related to both the Pseudomonas-like and Bacillus-like forms of the enzyme. The bins containing the Pseudomonas-like mnxG genes are most closely related to uncultured Deltaproteobacteria and Chloroflexi. The Deltaproteobacteria bin appears to be an obligate anaerobe with possible chemoautotrophic metabolisms, while the Chloroflexi appears to be a heterotrophic organism. The metagenome from the Mn-stained pumice was assembled into 122,092 contigs greater than 1000bp in length with an N50 of 7635 and a metagenome size of 385 Mbp. Both forms of mnxG genes are present in this metagenome as well as the genes encoding the putative Mn oxidases McoA and MopA. The greater diversity of Mn oxidase pathways in this metagenome suggests a more diverse Mn oxidizing microbial community in the cold pumice sample. Key enzymes for four of the six known carbon fixation pathways (the Calvin Cycle, the reductive TCA cycle, the Wood-Ljungdahl pathway, and the 3-hydroxypropionate/4-hydroxybutyrate Cycle) were also identified in both samples indicating primary production occurs via a diverse community of carbon fixing organisms. Together, these samples contain active, diverse populations of Mn oxidizing bacteria living in association with microbial communities supported by chemoautotrophic carbon fixation.

  2. Metagenomic sequence of saline desert microbiota from wild ass sanctuary, Little Rann of Kutch, Gujarat, India.

    PubMed

    Patel, Rajesh; Mevada, Vishal; Prajapati, Dhaval; Dudhagara, Pravin; Koringa, Prakash; Joshi, C G

    2015-03-01

    We report Metagenome from the saline desert soil sample of Little Rann of Kutch, Gujarat State, India. Metagenome consisted of 633,760 sequences with size 141,307,202 bp and 56% G + C content. Metagenome sequence data are available at EBI under EBI Metagenomics database with accession no. ERP005612. Community metagenomics revealed total 1802 species belonged to 43 different phyla with dominating Marinobacter (48.7%) and Halobacterium (4.6%) genus in bacterial and archaeal domain respectively. Remarkably, 18.2% sequences in a poorly characterized group and 4% gene for various stress responses along with versatile presence of commercial enzyme were evident in a functional metagenome analysis.

  3. Single Cell and Metagenomic Assemblies: Biology Drives Technical Choices and Goals (Metagenomics Informatics Challenges Workshop: 10K Genomes at a Time)

    ScienceCinema

    Stepanauskas, Ramunas

    2018-02-06

    DOE JGI's Tanja Woyke, chair of the Single Cells and Metagenomes session, delivers an introduction, followed by Bigelow Laboratory's Ramunas Stepanauskas on "Single Cell and Metagenomic Assemblies: Biology Drives Technical Choices and Goals" at the Metagenomics Informatics Challenges Workshop held at the DOE JGI on October 12-13, 2011.

  4. Block-circulant matrices with circulant blocks, Weil sums, and mutually unbiased bases. II. The prime power case

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Combescure, Monique

    2009-03-15

    In our previous paper [Combescure, M., 'Circulant matrices, Gauss sums and the mutually unbiased bases. I. The prime number case', Cubo A Mathematical Journal (unpublished)] we have shown that the theory of circulant matrices allows to recover the result that there exists p+1 mutually unbiased bases in dimension p, p being an arbitrary prime number. Two orthonormal bases B, B{sup '} of C{sup d} are said mutually unbiased if for all b(set-membership sign)B, for all b{sup '}(set-membership sign)B{sup '} one has that |b{center_dot}b{sup '}|=1/{radical}(d) (b{center_dot}b{sup '} Hermitian scalar product in C{sup d}). In this paper we show that the theorymore » of block-circulant matrices with circulant blocks allows to show very simply the known result that if d=p{sup n} (p a prime number and n any integer) there exists d+1 mutually unbiased bases in C{sup d}. Our result relies heavily on an idea of Klimov et al. [''Geometrical approach to the discrete Wigner function,'' J. Phys. A 39, 14471 (2006)]. As a subproduct we recover properties of quadratic Weil sums for p{>=}3, which generalizes the fact that in the prime case the quadratic Gauss sum properties follow from our results.« less

  5. Epidemiologic Evaluation of Measurement Data in the Presence of Detection Limits

    PubMed Central

    Lubin, Jay H.; Colt, Joanne S.; Camann, David; Davis, Scott; Cerhan, James R.; Severson, Richard K.; Bernstein, Leslie; Hartge, Patricia

    2004-01-01

    Quantitative measurements of environmental factors greatly improve the quality of epidemiologic studies but can pose challenges because of the presence of upper or lower detection limits or interfering compounds, which do not allow for precise measured values. We consider the regression of an environmental measurement (dependent variable) on several covariates (independent variables). Various strategies are commonly employed to impute values for interval-measured data, including assignment of one-half the detection limit to nondetected values or of “fill-in” values randomly selected from an appropriate distribution. On the basis of a limited simulation study, we found that the former approach can be biased unless the percentage of measurements below detection limits is small (5–10%). The fill-in approach generally produces unbiased parameter estimates but may produce biased variance estimates and thereby distort inference when 30% or more of the data are below detection limits. Truncated data methods (e.g., Tobit regression) and multiple imputation offer two unbiased approaches for analyzing measurement data with detection limits. If interest resides solely on regression parameters, then Tobit regression can be used. If individualized values for measurements below detection limits are needed for additional analysis, such as relative risk regression or graphical display, then multiple imputation produces unbiased estimates and nominal confidence intervals unless the proportion of missing data is extreme. We illustrate various approaches using measurements of pesticide residues in carpet dust in control subjects from a case–control study of non-Hodgkin lymphoma. PMID:15579415

  6. A multi-substrate approach for functional metagenomics-based screening for (hemi)cellulases in two wheat straw-degrading microbial consortia unveils novel thermoalkaliphilic enzymes.

    PubMed

    Maruthamuthu, Mukil; Jiménez, Diego Javier; Stevens, Patricia; van Elsas, Jan Dirk

    2016-01-28

    Functional metagenomics is a promising strategy for the exploration of the biocatalytic potential of microbiomes in order to uncover novel enzymes for industrial processes (e.g. biorefining or bleaching pulp). Most current methodologies used to screen for enzymes involved in plant biomass degradation are based on the use of single substrates. Moreover, highly diverse environments are used as metagenomic sources. However, such methods suffer from low hit rates of positive clones and hence the discovery of novel enzymatic activities from metagenomes has been hampered. Here, we constructed fosmid libraries from two wheat straw-degrading microbial consortia, denoted RWS (bred on untreated wheat straw) and TWS (bred on heat-treated wheat straw). Approximately 22,000 clones from each library were screened for (hemi)cellulose-degrading enzymes using a multi-chromogenic substrate approach. The screens yielded 71 positive clones for both libraries, giving hit rates of 1:440 and 1:1,047 for RWS and TWS, respectively. Seven clones (NT2-2, T5-5, NT18-17, T4-1, 10BT, NT18-21 and T17-2) were selected for sequence analyses. Their inserts revealed the presence of 18 genes encoding enzymes belonging to twelve different glycosyl hydrolase families (GH2, GH3, GH13, GH17, GH20, GH27, GH32, GH39, GH53, GH58, GH65 and GH109). These encompassed several carbohydrate-active gene clusters traceable mainly to Klebsiella related species. Detailed functional analyses showed that clone NT2-2 (containing a beta-galactosidase of ~116 kDa) had highest enzymatic activity at 55 °C and pH 9.0. Additionally, clone T5-5 (containing a beta-xylosidase of ~86 kDa) showed > 90% of enzymatic activity at 55 °C and pH 10.0. This study employed a high-throughput method for rapid screening of fosmid metagenomic libraries for (hemi)cellulose-degrading enzymes. The approach, consisting of screens on multi-substrates coupled to further analyses, revealed high hit rates, as compared with recent other studies. Two clones, 10BT and T4-1, required the presence of multiple substrates for detectable activity, indicating a new avenue in library activity screening. Finally, clones NT2-2, T5-5 and NT18-17 were found to encode putative novel thermo-alkaline enzymes, which could represent a starting point for further biotechnological applications.

  7. BioFuelDB: a database and prediction server of enzymes involved in biofuels production.

    PubMed

    Chaudhary, Nikhil; Gupta, Ankit; Gupta, Sudheer; Sharma, Vineet K

    2017-01-01

    In light of the rapid decrease in fossils fuel reserves and an increasing demand for energy, novel methods are required to explore alternative biofuel production processes to alleviate these pressures. A wide variety of molecules which can either be used as biofuels or as biofuel precursors are produced using microbial enzymes. However, the common challenges in the industrial implementation of enzyme catalysis for biofuel production are the unavailability of a comprehensive biofuel enzyme resource, low efficiency of known enzymes, and limited availability of enzymes which can function under extreme conditions in the industrial processes. We have developed a comprehensive database of known enzymes with proven or potential applications in biofuel production through text mining of PubMed abstracts and other publicly available information. A total of 131 enzymes with a role in biofuel production were identified and classified into six enzyme classes and four broad application categories namely 'Alcohol production', 'Biodiesel production', 'Fuel Cell' and 'Alternate biofuels'. A prediction tool 'Benz' was developed to identify and classify novel homologues of the known biofuel enzyme sequences from sequenced genomes and metagenomes. 'Benz' employs a hybrid approach incorporating HMMER 3.0 and RAPSearch2 programs to provide high accuracy and high speed for prediction. Using the Benz tool, 153,754 novel homologues of biofuel enzymes were identified from 23 diverse metagenomic sources. The comprehensive data of curated biofuel enzymes, their novel homologs identified from diverse metagenomes, and the hybrid prediction tool Benz are presented as a web server which can be used for the prediction of biofuel enzymes from genomic and metagenomic datasets. The database and the Benz tool is publicly available at http://metabiosys.iiserb.ac.in/biofueldb& http://metagenomics.iiserb.ac.in/biofueldb.

  8. Phylogenetic characterization of a biogas plant microbial community integrating clone library 16S-rDNA sequences and metagenome sequence data obtained by 454-pyrosequencing.

    PubMed

    Kröber, Magdalena; Bekel, Thomas; Diaz, Naryttza N; Goesmann, Alexander; Jaenicke, Sebastian; Krause, Lutz; Miller, Dimitri; Runte, Kai J; Viehöver, Prisca; Pühler, Alfred; Schlüter, Andreas

    2009-06-01

    The phylogenetic structure of the microbial community residing in a fermentation sample from a production-scale biogas plant fed with maize silage, green rye and liquid manure was analysed by an integrated approach using clone library sequences and metagenome sequence data obtained by 454-pyrosequencing. Sequencing of 109 clones from a bacterial and an archaeal 16S-rDNA amplicon library revealed that the obtained nucleotide sequences are similar but not identical to 16S-rDNA database sequences derived from different anaerobic environments including digestors and bioreactors. Most of the bacterial 16S-rDNA sequences could be assigned to the phylum Firmicutes with the most abundant class Clostridia and to the class Bacteroidetes, whereas most archaeal 16S-rDNA sequences cluster close to the methanogen Methanoculleus bourgensis. Further sequences of the archaeal library most probably represent so far non-characterised species within the genus Methanoculleus. A similar result derived from phylogenetic analysis of mcrA clone sequences. The mcrA gene product encodes the alpha-subunit of methyl-coenzyme-M reductase involved in the final step of methanogenesis. BLASTn analysis applying stringent settings resulted in assignment of 16S-rDNA metagenome sequence reads to 62 16S-rDNA amplicon sequences thus enabling frequency of abundance estimations for 16S-rDNA clone library sequences. Ribosomal Database Project (RDP) Classifier processing of metagenome 16S-rDNA reads revealed abundance of the phyla Firmicutes, Bacteroidetes and Euryarchaeota and the orders Clostridiales, Bacteroidales and Methanomicrobiales. Moreover, a large fraction of 16S-rDNA metagenome reads could not be assigned to lower taxonomic ranks, demonstrating that numerous microorganisms in the analysed fermentation sample of the biogas plant are still unclassified or unknown.

  9. A better sequence-read simulator program for metagenomics.

    PubMed

    Johnson, Stephen; Trost, Brett; Long, Jeffrey R; Pittet, Vanessa; Kusalik, Anthony

    2014-01-01

    There are many programs available for generating simulated whole-genome shotgun sequence reads. The data generated by many of these programs follow predefined models, which limits their use to the authors' original intentions. For example, many models assume that read lengths follow a uniform or normal distribution. Other programs generate models from actual sequencing data, but are limited to reads from single-genome studies. To our knowledge, there are no programs that allow a user to generate simulated data following non-parametric read-length distributions and quality profiles based on empirically-derived information from metagenomics sequencing data. We present BEAR (Better Emulation for Artificial Reads), a program that uses a machine-learning approach to generate reads with lengths and quality values that closely match empirically-derived distributions. BEAR can emulate reads from various sequencing platforms, including Illumina, 454, and Ion Torrent. BEAR requires minimal user input, as it automatically determines appropriate parameter settings from user-supplied data. BEAR also uses a unique method for deriving run-specific error rates, and extracts useful statistics from the metagenomic data itself, such as quality-error models. Many existing simulators are specific to a particular sequencing technology; however, BEAR is not restricted in this way. Because of its flexibility, BEAR is particularly useful for emulating the behaviour of technologies like Ion Torrent, for which no dedicated sequencing simulators are currently available. BEAR is also the first metagenomic sequencing simulator program that automates the process of generating abundances, which can be an arduous task. BEAR is useful for evaluating data processing tools in genomics. It has many advantages over existing comparable software, such as generating more realistic reads and being independent of sequencing technology, and has features particularly useful for metagenomics work.

  10. Microbiota composition, gene pool and its expression in Gir cattle (Bos indicus) rumen under different forage diets using metagenomic and metatranscriptomic approaches.

    PubMed

    Pandit, Ramesh J; Hinsu, Ankit T; Patel, Shriram H; Jakhesara, Subhash J; Koringa, Prakash G; Bruno, Fosso; Psifidi, Androniki; Shah, S V; Joshi, Chaitanya G

    2018-03-09

    Zebu (Bos indicus) is a domestic cattle species originating from the Indian subcontinent and now widely domesticated on several continents. In this study, we were particularly interested in understanding the functionally active rumen microbiota of an important Zebu breed, the Gir, under different dietary regimes. Metagenomic and metatranscriptomic data were compared at various taxonomic levels to elucidate the differential microbial population and its functional dynamics in Gir cattle rumen under different roughage dietary regimes. Different proportions of roughage rather than the type of roughage (dry or green) modulated microbiome composition and the expression of its gene pool. Fibre degrading bacteria (i.e. Clostridium, Ruminococcus, Eubacterium, Butyrivibrio, Bacillus and Roseburia) were higher in the solid fraction of rumen (P<0.01) compared to the liquid fraction, whereas bacteria considered to be utilizers of the degraded product (i.e. Prevotella, Bacteroides, Parabacteroides, Paludibacter and Victivallis) were dominant in the liquid fraction (P<0.05). Likewise, expression of fibre degrading enzymes and related carbohydrate binding modules (CBMs) occurred in the solid fraction. When metagenomic and metatranscriptomic data were compared, it was found that some genera and species were transcriptionally more active, although they were in low abundance, making an important contribution to fibre degradation and its further metabolism in the rumen. This study also identified some of the transcriptionally active genera, such as Caldicellulosiruptor and Paludibacter, whose potential has been less-explored in rumen. Overall, the comparison of metagenomic shotgun and metatranscriptomic sequencing appeared to be a much richer source of information compared to conventional metagenomic analysis. Copyright © 2018 Elsevier GmbH. All rights reserved.

  11. Exploring neighborhoods in the metagenome universe.

    PubMed

    Aßhauer, Kathrin P; Klingenberg, Heiner; Lingner, Thomas; Meinicke, Peter

    2014-07-14

    The variety of metagenomes in current databases provides a rapidly growing source of information for comparative studies. However, the quantity and quality of supplementary metadata is still lagging behind. It is therefore important to be able to identify related metagenomes by means of the available sequence data alone. We have studied efficient sequence-based methods for large-scale identification of similar metagenomes within a database retrieval context. In a broad comparison of different profiling methods we found that vector-based distance measures are well-suitable for the detection of metagenomic neighbors. Our evaluation on more than 1700 publicly available metagenomes indicates that for a query metagenome from a particular habitat on average nine out of ten nearest neighbors represent the same habitat category independent of the utilized profiling method or distance measure. While for well-defined labels a neighborhood accuracy of 100% can be achieved, in general the neighbor detection is severely affected by a natural overlap of manually annotated categories. In addition, we present results of a novel visualization method that is able to reflect the similarity of metagenomes in a 2D scatter plot. The visualization method shows a similarly high accuracy in the reduced space as compared with the high-dimensional profile space. Our study suggests that for inspection of metagenome neighborhoods the profiling methods and distance measures can be chosen to provide a convenient interpretation of results in terms of the underlying features. Furthermore, supplementary metadata of metagenome samples in the future needs to comply with readily available ontologies for fine-grained and standardized annotation. To make profile-based k-nearest-neighbor search and the 2D-visualization of the metagenome universe available to the research community, we included the proposed methods in our CoMet-Universe server for comparative metagenome analysis.

  12. Exploring Neighborhoods in the Metagenome Universe

    PubMed Central

    Aßhauer, Kathrin P.; Klingenberg, Heiner; Lingner, Thomas; Meinicke, Peter

    2014-01-01

    The variety of metagenomes in current databases provides a rapidly growing source of information for comparative studies. However, the quantity and quality of supplementary metadata is still lagging behind. It is therefore important to be able to identify related metagenomes by means of the available sequence data alone. We have studied efficient sequence-based methods for large-scale identification of similar metagenomes within a database retrieval context. In a broad comparison of different profiling methods we found that vector-based distance measures are well-suitable for the detection of metagenomic neighbors. Our evaluation on more than 1700 publicly available metagenomes indicates that for a query metagenome from a particular habitat on average nine out of ten nearest neighbors represent the same habitat category independent of the utilized profiling method or distance measure. While for well-defined labels a neighborhood accuracy of 100% can be achieved, in general the neighbor detection is severely affected by a natural overlap of manually annotated categories. In addition, we present results of a novel visualization method that is able to reflect the similarity of metagenomes in a 2D scatter plot. The visualization method shows a similarly high accuracy in the reduced space as compared with the high-dimensional profile space. Our study suggests that for inspection of metagenome neighborhoods the profiling methods and distance measures can be chosen to provide a convenient interpretation of results in terms of the underlying features. Furthermore, supplementary metadata of metagenome samples in the future needs to comply with readily available ontologies for fine-grained and standardized annotation. To make profile-based k-nearest-neighbor search and the 2D-visualization of the metagenome universe available to the research community, we included the proposed methods in our CoMet-Universe server for comparative metagenome analysis. PMID:25026170

  13. Unbiased approaches to biomarker discovery in neurodegenerative diseases

    PubMed Central

    Chen-Plotkin, Alice S.

    2014-01-01

    Neurodegenerative diseases such as Alzheimer’s disease, Parkinson’s disease, amyotrophic lateral sclerosis, and frontotemporal dementia have several important features in common. They are progressive, they affect a relatively inaccessible organ, and we have no disease-modifying therapies for them. For these brain-based diseases, current diagnosis and evaluation of disease severity rely almost entirely on clinical examination, which may only be a rough approximation of disease state. Thus, the development of biomarkers – objective, relatively easily measured and precise indicators of pathogenic processes – could improve patient care and accelerate therapeutic discovery. Yet existing, rigorously tested neurodegenerative disease biomarkers are few, and even fewer biomarkers have translated into clinical use. To find new biomarkers for these diseases, an unbiased, high-throughput screening approach may be needed. In this review, I will describe the potential utility of such an approach to biomarker discovery, using Parkinson’s disease as a case example. PMID:25442938

  14. HORSE SPECIES SYMPOSIUM: Canine intestinal microbiology and metagenomics: From phylogeny to function.

    PubMed

    Guard, B C; Suchodolski, J S

    2016-06-01

    Recent molecular studies have revealed a complex microbiota in the dog intestine. Convincing evidence has been reported linking changes in microbial communities to acute and chronic gastrointestinal inflammation, especially in canine inflammatory bowel disease (IBD). The most common microbial changes observed in intestinal inflammation are decreases in the bacterial phyla Firmicutes (i.e., Lachnospiraceae, Ruminococcaceae, and ) and Bacteroidetes, with concurrent increases in Proteobacteria (i.e., ). Due to the important role of microbial-derived metabolites for host health, it is important to elucidate the metabolic consequences of gastrointestinal dysbiosis and physiological pathways implicated in specific disease phenotypes. Metagenomic studies have used shotgun sequencing of DNA as well as phylogenetic investigation of communities by reconstruction of unobserved states (PICRUSt) to characterize functional changes in the bacterial metagenome in gastrointestinal disease. Furthermore, wide-scale and untargeted measurements of metabolic products derived by the host and the microbiota in intestinal samples allow a better understanding of the functional alterations that occur in gastrointestinal disease. For example, changes in bile acid metabolism and tryptophan catabolism recently have been reported in humans and dogs. Also, metabolites associated with the pentose phosphate pathway were significantly altered in chronic gastrointestinal inflammation and indicate the presence of oxidative stress in dogs with IBD. This review focuses on the advancements made in canine metagenomics and metabolomics and their implications in understanding gastrointestinal disease as well as the development of better treatment approaches.

  15. A potential source for cellulolytic enzyme discovery and environmental aspects revealed through metagenomics of Brazilian mangroves

    PubMed Central

    2013-01-01

    The mangroves are among the most productive and biologically important environments. The possible presence of cellulolytic enzymes and microorganisms useful for biomass degradation as well as taxonomic and functional aspects of two Brazilian mangroves were evaluated using cultivation and metagenomic approaches. From a total of 296 microorganisms with visual differences in colony morphology and growth (including bacteria, yeast and filamentous fungus), 179 (60.5%) and 117 (39.5%) were isolated from the Rio de Janeiro (RJ) and Bahia (BA) samples, respectively. RJ metagenome showed the higher number of microbial isolates, which is consistent with its most conserved state and higher diversity. The metagenomic sequencing data showed similar predominant bacterial phyla in the BA and RJ mangroves with an abundance of Proteobacteria (57.8% and 44.6%), Firmicutes (11% and 12.3%) and Actinobacteria (8.4% and 7.5%). A higher number of enzymes involved in the degradation of polycyclic aromatic compounds were found in the BA mangrove. Specific sequences involved in the cellulolytic degradation, belonging to cellulases, hemicellulases, carbohydrate binding domains, dockerins and cohesins were identified, and it was possible to isolate cultivable fungi and bacteria related to biomass decomposition and with potential applications for the production of biofuels. These results showed that the mangroves possess all fundamental molecular tools required for building the cellulosome, which is required for the efficient degradation of cellulose material and sugar release. PMID:24160319

  16. Single-Cell-Genomics-Facilitated Read Binning of Candidate Phylum EM19 Genomes from Geothermal Spring Metagenomes

    PubMed Central

    Becraft, Eric D.; Dodsworth, Jeremy A.; Murugapiran, Senthil K.; Ohlsson, J. Ingemar; Briggs, Brandon R.; Kanbar, Jad; De Vlaminck, Iwijn; Quake, Stephen R.; Dong, Hailiang; Hedlund, Brian P.

    2015-01-01

    The vast majority of microbial life remains uncatalogued due to the inability to cultivate these organisms in the laboratory. This “microbial dark matter” represents a substantial portion of the tree of life and of the populations that contribute to chemical cycling in many ecosystems. In this work, we leveraged an existing single-cell genomic data set representing the candidate bacterial phylum “Calescamantes” (EM19) to calibrate machine learning algorithms and define metagenomic bins directly from pyrosequencing reads derived from Great Boiling Spring in the U.S. Great Basin. Compared to other assembly-based methods, taxonomic binning with a read-based machine learning approach yielded final assemblies with the highest predicted genome completeness of any method tested. Read-first binning subsequently was used to extract Calescamantes bins from all metagenomes with abundant Calescamantes populations, including metagenomes from Octopus Spring and Bison Pool in Yellowstone National Park and Gongxiaoshe Spring in Yunnan Province, China. Metabolic reconstruction suggests that Calescamantes are heterotrophic, facultative anaerobes, which can utilize oxidized nitrogen sources as terminal electron acceptors for respiration in the absence of oxygen and use proteins as their primary carbon source. Despite their phylogenetic divergence, the geographically separate Calescamantes populations were highly similar in their predicted metabolic capabilities and core gene content, respiring O2, or oxidized nitrogen species for energy conservation in distant but chemically similar hot springs. PMID:26637598

  17. Fast and Sensitive Alignment of Microbial Whole Genome Sequencing Reads to Large Sequence Datasets on a Desktop PC: Application to Metagenomic Datasets and Pathogen Identification

    PubMed Central

    2014-01-01

    Next generation sequencing (NGS) of metagenomic samples is becoming a standard approach to detect individual species or pathogenic strains of microorganisms. Computer programs used in the NGS community have to balance between speed and sensitivity and as a result, species or strain level identification is often inaccurate and low abundance pathogens can sometimes be missed. We have developed Taxoner, an open source, taxon assignment pipeline that includes a fast aligner (e.g. Bowtie2) and a comprehensive DNA sequence database. We tested the program on simulated datasets as well as experimental data from Illumina, IonTorrent, and Roche 454 sequencing platforms. We found that Taxoner performs as well as, and often better than BLAST, but requires two orders of magnitude less running time meaning that it can be run on desktop or laptop computers. Taxoner is slower than the approaches that use small marker databases but is more sensitive due the comprehensive reference database. In addition, it can be easily tuned to specific applications using small tailored databases. When applied to metagenomic datasets, Taxoner can provide a functional summary of the genes mapped and can provide strain level identification. Taxoner is written in C for Linux operating systems. The code and documentation are available for research applications at http://code.google.com/p/taxoner. PMID:25077800

  18. Raman-activated cell sorting and metagenomic sequencing revealing carbon-fixing bacteria in the ocean.

    PubMed

    Jing, Xiaoyan; Gou, Honglei; Gong, Yanhai; Su, Xiaolu; Xu, La; Ji, Yuetong; Song, Yizhi; Thompson, Ian P; Xu, Jian; Huang, Wei E

    2018-05-04

    It is of great significance to understand CO 2 fixation in the oceans. Using single cell Raman spectra (SCRS) as biochemical profiles, Raman activated cell ejection (RACE) was able to link phenotypes and genotypes of cells. Here we show that mini-metagenomic sequences from RACE can be used as a reference to reconstruct nearly complete genomes of key functional bacteria by binning shotgun metagenomic sequencing data. By applying this approach to 13 C-bicarbonate spiked seawater from euphotic zone of the Yellow Sea of China, the dominant bacteria Synechococcus spp. and Pelagibacter spp. were revealed, and both of them contain carotenoid and were able to incorporate 13 C into the cells at the same time. Genetic analysis of the reconstructed genomes suggests that both Synechococcus spp. and Pelagibacter spp. contained all genes necessary for carotenoid synthesis, light energy harvesting and CO 2 fixation. Interestingly, the reconstructed genome indicates that Pelagibacter spp. harbored intact sets of genes for β-carotene (precursor of retional), proteorhodopsin synthesis and anaplerotic CO 2 fixation. This novel approach shines light on the role of marine "microbial dark matter" in global carbon cycling, by linking yet-to-be-cultured Synechococcus spp. and Pelagibacter spp. to carbon fixation and flow activities in situ. This article is protected by copyright. All rights reserved. © 2018 Society for Applied Microbiology and John Wiley & Sons Ltd.

  19. Fast and sensitive alignment of microbial whole genome sequencing reads to large sequence datasets on a desktop PC: application to metagenomic datasets and pathogen identification.

    PubMed

    Pongor, Lőrinc S; Vera, Roberto; Ligeti, Balázs

    2014-01-01

    Next generation sequencing (NGS) of metagenomic samples is becoming a standard approach to detect individual species or pathogenic strains of microorganisms. Computer programs used in the NGS community have to balance between speed and sensitivity and as a result, species or strain level identification is often inaccurate and low abundance pathogens can sometimes be missed. We have developed Taxoner, an open source, taxon assignment pipeline that includes a fast aligner (e.g. Bowtie2) and a comprehensive DNA sequence database. We tested the program on simulated datasets as well as experimental data from Illumina, IonTorrent, and Roche 454 sequencing platforms. We found that Taxoner performs as well as, and often better than BLAST, but requires two orders of magnitude less running time meaning that it can be run on desktop or laptop computers. Taxoner is slower than the approaches that use small marker databases but is more sensitive due the comprehensive reference database. In addition, it can be easily tuned to specific applications using small tailored databases. When applied to metagenomic datasets, Taxoner can provide a functional summary of the genes mapped and can provide strain level identification. Taxoner is written in C for Linux operating systems. The code and documentation are available for research applications at http://code.google.com/p/taxoner.

  20. Compartmentalized metabolic network reconstruction of microbial communities to determine the effect of agricultural intervention on soils

    PubMed Central

    Álvarez-Yela, Astrid Catalina; Gómez-Cano, Fabio; Zambrano, María Mercedes; Husserl, Johana; Danies, Giovanna; Restrepo, Silvia; González-Barrios, Andrés Fernando

    2017-01-01

    Soil microbial communities are responsible for a wide range of ecological processes and have an important economic impact in agriculture. Determining the metabolic processes performed by microbial communities is crucial for understanding and managing ecosystem properties. Metagenomic approaches allow the elucidation of the main metabolic processes that determine the performance of microbial communities under different environmental conditions and perturbations. Here we present the first compartmentalized metabolic reconstruction at a metagenomics scale of a microbial ecosystem. This systematic approach conceives a meta-organism without boundaries between individual organisms and allows the in silico evaluation of the effect of agricultural intervention on soils at a metagenomics level. To characterize the microbial ecosystems, topological properties, taxonomic and metabolic profiles, as well as a Flux Balance Analysis (FBA) were considered. Furthermore, topological and optimization algorithms were implemented to carry out the curation of the models, to ensure the continuity of the fluxes between the metabolic pathways, and to confirm the metabolite exchange between subcellular compartments. The proposed models provide specific information about ecosystems that are generally overlooked in non-compartmentalized or non-curated networks, like the influence of transport reactions in the metabolic processes, especially the important effect on mitochondrial processes, as well as provide more accurate results of the fluxes used to optimize the metabolic processes within the microbial community. PMID:28767679

  1. Classifying short genomic fragments from novel lineages using composition and homology

    PubMed Central

    2011-01-01

    Background The assignment of taxonomic attributions to DNA fragments recovered directly from the environment is a vital step in metagenomic data analysis. Assignments can be made using rank-specific classifiers, which assign reads to taxonomic labels from a predetermined level such as named species or strain, or rank-flexible classifiers, which choose an appropriate taxonomic rank for each sequence in a data set. The choice of rank typically depends on the optimal model for a given sequence and on the breadth of taxonomic groups seen in a set of close-to-optimal models. Homology-based (e.g., LCA) and composition-based (e.g., PhyloPythia, TACOA) rank-flexible classifiers have been proposed, but there is at present no hybrid approach that utilizes both homology and composition. Results We first develop a hybrid, rank-specific classifier based on BLAST and Naïve Bayes (NB) that has comparable accuracy and a faster running time than the current best approach, PhymmBL. By substituting LCA for BLAST or allowing the inclusion of suboptimal NB models, we obtain a rank-flexible classifier. This hybrid classifier outperforms established rank-flexible approaches on simulated metagenomic fragments of length 200 bp to 1000 bp and is able to assign taxonomic attributions to a subset of sequences with few misclassifications. We then demonstrate the performance of different classifiers on an enhanced biological phosphorous removal metagenome, illustrating the advantages of rank-flexible classifiers when representative genomes are absent from the set of reference genomes. Application to a glacier ice metagenome demonstrates that similar taxonomic profiles are obtained across a set of classifiers which are increasingly conservative in their classification. Conclusions Our NB-based classification scheme is faster than the current best composition-based algorithm, Phymm, while providing equally accurate predictions. The rank-flexible variant of NB, which we term ε-NB, is complementary to LCA and can be combined with it to yield conservative prediction sets of very high confidence. The simple parameterization of LCA and ε-NB allows for tuning of the balance between more predictions and increased precision, allowing the user to account for the sensitivity of downstream analyses to misclassified or unclassified sequences. PMID:21827705

  2. Lignolytic-consortium omics analyses reveal novel genomes and pathways involved in lignin modification and valorization.

    PubMed

    Moraes, Eduardo C; Alvarez, Thabata M; Persinoti, Gabriela F; Tomazetto, Geizecler; Brenelli, Livia B; Paixão, Douglas A A; Ematsu, Gabriela C; Aricetti, Juliana A; Caldana, Camila; Dixon, Neil; Bugg, Timothy D H; Squina, Fabio M

    2018-01-01

    Lignin is a heterogeneous polymer representing a renewable source of aromatic and phenolic bio-derived products for the chemical industry. However, the inherent structural complexity and recalcitrance of lignin makes its conversion into valuable chemicals a challenge. Natural microbial communities produce biocatalysts derived from a large number of microorganisms, including those considered unculturable, which operate synergistically to perform a variety of bioconversion processes. Thus, metagenomic approaches are a powerful tool to reveal novel optimized metabolic pathways for lignin conversion and valorization. The lignin-degrading consortium (LigMet) was obtained from a sugarcane plantation soil sample. The LigMet taxonomical analyses (based on 16S rRNA) indicated prevalence of Proteobacteria , Actinobacteria and Firmicutes members, including the Alcaligenaceae and Micrococcaceae families, which were enriched in the LigMet compared to sugarcane soil. Analysis of global DNA sequencing revealed around 240,000 gene models, and 65 draft bacterial genomes were predicted. Along with depicting several peroxidases, dye-decolorizing peroxidases, laccases, carbohydrate esterases, and lignocellulosic auxiliary (redox) activities, the major pathways related to aromatic degradation were identified, including benzoate (or methylbenzoate) degradation to catechol (or methylcatechol), catechol ortho-cleavage, catechol meta-cleavage, and phthalate degradation. A novel Paenarthrobacter strain harboring eight gene clusters related to aromatic degradation was isolated from LigMet and was able to grow on lignin as major carbon source. Furthermore, a recombinant pathway for vanillin production was designed based on novel gene sequences coding for a feruloyl-CoA synthetase and an enoyl-CoA hydratase/aldolase retrieved from the metagenomic data set. The enrichment protocol described in the present study was successful for a microbial consortium establishment towards the lignin and aromatic metabolism, providing pathways and enzyme sets for synthetic biology engineering approaches. This work represents a pioneering study on lignin conversion and valorization strategies based on metagenomics, revealing several novel lignin conversion enzymes, aromatic-degrading bacterial genomes, and a novel bacterial strain of potential biotechnological interest. The validation of a biosynthetic route for vanillin synthesis confirmed the applicability of the targeted metagenome discovery approach for lignin valorization strategies.

  3. Antibiotic Resistome: Improving Detection and Quantification Accuracy for Comparative Metagenomics.

    PubMed

    Elbehery, Ali H A; Aziz, Ramy K; Siam, Rania

    2016-04-01

    The unprecedented rise of life-threatening antibiotic resistance (AR), combined with the unparalleled advances in DNA sequencing of genomes and metagenomes, has pushed the need for in silico detection of the resistance potential of clinical and environmental metagenomic samples through the quantification of AR genes (i.e., genes conferring antibiotic resistance). Therefore, determining an optimal methodology to quantitatively and accurately assess AR genes in a given environment is pivotal. Here, we optimized and improved existing AR detection methodologies from metagenomic datasets to properly consider AR-generating mutations in antibiotic target genes. Through comparative metagenomic analysis of previously published AR gene abundance in three publicly available metagenomes, we illustrate how mutation-generated resistance genes are either falsely assigned or neglected, which alters the detection and quantitation of the antibiotic resistome. In addition, we inspected factors influencing the outcome of AR gene quantification using metagenome simulation experiments, and identified that genome size, AR gene length, total number of metagenomics reads and selected sequencing platforms had pronounced effects on the level of detected AR. In conclusion, our proposed improvements in the current methodologies for accurate AR detection and resistome assessment show reliable results when tested on real and simulated metagenomic datasets.

  4. Evolutionary Patterns and Processes: Lessons from Ancient DNA.

    PubMed

    Leonardi, Michela; Librado, Pablo; Der Sarkissian, Clio; Schubert, Mikkel; Alfarhan, Ahmed H; Alquraishi, Saleh A; Al-Rasheid, Khaled A S; Gamba, Cristina; Willerslev, Eske; Orlando, Ludovic

    2017-01-01

    Ever since its emergence in 1984, the field of ancient DNA has struggled to overcome the challenges related to the decay of DNA molecules in the fossil record. With the recent development of high-throughput DNA sequencing technologies and molecular techniques tailored to ultra-damaged templates, it has now come of age, merging together approaches in phylogenomics, population genomics, epigenomics, and metagenomics. Leveraging on complete temporal sample series, ancient DNA provides direct access to the most important dimension in evolution—time, allowing a wealth of fundamental evolutionary processes to be addressed at unprecedented resolution. This review taps into the most recent findings in ancient DNA research to present analyses of ancient genomic and metagenomic data.

  5. Evolutionary Patterns and Processes: Lessons from Ancient DNA

    PubMed Central

    Leonardi, Michela; Librado, Pablo; Der Sarkissian, Clio; Schubert, Mikkel; Alfarhan, Ahmed H.; Alquraishi, Saleh A.; Al-Rasheid, Khaled A. S.; Gamba, Cristina; Willerslev, Eske

    2017-01-01

    Abstract Ever since its emergence in 1984, the field of ancient DNA has struggled to overcome the challenges related to the decay of DNA molecules in the fossil record. With the recent development of high-throughput DNA sequencing technologies and molecular techniques tailored to ultra-damaged templates, it has now come of age, merging together approaches in phylogenomics, population genomics, epigenomics, and metagenomics. Leveraging on complete temporal sample series, ancient DNA provides direct access to the most important dimension in evolution—time, allowing a wealth of fundamental evolutionary processes to be addressed at unprecedented resolution. This review taps into the most recent findings in ancient DNA research to present analyses of ancient genomic and metagenomic data. PMID:28173586

  6. Metagenome changes in the biogas producing community during anaerobic digestion of rice straw.

    PubMed

    Pore, Soham D; Shetty, Deepa; Arora, Preeti; Maheshwari, Sneha; Dhakephalkar, Prashant K

    2016-08-01

    The present investigation was undertaken to study the microbial community succession in a sour and healthy digester. Ion torrent next-generation sequencing (NGS)-based metagenomic approach indicated abundance of hydrolytic bacteria and exclusion of methanogens and syntrophic bacteria in sour digester. Functional gene analysis revealed higher abundance of enzymes involved in acidogenesis and lower abundance of enzymes associated with methanogenesis like Methyl coenzyme M-reductase, F420 dependent reductase and Formylmethanofuran dehydrogenase in sour digester. Increased abundance of methanogens (Methanomicrobia) and genes involved in methanogenesis was observed in the restored/healthy digester highlighting revival of pH sensitive methanogenic community. Copyright © 2016 Elsevier Ltd. All rights reserved.

  7. Construction of a metagenomic DNA library of sponge symbionts and screening of antibacterial metabolites

    NASA Astrophysics Data System (ADS)

    Chen, Juan; Zhu, Tianjiao; Li, Dehai; Cui, Chengbin; Fang, Yuchun; Liu, Hongbing; Liu, Peipei; Gu, Qianqun; Zhu, Weiming

    2006-04-01

    To study the bioactive metabolites produced by sponge-derived uncultured symbionts, a metagenomic DNA library of the symbionts of sponge Gelliodes gracilis was constructed. The average size of DNA inserts in the library was 20 kb. This library was screened for antibiotic activity using paper dise assaying. Two clones displayed the antibacterial activity against Micrococcus tetragenus. The metabolites of these two clones were analyzed through HPLC. The result showed that their metabolites were quite different from those of the host E. coli DH5α and the host containing vector pHZ132. This study may present a new approach to exploring bioactive metabolites of sponge symbionts.

  8. Effective Analysis of NGS Metagenomic Data with Ultra-Fast Clustering Algorithms (MICW - Metagenomics Informatics Challenges Workshop: 10K Genomes at a Time)

    ScienceCinema

    Li, Weizhong

    2018-02-12

    San Diego Supercomputer Center's Weizhong Li on "Effective Analysis of NGS Metagenomic Data with Ultra-fast Clustering Algorithms" at the Metagenomics Informatics Challenges Workshop held at the DOE JGI on October 12-13, 2011.

  9. Communication: Improved ab initio molecular dynamics by minimally biasing with experimental data

    NASA Astrophysics Data System (ADS)

    White, Andrew D.; Knight, Chris; Hocky, Glen M.; Voth, Gregory A.

    2017-01-01

    Accounting for electrons and nuclei simultaneously is a powerful capability of ab initio molecular dynamics (AIMD). However, AIMD is often unable to accurately reproduce properties of systems such as water due to inaccuracies in the underlying electronic density functionals. This shortcoming is often addressed by added empirical corrections and/or increasing the simulation temperature. We present here a maximum-entropy approach to directly incorporate limited experimental data via a minimal bias. Biased AIMD simulations of water and an excess proton in water are shown to give significantly improved properties both for observables which were biased to match experimental data and for unbiased observables. This approach also yields new physical insight into inaccuracies in the underlying density functional theory as utilized in the unbiased AIMD.

  10. Communication: Improved ab initio molecular dynamics by minimally biasing with experimental data.

    PubMed

    White, Andrew D; Knight, Chris; Hocky, Glen M; Voth, Gregory A

    2017-01-28

    Accounting for electrons and nuclei simultaneously is a powerful capability of ab initio molecular dynamics (AIMD). However, AIMD is often unable to accurately reproduce properties of systems such as water due to inaccuracies in the underlying electronic density functionals. This shortcoming is often addressed by added empirical corrections and/or increasing the simulation temperature. We present here a maximum-entropy approach to directly incorporate limited experimental data via a minimal bias. Biased AIMD simulations of water and an excess proton in water are shown to give significantly improved properties both for observables which were biased to match experimental data and for unbiased observables. This approach also yields new physical insight into inaccuracies in the underlying density functional theory as utilized in the unbiased AIMD.

  11. EBI metagenomics--a new resource for the analysis and archiving of metagenomic data.

    PubMed

    Hunter, Sarah; Corbett, Matthew; Denise, Hubert; Fraser, Matthew; Gonzalez-Beltran, Alejandra; Hunter, Christopher; Jones, Philip; Leinonen, Rasko; McAnulla, Craig; Maguire, Eamonn; Maslen, John; Mitchell, Alex; Nuka, Gift; Oisel, Arnaud; Pesseat, Sebastien; Radhakrishnan, Rajesh; Rocca-Serra, Philippe; Scheremetjew, Maxim; Sterk, Peter; Vaughan, Daniel; Cochrane, Guy; Field, Dawn; Sansone, Susanna-Assunta

    2014-01-01

    Metagenomics is a relatively recently established but rapidly expanding field that uses high-throughput next-generation sequencing technologies to characterize the microbial communities inhabiting different ecosystems (including oceans, lakes, soil, tundra, plants and body sites). Metagenomics brings with it a number of challenges, including the management, analysis, storage and sharing of data. In response to these challenges, we have developed a new metagenomics resource (http://www.ebi.ac.uk/metagenomics/) that allows users to easily submit raw nucleotide reads for functional and taxonomic analysis by a state-of-the-art pipeline, and have them automatically stored (together with descriptive, standards-compliant metadata) in the European Nucleotide Archive.

  12. Metagenomic and functional analyses of the consequences of reduction of bacterial diversity on soil functions and bioremediation in diesel-contaminated microcosms.

    PubMed

    Jung, Jaejoon; Philippot, Laurent; Park, Woojun

    2016-03-14

    The relationship between microbial biodiversity and soil function is an important issue in ecology, yet most studies have been performed in pristine ecosystems. Here, we assess the role of microbial diversity in ecological function and remediation strategies in diesel-contaminated soils. Soil microbial diversity was manipulated using a removal by dilution approach and microbial functions were determined using both metagenomic analyses and enzymatic assays. A shift from Proteobacteria- to Actinobacteria-dominant communities was observed when species diversity was reduced. Metagenomic analysis showed that a large proportion of functional gene categories were significantly altered by the reduction in biodiversity. The abundance of genes related to the nitrogen cycle was significantly reduced in the low-diversity community, impairing denitrification. In contrast, the efficiency of diesel biodegradation was increased in the low-diversity community and was further enhanced by addition of red clay as a stimulating agent. Our results suggest that the relationship between microbial diversity and ecological function involves trade-offs among ecological processes, and should not be generalized as a positive, neutral, or negative relationship.

  13. Potential of fecal microbiota for early-stage detection of colorectal cancer

    PubMed Central

    Zeller, Georg; Tap, Julien; Voigt, Anita Y; Sunagawa, Shinichi; Kultima, Jens Roat; Costea, Paul I; Amiot, Aurélien; Böhm, Jürgen; Brunetti, Francesco; Habermann, Nina; Hercog, Rajna; Koch, Moritz; Luciani, Alain; Mende, Daniel R; Schneider, Martin A; Schrotz-King, Petra; Tournigand, Christophe; Tran Van Nhieu, Jeanne; Yamada, Takuji; Zimmermann, Jürgen; Benes, Vladimir; Kloor, Matthias; Ulrich, Cornelia M; von Knebel Doeberitz, Magnus; Sobhani, Iradj; Bork, Peer

    2014-01-01

    Several bacterial species have been implicated in the development of colorectal carcinoma (CRC), but CRC-associated changes of fecal microbiota and their potential for cancer screening remain to be explored. Here, we used metagenomic sequencing of fecal samples to identify taxonomic markers that distinguished CRC patients from tumor-free controls in a study population of 156 participants. Accuracy of metagenomic CRC detection was similar to the standard fecal occult blood test (FOBT) and when both approaches were combined, sensitivity improved > 45% relative to the FOBT, while maintaining its specificity. Accuracy of metagenomic CRC detection did not differ significantly between early- and late-stage cancer and could be validated in independent patient and control populations (N = 335) from different countries. CRC-associated changes in the fecal microbiome at least partially reflected microbial community composition at the tumor itself, indicating that observed gene pool differences may reveal tumor-related host–microbe interactions. Indeed, we deduced a metabolic shift from fiber degradation in controls to utilization of host carbohydrates and amino acids in CRC patients, accompanied by an increase of lipopolysaccharide metabolism. PMID:25432777

  14. Metagenomic and functional analyses of the consequences of reduction of bacterial diversity on soil functions and bioremediation in diesel-contaminated microcosms

    PubMed Central

    Jung, Jaejoon; Philippot, Laurent; Park, Woojun

    2016-01-01

    The relationship between microbial biodiversity and soil function is an important issue in ecology, yet most studies have been performed in pristine ecosystems. Here, we assess the role of microbial diversity in ecological function and remediation strategies in diesel-contaminated soils. Soil microbial diversity was manipulated using a removal by dilution approach and microbial functions were determined using both metagenomic analyses and enzymatic assays. A shift from Proteobacteria- to Actinobacteria-dominant communities was observed when species diversity was reduced. Metagenomic analysis showed that a large proportion of functional gene categories were significantly altered by the reduction in biodiversity. The abundance of genes related to the nitrogen cycle was significantly reduced in the low-diversity community, impairing denitrification. In contrast, the efficiency of diesel biodegradation was increased in the low-diversity community and was further enhanced by addition of red clay as a stimulating agent. Our results suggest that the relationship between microbial diversity and ecological function involves trade-offs among ecological processes, and should not be generalized as a positive, neutral, or negative relationship. PMID:26972977

  15. Identification of a Novel Human Papillomavirus by Metagenomic Analysis of Samples from Patients with Febrile Respiratory Illness

    PubMed Central

    Mokili, John L.; Dutilh, Bas E.; Lim, Yan Wei; Schneider, Bradley S.; Taylor, Travis; Haynes, Matthew R.; Metzgar, David; Myers, Christopher A.; Blair, Patrick J.; Nosrat, Bahador; Wolfe, Nathan D.; Rohwer, Forest

    2013-01-01

    As part of a virus discovery investigation using a metagenomic approach, a highly divergent novel Human papillomavirus type was identified in pooled convenience nasal/oropharyngeal swab samples collected from patients with febrile respiratory illness. Phylogenetic analysis of the whole genome and the L1 gene reveals that the new HPV identified in this study clusters with previously described gamma papillomaviruses, sharing only 61.1% (whole genome) and 63.1% (L1) sequence identity with its closest relative in the Papillomavirus episteme (PAVE) database. This new virus was named HPV_SD2 pending official classification. The complete genome of HPV-SD2 is 7,299 bp long (36.3% G/C) and contains 7 open reading frames (L2, L1, E6, E7, E1, E2 and E4) and a non-coding long control region (LCR) between L1 and E6. The metagenomic procedures, coupled with the bioinformatic methods described herein are well suited to detect small circular genomes such as those of human papillomaviruses. PMID:23554892

  16. Metagenomic insights into ultraviolet disinfection effects on antibiotic resistome in biologically treated wastewater.

    PubMed

    Hu, Qing; Zhang, Xu-Xiang; Jia, Shuyu; Huang, Kailong; Tang, Junying; Shi, Peng; Ye, Lin; Ren, Hongqiang

    2016-09-15

    High-throughput sequencing-based metagenomic approaches were used to comprehensively investigate ultraviolet effects on the microbial community structure, and diversity and abundance of antibiotic resistance genes (ARGs) and mobile genetic elements (MGEs) in biologically treated wastewater. After ultraviolet radiation, some dominant genera, like Aeromonas and Halomonas, in the wastewater almost disappeared, while the relative abundance of some minor genera including Pseudomonas and Bacillus increased dozens of times. Metagenomic analysis showed that 159 ARGs within 14 types were detectable in the samples, and the radiation at 500 mJ/cm(2) obviously increased their total relative abundance from 31.68 ppm to 190.78 ppm, which was supported by quantitative real time PCR. As the dominant persistent ARGs, multidrug resistance genes carried by Pseudomonas and bacitracin resistance gene bacA carried by Bacillus mainly contributed to the ARGs abundance increase. Bacterial community shift and MGEs replication induced by the radiation might drive the resistome alteration. The findings may shed new light on the mechanism behind the ultraviolet radiation effects on antibiotic resistance in wastewater. Copyright © 2016 Elsevier Ltd. All rights reserved.

  17. Translational metagenomics and the human resistome: confronting the menace of the new millennium.

    PubMed

    Willmann, Matthias; Peter, Silke

    2017-01-01

    The increasing threat of antimicrobial resistance poses one of the greatest challenges to modern medicine. The collection of all antimicrobial resistance genes carried by various microorganisms in the human body is called the human resistome and represents the source of resistance in pathogens that can eventually cause life-threatening and untreatable infections. A deep understanding of the human resistome and its multilateral interaction with various environments is necessary for developing proper measures that can efficiently reduce the spread of resistance. However, the human resistome and its evolution still remain, for the most part, a mystery to researchers. Metagenomics, particularly in combination with next-generation-sequencing technology, provides a powerful methodological approach for studying the human microbiome as well as the pathogenome, the virolume and especially the resistome. We summarize below current knowledge on how the human resistome is shaped and discuss how metagenomics can be employed to improve our understanding of these complex processes, particularly as regards a rapid translation of new findings into clinical diagnostics, infection control and public health.

  18. Product-induced gene expression, a product-responsive reporter assay used to screen metagenomic libraries for enzyme-encoding genes.

    PubMed

    Uchiyama, Taku; Miyazaki, Kentaro

    2010-11-01

    A reporter assay-based screening method for enzymes, which we named product-induced gene expression (PIGEX), was developed and used to screen a metagenomic library for amidases. A benzoate-responsive transcriptional activator, BenR, was placed upstream of the gene encoding green fluorescent protein and used as a sensor. Escherichia coli sensor cells carrying the benR-gfp gene cassette fluoresced in response to benzoate concentrations as low as 10 μM but were completely unresponsive to the substrate benzamide. An E. coli metagenomic library consisting of 96,000 clones was grown in 96-well format in LB medium containing benzamide. The library cells were then cocultivated with sensor cells. Eleven amidase genes were recovered from 143 fluorescent wells; eight of these genes were homologous to known bacterial amidase genes while three were novel genes. In addition to their activity toward benzamide, the enzymes were active toward various substrates, including d- and l-amino acid amides, and displayed enantioselectivity. Thus, we demonstrated that PIGEX is an effective approach for screening novel enzymes based on product detection.

  19. Metagenomics of prebiotic and probiotic supplemented broilers gastrointestinal tract microbiome

    USDA-ARS?s Scientific Manuscript database

    Phylogenetic investigation of communities by reconstruction of unobserved states (PICRUSt) is a recently developed computational approach for prediction of functional composition of a microbiome comparing marker gene data with a reference genome database. The procedure established significant link ...

  20. MetaVelvet: An Extension of Velvet Assembler to de novo Metagenome Assembly from Short Sequence Reads (Metagenomics Informatics Challenges Workshop: 10K Genomes at a Time)

    ScienceCinema

    Sakakibara, Yasumbumi

    2018-02-13

    Keio University's Yasumbumi Sakakibara on "MetaVelvet: An Extension of Velvet Assembler to de novo Metagenome Assembly from Short Sequence Reads" at the Metagenomics Informatics Challenges Workshop held at the DOE JGI on October 12-13, 2011.

  1. MetaVelvet: An Extension of Velvet Assembler to de novo Metagenome Assembly from Short Sequence Reads (Metagenomics Informatics Challenges Workshop: 10K Genomes at a Time)

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Sakakibara, Yasumbumi

    2011-10-13

    Keio University's Yasumbumi Sakakibara on "MetaVelvet: An Extension of Velvet Assembler to de novo Metagenome Assembly from Short Sequence Reads" at the Metagenomics Informatics Challenges Workshop held at the DOE JGI on October 12-13, 2011.

  2. BioCreative Workshops for DOE Genome Sciences: Text Mining for Metagenomics

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Wu, Cathy H.; Hirschman, Lynette

    The objective of this project was to host BioCreative workshops to define and develop text mining tasks to meet the needs of the Genome Sciences community, focusing on metadata information extraction in metagenomics. Following the successful introduction of metagenomics at the BioCreative IV workshop, members of the metagenomics community and BioCreative communities continued discussion to identify candidate topics for a BioCreative metagenomics track for BioCreative V. Of particular interest was the capture of environmental and isolation source information from text. The outcome was to form a “community of interest” around work on the interactive EXTRACT system, which supported interactive taggingmore » of environmental and species data. This experiment is included in the BioCreative V virtual issue of Database. In addition, there was broad participation by members of the metagenomics community in the panels held at BioCreative V, leading to valuable exchanges between the text mining developers and members of the metagenomics research community. These exchanges are reflected in a number of the overview and perspective pieces also being captured in the BioCreative V virtual issue. Overall, this conversation has exposed the metagenomics researchers to the possibilities of text mining, and educated the text mining developers to the specific needs of the metagenomics community.« less

  3. Feline fecal virome reveals novel and prevalent enteric viruses

    PubMed Central

    Ng, Terry Fei Fan; Mesquita, João Rodrigo; Nascimento, Maria São José; Kondov, Nikola O.; Wong, Walt; Reuter, Gábor; Knowles, Nick J.; Vega, Everardo; Esona, Mathew D.; Deng, Xutao; Vinjé, Jan; Delwart, Eric

    2014-01-01

    Humans keep more than 80 million cats worldwide, ensuring frequent contacts with their viruses. Despite such interactions the enteric virome of cats remains poorly understood. We analyzed a fecal sample from a single healthy cat from Portugal using viral metagenomics and detected five eukaryotic viral genomes. These viruses included a novel picornavirus (proposed genus “Sakobuvirus”) and bocavirus (feline bocavirus 2), a variant of feline astrovirus 2 and sequence fragments of a highly divergent feline rotavirus and picobirnavirus. Feline sakobuvirus A represents the prototype species of a proposed new genus in the Picornaviridae family, distantly related to human salivirus and kobuvirus. Feline astroviruses (mamastrovirus 2) are the closest relatives of the classic human astroviruses (mamastrovirus 1), suggestive of past cross-species transmission. Presence of these viruses by PCR among Portuguese cats was detected in 13% (rotavirus), 7% (astrovirus), 6% (bocavirus), 4% (sakobuvirus), and 4% (picobirnavirus) of 55 feline fecal samples. Co-infections were frequent with 40% (4/10) of cats shedding more than one of these viruses. Our study provides an initial unbiased description of the feline fecal virome indicating a high level of asymptomatic infections. Availability of the genome sequences of these viruses will facilitate future tropism and disease association studies. PMID:24793097

  4. Effect of short-term room temperature storage on the microbial community in infant fecal samples.

    PubMed

    Guo, Yong; Li, Sheng-Hui; Kuang, Ya-Shu; He, Jian-Rong; Lu, Jin-Hua; Luo, Bei-Jun; Jiang, Feng-Ju; Liu, Yao-Zhong; Papasian, Christopher J; Xia, Hui-Min; Deng, Hong-Wen; Qiu, Xiu

    2016-05-26

    Sample storage conditions are important for unbiased analysis of microbial communities in metagenomic studies. Specifically, for infant gut microbiota studies, stool specimens are often exposed to room temperature (RT) conditions prior to analysis. This could lead to variations in structural and quantitative assessment of bacterial communities. To estimate such effects of RT storage, we collected feces from 29 healthy infants (0-3 months) and partitioned each sample into 5 portions to be stored for different lengths of time at RT before freezing at -80 °C. Alpha diversity did not differ between samples with storage time from 0 to 2 hours. The UniFrac distances and microbial composition analysis showed significant differences by testing among individuals, but not by testing between different time points at RT. Changes in the relative abundance of some specific (less common, minor) taxa were still found during storage at room temperature. Our results support previous studies in children and adults, and provided useful information for accurate characterization of infant gut microbiomes. In particular, our study furnished a solid foundation and justification for using fecal samples exposed to RT for less than 2 hours for comparative analyses between various medical conditions.

  5. MetaCRAST: reference-guided extraction of CRISPR spacers from unassembled metagenomes.

    PubMed

    Moller, Abraham G; Liang, Chun

    2017-01-01

    Clustered regularly interspaced short palindromic repeat (CRISPR) systems are the adaptive immune systems of bacteria and archaea against viral infection. While CRISPRs have been exploited as a tool for genetic engineering, their spacer sequences can also provide valuable insights into microbial ecology by linking environmental viruses to their microbial hosts. Despite this importance, metagenomic CRISPR detection remains a major challenge. Here we present a reference-guided CRISPR spacer detection tool ( Meta genomic C RISPR R eference- A ided S earch T ool-MetaCRAST) that constrains searches based on user-specified direct repeats (DRs). These DRs could be expected from assembly or taxonomic profiles of metagenomes. We compared the performance of MetaCRAST to those of two existing metagenomic CRISPR detection tools-Crass and MinCED-using both real and simulated acid mine drainage (AMD) and enhanced biological phosphorus removal (EBPR) metagenomes. Our evaluation shows MetaCRAST improves CRISPR spacer detection in real metagenomes compared to the de novo CRISPR detection methods Crass and MinCED. Evaluation on simulated metagenomes show it performs better than de novo tools for Illumina metagenomes and comparably for 454 metagenomes. It also has comparable performance dependence on read length and community composition, run time, and accuracy to these tools. MetaCRAST is implemented in Perl, parallelizable through the Many Core Engine (MCE), and takes metagenomic sequence reads and direct repeat queries (FASTA or FASTQ) as input. It is freely available for download at https://github.com/molleraj/MetaCRAST.

  6. Evaluation of rapid and simple techniques for the enrichment of viruses prior to metagenomic virus discovery.

    PubMed

    Hall, Richard J; Wang, Jing; Todd, Angela K; Bissielo, Ange B; Yen, Seiha; Strydom, Hugo; Moore, Nicole E; Ren, Xiaoyun; Huang, Q Sue; Carter, Philip E; Peacey, Matthew

    2014-01-01

    The discovery of new or divergent viruses using metagenomics and high-throughput sequencing has become more commonplace. The preparation of a sample is known to have an effect on the representation of virus sequences within the metagenomic dataset yet comparatively little attention has been given to this. Physical enrichment techniques are often applied to samples to increase the number of viral sequences and therefore enhance the probability of detection. With the exception of virus ecology studies, there is a paucity of information available to researchers on the type of sample preparation required for a viral metagenomic study that seeks to identify an aetiological virus in an animal or human diagnostic sample. A review of published virus discovery studies revealed the most commonly used enrichment methods, that were usually quick and simple to implement, namely low-speed centrifugation, filtration, nuclease-treatment (or combinations of these) which have been routinely used but often without justification. These were applied to a simple and well-characterised artificial sample composed of bacterial and human cells, as well as DNA (adenovirus) and RNA viruses (influenza A and human enterovirus), being either non-enveloped capsid or enveloped viruses. The effect of the enrichment method was assessed by both quantitative real-time PCR and metagenomic analysis that incorporated an amplification step. Reductions in the absolute quantities of bacteria and human cells were observed for each method as determined by qPCR, but the relative abundance of viral sequences in the metagenomic dataset remained largely unchanged. A 3-step method of centrifugation, filtration and nuclease-treatment showed the greatest increase in the proportion of viral sequences. This study provides a starting point for the selection of a purification method in future virus discovery studies, and highlights the need for more data to validate the effect of enrichment methods on different sample types, amplification, bioinformatics approaches and sequencing platforms. This study also highlights the potential risks that may attend selection of a virus enrichment method without any consideration for the sample type being investigated. Copyright © 2013 The Authors. Published by Elsevier B.V. All rights reserved.

  7. Metagenomic Evidence for the Presence of Comammox Nitrospira-Like Bacteria in a Drinking Water System.

    PubMed

    Pinto, Ameet J; Marcus, Daniel N; Ijaz, Umer Zeeshan; Bautista-de Lose Santos, Quyen Melina; Dick, Gregory J; Raskin, Lutgarde

    2016-01-01

    We report metagenomic evidence for the presence of a Nitrospira-like organism with the metabolic potential to perform the complete oxidation of ammonia to nitrate (i.e., it is a complete ammonia oxidizer [comammox]) in a drinking water system. This metagenome bin was discovered through shotgun DNA sequencing of samples from biologically active filters at the drinking water treatment plant in Ann Arbor, MI. Ribosomal proteins, 16S rRNA, and nxrA gene analyses confirmed that this genome is related to Nitrospira-like nitrite-oxidizing bacteria. The presence of the full suite of ammonia oxidation genes, including ammonia monooxygenase and hydroxylamine dehydrogenase, on a single ungapped scaffold within this metagenome bin suggests the presence of recently discovered comammox potential. Evaluations based on coverage and k-mer frequency distribution, use of two different genome-binning approaches, and nucleic acid and protein similarity analyses support the presence of this scaffold within the Nitrospira metagenome bin. The amoA gene found in this metagenome bin is divergent from those of canonical ammonia and methane oxidizers and clusters closely with the unusual amoA gene of comammox Nitrospira. This finding suggests that previously reported imbalances in abundances of nitrite- and ammonia-oxidizing bacteria/archaea may likely be explained by the capacity of Nitrospira-like organisms to completely oxidize ammonia. This finding might have significant implications for our understanding of microbially mediated nitrogen transformations in engineered and natural systems. IMPORTANCE Nitrification plays an important role in regulating the concentrations of inorganic nitrogen species in a range of environments, from drinking water and wastewater treatment plants to the oceans. Until recently, aerobic nitrification was considered to be a two-step process involving ammonia-oxidizing bacteria or archaea and nitrite-oxidizing bacteria. This process requires close cooperation between these two functional guilds for complete conversion of ammonia to nitrate, without the accumulation of nitrite or other intermediates, such as nitrous oxide, a potent greenhouse gas. The discovery of a single organism with the potential to oxidize both ammonia and nitrite adds a new dimension to the current understanding of aerobic nitrification, while presenting opportunities to rethink nitrogen management in engineered systems.

  8. Current strategies for mobilome research.

    PubMed

    Jørgensen, Tue S; Kiil, Anne S; Hansen, Martin A; Sørensen, Søren J; Hansen, Lars H

    2014-01-01

    Mobile genetic elements (MGEs) are pivotal for bacterial evolution and adaptation, allowing shuffling of genes even between distantly related bacterial species. The study of these elements is biologically interesting as the mode of genetic propagation is kaleidoscopic and important, as MGEs are the main vehicles of the increasing bacterial antibiotic resistance that causes thousands of human deaths each year. The study of MGEs has previously focused on plasmids from individual isolates, but the revolution in sequencing technology has allowed the study of mobile genomic elements of entire communities using metagenomic approaches. The problem in using metagenomic sequencing for the study of MGEs is that plasmids and other mobile elements only comprise a small fraction of the total genetic content that are difficult to separate from chromosomal DNA based on sequence alone. The distinction between plasmid and chromosome is important as the mobility and regulation of genes largely depend on their genetic context. Several different approaches have been proposed that specifically enrich plasmid DNA from community samples. Here, we review recent approaches used to study entire plasmid pools from complex environments, and point out possible future developments for and pitfalls of these approaches. Further, we discuss the use of the PacBio long-read sequencing technology for MGE discovery.

  9. Assembling the Marine Metagenome, One Cell at a Time

    PubMed Central

    Woyke, Tanja; Xie, Gary; Copeland, Alex; González, José M.; Han, Cliff; Kiss, Hajnalka; Saw, Jimmy H.; Senin, Pavel; Yang, Chi; Chatterji, Sourav; Cheng, Jan-Fang; Eisen, Jonathan A.; Sieracki, Michael E.; Stepanauskas, Ramunas

    2009-01-01

    The difficulty associated with the cultivation of most microorganisms and the complexity of natural microbial assemblages, such as marine plankton or human microbiome, hinder genome reconstruction of representative taxa using cultivation or metagenomic approaches. Here we used an alternative, single cell sequencing approach to obtain high-quality genome assemblies of two uncultured, numerically significant marine microorganisms. We employed fluorescence-activated cell sorting and multiple displacement amplification to obtain hundreds of micrograms of genomic DNA from individual, uncultured cells of two marine flavobacteria from the Gulf of Maine that were phylogenetically distant from existing cultured strains. Shotgun sequencing and genome finishing yielded 1.9 Mbp in 17 contigs and 1.5 Mbp in 21 contigs for the two flavobacteria, with estimated genome recoveries of about 91% and 78%, respectively. Only 0.24% of the assembling sequences were contaminants and were removed from further analysis using rigorous quality control. In contrast to all cultured strains of marine flavobacteria, the two single cell genomes were excellent Global Ocean Sampling (GOS) metagenome fragment recruiters, demonstrating their numerical significance in the ocean. The geographic distribution of GOS recruits along the Northwest Atlantic coast coincided with ocean surface currents. Metabolic reconstruction indicated diverse potential energy sources, including biopolymer degradation, proteorhodopsin photometabolism, and hydrogen oxidation. Compared to cultured relatives, the two uncultured flavobacteria have small genome sizes, few non-coding nucleotides, and few paralogous genes, suggesting adaptations to narrow ecological niches. These features may have contributed to the abundance of the two taxa in specific regions of the ocean, and may have hindered their cultivation. We demonstrate the power of single cell DNA sequencing to generate reference genomes of uncultured taxa from a complex microbial community of marine bacterioplankton. A combination of single cell genomics and metagenomics enabled us to analyze the genome content, metabolic adaptations, and biogeography of these taxa. PMID:19390573

  10. PhyloPythiaS+: a self-training method for the rapid reconstruction of low-ranking taxonomic bins from metagenomes.

    PubMed

    Gregor, Ivan; Dröge, Johannes; Schirmer, Melanie; Quince, Christopher; McHardy, Alice C

    2016-01-01

    Background. Metagenomics is an approach for characterizing environmental microbial communities in situ, it allows their functional and taxonomic characterization and to recover sequences from uncultured taxa. This is often achieved by a combination of sequence assembly and binning, where sequences are grouped into 'bins' representing taxa of the underlying microbial community. Assignment to low-ranking taxonomic bins is an important challenge for binning methods as is scalability to Gb-sized datasets generated with deep sequencing techniques. One of the best available methods for species bins recovery from deep-branching phyla is the expert-trained PhyloPythiaS package, where a human expert decides on the taxa to incorporate in the model and identifies 'training' sequences based on marker genes directly from the sample. Due to the manual effort involved, this approach does not scale to multiple metagenome samples and requires substantial expertise, which researchers who are new to the area do not have. Results. We have developed PhyloPythiaS+, a successor to our PhyloPythia(S) software. The new (+) component performs the work previously done by the human expert. PhyloPythiaS+ also includes a new k-mer counting algorithm, which accelerated the simultaneous counting of 4-6-mers used for taxonomic binning 100-fold and reduced the overall execution time of the software by a factor of three. Our software allows to analyze Gb-sized metagenomes with inexpensive hardware, and to recover species or genera-level bins with low error rates in a fully automated fashion. PhyloPythiaS+ was compared to MEGAN, taxator-tk, Kraken and the generic PhyloPythiaS model. The results showed that PhyloPythiaS+ performs especially well for samples originating from novel environments in comparison to the other methods. Availability. PhyloPythiaS+ in a virtual machine is available for installation under Windows, Unix systems or OS X on: https://github.com/algbioi/ppsp/wiki.

  11. BioMaS: a modular pipeline for Bioinformatic analysis of Metagenomic AmpliconS.

    PubMed

    Fosso, Bruno; Santamaria, Monica; Marzano, Marinella; Alonso-Alemany, Daniel; Valiente, Gabriel; Donvito, Giacinto; Monaco, Alfonso; Notarangelo, Pasquale; Pesole, Graziano

    2015-07-01

    Substantial advances in microbiology, molecular evolution and biodiversity have been carried out in recent years thanks to Metagenomics, which allows to unveil the composition and functions of mixed microbial communities in any environmental niche. If the investigation is aimed only at the microbiome taxonomic structure, a target-based metagenomic approach, here also referred as Meta-barcoding, is generally applied. This approach commonly involves the selective amplification of a species-specific genetic marker (DNA meta-barcode) in the whole taxonomic range of interest and the exploration of its taxon-related variants through High-Throughput Sequencing (HTS) technologies. The accessibility to proper computational systems for the large-scale bioinformatic analysis of HTS data represents, currently, one of the major challenges in advanced Meta-barcoding projects. BioMaS (Bioinformatic analysis of Metagenomic AmpliconS) is a new bioinformatic pipeline designed to support biomolecular researchers involved in taxonomic studies of environmental microbial communities by a completely automated workflow, comprehensive of all the fundamental steps, from raw sequence data upload and cleaning to final taxonomic identification, that are absolutely required in an appropriately designed Meta-barcoding HTS-based experiment. In its current version, BioMaS allows the analysis of both bacterial and fungal environments starting directly from the raw sequencing data from either Roche 454 or Illumina HTS platforms, following two alternative paths, respectively. BioMaS is implemented into a public web service available at https://recasgateway.ba.infn.it/ and is also available in Galaxy at http://galaxy.cloud.ba.infn.it:8080 (only for Illumina data). BioMaS is a friendly pipeline for Meta-barcoding HTS data analysis specifically designed for users without particular computing skills. A comparative benchmark, carried out by using a simulated dataset suitably designed to broadly represent the currently known bacterial and fungal world, showed that BioMaS outperforms QIIME and MOTHUR in terms of extent and accuracy of deep taxonomic sequence assignments.

  12. Evaluation of the Cow Rumen Metagenome: Assembly by Single Copy Gene Analysis and Single Cell Genome Assemblies (Metagenomics Informatics Challenges Workshop: 10K Genomes at a Time)

    ScienceCinema

    Sczyrba, Alex

    2018-02-13

    DOE JGI's Alex Sczyrba on "Evaluation of the Cow Rumen Metagenome" and "Assembly by Single Copy Gene Analysis and Single Cell Genome Assemblies" at the Metagenomics Informatics Challenges Workshop held at the DOE JGI on October 12-13, 2011.

  13. Evaluation of the Cow Rumen Metagenome: Assembly by Single Copy Gene Analysis and Single Cell Genome Assemblies (Metagenomics Informatics Challenges Workshop: 10K Genomes at a Time)

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Sczyrba, Alex

    2011-10-13

    DOE JGI's Alex Sczyrba on "Evaluation of the Cow Rumen Metagenome" and "Assembly by Single Copy Gene Analysis and Single Cell Genome Assemblies" at the Metagenomics Informatics Challenges Workshop held at the DOE JGI on October 12-13, 2011.

  14. Exploiting HPC Platforms for Metagenomics: Challenges and Opportunities (MICW - Metagenomics Informatics Challenges Workshop: 10K Genomes at a Time)

    ScienceCinema

    Canon, Shane

    2018-01-24

    DOE JGI's Zhong Wang, chair of the High-performance Computing session, gives a brief introduction before Berkeley Lab's Shane Canon talks about "Exploiting HPC Platforms for Metagenomics: Challenges and Opportunities" at the Metagenomics Informatics Challenges Workshop held at the DOE JGI on October 12-13, 2011.

  15. COMPETITIVE METAGENOMIC DNA HYBRIDIZATION IDENTIFIES HOST-SPECIFIC GENETIC MARKERS IN HUMAN FECAL MICROBIAL COMMUNITIES

    EPA Science Inventory

    Although recent technological advances in DNA sequencing and computational biology now allow scientists to compare entire microbial genomes, the use of these approaches to discern key genomic differences between natural microbial communities remains prohibitively expensive for mo...

  16. Single-Cell-Genomics-Facilitated Read Binning of Candidate Phylum EM19 Genomes from Geothermal Spring Metagenomes.

    PubMed

    Becraft, Eric D; Dodsworth, Jeremy A; Murugapiran, Senthil K; Ohlsson, J Ingemar; Briggs, Brandon R; Kanbar, Jad; De Vlaminck, Iwijn; Quake, Stephen R; Dong, Hailiang; Hedlund, Brian P; Swingley, Wesley D

    2016-02-15

    The vast majority of microbial life remains uncatalogued due to the inability to cultivate these organisms in the laboratory. This "microbial dark matter" represents a substantial portion of the tree of life and of the populations that contribute to chemical cycling in many ecosystems. In this work, we leveraged an existing single-cell genomic data set representing the candidate bacterial phylum "Calescamantes" (EM19) to calibrate machine learning algorithms and define metagenomic bins directly from pyrosequencing reads derived from Great Boiling Spring in the U.S. Great Basin. Compared to other assembly-based methods, taxonomic binning with a read-based machine learning approach yielded final assemblies with the highest predicted genome completeness of any method tested. Read-first binning subsequently was used to extract Calescamantes bins from all metagenomes with abundant Calescamantes populations, including metagenomes from Octopus Spring and Bison Pool in Yellowstone National Park and Gongxiaoshe Spring in Yunnan Province, China. Metabolic reconstruction suggests that Calescamantes are heterotrophic, facultative anaerobes, which can utilize oxidized nitrogen sources as terminal electron acceptors for respiration in the absence of oxygen and use proteins as their primary carbon source. Despite their phylogenetic divergence, the geographically separate Calescamantes populations were highly similar in their predicted metabolic capabilities and core gene content, respiring O2, or oxidized nitrogen species for energy conservation in distant but chemically similar hot springs. Copyright © 2016, American Society for Microbiology. All Rights Reserved.

  17. Metavir 2: new tools for viral metagenome comparison and assembled virome analysis

    PubMed Central

    2014-01-01

    Background Metagenomics, based on culture-independent sequencing, is a well-fitted approach to provide insights into the composition, structure and dynamics of environmental viral communities. Following recent advances in sequencing technologies, new challenges arise for existing bioinformatic tools dedicated to viral metagenome (i.e. virome) analysis as (i) the number of viromes is rapidly growing and (ii) large genomic fragments can now be obtained by assembling the huge amount of sequence data generated for each metagenome. Results To face these challenges, a new version of Metavir was developed. First, all Metavir tools have been adapted to support comparative analysis of viromes in order to improve the analysis of multiple datasets. In addition to the sequence comparison previously provided, viromes can now be compared through their k-mer frequencies, their taxonomic compositions, recruitment plots and phylogenetic trees containing sequences from different datasets. Second, a new section has been specifically designed to handle assembled viromes made of thousands of large genomic fragments (i.e. contigs). This section includes an annotation pipeline for uploaded viral contigs (gene prediction, similarity search against reference viral genomes and protein domains) and an extensive comparison between contigs and reference genomes. Contigs and their annotations can be explored on the website through specifically developed dynamic genomic maps and interactive networks. Conclusions The new features of Metavir 2 allow users to explore and analyze viromes composed of raw reads or assembled fragments through a set of adapted tools and a user-friendly interface. PMID:24646187

  18. Diversity of thermophiles in a Malaysian hot spring determined using 16S rRNA and shotgun metagenome sequencing.

    PubMed

    Chan, Chia Sing; Chan, Kok-Gan; Tay, Yea-Ling; Chua, Yi-Heng; Goh, Kian Mau

    2015-01-01

    The Sungai Klah (SK) hot spring is the second hottest geothermal spring in Malaysia. This hot spring is a shallow, 150-m-long, fast-flowing stream, with temperatures varying from 50 to 110°C and a pH range of 7.0-9.0. Hidden within a wooded area, the SK hot spring is continually fed by plant litter, resulting in a relatively high degree of total organic content (TOC). In this study, a sample taken from the middle of the stream was analyzed at the 16S rRNA V3-V4 region by amplicon metagenome sequencing. Over 35 phyla were detected by analyzing the 16S rRNA data. Firmicutes and Proteobacteria represented approximately 57% of the microbiome. Approximately 70% of the detected thermophiles were strict anaerobes; however, Hydrogenobacter spp., obligate chemolithotrophic thermophiles, represented one of the major taxa. Several thermophilic photosynthetic microorganisms and acidothermophiles were also detected. Most of the phyla identified by 16S rRNA were also found using the shotgun metagenome approaches. The carbon, sulfur, and nitrogen metabolism within the SK hot spring community were evaluated by shotgun metagenome sequencing, and the data revealed diversity in terms of metabolic activity and dynamics. This hot spring has a rich diversified phylogenetic community partly due to its natural environment (plant litter, high TOC, and a shallow stream) and geochemical parameters (broad temperature and pH range). It is speculated that symbiotic relationships occur between the members of the community.

  19. Amplicon-based metagenomics identified candidate organisms in soils that caused yield decline in strawberry

    PubMed Central

    Xu, Xiangming; Passey, Thomas; Wei, Feng; Saville, Robert; Harrison, Richard J.

    2015-01-01

    A phenomenon of yield decline due to weak plant growth in strawberry was recently observed in non-chemo-fumigated soils, which was not associated with the soil fungal pathogen Verticillium dahliae, the main target of fumigation. Amplicon-based metagenomics was used to profile soil microbiota in order to identify microbial organisms that may have caused the yield decline. A total of 36 soil samples were obtained in 2013 and 2014 from four sites for metagenomic studies; two of the four sites had a yield-decline problem, the other two did not. More than 2000 fungal or bacterial operational taxonomy units (OTUs) were found in these samples. Relative abundance of individual OTUs was statistically compared for differences between samples from sites with or without yield decline. A total of 721 individual comparisons were statistically significant – involving 366 unique bacterial and 44 unique fungal OTUs. Based on further selection criteria, we focused on 34 bacterial and 17 fungal OTUs and found that yield decline resulted probably from one or more of the following four factors: (1) low abundance of Bacillus and Pseudomonas populations, which are well known for their ability of supressing pathogen development and/or promoting plant growth; (2) lack of the nematophagous fungus (Paecilomyces species); (3) a high level of two non-specific fungal root rot pathogens; and (4) wet soil conditions. This study demonstrated the usefulness of an amplicon-based metagenomics approach to profile soil microbiota and to detect differential abundance in microbes. PMID:26504572

  20. An Insight into Phage Diversity at Environmental Habitats using Comparative Metagenomics Approach.

    PubMed

    Parmar, Krupa; Dafale, Nishant; Pal, Rajesh; Tikariha, Hitesh; Purohit, Hemant

    2018-02-01

    Bacteriophages play significant role in driving microbial diversity; however, little is known about the diversity of phages in different ecosystems. A dynamic predator-prey mechanism called "kill the winner" suggests the elimination of most active bacterial populations through phages. Thus, interaction between phage and host has an effect on the composition of microbial communities in ecosystems. In this study, secondary phage metagenome data from aquatic habitats: wastewater treatment plant (WWTP), fresh, marine, and hot water spring habitat were analyzed using MG-RAST and STAMP tools to explore the diversity of the viruses. Differential relative abundance of phage families-Siphoviridae (34%) and Myoviridae (26%) in WWTP, Myoviridae (30%) and Podoviridae (23%) in fresh water, and Myoviridae (41%) and Podoviridae (8%) in marine-was found to be a discriminating factor among four habitats while Rudiviridae (9%), Globuloviridae (8%), and Lipothrixviridae (1%) were exclusively observed in hot water spring. Subsequently, at genera level, Bpp-1-like virus, Chlorovirus, and T4-like virus were found abundant in WWTP, fresh, and marine habitat, respectively. PCA analysis revealed completely disparate composition of phage in hot water spring from other three ecosystems. Similar analysis of relative abundance of functional features corroborated observations from taxa analysis. Functional features corresponding to phage packaging machinery, replication, integration and excision, and gene transfer discriminated among four habitats. The comparative metagenomics approach exhibited genetically distinct phage communities among four habitats. Results revealed that selective distribution of phage communities would help in understanding the role of phages in food chains, nutrient cycling, and microbial ecology. Study of specific phages would also help in controlling environmental pathogens including MDR bacterial populations using phage therapy approach by selective mining and isolation of phages against specific pathogens persisting in a given environment.

  1. Use of Metagenomic Shotgun Sequencing Technology To Detect Foodborne Pathogens within the Microbiome of the Beef Production Chain.

    PubMed

    Yang, Xiang; Noyes, Noelle R; Doster, Enrique; Martin, Jennifer N; Linke, Lyndsey M; Magnuson, Roberta J; Yang, Hua; Geornaras, Ifigenia; Woerner, Dale R; Jones, Kenneth L; Ruiz, Jaime; Boucher, Christina; Morley, Paul S; Belk, Keith E

    2016-04-01

    Foodborne illnesses associated with pathogenic bacteria are a global public health and economic challenge. The diversity of microorganisms (pathogenic and nonpathogenic) that exists within the food and meat industries complicates efforts to understand pathogen ecology. Further, little is known about the interaction of pathogens within the microbiome throughout the meat production chain. Here, a metagenomic approach and shotgun sequencing technology were used as tools to detect pathogenic bacteria in environmental samples collected from the same groups of cattle at different longitudinal processing steps of the beef production chain: cattle entry to feedlot, exit from feedlot, cattle transport trucks, abattoir holding pens, and the end of the fabrication system. The log read counts classified as pathogens per million reads for Salmonella enterica,Listeria monocytogenes,Escherichia coli,Staphylococcus aureus, Clostridium spp. (C. botulinum and C. perfringens), and Campylobacter spp. (C. jejuni,C. coli, and C. fetus) decreased over subsequential processing steps. Furthermore, the normalized read counts for S. enterica,E. coli, and C. botulinumwere greater in the final product than at the feedlots, indicating that the proportion of these bacteria increased (the effect on absolute numbers was unknown) within the remaining microbiome. From an ecological perspective, data indicated that shotgun metagenomics can be used to evaluate not only the microbiome but also shifts in pathogen populations during beef production. Nonetheless, there were several challenges in this analysis approach, one of the main ones being the identification of the specific pathogen from which the sequence reads originated, which makes this approach impractical for use in pathogen identification for regulatory and confirmation purposes. Copyright © 2016 Yang et al.

  2. Challenges and opportunities in understanding microbial communities with metagenome assembly (accompanied by IPython Notebook tutorial)

    DOE PAGES

    Howe, Adina; Chain, Patrick S. G.

    2015-07-09

    Metagenomic investigations hold great promise for informing the genetics, physiology, and ecology of environmental microorganisms. Current challenges for metagenomic analysis are related to our ability to connect the dots between sequencing reads, their population of origin, and their encoding functions. Assembly-based methods reduce dataset size by extending overlapping reads into larger contiguous sequences (contigs), providing contextual information for genetic sequences that does not rely on existing references. These methods, however, tend to be computationally intensive and are again challenged by sequencing errors as well as by genomic repeats. While numerous tools have been developed based on these methodological concepts, theymore » present confounding choices and training requirements to metagenomic investigators. To help with accessibility to assembly tools, this review also includes an IPython Notebook metagenomic assembly tutorial. This tutorial has instructions for execution any operating system using Amazon Elastic Cloud Compute and guides users through downloading, assembly, and mapping reads to contigs of a mock microbiome metagenome. Despite its challenges, metagenomic analysis has already revealed novel insights into many environments on Earth. As software, training, and data continue to emerge, metagenomic data access and its discoveries will to grow.« less

  3. Identifying biologically relevant differences between metagenomic communities.

    PubMed

    Parks, Donovan H; Beiko, Robert G

    2010-03-15

    Metagenomics is the study of genetic material recovered directly from environmental samples. Taxonomic and functional differences between metagenomic samples can highlight the influence of ecological factors on patterns of microbial life in a wide range of habitats. Statistical hypothesis tests can help us distinguish ecological influences from sampling artifacts, but knowledge of only the P-value from a statistical hypothesis test is insufficient to make inferences about biological relevance. Current reporting practices for pairwise comparative metagenomics are inadequate, and better tools are needed for comparative metagenomic analysis. We have developed a new software package, STAMP, for comparative metagenomics that supports best practices in analysis and reporting. Examination of a pair of iron mine metagenomes demonstrates that deeper biological insights can be gained using statistical techniques available in our software. An analysis of the functional potential of 'Candidatus Accumulibacter phosphatis' in two enhanced biological phosphorus removal metagenomes identified several subsystems that differ between the A.phosphatis stains in these related communities, including phosphate metabolism, secretion and metal transport. Python source code and binaries are freely available from our website at http://kiwi.cs.dal.ca/Software/STAMP CONTACT: beiko@cs.dal.ca Supplementary data are available at Bioinformatics online.

  4. Challenges and opportunities in understanding microbial communities with metagenome assembly (accompanied by IPython Notebook tutorial)

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Howe, Adina; Chain, Patrick S. G.

    Metagenomic investigations hold great promise for informing the genetics, physiology, and ecology of environmental microorganisms. Current challenges for metagenomic analysis are related to our ability to connect the dots between sequencing reads, their population of origin, and their encoding functions. Assembly-based methods reduce dataset size by extending overlapping reads into larger contiguous sequences (contigs), providing contextual information for genetic sequences that does not rely on existing references. These methods, however, tend to be computationally intensive and are again challenged by sequencing errors as well as by genomic repeats. While numerous tools have been developed based on these methodological concepts, theymore » present confounding choices and training requirements to metagenomic investigators. To help with accessibility to assembly tools, this review also includes an IPython Notebook metagenomic assembly tutorial. This tutorial has instructions for execution any operating system using Amazon Elastic Cloud Compute and guides users through downloading, assembly, and mapping reads to contigs of a mock microbiome metagenome. Despite its challenges, metagenomic analysis has already revealed novel insights into many environments on Earth. As software, training, and data continue to emerge, metagenomic data access and its discoveries will to grow.« less

  5. MetaCAA: A clustering-aided methodology for efficient assembly of metagenomic datasets.

    PubMed

    Reddy, Rachamalla Maheedhar; Mohammed, Monzoorul Haque; Mande, Sharmila S

    2014-01-01

    A key challenge in analyzing metagenomics data pertains to assembly of sequenced DNA fragments (i.e. reads) originating from various microbes in a given environmental sample. Several existing methodologies can assemble reads originating from a single genome. However, these methodologies cannot be applied for efficient assembly of metagenomic sequence datasets. In this study, we present MetaCAA - a clustering-aided methodology which helps in improving the quality of metagenomic sequence assembly. MetaCAA initially groups sequences constituting a given metagenome into smaller clusters. Subsequently, sequences in each cluster are independently assembled using CAP3, an existing single genome assembly program. Contigs formed in each of the clusters along with the unassembled reads are then subjected to another round of assembly for generating the final set of contigs. Validation using simulated and real-world metagenomic datasets indicates that MetaCAA aids in improving the overall quality of assembly. A software implementation of MetaCAA is available at https://metagenomics.atc.tcs.com/MetaCAA. Copyright © 2014 Elsevier Inc. All rights reserved.

  6. Metagenomics as a Tool for Enzyme Discovery: Hydrolytic Enzymes from Marine-Related Metagenomes.

    PubMed

    Popovic, Ana; Tchigvintsev, Anatoly; Tran, Hai; Chernikova, Tatyana N; Golyshina, Olga V; Yakimov, Michail M; Golyshin, Peter N; Yakunin, Alexander F

    2015-01-01

    This chapter discusses metagenomics and its application for enzyme discovery, with a focus on hydrolytic enzymes from marine metagenomic libraries. With less than one percent of culturable microorganisms in the environment, metagenomics, or the collective study of community genetics, has opened up a rich pool of uncharacterized metabolic pathways, enzymes, and adaptations. This great untapped pool of genes provides the particularly exciting potential to mine for new biochemical activities or novel enzymes with activities tailored to peculiar sets of environmental conditions. Metagenomes also represent a huge reservoir of novel enzymes for applications in biocatalysis, biofuels, and bioremediation. Here we present the results of enzyme discovery for four enzyme activities, of particular industrial or environmental interest, including esterase/lipase, glycosyl hydrolase, protease and dehalogenase.

  7. Metagenomic Analysis of the Ferret Fecal Viral Flora

    PubMed Central

    Smits, Saskia L.; Raj, V. Stalin; Oduber, Minoushka D.; Schapendonk, Claudia M. E.; Bodewes, Rogier; Provacia, Lisette; Stittelaar, Koert J.; Osterhaus, Albert D. M. E.; Haagmans, Bart L.

    2013-01-01

    Ferrets are widely used as a small animal model for a number of viral infections, including influenza A virus and SARS coronavirus. To further analyze the microbiological status of ferrets, their fecal viral flora was studied using a metagenomics approach. Novel viruses from the families Picorna-, Papilloma-, and Anelloviridae as well as known viruses from the families Astro-, Corona-, Parvo-, and Hepeviridae were identified in different ferret cohorts. Ferret kobu- and hepatitis E virus were mainly present in human household ferrets, whereas coronaviruses were found both in household as well as farm ferrets. Our studies illuminate the viral diversity found in ferrets and provide tools to prescreen for newly identified viruses that potentially could influence disease outcome of experimental virus infections in ferrets. PMID:23977082

  8. An efficient algorithm for accurate computation of the Dirichlet-multinomial log-likelihood function.

    PubMed

    Yu, Peng; Shaw, Chad A

    2014-06-01

    The Dirichlet-multinomial (DMN) distribution is a fundamental model for multicategory count data with overdispersion. This distribution has many uses in bioinformatics including applications to metagenomics data, transctriptomics and alternative splicing. The DMN distribution reduces to the multinomial distribution when the overdispersion parameter ψ is 0. Unfortunately, numerical computation of the DMN log-likelihood function by conventional methods results in instability in the neighborhood of [Formula: see text]. An alternative formulation circumvents this instability, but it leads to long runtimes that make it impractical for large count data common in bioinformatics. We have developed a new method for computation of the DMN log-likelihood to solve the instability problem without incurring long runtimes. The new approach is composed of a novel formula and an algorithm to extend its applicability. Our numerical experiments show that this new method both improves the accuracy of log-likelihood evaluation and the runtime by several orders of magnitude, especially in high-count data situations that are common in deep sequencing data. Using real metagenomic data, our method achieves manyfold runtime improvement. Our method increases the feasibility of using the DMN distribution to model many high-throughput problems in bioinformatics. We have included in our work an R package giving access to this method and a vingette applying this approach to metagenomic data. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  9. Spherical: an iterative workflow for assembling metagenomic datasets.

    PubMed

    Hitch, Thomas C A; Creevey, Christopher J

    2018-01-24

    The consensus emerging from the study of microbiomes is that they are far more complex than previously thought, requiring better assemblies and increasingly deeper sequencing. However, current metagenomic assembly techniques regularly fail to incorporate all, or even the majority in some cases, of the sequence information generated for many microbiomes, negating this effort. This can especially bias the information gathered and the perceived importance of the minor taxa in a microbiome. We propose a simple but effective approach, implemented in Python, to address this problem. Based on an iterative methodology, our workflow (called Spherical) carries out successive rounds of assemblies with the sequencing reads not yet utilised. This approach also allows the user to reduce the resources required for very large datasets, by assembling random subsets of the whole in a "divide and conquer" manner. We demonstrate the accuracy of Spherical using simulated data based on completely sequenced genomes and the effectiveness of the workflow at retrieving lost information for taxa in three published metagenomics studies of varying sizes. Our results show that Spherical increased the amount of reads utilized in the assembly by up to 109% compared to the base assembly. The additional contigs assembled by the Spherical workflow resulted in a significant (P < 0.05) changes in the predicted taxonomic profile of all datasets analysed. Spherical is implemented in Python 2.7 and freely available for use under the MIT license. Source code and documentation is hosted publically at: https://github.com/thh32/Spherical .

  10. RubisCO Gene Clusters Found in a Metagenome Microarray from Acid Mine Drainage

    PubMed Central

    Guo, Xue; Yin, Huaqun; Cong, Jing; Dai, Zhimin; Liang, Yili

    2013-01-01

    The enzyme responsible for carbon dioxide fixation in the Calvin cycle, ribulose-1,5-bisphosphate carboxylase/oxygenase (RubisCO), is always detected as a phylogenetic marker to analyze the distribution and activity of autotrophic bacteria. However, such an approach provides no indication as to the significance of genomic content and organization. Horizontal transfers of RubisCO genes occurring in eubacteria and plastids may seriously affect the credibility of this approach. Here, we presented a new method to analyze the diversity and genomic content of RubisCO genes in acid mine drainage (AMD). A metagenome microarray containing 7,776 large-insertion fosmids was constructed to quickly screen genome fragments containing RubisCO form I large-subunit genes (cbbL). Forty-six cbbL-containing fosmids were detected, and six fosmids were fully sequenced. To evaluate the reliability of the metagenome microarray and understand the microbial community in AMD, the diversities of cbbL and the 16S rRNA gene were analyzed. Fosmid sequences revealed that the form I RubisCO gene cluster could be subdivided into form IA and IB RubisCO gene clusters in AMD, because of significant divergences in molecular phylogenetics and conservative genomic organization. Interestingly, the form I RubisCO gene cluster coexisted with the form II RubisCO gene cluster in one fosmid genomic fragment. Phylogenetic analyses revealed that horizontal transfers of RubisCO genes may occur widely in AMD, which makes the evolutionary history of RubisCO difficult to reconcile with organismal phylogeny. PMID:23335778

  11. Estimating Unbiased Land Cover Change Areas In The Colombian Amazon Using Landsat Time Series And Statistical Inference Methods

    NASA Astrophysics Data System (ADS)

    Arevalo, P. A.; Olofsson, P.; Woodcock, C. E.

    2017-12-01

    Unbiased estimation of the areas of conversion between land categories ("activity data") and their uncertainty is crucial for providing more robust calculations of carbon emissions to the atmosphere, as well as their removals. This is particularly important for the REDD+ mechanism of UNFCCC where an economic compensation is tied to the magnitude and direction of such fluxes. Dense time series of Landsat data and statistical protocols are becoming an integral part of forest monitoring efforts, but there are relatively few studies in the tropics focused on using these methods to advance operational MRV systems (Monitoring, Reporting and Verification). We present the results of a prototype methodology for continuous monitoring and unbiased estimation of activity data that is compliant with the IPCC Approach 3 for representation of land. We used a break detection algorithm (Continuous Change Detection and Classification, CCDC) to fit pixel-level temporal segments to time series of Landsat data in the Colombian Amazon. The segments were classified using a Random Forest classifier to obtain annual maps of land categories between 2001 and 2016. Using these maps, a biannual stratified sampling approach was implemented and unbiased stratified estimators constructed to calculate area estimates with confidence intervals for each of the stable and change classes. Our results provide evidence of a decrease in primary forest as a result of conversion to pastures, as well as increase in secondary forest as pastures are abandoned and the forest allowed to regenerate. Estimating areas of other land transitions proved challenging because of their very small mapped areas compared to stable classes like forest, which corresponds to almost 90% of the study area. Implications on remote sensing data processing, sample allocation and uncertainty reduction are also discussed.

  12. Rapid Detection of Trichodysplasia Spinulosa-Associated Polyomavirus in Skin Biopsy Specimen

    PubMed Central

    Urbano, Paulo Roberto P.; Pannuti, Cláudio Sérgio; Pierrotti, Ligia C.; David-Neto, Elias

    2014-01-01

    Trichodysplasia spinulosa-associated polyomavirus (TSV) is responsible for a rare skin cancer. Using metagenomic approaches, we determined the complete genome sequence of a TSV first detected in Brazil in spicules of an immunocompromised patient suspected to have trichodysplasia spinulosa. PMID:25059864

  13. DEVELOPMENT OF MICROBIAL METAGENOMIC MARKERS FOR ENVIRONMENTAL MONITORING AND RISK ASSESSMENT

    EPA Science Inventory

    The microbiological water quality standards established by EPA depend on culturing fecal indicator bacteria to predict the risks associated with water usage. For decades this has been the favored approach to microbiological monitoring in spite of the fact that culture-based meth...

  14. SARS-like WIV1-CoV poised for human emergence

    PubMed Central

    Menachery, Vineet D.; Yount, Boyd L.; Sims, Amy C.; Debbink, Kari; Agnihothram, Sudhakar S.; Gralinski, Lisa E.; Graham, Rachel L.; Scobey, Trevor; Plante, Jessica A.; Royal, Scott R.; Swanstrom, Jesica; Sheahan, Timothy P.; Pickles, Raymond J.; Corti, Davide; Randell, Scott H.; Lanzavecchia, Antonio; Marasco, Wayne A.; Baric, Ralph S.

    2016-01-01

    Outbreaks from zoonotic sources represent a threat to both human disease as well as the global economy. Despite a wealth of metagenomics studies, methods to leverage these datasets to identify future threats are underdeveloped. In this study, we describe an approach that combines existing metagenomics data with reverse genetics to engineer reagents to evaluate emergence and pathogenic potential of circulating zoonotic viruses. Focusing on the severe acute respiratory syndrome (SARS)-like viruses, the results indicate that the WIV1-coronavirus (CoV) cluster has the ability to directly infect and may undergo limited transmission in human populations. However, in vivo attenuation suggests additional adaptation is required for epidemic disease. Importantly, available SARS monoclonal antibodies offered success in limiting viral infection absent from available vaccine approaches. Together, the data highlight the utility of a platform to identify and prioritize prepandemic strains harbored in animal reservoirs and document the threat posed by WIV1-CoV for emergence in human populations. PMID:26976607

  15. MG-Digger: An Automated Pipeline to Search for Giant Virus-Related Sequences in Metagenomes

    PubMed Central

    Verneau, Jonathan; Levasseur, Anthony; Raoult, Didier; La Scola, Bernard; Colson, Philippe

    2016-01-01

    The number of metagenomic studies conducted each year is growing dramatically. Storage and analysis of such big data is difficult and time-consuming. Interestingly, analysis shows that environmental and human metagenomes include a significant amount of non-annotated sequences, representing a ‘dark matter.’ We established a bioinformatics pipeline that automatically detects metagenome reads matching query sequences from a given set and applied this tool to the detection of sequences matching large and giant DNA viral members of the proposed order Megavirales or virophages. A total of 1,045 environmental and human metagenomes (≈ 1 Terabase) were collected, processed, and stored on our bioinformatics server. In addition, nucleotide and protein sequences from 93 Megavirales representatives, including 19 giant viruses of amoeba, and 5 virophages, were collected. The pipeline was generated by scripts written in Python language and entitled MG-Digger. Metagenomes previously found to contain megavirus-like sequences were tested as controls. MG-Digger was able to annotate 100s of metagenome sequences as best matching those of giant viruses. These sequences were most often found to be similar to phycodnavirus or mimivirus sequences, but included reads related to recently available pandoraviruses, Pithovirus sibericum, and faustoviruses. Compared to other tools, MG-Digger combined stand-alone use on Linux or Windows operating systems through a user-friendly interface, implementation of ready-to-use customized metagenome databases and query sequence databases, adjustable parameters for BLAST searches, and creation of output files containing selected reads with best match identification. Compared to Metavir 2, a reference tool in viral metagenome analysis, MG-Digger detected 8% more true positive Megavirales-related reads in a control metagenome. The present work shows that massive, automated and recurrent analyses of metagenomes are effective in improving knowledge about the presence and prevalence of giant viruses in the environment and the human body. PMID:27065984

  16. Circulating tumor cell detection: A direct comparison between negative and unbiased enrichment in lung cancer.

    PubMed

    Xu, Yan; Liu, Biao; Ding, Fengan; Zhou, Xiaodie; Tu, Pin; Yu, Bo; He, Yan; Huang, Peilin

    2017-06-01

    Circulating tumor cells (CTCs), isolated as a 'liquid biopsy', may provide important diagnostic and prognostic information. Therefore, rapid, reliable and unbiased detection of CTCs are required for routine clinical analyses. It was demonstrated that negative enrichment, an epithelial marker-independent technique for isolating CTCs, exhibits a better efficiency in the detection of CTCs compared with positive enrichment techniques that only use specific anti-epithelial cell adhesion molecules. However, negative enrichment techniques incur significant cell loss during the isolation procedure, and as it is a method that uses only one type of antibody, it is inherently biased. The detection procedure and identification of cell types also relies on skilled and experienced technicians. In the present study, the detection sensitivity of using negative enrichment and a previously described unbiased detection method was compared. The results revealed that unbiased detection methods may efficiently detect >90% of cancer cells in blood samples containing CTCs. By contrast, only 40-60% of CTCs were detected by negative enrichment. Additionally, CTCs were identified in >65% of patients with stage I/II lung cancer. This simple yet efficient approach may achieve a high level of sensitivity. It demonstrates a potential for the large-scale clinical implementation of CTC-based diagnostic and prognostic strategies.

  17. Building unbiased estimators from non-gaussian likelihoods with application to shear estimation

    DOE PAGES

    Madhavacheril, Mathew S.; McDonald, Patrick; Sehgal, Neelima; ...

    2015-01-15

    We develop a general framework for generating estimators of a given quantity which are unbiased to a given order in the difference between the true value of the underlying quantity and the fiducial position in theory space around which we expand the likelihood. We apply this formalism to rederive the optimal quadratic estimator and show how the replacement of the second derivative matrix with the Fisher matrix is a generic way of creating an unbiased estimator (assuming choice of the fiducial model is independent of data). Next we apply the approach to estimation of shear lensing, closely following the workmore » of Bernstein and Armstrong (2014). Our first order estimator reduces to their estimator in the limit of zero shear, but it also naturally allows for the case of non-constant shear and the easy calculation of correlation functions or power spectra using standard methods. Both our first-order estimator and Bernstein and Armstrong’s estimator exhibit a bias which is quadratic in true shear. Our third-order estimator is, at least in the realm of the toy problem of Bernstein and Armstrong, unbiased to 0.1% in relative shear errors Δg/g for shears up to |g| = 0.2.« less

  18. Building unbiased estimators from non-Gaussian likelihoods with application to shear estimation

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Madhavacheril, Mathew S.; Sehgal, Neelima; McDonald, Patrick

    2015-01-01

    We develop a general framework for generating estimators of a given quantity which are unbiased to a given order in the difference between the true value of the underlying quantity and the fiducial position in theory space around which we expand the likelihood. We apply this formalism to rederive the optimal quadratic estimator and show how the replacement of the second derivative matrix with the Fisher matrix is a generic way of creating an unbiased estimator (assuming choice of the fiducial model is independent of data). Next we apply the approach to estimation of shear lensing, closely following the workmore » of Bernstein and Armstrong (2014). Our first order estimator reduces to their estimator in the limit of zero shear, but it also naturally allows for the case of non-constant shear and the easy calculation of correlation functions or power spectra using standard methods. Both our first-order estimator and Bernstein and Armstrong's estimator exhibit a bias which is quadratic in true shear. Our third-order estimator is, at least in the realm of the toy problem of Bernstein and Armstrong, unbiased to 0.1% in relative shear errors Δg/g for shears up to |g|=0.2.« less

  19. Unbiased reduced density matrices and electronic properties from full configuration interaction quantum Monte Carlo.

    PubMed

    Overy, Catherine; Booth, George H; Blunt, N S; Shepherd, James J; Cleland, Deidre; Alavi, Ali

    2014-12-28

    Properties that are necessarily formulated within pure (symmetric) expectation values are difficult to calculate for projector quantum Monte Carlo approaches, but are critical in order to compute many of the important observable properties of electronic systems. Here, we investigate an approach for the sampling of unbiased reduced density matrices within the full configuration interaction quantum Monte Carlo dynamic, which requires only small computational overheads. This is achieved via an independent replica population of walkers in the dynamic, sampled alongside the original population. The resulting reduced density matrices are free from systematic error (beyond those present via constraints on the dynamic itself) and can be used to compute a variety of expectation values and properties, with rapid convergence to an exact limit. A quasi-variational energy estimate derived from these density matrices is proposed as an accurate alternative to the projected estimator for multiconfigurational wavefunctions, while its variational property could potentially lend itself to accurate extrapolation approaches in larger systems.

  20. Unbiased reduced density matrices and electronic properties from full configuration interaction quantum Monte Carlo

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Overy, Catherine; Blunt, N. S.; Shepherd, James J.

    2014-12-28

    Properties that are necessarily formulated within pure (symmetric) expectation values are difficult to calculate for projector quantum Monte Carlo approaches, but are critical in order to compute many of the important observable properties of electronic systems. Here, we investigate an approach for the sampling of unbiased reduced density matrices within the full configuration interaction quantum Monte Carlo dynamic, which requires only small computational overheads. This is achieved via an independent replica population of walkers in the dynamic, sampled alongside the original population. The resulting reduced density matrices are free from systematic error (beyond those present via constraints on the dynamicmore » itself) and can be used to compute a variety of expectation values and properties, with rapid convergence to an exact limit. A quasi-variational energy estimate derived from these density matrices is proposed as an accurate alternative to the projected estimator for multiconfigurational wavefunctions, while its variational property could potentially lend itself to accurate extrapolation approaches in larger systems.« less

  1. New Approach for Investigating Reaction Dynamics and Rates with Ab Initio Calculations.

    PubMed

    Fleming, Kelly L; Tiwary, Pratyush; Pfaendtner, Jim

    2016-01-21

    Herein, we demonstrate a convenient approach to systematically investigate chemical reaction dynamics using the metadynamics (MetaD) family of enhanced sampling methods. Using a symmetric SN2 reaction as a model system, we applied infrequent metadynamics, a theoretical framework based on acceleration factors, to quantitatively estimate the rate of reaction from biased and unbiased simulations. A systematic study of the algorithm and its application to chemical reactions was performed by sampling over 5000 independent reaction events. Additionally, we quantitatively reweighed exhaustive free-energy calculations to obtain the reaction potential-energy surface and showed that infrequent metadynamics works to effectively determine Arrhenius-like activation energies. Exact agreement with unbiased high-temperature kinetics is also shown. The feasibility of using the approach on actual ab initio molecular dynamics calculations is then presented by using Car-Parrinello MD+MetaD to sample the same reaction using only 10-20 calculations of the rare event. Owing to the ease of use and comparatively low-cost of computation, the approach has extensive potential applications for catalysis, combustion, pyrolysis, and enzymology.

  2. Microbial Metagenomics: Beyond the Genome

    NASA Astrophysics Data System (ADS)

    Gilbert, Jack A.; Dupont, Christopher L.

    2011-01-01

    Metagenomics literally means “beyond the genome.” Marine microbial metagenomic databases presently comprise ˜400 billion base pairs of DNA, only ˜3% of that found in 1 ml of seawater. Very soon a trillion-base-pair sequence run will be feasible, so it is time to reflect on what we have learned from metagenomics. We review the impact of metagenomics on our understanding of marine microbial communities. We consider the studies facilitated by data generated through the Global Ocean Sampling expedition, as well as the revolution wrought at the individual laboratory level through next generation sequencing technologies. We review recent studies and discoveries since 2008, provide a discussion of bioinformatic analyses, including conceptual pipelines and sequence annotation and predict the future of metagenomics, with suggestions of collaborative community studies tailored toward answering some of the fundamental questions in marine microbial ecology.

  3. Tracking microbial colonization in fecal microbiota transplantation experiments via genome-resolved metagenomics

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lee, Sonny T. M.; Kahn, Stacy A.; Delmont, Tom O.

    Fecal microbiota transplantation (FMT) is an effective treatment for recurrent Clostridium difficile infection and shows promise for treating other medical conditions associated with intestinal dysbioses. However, we lack a sufficient understanding of which microbial populations successfully colonize the recipient gut, and the widely used approaches to study the microbial ecology of FMT experiments fail to provide enough resolution to identify populations that are likely responsible for FMT-derived benefits. Here, we used shotgun metagenomics together with assembly and binning strategies to reconstruct metagenome-assembled genomes (MAGs) from fecal samples of a single FMT donor. We then used metagenomic mapping to track themore » occurrence and distribution patterns of donor MAGs in two FMT recipients. Our analyses revealed that 22% of the 92 highly complete bacterial MAGs that we identified from the donor successfully colonized and remained abundant in two recipients for at least 8 weeks. Most MAGs with a high colonization rate belonged to the order Bacteroidales. The vast majority of those that lacked evidence of colonization belonged to the order Clostridiales, and colonization success was negatively correlated with the number of genes related to sporulation. Our analysis of 151 publicly available gut metagenomes showed that the donor MAGs that colonized both recipients were prevalent, and the ones that colonized neither were rare across the participants of the Human Microbiome Project. Although our dataset showed a link between taxonomy and the colonization ability of a given MAG, we also identified MAGs that belong to the same taxon with different colonization properties, highlighting the importance of an appropriate level of resolution to explore the functional basis of colonization and to identify targets for cultivation, hypothesis generation, and testing in model systems. Lastly, the analytical strategy adopted in our study can provide genomic insights into bacterial populations that may be critical to the efficacy of FMT due to their success in gut colonization and metabolic properties, and guide cultivation efforts to investigate mechanistic underpinnings of this procedure beyond associations.« less

  4. Comparative (Meta)genomic Analysis and Ecological Profiling of Human Gut-Specific Bacteriophage φB124-14

    PubMed Central

    Ogilvie, Lesley A.; Caplin, Jonathan; Dedi, Cinzia; Diston, David; Cheek, Elizabeth; Bowler, Lucas; Taylor, Huw; Ebdon, James; Jones, Brian V.

    2012-01-01

    Bacteriophage associated with the human gut microbiome are likely to have an important impact on community structure and function, and provide a wealth of biotechnological opportunities. Despite this, knowledge of the ecology and composition of bacteriophage in the gut bacterial community remains poor, with few well characterized gut-associated phage genomes currently available. Here we describe the identification and in-depth (meta)genomic, proteomic, and ecological analysis of a human gut-specific bacteriophage (designated φB124-14). In doing so we illuminate a fraction of the biological dark matter extant in this ecosystem and its surrounding eco-genomic landscape, identifying a novel and uncharted bacteriophage gene-space in this community. φB124-14 infects only a subset of closely related gut-associated Bacteroides fragilis strains, and the circular genome encodes functions previously found to be rare in viral genomes and human gut viral metagenome sequences, including those which potentially confer advantages upon phage and/or host bacteria. Comparative genomic analyses revealed φB124-14 is most closely related to φB40-8, the only other publically available Bacteroides sp. phage genome, whilst comparative metagenomic analysis of both phage failed to identify any homologous sequences in 136 non-human gut metagenomic datasets searched, supporting the human gut-specific nature of this phage. Moreover, a potential geographic variation in the carriage of these and related phage was revealed by analysis of their distribution and prevalence within 151 human gut microbiomes and viromes from Europe, America and Japan. Finally, ecological profiling of φB124-14 and φB40-8, using both gene-centric alignment-driven phylogenetic analyses, as well as alignment-free gene-independent approaches was undertaken. This not only verified the human gut-specific nature of both phage, but also indicated that these phage populate a distinct and unexplored ecological landscape within the human gut microbiome. PMID:22558115

  5. Computational prediction of CRISPR cassettes in gut metagenome samples from Chinese type-2 diabetic patients and healthy controls.

    PubMed

    Mangericao, Tatiana C; Peng, Zhanhao; Zhang, Xuegong

    2016-01-11

    CRISPR has been becoming a hot topic as a powerful technique for genome editing for human and other higher organisms. The original CRISPR-Cas (Clustered Regularly Interspaced Short Palindromic Repeats coupled with CRISPR-associated proteins) is an important adaptive defence system for prokaryotes that provides resistance against invading elements such as viruses and plasmids. A CRISPR cassette contains short nucleotide sequences called spacers. These unique regions retain a history of the interactions between prokaryotes and their invaders in individual strains and ecosystems. One important ecosystem in the human body is the human gut, a rich habitat populated by a great diversity of microorganisms. Gut microbiomes are important for human physiology and health. Metagenome sequencing has been widely applied for studying the gut microbiomes. Most efforts in metagenome study has been focused on profiling taxa compositions and gene catalogues and identifying their associations with human health. Less attention has been paid to the analysis of the ecosystems of microbiomes themselves especially their CRISPR composition. We conducted a preliminary analysis of CRISPR sequences in a human gut metagenomic data set of Chinese individuals of type-2 diabetes patients and healthy controls. Applying an available CRISPR-identification algorithm, PILER-CR, we identified 3169 CRISPR cassettes in the data, from which we constructed a set of 1302 unique repeat sequences and 36,709 spacers. A more extensive analysis was made for the CRISPR repeats: these repeats were submitted to a more comprehensive clustering and classification using the web server tool CRISPRmap. All repeats were compared with known CRISPRs in the database CRISPRdb. A total of 784 repeats had matches in the database, and the remaining 518 repeats from our set are potentially novel ones. The computational analysis of CRISPR composition based contigs of metagenome sequencing data is feasible. It provides an efficient approach for finding potential novel CRISPR arrays and for analysing the ecosystem and history of human microbiomes.

  6. Comparative metagenomics of microbial communities inhabiting deep-sea hydrothermal vent chimneys with contrasting chemistries

    PubMed Central

    Xie, Wei; Wang, Fengping; Guo, Lei; Chen, Zeling; Sievert, Stefan M; Meng, Jun; Huang, Guangrui; Li, Yuxin; Yan, Qingyu; Wu, Shan; Wang, Xin; Chen, Shangwu; He, Guangyuan; Xiao, Xiang; Xu, Anlong

    2011-01-01

    Deep-sea hydrothermal vent chimneys harbor a high diversity of largely unknown microorganisms. Although the phylogenetic diversity of these microorganisms has been described previously, the adaptation and metabolic potential of the microbial communities is only beginning to be revealed. A pyrosequencing approach was used to directly obtain sequences from a fosmid library constructed from a black smoker chimney 4143-1 in the Mothra hydrothermal vent field at the Juan de Fuca Ridge. A total of 308 034 reads with an average sequence length of 227 bp were generated. Comparative genomic analyses of metagenomes from a variety of environments by two-way clustering of samples and functional gene categories demonstrated that the 4143-1 metagenome clustered most closely with that from a carbonate chimney from Lost City. Both are highly enriched in genes for mismatch repair and homologous recombination, suggesting that the microbial communities have evolved extensive DNA repair systems to cope with the extreme conditions that have potential deleterious effects on the genomes. As previously reported for the Lost City microbiome, the metagenome of chimney 4143-1 exhibited a high proportion of transposases, implying that horizontal gene transfer may be a common occurrence in the deep-sea vent chimney biosphere. In addition, genes for chemotaxis and flagellar assembly were highly enriched in the chimney metagenomes, reflecting the adaptation of the organisms to the highly dynamic conditions present within the chimney walls. Reconstruction of the metabolic pathways revealed that the microbial community in the wall of chimney 4143-1 was mainly fueled by sulfur oxidation, putatively coupled to nitrate reduction to perform inorganic carbon fixation through the Calvin–Benson–Bassham cycle. On the basis of the genomic organization of the key genes of the carbon fixation and sulfur oxidation pathways contained in the large genomic fragments, both obligate and facultative autotrophs appear to be present and contribute to biomass production. PMID:20927138

  7. Tracking microbial colonization in fecal microbiota transplantation experiments via genome-resolved metagenomics

    DOE PAGES

    Lee, Sonny T. M.; Kahn, Stacy A.; Delmont, Tom O.; ...

    2017-05-04

    Fecal microbiota transplantation (FMT) is an effective treatment for recurrent Clostridium difficile infection and shows promise for treating other medical conditions associated with intestinal dysbioses. However, we lack a sufficient understanding of which microbial populations successfully colonize the recipient gut, and the widely used approaches to study the microbial ecology of FMT experiments fail to provide enough resolution to identify populations that are likely responsible for FMT-derived benefits. Here, we used shotgun metagenomics together with assembly and binning strategies to reconstruct metagenome-assembled genomes (MAGs) from fecal samples of a single FMT donor. We then used metagenomic mapping to track themore » occurrence and distribution patterns of donor MAGs in two FMT recipients. Our analyses revealed that 22% of the 92 highly complete bacterial MAGs that we identified from the donor successfully colonized and remained abundant in two recipients for at least 8 weeks. Most MAGs with a high colonization rate belonged to the order Bacteroidales. The vast majority of those that lacked evidence of colonization belonged to the order Clostridiales, and colonization success was negatively correlated with the number of genes related to sporulation. Our analysis of 151 publicly available gut metagenomes showed that the donor MAGs that colonized both recipients were prevalent, and the ones that colonized neither were rare across the participants of the Human Microbiome Project. Although our dataset showed a link between taxonomy and the colonization ability of a given MAG, we also identified MAGs that belong to the same taxon with different colonization properties, highlighting the importance of an appropriate level of resolution to explore the functional basis of colonization and to identify targets for cultivation, hypothesis generation, and testing in model systems. Lastly, the analytical strategy adopted in our study can provide genomic insights into bacterial populations that may be critical to the efficacy of FMT due to their success in gut colonization and metabolic properties, and guide cultivation efforts to investigate mechanistic underpinnings of this procedure beyond associations.« less

  8. MALINA: a web service for visual analytics of human gut microbiota whole-genome metagenomic reads.

    PubMed

    Tyakht, Alexander V; Popenko, Anna S; Belenikin, Maxim S; Altukhov, Ilya A; Pavlenko, Alexander V; Kostryukova, Elena S; Selezneva, Oksana V; Larin, Andrei K; Karpova, Irina Y; Alexeev, Dmitry G

    2012-12-07

    MALINA is a web service for bioinformatic analysis of whole-genome metagenomic data obtained from human gut microbiota sequencing. As input data, it accepts metagenomic reads of various sequencing technologies, including long reads (such as Sanger and 454 sequencing) and next-generation (including SOLiD and Illumina). It is the first metagenomic web service that is capable of processing SOLiD color-space reads, to authors' knowledge. The web service allows phylogenetic and functional profiling of metagenomic samples using coverage depth resulting from the alignment of the reads to the catalogue of reference sequences which are built into the pipeline and contain prevalent microbial genomes and genes of human gut microbiota. The obtained metagenomic composition vectors are processed by the statistical analysis and visualization module containing methods for clustering, dimension reduction and group comparison. Additionally, the MALINA database includes vectors of bacterial and functional composition for human gut microbiota samples from a large number of existing studies allowing their comparative analysis together with user samples, namely datasets from Russian Metagenome project, MetaHIT and Human Microbiome Project (downloaded from http://hmpdacc.org). MALINA is made freely available on the web at http://malina.metagenome.ru. The website is implemented in JavaScript (using Ext JS), Microsoft .NET Framework, MS SQL, Python, with all major browsers supported.

  9. Mixed model approaches for diallel analysis based on a bio-model.

    PubMed

    Zhu, J; Weir, B S

    1996-12-01

    A MINQUE(1) procedure, which is minimum norm quadratic unbiased estimation (MINQUE) method with 1 for all the prior values, is suggested for estimating variance and covariance components in a bio-model for diallel crosses. Unbiasedness and efficiency of estimation were compared for MINQUE(1), restricted maximum likelihood (REML) and MINQUE theta which has parameter values for the prior values. MINQUE(1) is almost as efficient as MINQUE theta for unbiased estimation of genetic variance and covariance components. The bio-model is efficient and robust for estimating variance and covariance components for maternal and paternal effects as well as for nuclear effects. A procedure of adjusted unbiased prediction (AUP) is proposed for predicting random genetic effects in the bio-model. The jack-knife procedure is suggested for estimation of sampling variances of estimated variance and covariance components and of predicted genetic effects. Worked examples are given for estimation of variance and covariance components and for prediction of genetic merits.

  10. Autocorrelation analysis for the unbiased determination of power-law exponents in single-quantum-dot blinking.

    PubMed

    Houel, Julien; Doan, Quang T; Cajgfinger, Thomas; Ledoux, Gilles; Amans, David; Aubret, Antoine; Dominjon, Agnès; Ferriol, Sylvain; Barbier, Rémi; Nasilowski, Michel; Lhuillier, Emmanuel; Dubertret, Benoît; Dujardin, Christophe; Kulzer, Florian

    2015-01-27

    We present an unbiased and robust analysis method for power-law blinking statistics in the photoluminescence of single nanoemitters, allowing us to extract both the bright- and dark-state power-law exponents from the emitters' intensity autocorrelation functions. As opposed to the widely used threshold method, our technique therefore does not require discriminating the emission levels of bright and dark states in the experimental intensity timetraces. We rely on the simultaneous recording of 450 emission timetraces of single CdSe/CdS core/shell quantum dots at a frame rate of 250 Hz with single photon sensitivity. Under these conditions, our approach can determine ON and OFF power-law exponents with a precision of 3% from a comparison to numerical simulations, even for shot-noise-dominated emission signals with an average intensity below 1 photon per frame and per quantum dot. These capabilities pave the way for the unbiased, threshold-free determination of blinking power-law exponents at the microsecond time scale.

  11. Metagenome assembly through clustering of next-generation sequencing data using protein sequences.

    PubMed

    Sim, Mikang; Kim, Jaebum

    2015-02-01

    The study of environmental microbial communities, called metagenomics, has gained a lot of attention because of the recent advances in next-generation sequencing (NGS) technologies. Microbes play a critical role in changing their environments, and the mode of their effect can be solved by investigating metagenomes. However, the difficulty of metagenomes, such as the combination of multiple microbes and different species abundance, makes metagenome assembly tasks more challenging. In this paper, we developed a new metagenome assembly method by utilizing protein sequences, in addition to the NGS read sequences. Our method (i) builds read clusters by using mapping information against available protein sequences, and (ii) creates contig sequences by finding consensus sequences through probabilistic choices from the read clusters. By using simulated NGS read sequences from real microbial genome sequences, we evaluated our method in comparison with four existing assembly programs. We found that our method could generate relatively long and accurate metagenome assemblies, indicating that the idea of using protein sequences, as a guide for the assembly, is promising. Copyright © 2015 Elsevier B.V. All rights reserved.

  12. Unbiased split variable selection for random survival forests using maximally selected rank statistics.

    PubMed

    Wright, Marvin N; Dankowski, Theresa; Ziegler, Andreas

    2017-04-15

    The most popular approach for analyzing survival data is the Cox regression model. The Cox model may, however, be misspecified, and its proportionality assumption may not always be fulfilled. An alternative approach for survival prediction is random forests for survival outcomes. The standard split criterion for random survival forests is the log-rank test statistic, which favors splitting variables with many possible split points. Conditional inference forests avoid this split variable selection bias. However, linear rank statistics are utilized by default in conditional inference forests to select the optimal splitting variable, which cannot detect non-linear effects in the independent variables. An alternative is to use maximally selected rank statistics for the split point selection. As in conditional inference forests, splitting variables are compared on the p-value scale. However, instead of the conditional Monte-Carlo approach used in conditional inference forests, p-value approximations are employed. We describe several p-value approximations and the implementation of the proposed random forest approach. A simulation study demonstrates that unbiased split variable selection is possible. However, there is a trade-off between unbiased split variable selection and runtime. In benchmark studies of prediction performance on simulated and real datasets, the new method performs better than random survival forests if informative dichotomous variables are combined with uninformative variables with more categories and better than conditional inference forests if non-linear covariate effects are included. In a runtime comparison, the method proves to be computationally faster than both alternatives, if a simple p-value approximation is used. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.

  13. Taxonomic and functional profiles of soil samples from Atlantic forest and Caatinga biomes in northeastern Brazil

    PubMed Central

    Pacchioni, Ralfo G; Carvalho, Fabíola M; Thompson, Claudia E; Faustino, André L F; Nicolini, Fernanda; Pereira, Tatiana S; Silva, Rita C B; Cantão, Mauricio E; Gerber, Alexandra; Vasconcelos, Ana T R; Agnez-Lima, Lucymara F

    2014-01-01

    Although microorganisms play crucial roles in ecosystems, metagenomic analyses of soil samples are quite scarce, especially in the Southern Hemisphere. In this work, the microbial diversity of soil samples from an Atlantic Forest and Caatinga was analyzed using a metagenomic approach. Proteobacteria and Actinobacteria were the dominant phyla in both samples. Among which, a significant proportion of stress-resistant bacteria associated to organic matter degradation was found. Sequences related to metabolism of amino acids, nitrogen, and DNA and stress resistance were more frequent in Caatinga soil, while the forest sample showed the highest occurrence of hits annotated in phosphorous metabolism, defense mechanisms, and aromatic compound degradation subsystems. The principal component analysis (PCA) showed that our samples are close to the desert metagenomes in relation to taxonomy, but are more similar to rhizosphere microbiota in relation to the functional profiles. The data indicate that soil characteristics affect the taxonomic and functional distribution; these characteristics include low nutrient content, high drainage (both are sandy soils), vegetation, and exposure to stress. In both samples, a rapid turnover of organic matter with low greenhouse gas emission was suggested by the functional profiles obtained, reinforcing the importance of preserving natural areas. PMID:24706600

  14. An Integrated Metagenomics/Metaproteomics Investigation of the Microbial Communities and Enzymes in Solid-state Fermentation of Pu-erh tea

    PubMed Central

    Zhao, Ming; Zhang, Dong-lian; Su, Xiao-qin; Duan, Shuang-mei; Wan, Jin-qiong; Yuan, Wen-xia; Liu, Ben-ying; Ma, Yan; Pan, Ying-hong

    2015-01-01

    Microbial enzymes during solid-state fermentation (SSF), which play important roles in the food, chemical, pharmaceutical and environmental fields, remain relatively unknown. In this work, the microbial communities and enzymes in SSF of Pu-erh tea, a well-known traditional Chinese tea, were investigated by integrated metagenomics/metaproteomics approach. The dominant bacteria and fungi were identified as Proteobacteria (48.42%) and Aspergillus (94.98%), through pyrosequencing-based analyses of the bacterial 16S and fungal 18S rRNA genes, respectively. In total, 335 proteins with at least two unique peptides were identified and classified into 28 Biological Processes and 35 Molecular Function categories using a metaproteomics analysis. The integration of metagenomics and metaproteomics data demonstrated that Aspergillus was dominant fungus and major host of identified proteins (50.45%). Enzymes involved in the degradation of the plant cell wall were identified and associated with the soft-rotting of tea leaves. Peroxiredoxins, catalase and peroxidases were associated with the oxidation of catechins. In conclusion, this work greatly advances our understanding of the SSF of Pu-erh tea and provides a powerful tool for studying SSF mechanisms, especially in relation to the microbial communities present. PMID:25974221

  15. PlasFlow: predicting plasmid sequences in metagenomic data using genome signatures

    PubMed Central

    Lipinski, Leszek; Dziembowski, Andrzej

    2018-01-01

    Abstract Plasmids are mobile genetics elements that play an important role in the environmental adaptation of microorganisms. Although plasmids are usually analyzed in cultured microorganisms, there is a need for methods that allow for the analysis of pools of plasmids (plasmidomes) in environmental samples. To that end, several molecular biology and bioinformatics methods have been developed; however, they are limited to environments with low diversity and cannot recover large plasmids. Here, we present PlasFlow, a novel tool based on genomic signatures that employs a neural network approach for identification of bacterial plasmid sequences in environmental samples. PlasFlow can recover plasmid sequences from assembled metagenomes without any prior knowledge of the taxonomical or functional composition of samples with an accuracy up to 96%. It can also recover sequences of both circular and linear plasmids and can perform initial taxonomical classification of sequences. Compared to other currently available tools, PlasFlow demonstrated significantly better performance on test datasets. Analysis of two samples from heavy metal-contaminated microbial mats revealed that plasmids may constitute an important fraction of their metagenomes and carry genes involved in heavy-metal homeostasis, proving the pivotal role of plasmids in microorganism adaptation to environmental conditions. PMID:29346586

  16. Characterisation of the canine faecal virome in healthy dogs and dogs with acute diarrhoea using shotgun metagenomics.

    PubMed

    Moreno, Paloma S; Wagner, Josef; Mansfield, Caroline S; Stevens, Matthew; Gilkerson, James R; Kirkwood, Carl D

    2017-01-01

    The virome has been increasingly investigated in numerous animal species and in different sites of the body, facilitating the identification and discovery of a variety of viruses. In spite of this, the faecal virome of healthy dogs has not been investigated. In this study we describe the faecal virome of healthy dogs and dogs with acute diarrhoea in Australia, using a shotgun metagenomic approach. Viral sequences from a range of different virus families, including both RNA and DNA families, and known pathogens implicated in enteric disease were documented. Twelve viral families were identified, of which four were bacteriophages. Eight eukaryotic viral families were detected: Astroviridae, Coronaviridae, Reoviridae, Picornaviridae, Caliciviridae, Parvoviridae, Adenoviridae and Papillomaviridae. Families Astroviridae, Picornaviridae and Caliciviridae were found only in dogs with acute diarrhoea, with Astroviridae being the most common family identified in this group. Due to its prevalence, characterisation the complete genome of a canine astrovirus was performed. These studies indicate that metagenomic analyses are useful for the investigation of viral populations in the faeces of dogs. Further studies to elucidate the epidemiological and biological relevance of these findings are warranted.

  17. Characterisation of the canine faecal virome in healthy dogs and dogs with acute diarrhoea using shotgun metagenomics

    PubMed Central

    Wagner, Josef; Mansfield, Caroline S.; Stevens, Matthew; Gilkerson, James R.; Kirkwood, Carl D.

    2017-01-01

    The virome has been increasingly investigated in numerous animal species and in different sites of the body, facilitating the identification and discovery of a variety of viruses. In spite of this, the faecal virome of healthy dogs has not been investigated. In this study we describe the faecal virome of healthy dogs and dogs with acute diarrhoea in Australia, using a shotgun metagenomic approach. Viral sequences from a range of different virus families, including both RNA and DNA families, and known pathogens implicated in enteric disease were documented. Twelve viral families were identified, of which four were bacteriophages. Eight eukaryotic viral families were detected: Astroviridae, Coronaviridae, Reoviridae, Picornaviridae, Caliciviridae, Parvoviridae, Adenoviridae and Papillomaviridae. Families Astroviridae, Picornaviridae and Caliciviridae were found only in dogs with acute diarrhoea, with Astroviridae being the most common family identified in this group. Due to its prevalence, characterisation the complete genome of a canine astrovirus was performed. These studies indicate that metagenomic analyses are useful for the investigation of viral populations in the faeces of dogs. Further studies to elucidate the epidemiological and biological relevance of these findings are warranted. PMID:28570584

  18. Viral Metagenomics on Blood-Feeding Arthropods as a Tool for Human Disease Surveillance

    PubMed Central

    Brinkmann, Annika; Nitsche, Andreas; Kohl, Claudia

    2016-01-01

    Surveillance and monitoring of viral pathogens circulating in humans and wildlife, together with the identification of emerging infectious diseases (EIDs), are critical for the prediction of future disease outbreaks and epidemics at an early stage. It is advisable to sample a broad range of vertebrates and invertebrates at different temporospatial levels on a regular basis to detect possible candidate viruses at their natural source. However, virus surveillance systems can be expensive, costly in terms of finances and resources and inadequate for sampling sufficient numbers of different host species over space and time. Recent publications have presented the concept of a new virus surveillance system, coining the terms “flying biological syringes”, “xenosurveillance” and “vector-enabled metagenomics”. According to these novel and promising surveillance approaches, viral metagenomics on engorged mosquitoes might reflect the viral diversity of numerous mammals, birds and humans, combined in the mosquitoes’ blood meal during feeding on the host. In this review article, we summarize the literature on vector-enabled metagenomics (VEM) techniques and its application in disease surveillance in humans. Furthermore, we highlight the combination of VEM and “invertebrate-derived DNA” (iDNA) analysis to identify the host DNA within the mosquito midgut. PMID:27775568

  19. A Metagenomic Advance for the Cloning and Characterization of a Cellulase from Red Rice Crop Residues.

    PubMed

    Meneses, Carlos; Silva, Bruna; Medeiros, Betsy; Serrato, Rodrigo; Johnston-Monje, David

    2016-06-25

    Many naturally-occurring cellulolytic microorganisms are not readily cultivable, demanding a culture-independent approach in order to study their cellulolytic genes. Metagenomics involves the isolation of DNA from environmental sources and can be used to identify enzymes with biotechnological potential from uncultured microbes. In this study, a gene encoding an endoglucanase was cloned from red rice crop residues using a metagenomic strategy. The amino acid identity between this gene and its closest published counterparts is lower than 70%. The endoglucanase was named EglaRR01 and was biochemically characterized. This recombinant protein showed activity on carboxymethylcellulose, indicating that EglaRR01 is an endoactive lytic enzyme. The enzymatic activity was optimal at a pH of 6.8 and at a temperature of 30 °C. Ethanol production from this recombinant enzyme was also analyzed on EglaRR01 crop residues, and resulted in conversion of cellulose from red rice into simple sugars which were further fermented by Saccharomyces cerevisiae to produce ethanol after seven days. Ethanol yield in this study was approximately 8 g/L. The gene found herein shows strong potential for use in ethanol production from cellulosic biomass (second generation ethanol).

  20. Metagenomics reveals flavour metabolic network of cereal vinegar microbiota.

    PubMed

    Wu, Lin-Huan; Lu, Zhen-Ming; Zhang, Xiao-Juan; Wang, Zong-Min; Yu, Yong-Jian; Shi, Jin-Song; Xu, Zheng-Hong

    2017-04-01

    Multispecies microbial community formed through centuries of repeated batch acetic acid fermentation (AAF) is crucial for the flavour quality of traditional vinegar produced from cereals. However, the metabolism to generate and/or formulate the essential flavours by the multispecies microbial community is hardly understood. Here we used metagenomic approach to clarify in situ metabolic network of key microbes responsible for flavour synthesis of a typical cereal vinegar, Zhenjiang aromatic vinegar, produced by solid-state fermentation. First, we identified 3 organic acids, 7 amino acids, and 20 volatiles as dominant vinegar metabolites. Second, we revealed taxonomic and functional composition of the microbiota by metagenomic shotgun sequencing. A total of 86 201 predicted protein-coding genes from 35 phyla (951 genera) were involved in Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways of Metabolism (42.3%), Genetic Information Processing (28.3%), and Environmental Information Processing (10.1%). Furthermore, a metabolic network for substrate breakdown and dominant flavour formation in vinegar microbiota was constructed, and microbial distribution discrepancy in different metabolic pathways was charted. This study helps elucidating different metabolic roles of microbes during flavour formation in vinegar microbiota. Copyright © 2016 Elsevier Ltd. All rights reserved.

  1. Genomic and metagenomic technologies to explore the antibiotic resistance mobilome.

    PubMed

    Martínez, José L; Coque, Teresa M; Lanza, Val F; de la Cruz, Fernando; Baquero, Fernando

    2017-01-01

    Antibiotic resistance is a relevant problem for human health that requires global approaches to establish a deep understanding of the processes of acquisition, stabilization, and spread of resistance among human bacterial pathogens. Since natural (nonclinical) ecosystems are reservoirs of resistance genes, a health-integrated study of the epidemiology of antibiotic resistance requires the exploration of such ecosystems with the aim of determining the role they may play in the selection, evolution, and spread of antibiotic resistance genes, involving the so-called resistance mobilome. High-throughput sequencing techniques allow an unprecedented opportunity to describe the genetic composition of a given microbiome without the need to subculture the organisms present inside. However, bioinformatic methods for analyzing this bulk of data, mainly with respect to binning each resistance gene with the organism hosting it, are still in their infancy. Here, we discuss how current genomic methodologies can serve to analyze the resistance mobilome and its linkage with different bacterial genomes and metagenomes. In addition, we describe the drawbacks of current methodologies for analyzing the resistance mobilome, mainly in cases of complex microbiotas, and discuss the possibility of implementing novel tools to improve our current metagenomic toolbox. © 2016 New York Academy of Sciences.

  2. Comparative Viral Metagenomics of Environmental Samples from Korea

    PubMed Central

    Kim, Min-Soo; Whon, Tae Woong

    2013-01-01

    The introduction of metagenomics into the field of virology has facilitated the exploration of viral communities in various natural habitats. Understanding the viral ecology of a variety of sample types throughout the biosphere is important per se, but it also has potential applications in clinical and diagnostic virology. However, the procedures used by viral metagenomics may produce technical errors, such as amplification bias, while public viral databases are very limited, which may hamper the determination of the viral diversity in samples. This review considers the current state of viral metagenomics, based on examples from Korean viral metagenomic studies-i.e., rice paddy soil, fermented foods, human gut, seawater, and the near-surface atmosphere. Viral metagenomics has become widespread due to various methodological developments, and much attention has been focused on studies that consider the intrinsic role of viruses that interact with their hosts. PMID:24124407

  3. Metagenomic applications in environmental monitoring and bioremediation.

    PubMed

    Techtmann, Stephen M; Hazen, Terry C

    2016-10-01

    With the rapid advances in sequencing technology, the cost of sequencing has dramatically dropped and the scale of sequencing projects has increased accordingly. This has provided the opportunity for the routine use of sequencing techniques in the monitoring of environmental microbes. While metagenomic applications have been routinely applied to better understand the ecology and diversity of microbes, their use in environmental monitoring and bioremediation is increasingly common. In this review we seek to provide an overview of some of the metagenomic techniques used in environmental systems biology, addressing their application and limitation. We will also provide several recent examples of the application of metagenomics to bioremediation. We discuss examples where microbial communities have been used to predict the presence and extent of contamination, examples of how metagenomics can be used to characterize the process of natural attenuation by unculturable microbes, as well as examples detailing the use of metagenomics to understand the impact of biostimulation on microbial communities.

  4. EBI metagenomics in 2016 - an expanding and evolving resource for the analysis and archiving of metagenomic data

    PubMed Central

    Mitchell, Alex; Bucchini, Francois; Cochrane, Guy; Denise, Hubert; Hoopen, Petra ten; Fraser, Matthew; Pesseat, Sebastien; Potter, Simon; Scheremetjew, Maxim; Sterk, Peter; Finn, Robert D.

    2016-01-01

    EBI metagenomics (https://www.ebi.ac.uk/metagenomics/) is a freely available hub for the analysis and archiving of metagenomic and metatranscriptomic data. Over the last 2 years, the resource has undergone rapid growth, with an increase of over five-fold in the number of processed samples and consequently represents one of the largest resources of analysed shotgun metagenomes. Here, we report the status of the resource in 2016 and give an overview of new developments. In particular, we describe updates to data content, a complete overhaul of the analysis pipeline, streamlining of data presentation via the website and the development of a new web based tool to compare functional analyses of sequence runs within a study. We also highlight two of the higher profile projects that have been analysed using the resource in the last year: the oceanographic projects Ocean Sampling Day and Tara Oceans. PMID:26582919

  5. Insights into Diversity and Imputed Metabolic Potential of Bacterial Communities in the Continental Shelf of Agatti Island

    PubMed Central

    Dhar, Sunil Kumar; Jani, Kunal; Apte, Deepak A.; Shouche, Yogesh S.; Sharma, Avinash

    2015-01-01

    Marine microbes play a key role and contribute largely to the global biogeochemical cycles. This study aims to explore microbial diversity from one such ecological hotspot, the continental shelf of Agatti Island. Sediment samples from various depths of the continental shelf were analyzed for bacterial diversity using deep sequencing technology along with the culturable approach. Additionally, imputed metagenomic approach was carried out to understand the functional aspects of microbial community especially for microbial genes important in nutrient uptake, survival and biogeochemical cycling in the marine environment. Using culturable approach, 28 bacterial strains representing 9 genera were isolated from various depths of continental shelf. The microbial community structure throughout the samples was dominated by phylum Proteobacteria and harbored various bacterioplanktons as well. Significant differences were observed in bacterial diversity within a short region of the continental shelf (1–40 meters) i.e. between upper continental shelf samples (UCS) with lesser depths (i.e. 1–20 meters) and lower continental shelf samples (LCS) with greater depths (i.e. 25–40 meters). By using imputed metagenomic approach, this study also discusses several adaptive mechanisms which enable microbes to survive in nutritionally deprived conditions, and also help to understand the influence of nutrition availability on bacterial diversity. PMID:26066038

  6. Insights into Diversity and Imputed Metabolic Potential of Bacterial Communities in the Continental Shelf of Agatti Island.

    PubMed

    Kumbhare, Shreyas V; Dhotre, Dhiraj P; Dhar, Sunil Kumar; Jani, Kunal; Apte, Deepak A; Shouche, Yogesh S; Sharma, Avinash

    2015-01-01

    Marine microbes play a key role and contribute largely to the global biogeochemical cycles. This study aims to explore microbial diversity from one such ecological hotspot, the continental shelf of Agatti Island. Sediment samples from various depths of the continental shelf were analyzed for bacterial diversity using deep sequencing technology along with the culturable approach. Additionally, imputed metagenomic approach was carried out to understand the functional aspects of microbial community especially for microbial genes important in nutrient uptake, survival and biogeochemical cycling in the marine environment. Using culturable approach, 28 bacterial strains representing 9 genera were isolated from various depths of continental shelf. The microbial community structure throughout the samples was dominated by phylum Proteobacteria and harbored various bacterioplanktons as well. Significant differences were observed in bacterial diversity within a short region of the continental shelf (1-40 meters) i.e. between upper continental shelf samples (UCS) with lesser depths (i.e. 1-20 meters) and lower continental shelf samples (LCS) with greater depths (i.e. 25-40 meters). By using imputed metagenomic approach, this study also discusses several adaptive mechanisms which enable microbes to survive in nutritionally deprived conditions, and also help to understand the influence of nutrition availability on bacterial diversity.

  7. Current strategies for mobilome research

    PubMed Central

    Jørgensen, Tue S.; Kiil, Anne S.; Hansen, Martin A.; Sørensen, Søren J.; Hansen, Lars H.

    2015-01-01

    Mobile genetic elements (MGEs) are pivotal for bacterial evolution and adaptation, allowing shuffling of genes even between distantly related bacterial species. The study of these elements is biologically interesting as the mode of genetic propagation is kaleidoscopic and important, as MGEs are the main vehicles of the increasing bacterial antibiotic resistance that causes thousands of human deaths each year. The study of MGEs has previously focused on plasmids from individual isolates, but the revolution in sequencing technology has allowed the study of mobile genomic elements of entire communities using metagenomic approaches. The problem in using metagenomic sequencing for the study of MGEs is that plasmids and other mobile elements only comprise a small fraction of the total genetic content that are difficult to separate from chromosomal DNA based on sequence alone. The distinction between plasmid and chromosome is important as the mobility and regulation of genes largely depend on their genetic context. Several different approaches have been proposed that specifically enrich plasmid DNA from community samples. Here, we review recent approaches used to study entire plasmid pools from complex environments, and point out possible future developments for and pitfalls of these approaches. Further, we discuss the use of the PacBio long-read sequencing technology for MGE discovery. PMID:25657641

  8. Best practices for traffic impact studies : final report.

    DOT National Transportation Integrated Search

    2006-06-01

    For many years there have been concerns that some traffic engineers may approach traffic impact studies with an eye : toward assisting developers expedite their development approval rather than delivering an unbiased evaluation of the : impact of the...

  9. Functional annotation of chemical libraries across diverse biological processes.

    PubMed

    Piotrowski, Jeff S; Li, Sheena C; Deshpande, Raamesh; Simpkins, Scott W; Nelson, Justin; Yashiroda, Yoko; Barber, Jacqueline M; Safizadeh, Hamid; Wilson, Erin; Okada, Hiroki; Gebre, Abraham A; Kubo, Karen; Torres, Nikko P; LeBlanc, Marissa A; Andrusiak, Kerry; Okamoto, Reika; Yoshimura, Mami; DeRango-Adem, Eva; van Leeuwen, Jolanda; Shirahige, Katsuhiko; Baryshnikova, Anastasia; Brown, Grant W; Hirano, Hiroyuki; Costanzo, Michael; Andrews, Brenda; Ohya, Yoshikazu; Osada, Hiroyuki; Yoshida, Minoru; Myers, Chad L; Boone, Charles

    2017-09-01

    Chemical-genetic approaches offer the potential for unbiased functional annotation of chemical libraries. Mutations can alter the response of cells in the presence of a compound, revealing chemical-genetic interactions that can elucidate a compound's mode of action. We developed a highly parallel, unbiased yeast chemical-genetic screening system involving three key components. First, in a drug-sensitive genetic background, we constructed an optimized diagnostic mutant collection that is predictive for all major yeast biological processes. Second, we implemented a multiplexed (768-plex) barcode-sequencing protocol, enabling the assembly of thousands of chemical-genetic profiles. Finally, based on comparison of the chemical-genetic profiles with a compendium of genome-wide genetic interaction profiles, we predicted compound functionality. Applying this high-throughput approach, we screened seven different compound libraries and annotated their functional diversity. We further validated biological process predictions, prioritized a diverse set of compounds, and identified compounds that appear to have dual modes of action.

  10. Metagenomic systems biology of the human gut microbiome reveals topological shifts associated with obesity and inflammatory bowel disease.

    PubMed

    Greenblum, Sharon; Turnbaugh, Peter J; Borenstein, Elhanan

    2012-01-10

    The human microbiome plays a key role in a wide range of host-related processes and has a profound effect on human health. Comparative analyses of the human microbiome have revealed substantial variation in species and gene composition associated with a variety of disease states but may fall short of providing a comprehensive understanding of the impact of this variation on the community and on the host. Here, we introduce a metagenomic systems biology computational framework, integrating metagenomic data with an in silico systems-level analysis of metabolic networks. Focusing on the gut microbiome, we analyze fecal metagenomic data from 124 unrelated individuals, as well as six monozygotic twin pairs and their mothers, and generate community-level metabolic networks of the microbiome. Placing variations in gene abundance in the context of these networks, we identify both gene-level and network-level topological differences associated with obesity and inflammatory bowel disease (IBD). We show that genes associated with either of these host states tend to be located at the periphery of the metabolic network and are enriched for topologically derived metabolic "inputs." These findings may indicate that lean and obese microbiomes differ primarily in their interface with the host and in the way they interact with host metabolism. We further demonstrate that obese microbiomes are less modular, a hallmark of adaptation to low-diversity environments. We additionally link these topological variations to community species composition. The system-level approach presented here lays the foundation for a unique framework for studying the human microbiome, its organization, and its impact on human health.

  11. An Artificial Functional Family Filter in Homolog Searching in Next-generation Sequencing Metagenomics

    PubMed Central

    Du, Ruofei; Mercante, Donald; Fang, Zhide

    2013-01-01

    In functional metagenomics, BLAST homology search is a common method to classify metagenomic reads into protein/domain sequence families such as Clusters of Orthologous Groups of proteins (COGs) in order to quantify the abundance of each COG in the community. The resulting functional profile of the community is then used in downstream analysis to correlate the change in abundance to environmental perturbation, clinical variation, and so on. However, the short read length coupled with next-generation sequencing technologies poses a barrier in this approach, essentially because similarity significance cannot be discerned by searching with short reads. Consequently, artificial functional families are produced, in which those with a large number of reads assigned decreases the accuracy of functional profile dramatically. There is no method available to address this problem. We intended to fill this gap in this paper. We revealed that BLAST similarity scores of homologues for short reads from COG protein members coding sequences are distributed differently from the scores of those derived elsewhere. We showed that, by choosing an appropriate score cut-off, we are able to filter out most artificial families and simultaneously to preserve sufficient information in order to build the functional profile. We also showed that, by incorporated application of BLAST and RPS-BLAST, some artificial families with large read counts can be further identified after the score cutoff filtration. Evaluated on three experimental metagenomic datasets with different coverages, we found that the proposed method is robust against read coverage and consistently outperforms the other E-value cutoff methods currently used in literatures. PMID:23516532

  12. Exploring the Impacts of Anthropogenic Disturbance on Seawater and Sediment Microbial Communities in Korean Coastal Waters Using Metagenomics Analysis

    PubMed Central

    Won, Nam-Il; Kim, Ki-Hwan; Kang, Ji Hyoun; Park, Sang Rul; Lee, Hyuk Je

    2017-01-01

    The coastal ecosystems are considered as one of the most dynamic and vulnerable environments under various anthropogenic developments and the effects of climate change. Variations in the composition and diversity of microbial communities may be a good indicator for determining whether the marine ecosystems are affected by complex forcing stressors. DNA sequence-based metagenomics has recently emerged as a promising tool for analyzing the structure and diversity of microbial communities based on environmental DNA (eDNA). However, few studies have so far been performed using this approach to assess the impacts of human activities on the microbial communities in marine systems. In this study, using metagenomic DNA sequencing (16S ribosomal RNA gene), we analyzed and compared seawater and sediment communities between sand mining and control (natural) sites in southern coastal waters of Korea to assess whether anthropogenic activities have significantly affected the microbial communities. The sand mining sites harbored considerably lower levels of microbial diversities in the surface seawater community during spring compared with control sites. Moreover, the sand mining areas had distinct microbial taxonomic group compositions, particularly during spring season. The microbial groups detected solely in the sediment load/dredging areas (e.g., Marinobacter, Alcanivorax, Novosphingobium) are known to be involved in degradation of toxic chemicals such as hydrocarbon, oil, and aromatic compounds, and they also contain potential pathogens. This study highlights the versatility of metagenomics in monitoring and diagnosing the impacts of human disturbance on the environmental health of marine ecosystems from eDNA. PMID:28134828

  13. Sparse distance-based learning for simultaneous multiclass classification and feature selection of metagenomic data.

    PubMed

    Liu, Zhenqiu; Hsiao, William; Cantarel, Brandi L; Drábek, Elliott Franco; Fraser-Liggett, Claire

    2011-12-01

    Direct sequencing of microbes in human ecosystems (the human microbiome) has complemented single genome cultivation and sequencing to understand and explore the impact of commensal microbes on human health. As sequencing technologies improve and costs decline, the sophistication of data has outgrown available computational methods. While several existing machine learning methods have been adapted for analyzing microbiome data recently, there is not yet an efficient and dedicated algorithm available for multiclass classification of human microbiota. By combining instance-based and model-based learning, we propose a novel sparse distance-based learning method for simultaneous class prediction and feature (variable or taxa, which is used interchangeably) selection from multiple treatment populations on the basis of 16S rRNA sequence count data. Our proposed method simultaneously minimizes the intraclass distance and maximizes the interclass distance with many fewer estimated parameters than other methods. It is very efficient for problems with small sample sizes and unbalanced classes, which are common in metagenomic studies. We implemented this method in a MATLAB toolbox called MetaDistance. We also propose several approaches for data normalization and variance stabilization transformation in MetaDistance. We validate this method on several real and simulated 16S rRNA datasets to show that it outperforms existing methods for classifying metagenomic data. This article is the first to address simultaneous multifeature selection and class prediction with metagenomic count data. The MATLAB toolbox is freely available online at http://metadistance.igs.umaryland.edu/. zliu@umm.edu Supplementary data are available at Bioinformatics online.

  14. Metagenomic Insights into the RDX-Degrading Potential of the Ovine Rumen Microbiome

    PubMed Central

    Li, Robert W.; Giarrizzo, Juan Gabriel; Wu, Sitao; Li, Weizhong; Duringer, Jennifer M.; Craig, A. Morrie

    2014-01-01

    The manufacturing processes of royal demolition explosive (RDX), or hexahydro-1,3,5-trinitro-1,3,5-triazine, have resulted in serious water contamination. As a potential carcinogen, RDX can cause a broad range of harmful effects to humans and animals. The ovine rumen is capable of rapid degradation of nitroaromatic compounds, including RDX. While ruminal RDX-degrading bacteria have been identified, the genes and pathways responsible for RDX degradation in the rumen have yet to be characterized. In this study, we characterized the metabolic potential of the ovine rumen using metagenomic approaches. Sequences homologous to at least five RDX-degrading genes cloned from environmental samples (diaA, xenA, xenB, xplA, and xplB) were present in the ovine rumen microbiome. Among them, diaA was the most abundant, likely reflective of the predominance of the genus Clostridium in the ovine rumen. At least ten genera known to harbor RDX-degrading microorganisms were detectable. Metagenomic sequences were also annotated using public databases, such as Pfam, COG, and KEGG. Five of the six Pfam protein families known to be responsible for RDX degradation in environmental samples were identified in the ovine rumen. However, increased substrate availability did not appear to enhance the proliferation of RDX-degrading bacteria and alter the microbial composition of the ovine rumen. This implies that the RDX-degrading capacity of the ovine rumen microbiome is likely regulated at the transcription level. Our results provide metagenomic insights into the RDX-degrading potential of the ovine rumen, and they will facilitate the development of novel and economic bioremediation strategies. PMID:25383623

  15. Cloning, Expression and Characteristics of a Novel Alkalistable and Thermostable Xylanase Encoding Gene (Mxyl) Retrieved from Compost-Soil Metagenome

    PubMed Central

    Verma, Digvijay; Kawarabayasi, Yutaka; Miyazaki, Kentaro; Satyanarayana, Tulasi

    2013-01-01

    Background The alkalistable and thermostable xylanases are in high demand for pulp bleaching in paper industry and generating xylooligosaccharides by hydrolyzing xylan component of agro-residues. The compost-soil samples, one of the hot environments, are expected to be a rich source of microbes with thermostable enzymes. Methodology/Principal Findings Metagenomic DNA from hot environmental samples could be a rich source of novel biocatalysts. While screening metagenomic library constructed from DNA extracted from the compost-soil in the p18GFP vector, a clone (TSDV-MX1) was detected that exhibited clear zone of xylan hydrolysis on RBB xylan plate. The sequencing of 6.321 kb DNA insert and its BLAST analysis detected the presence of xylanase gene that comprised 1077 bp. The deduced protein sequence (358 amino acids) displayed homology with glycosyl hydrolase (GH) family 11 xylanases. The gene was subcloned into pET28a vector and expressed in E. coli BL21 (DE3). The recombinant xylanase (rMxyl) exhibited activity over a broad range of pH and temperature with optima at pH 9.0 and 80°C. The recombinant xylanase is highly thermostable having T1/2 of 2 h at 80°C and 15 min at 90°C. Conclusion/Significance This is the first report on the retrieval of xylanase gene through metagenomic approach that encodes an enzyme with alkalistability and thermostability. The recombinant xylanase has a potential application in paper and pulp industry in pulp bleaching and generating xylooligosaccharides from the abundantly available agro-residues. PMID:23382818

  16. Metagenomic Insights into the Fibrolytic Microbiome in Yak Rumen

    PubMed Central

    Song, Lei; Liu, Di; Liu, Li; Chen, Furong; Wang, Min; Li, Jiabao; Zeng, Xiaowei; Dong, Zhiyang; Hu, Songnian; Li, Lingyan; Xu, Jian; Huang, Li; Dong, Xiuzhu

    2012-01-01

    The rumen hosts one of the most efficient microbial systems for degrading plant cell walls, yet the predominant cellulolytic proteins and fibrolytic mechanism(s) remain elusive. Here we investigated the cellulolytic microbiome of the yak rumen by using a combination of metagenome-based and bacterial artificial chromosome (BAC)-based functional screening approaches. Totally 223 fibrolytic BAC clones were pyrosequenced and 10,070 ORFs were identified. Among them 150 were annotated as the glycoside hydrolase (GH) genes for fibrolytic proteins, and the majority (69%) of them were clustered or linked with genes encoding related functions. Among the 35 fibrolytic contigs of >10 Kb in length, 25 were derived from Bacteroidetes and four from Firmicutes. Coverage analysis indicated that the fibrolytic genes on most Bacteroidetes-contigs were abundantly represented in the metagenomic sequences, and they were frequently linked with genes encoding SusC/SusD-type outer-membrane proteins. GH5, GH9, and GH10 cellulase/hemicellulase genes were predominant, but no GH48 exocellulase gene was found. Most (85%) of the cellulase and hemicellulase proteins possessed a signal peptide; only a few carried carbohydrate-binding modules, and no cellulosomal domains were detected. These findings suggest that the SucC/SucD-involving mechanism, instead of one based on cellulosomes or the free-enzyme system, serves a major role in lignocellulose degradation in yak rumen. Genes encoding an endoglucanase of a novel GH5 subfamily occurred frequently in the metagenome, and the recombinant proteins encoded by the genes displayed moderate Avicelase in addition to endoglucanase activities, suggesting their important contribution to lignocellulose degradation in the exocellulase-scarce rumen. PMID:22808161

  17. CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes

    PubMed Central

    Parks, Donovan H.; Imelfort, Michael; Skennerton, Connor T.; Hugenholtz, Philip; Tyson, Gene W.

    2015-01-01

    Large-scale recovery of genomes from isolates, single cells, and metagenomic data has been made possible by advances in computational methods and substantial reductions in sequencing costs. Although this increasing breadth of draft genomes is providing key information regarding the evolutionary and functional diversity of microbial life, it has become impractical to finish all available reference genomes. Making robust biological inferences from draft genomes requires accurate estimates of their completeness and contamination. Current methods for assessing genome quality are ad hoc and generally make use of a limited number of “marker” genes conserved across all bacterial or archaeal genomes. Here we introduce CheckM, an automated method for assessing the quality of a genome using a broader set of marker genes specific to the position of a genome within a reference genome tree and information about the collocation of these genes. We demonstrate the effectiveness of CheckM using synthetic data and a wide range of isolate-, single-cell-, and metagenome-derived genomes. CheckM is shown to provide accurate estimates of genome completeness and contamination and to outperform existing approaches. Using CheckM, we identify a diverse range of errors currently impacting publicly available isolate genomes and demonstrate that genomes obtained from single cells and metagenomic data vary substantially in quality. In order to facilitate the use of draft genomes, we propose an objective measure of genome quality that can be used to select genomes suitable for specific gene- and genome-centric analyses of microbial communities. PMID:25977477

  18. CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes.

    PubMed

    Parks, Donovan H; Imelfort, Michael; Skennerton, Connor T; Hugenholtz, Philip; Tyson, Gene W

    2015-07-01

    Large-scale recovery of genomes from isolates, single cells, and metagenomic data has been made possible by advances in computational methods and substantial reductions in sequencing costs. Although this increasing breadth of draft genomes is providing key information regarding the evolutionary and functional diversity of microbial life, it has become impractical to finish all available reference genomes. Making robust biological inferences from draft genomes requires accurate estimates of their completeness and contamination. Current methods for assessing genome quality are ad hoc and generally make use of a limited number of "marker" genes conserved across all bacterial or archaeal genomes. Here we introduce CheckM, an automated method for assessing the quality of a genome using a broader set of marker genes specific to the position of a genome within a reference genome tree and information about the collocation of these genes. We demonstrate the effectiveness of CheckM using synthetic data and a wide range of isolate-, single-cell-, and metagenome-derived genomes. CheckM is shown to provide accurate estimates of genome completeness and contamination and to outperform existing approaches. Using CheckM, we identify a diverse range of errors currently impacting publicly available isolate genomes and demonstrate that genomes obtained from single cells and metagenomic data vary substantially in quality. In order to facilitate the use of draft genomes, we propose an objective measure of genome quality that can be used to select genomes suitable for specific gene- and genome-centric analyses of microbial communities. © 2015 Parks et al.; Published by Cold Spring Harbor Laboratory Press.

  19. Metagenomic mining pectinolytic microbes and enzymes from an apple pomace-adapted compost microbial community.

    PubMed

    Zhou, Man; Guo, Peng; Wang, Tao; Gao, Lina; Yin, Huijun; Cai, Cheng; Gu, Jie; Lü, Xin

    2017-01-01

    Degradation of pectin in lignocellulosic materials is one of the key steps for biofuel production. Biological hydrolysis of pectin, i.e., degradation by pectinolytic microbes and enzymes, is an attractive paradigm because of its obvious advantages, such as environmentally friendly procedures, low in energy demand for lignin removal, and the possibility to be integrated in consolidated process. In this study, a metagenomics sequence-guided strategy coupled with enrichment culture technique was used to facilitate targeted discovery of pectinolytic microbes and enzymes. An apple pomace-adapted compost (APAC) habitat was constructed to boost the enrichment of pectinolytic microorganisms. Analyses of 16S rDNA high-throughput sequencing revealed that microbial communities changed dramatically during composting with some bacterial populations being greatly enriched. Metagenomics data showed that apple pomace-adapted compost microbial community (APACMC) was dominated by Proteobacteria and Bacteroidetes . Functional analysis and carbohydrate-active enzyme profiles confirmed that APACMC had been successfully enriched for the targeted functions. Among the 1756 putative genes encoding pectinolytic enzymes, 129 were predicted as novel (with an identity <30% to any CAZy database entry) and only 1.92% were more than 75% identical with proteins in NCBI environmental database, demonstrating that they have not been observed in previous metagenome projects. Phylogenetic analysis showed that APACMC harbored a broad range of pectinolytic bacteria and many of them were previously unrecognized. The immensely diverse pectinolytic microbes and enzymes found in our study will expand the arsenal of proficient degraders and enzymes for lignocellulosic biofuel production. Our study provides a powerful approach for targeted mining microbes and enzymes in numerous industries.

  20. Exploring the Impacts of Anthropogenic Disturbance on Seawater and Sediment Microbial Communities in Korean Coastal Waters Using Metagenomics Analysis.

    PubMed

    Won, Nam-Il; Kim, Ki-Hwan; Kang, Ji Hyoun; Park, Sang Rul; Lee, Hyuk Je

    2017-01-27

    The coastal ecosystems are considered as one of the most dynamic and vulnerable environments under various anthropogenic developments and the effects of climate change. Variations in the composition and diversity of microbial communities may be a good indicator for determining whether the marine ecosystems are affected by complex forcing stressors. DNA sequence-based metagenomics has recently emerged as a promising tool for analyzing the structure and diversity of microbial communities based on environmental DNA (eDNA). However, few studies have so far been performed using this approach to assess the impacts of human activities on the microbial communities in marine systems. In this study, using metagenomic DNA sequencing (16S ribosomal RNA gene), we analyzed and compared seawater and sediment communities between sand mining and control (natural) sites in southern coastal waters of Korea to assess whether anthropogenic activities have significantly affected the microbial communities. The sand mining sites harbored considerably lower levels of microbial diversities in the surface seawater community during spring compared with control sites. Moreover, the sand mining areas had distinct microbial taxonomic group compositions, particularly during spring season. The microbial groups detected solely in the sediment load/dredging areas (e.g., Marinobacter, Alcanivorax, Novosphingobium) are known to be involved in degradation of toxic chemicals such as hydrocarbon, oil, and aromatic compounds, and they also contain potential pathogens. This study highlights the versatility of metagenomics in monitoring and diagnosing the impacts of human disturbance on the environmental health of marine ecosystems from eDNA.

  1. Comparative metagenomic, phylogenetic and physiological analyses of soil microbial communities across nitrogen gradients.

    PubMed

    Fierer, Noah; Lauber, Christian L; Ramirez, Kelly S; Zaneveld, Jesse; Bradford, Mark A; Knight, Rob

    2012-05-01

    Terrestrial ecosystems are receiving elevated inputs of nitrogen (N) from anthropogenic sources and understanding how these increases in N availability affect soil microbial communities is critical for predicting the associated effects on belowground ecosystems. We used a suite of approaches to analyze the structure and functional characteristics of soil microbial communities from replicated plots in two long-term N fertilization experiments located in contrasting systems. Pyrosequencing-based analyses of 16S rRNA genes revealed no significant effects of N fertilization on bacterial diversity, but significant effects on community composition at both sites; copiotrophic taxa (including members of the Proteobacteria and Bacteroidetes phyla) typically increased in relative abundance in the high N plots, with oligotrophic taxa (mainly Acidobacteria) exhibiting the opposite pattern. Consistent with the phylogenetic shifts under N fertilization, shotgun metagenomic sequencing revealed increases in the relative abundances of genes associated with DNA/RNA replication, electron transport and protein metabolism, increases that could be resolved even with the shallow shotgun metagenomic sequencing conducted here (average of 75 000 reads per sample). We also observed shifts in the catabolic capabilities of the communities across the N gradients that were significantly correlated with the phylogenetic and metagenomic responses, indicating possible linkages between the structure and functioning of soil microbial communities. Overall, our results suggest that N fertilization may, directly or indirectly, induce a shift in the predominant microbial life-history strategies, favoring a more active, copiotrophic microbial community, a pattern that parallels the often observed replacement of K-selected with r-selected plant species with elevated N.

  2. Diversity of thermophiles in a Malaysian hot spring determined using 16S rRNA and shotgun metagenome sequencing

    PubMed Central

    Chan, Chia Sing; Chan, Kok-Gan; Tay, Yea-Ling; Chua, Yi-Heng; Goh, Kian Mau

    2015-01-01

    The Sungai Klah (SK) hot spring is the second hottest geothermal spring in Malaysia. This hot spring is a shallow, 150-m-long, fast-flowing stream, with temperatures varying from 50 to 110°C and a pH range of 7.0–9.0. Hidden within a wooded area, the SK hot spring is continually fed by plant litter, resulting in a relatively high degree of total organic content (TOC). In this study, a sample taken from the middle of the stream was analyzed at the 16S rRNA V3-V4 region by amplicon metagenome sequencing. Over 35 phyla were detected by analyzing the 16S rRNA data. Firmicutes and Proteobacteria represented approximately 57% of the microbiome. Approximately 70% of the detected thermophiles were strict anaerobes; however, Hydrogenobacter spp., obligate chemolithotrophic thermophiles, represented one of the major taxa. Several thermophilic photosynthetic microorganisms and acidothermophiles were also detected. Most of the phyla identified by 16S rRNA were also found using the shotgun metagenome approaches. The carbon, sulfur, and nitrogen metabolism within the SK hot spring community were evaluated by shotgun metagenome sequencing, and the data revealed diversity in terms of metabolic activity and dynamics. This hot spring has a rich diversified phylogenetic community partly due to its natural environment (plant litter, high TOC, and a shallow stream) and geochemical parameters (broad temperature and pH range). It is speculated that symbiotic relationships occur between the members of the community. PMID:25798135

  3. EBI metagenomics—a new resource for the analysis and archiving of metagenomic data

    PubMed Central

    Hunter, Sarah; Corbett, Matthew; Denise, Hubert; Fraser, Matthew; Gonzalez-Beltran, Alejandra; Hunter, Christopher; Jones, Philip; Leinonen, Rasko; McAnulla, Craig; Maguire, Eamonn; Maslen, John; Mitchell, Alex; Nuka, Gift; Oisel, Arnaud; Pesseat, Sebastien; Radhakrishnan, Rajesh; Rocca-Serra, Philippe; Scheremetjew, Maxim; Sterk, Peter; Vaughan, Daniel; Cochrane, Guy; Field, Dawn; Sansone, Susanna-Assunta

    2014-01-01

    Metagenomics is a relatively recently established but rapidly expanding field that uses high-throughput next-generation sequencing technologies to characterize the microbial communities inhabiting different ecosystems (including oceans, lakes, soil, tundra, plants and body sites). Metagenomics brings with it a number of challenges, including the management, analysis, storage and sharing of data. In response to these challenges, we have developed a new metagenomics resource (http://www.ebi.ac.uk/metagenomics/) that allows users to easily submit raw nucleotide reads for functional and taxonomic analysis by a state-of-the-art pipeline, and have them automatically stored (together with descriptive, standards-compliant metadata) in the European Nucleotide Archive. PMID:24165880

  4. ELIXIR pilot action: Marine metagenomics - towards a domain specific set of sustainable services.

    PubMed

    Robertsen, Espen Mikal; Denise, Hubert; Mitchell, Alex; Finn, Robert D; Bongo, Lars Ailo; Willassen, Nils Peder

    2017-01-01

    Metagenomics, the study of genetic material recovered directly from environmental samples, has the potential to provide insight into the structure and function of heterogeneous microbial communities.  There has been an increased use of metagenomics to discover and understand the diverse biosynthetic capacities of marine microbes, thereby allowing them to be exploited for industrial, food, and health care products. This ELIXIR pilot action was motivated by the need to establish dedicated data resources and harmonized metagenomics pipelines for the marine domain, in order to enhance the exploration and exploitation of marine genetic resources. In this paper, we summarize some of the results from the ELIXIR pilot action "Marine metagenomics - towards user centric services".

  5. A unique circovirus-like genome detected in pig feces

    USDA-ARS?s Scientific Manuscript database

    Using a metagenomic approach and molecular cloning methods, we identified, cloned, and sequenced the complete genome of a novel circular DNA virus, porcine stool-associated virus (PoSCV4), from pig feces. Phylogenetic analysis of the deduced replication initiator protein showed that PoSCV4 is most r...

  6. Rapid detection of trichodysplasia spinulosa-associated polyomavirus in skin biopsy specimen.

    PubMed

    Urbano, Paulo Roberto P; Pannuti, Cláudio Sérgio; Pierrotti, Ligia C; David-Neto, Elias; Romano, Camila Malta

    2014-07-24

    Trichodysplasia spinulosa-associated polyomavirus (TSV) is responsible for a rare skin cancer. Using metagenomic approaches, we determined the complete genome sequence of a TSV first detected in Brazil in spicules of an immunocompromised patient suspected to have trichodysplasia spinulosa. Copyright © 2014 Urbano et al.

  7. The metatranscriptome of the rhesus macaque: investigating potential causes of idiopathic chronic diarrhea

    USDA-ARS?s Scientific Manuscript database

    The study of the gut microbiome—the collection of microbes within the intestinal tract and the genes they express—is growing in popularity as associations are found between diet, gut microbiome activity, and host health and disease. However, current metagenomic and ribosomal profiling approaches are...

  8. MOLECULAR TRACKING FECAL CONTAMINATION IN SURFACE WATERS: 16S RDNA VERSUS METAGENOMICS APPROACHES

    EPA Science Inventory

    Microbial source tracking methods need to be sensitive and exhibit temporal and geographic stability in order to provide meaningful data in field studies. The objective of this study was to use a combination of PCR-based methods to track cow fecal contamination in two watersheds....

  9. Characterization of chlorinated and chloraminated drinking water microbial communities in a distribution system simulator using pyrosequencing data analysis

    EPA Science Inventory

    The molecular analysis of drinking water microbial communities has focused primarily on 16S rRNA gene sequence analysis. Since this approach provides limited information on function potential of microbial communities, analysis of whole-metagenome pyrosequencing data was used to...

  10. CowPI: A Rumen Microbiome Focussed Version of the PICRUSt Functional Inference Software.

    PubMed

    Wilkinson, Toby J; Huws, Sharon A; Edwards, Joan E; Kingston-Smith, Alison H; Siu-Ting, Karen; Hughes, Martin; Rubino, Francesco; Friedersdorff, Maximillian; Creevey, Christopher J

    2018-01-01

    Metataxonomic 16S rDNA based studies are a commonplace and useful tool in the research of the microbiome, but they do not provide the full investigative power of metagenomics and metatranscriptomics for revealing the functional potential of microbial communities. However, the use of metagenomic and metatranscriptomic technologies is hindered by high costs and skills barrier necessary to generate and interpret the data. To address this, a tool for Phylogenetic Investigation of Communities by Reconstruction of Unobserved States (PICRUSt) was developed for inferring the functional potential of an observed microbiome profile, based on 16S data. This allows functional inferences to be made from metataxonomic 16S rDNA studies with little extra work or cost, but its accuracy relies on the availability of completely sequenced genomes of representative organisms from the community being investigated. The rumen microbiome is an example of a community traditionally underrepresented in genome and sequence databases, but recent efforts by projects such as the Global Rumen Census and Hungate 1000 have resulted in a wide sampling of 16S rDNA profiles and almost 500 fully sequenced microbial genomes from this environment. Using this information, we have developed "CowPI," a focused version of the PICRUSt tool provided for use by the wider scientific community in the study of the rumen microbiome. We evaluated the accuracy of CowPI and PICRUSt using two 16S datasets from the rumen microbiome: one generated from rDNA and the other from rRNA where corresponding metagenomic and metatranscriptomic data was also available. We show that the functional profiles predicted by CowPI better match estimates for both the meta-genomic and transcriptomic datasets than PICRUSt, and capture the higher degree of genetic variation and larger pangenomes of rumen organisms. Nonetheless, whilst being closer in terms of predictive power for the rumen microbiome, there were differences when compared to both the metagenomic and metatranscriptome data and so we recommend, where possible, functional inferences from 16S data should not replace metagenomic and metatranscriptomic approaches. The tool can be accessed at http://www.cowpi.org and is provided to the wider scientific community for use in the study of the rumen microbiome.

  11. Nonpareil 3: Fast Estimation of Metagenomic Coverage and Sequence Diversity.

    PubMed

    Rodriguez-R, Luis M; Gunturu, Santosh; Tiedje, James M; Cole, James R; Konstantinidis, Konstantinos T

    2018-01-01

    Estimations of microbial community diversity based on metagenomic data sets are affected, often to an unknown degree, by biases derived from insufficient coverage and reference database-dependent estimations of diversity. For instance, the completeness of reference databases cannot be generally estimated since it depends on the extant diversity sampled to date, which, with the exception of a few habitats such as the human gut, remains severely undersampled. Further, estimation of the degree of coverage of a microbial community by a metagenomic data set is prohibitively time-consuming for large data sets, and coverage values may not be directly comparable between data sets obtained with different sequencing technologies. Here, we extend Nonpareil, a database-independent tool for the estimation of coverage in metagenomic data sets, to a high-performance computing implementation that scales up to hundreds of cores and includes, in addition, a k -mer-based estimation as sensitive as the original alignment-based version but about three hundred times as fast. Further, we propose a metric of sequence diversity ( N d ) derived directly from Nonpareil curves that correlates well with alpha diversity assessed by traditional metrics. We use this metric in different experiments demonstrating the correlation with the Shannon index estimated on 16S rRNA gene profiles and show that N d additionally reveals seasonal patterns in marine samples that are not captured by the Shannon index and more precise rankings of the magnitude of diversity of microbial communities in different habitats. Therefore, the new version of Nonpareil, called Nonpareil 3, advances the toolbox for metagenomic analyses of microbiomes. IMPORTANCE Estimation of the coverage provided by a metagenomic data set, i.e., what fraction of the microbial community was sampled by DNA sequencing, represents an essential first step of every culture-independent genomic study that aims to robustly assess the sequence diversity present in a sample. However, estimation of coverage remains elusive because of several technical limitations associated with high computational requirements and limiting statistical approaches to quantify diversity. Here we described Nonpareil 3, a new bioinformatics algorithm that circumvents several of these limitations and thus can facilitate culture-independent studies in clinical or environmental settings, independent of the sequencing platform employed. In addition, we present a new metric of sequence diversity based on rarefied coverage and demonstrate its use in communities from diverse ecosystems.

  12. First report of bacterial community from a Bat Guano using Illumina next-generation sequencing.

    PubMed

    De Mandal, Surajit; Zothansanga; Panda, Amritha Kumari; Bisht, Satpal Singh; Senthil Kumar, Nachimuthu

    2015-06-01

    V4 hypervariable region of 16S rDNA was analyzed for identifying the bacterial communities present in Bat Guano from the unexplored cave - Pnahkyndeng, Meghalaya, Northeast India. Metagenome comprised of 585,434 raw Illumina sequences with a 59.59% G+C content. A total of 416,490 preprocessed reads were clustered into 1282 OTUs (operational taxonomical units) comprising of 18 bacterial phyla. The taxonomic profile showed that the guano bacterial community is dominated by Chloroflexi, Actinobacteria and Crenarchaeota which account for 70.73% of all sequence reads and 43.83% of all OTUs. Metagenome sequence data are available at NCBI under the accession no. SRP051094. This study is the first to characterize Bat Guano bacterial community using next-generation sequencing approach.

  13. First report of bacterial community from a Bat Guano using Illumina next-generation sequencing

    PubMed Central

    De Mandal, Surajit; Zothansanga; Panda, Amritha Kumari; Bisht, Satpal Singh; Senthil Kumar, Nachimuthu

    2015-01-01

    V4 hypervariable region of 16S rDNA was analyzed for identifying the bacterial communities present in Bat Guano from the unexplored cave — Pnahkyndeng, Meghalaya, Northeast India. Metagenome comprised of 585,434 raw Illumina sequences with a 59.59% G+C content. A total of 416,490 preprocessed reads were clustered into 1282 OTUs (operational taxonomical units) comprising of 18 bacterial phyla. The taxonomic profile showed that the guano bacterial community is dominated by Chloroflexi, Actinobacteria and Crenarchaeota which account for 70.73% of all sequence reads and 43.83% of all OTUs. Metagenome sequence data are available at NCBI under the accession no. SRP051094. This study is the first to characterize Bat Guano bacterial community using next-generation sequencing approach. PMID:26484190

  14. Human Microbiome Acquisition and Bioinformatic Challenges in Metagenomic Studies

    PubMed Central

    2018-01-01

    The study of the human microbiome has become a very popular topic. Our microbial counterpart, in fact, appears to play an important role in human physiology and health maintenance. Accordingly, microbiome alterations have been reported in an increasing number of human diseases. Despite the huge amount of data produced to date, less is known on how a microbial dysbiosis effectively contributes to a specific pathology. To fill in this gap, other approaches for microbiome study, more comprehensive than 16S rRNA gene sequencing, i.e., shotgun metagenomics and metatranscriptomics, are becoming more widely used. Methods standardization and the development of specific pipelines for data analysis are required to contribute to and increase our understanding of the human microbiome relationship with health and disease status. PMID:29382070

  15. Taxonomic and functional metagenomic profiling of gastrointestinal tract microbiome of the farmed adult turbot (Scophthalmus maximus).

    PubMed

    Xing, Mengxin; Hou, Zhanhui; Yuan, Jianbo; Liu, Yuan; Qu, Yanmei; Liu, Bin

    2013-12-01

    Metagenomics combined with 16S rRNA gene sequence analyses was applied to unveil the taxonomic composition and functional diversity of the farmed adult turbot gastrointestinal (GI) microbiome. Proteobacteria and Firmicutes which existed in both GI content and mucus were dominated in the turbot GI microbiome. 16S rRNA gene sequence analyses also indicated that the turbot GI tract may harbor some bacteria which originated from associated seawater. Functional analyses indicated that the clustering-based subsystem and many metabolic subsystems were dominant in the turbot GI metagenome. Compared with other gut metagenomes, quorum sensing and biofilm formation was overabundant in the turbot GI metagenome. Genes associated with quorum sensing and biofilm formation were found in species within Vibrio, including Vibrio vulnificus, Vibrio cholerae and Vibrio parahaemolyticus. In farmed fish gut metagenomes, the stress response and protein folding subsystems were over-represented and several genes concerning antibiotic and heavy metal resistance were also detected. These data suggested that the turbot GI microbiome may be affected by human factors in aquaculture. Additionally, iron acquisition and the metabolism subsystem were more abundant in the turbot GI metagenome when compared with freshwater fish gut metagenome, suggesting that unique metabolic potential may be observed in marine animal GI microbiomes. © 2013 Federation of European Microbiological Societies. Published by John Wiley & Sons Ltd. All rights reserved.

  16. Mutually unbiased product bases for multiple qudits

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    McNulty, Daniel; Pammer, Bogdan; Weigert, Stefan

    We investigate the interplay between mutual unbiasedness and product bases for multiple qudits of possibly different dimensions. A product state of such a system is shown to be mutually unbiased to a product basis only if each of its factors is mutually unbiased to all the states which occur in the corresponding factors of the product basis. This result implies both a tight limit on the number of mutually unbiased product bases which the system can support and a complete classification of mutually unbiased product bases for multiple qubits or qutrits. In addition, only maximally entangled states can be mutuallymore » unbiased to a maximal set of mutually unbiased product bases.« less

  17. Structural and mechanistic analysis of a β-glycoside phosphorylase identified by screening a metagenomic library.

    PubMed

    Macdonald, Spencer S; Patel, Ankoor; Larmour, Veronica L C; Morgan-Lang, Connor; Hallam, Steven J; Mark, Brian L; Withers, Stephen G

    2018-03-02

    Glycoside phosphorylases have considerable potential as catalysts for the assembly of useful glycans for products ranging from functional foods and prebiotics to novel materials. However, the substrate diversity of currently identified phosphorylases is relatively small, limiting their practical applications. To address this limitation, we developed a high-throughput screening approach using the activated substrate 2,4-dinitrophenyl β-d-glucoside (DNPGlc) and inorganic phosphate for identifying glycoside phosphorylase activity and used it to screen a large insert metagenomic library. The initial screen, based on release of 2,4-dinitrophenyl from DNPGlc in the presence of phosphate, identified the gene bglP, encoding a retaining β-glycoside phosphorylase from the CAZy GH3 family. Kinetic and mechanistic analysis of the gene product, BglP, confirmed a double displacement ping-pong mechanism involving a covalent glycosyl-enzyme intermediate. X-ray crystallographic analysis provided insights into the phosphate-binding mode and identified a key glutamine residue in the active site important for substrate recognition. Substituting this glutamine for a serine swapped the substrate specificity from glucoside to N -acetylglucosaminide. In summary, we present a high-throughput screening approach for identifying β-glycoside phosphorylases, which was robust, simple to implement, and useful in identifying active clones within a metagenomics library. Implementation of this screen enabled discovery of a new glycoside phosphorylase class and has paved the way to devising simple ways in which enzyme specificity can be encoded and swapped, which has implications for biotechnological applications. © 2018 by The American Society for Biochemistry and Molecular Biology, Inc.

  18. MetAMOS: a modular and open source metagenomic assembly and analysis pipeline

    PubMed Central

    2013-01-01

    We describe MetAMOS, an open source and modular metagenomic assembly and analysis pipeline. MetAMOS represents an important step towards fully automated metagenomic analysis, starting with next-generation sequencing reads and producing genomic scaffolds, open-reading frames and taxonomic or functional annotations. MetAMOS can aid in reducing assembly errors, commonly encountered when assembling metagenomic samples, and improves taxonomic assignment accuracy while also reducing computational cost. MetAMOS can be downloaded from: https://github.com/treangen/MetAMOS. PMID:23320958

  19. Metagenomes from two microbial consortia associated with Santa Barbara seep oil.

    PubMed

    Hawley, Erik R; Malfatti, Stephanie A; Pagani, Ioanna; Huntemann, Marcel; Chen, Amy; Foster, Brian; Copeland, Alexander; del Rio, Tijana Glavina; Pati, Amrita; Jansson, Janet R; Gilbert, Jack A; Tringe, Susannah Green; Lorenson, Thomas D; Hess, Matthias

    2014-12-01

    The metagenomes from two microbial consortia associated with natural oils seeping into the Pacific Ocean offshore the coast of Santa Barbara (California, USA) were determined to complement already existing metagenomes generated from microbial communities associated with hydrocarbons that pollute the marine ecosystem. This genomics resource article is the first of two publications reporting a total of four new metagenomes from oils that seep into the Santa Barbara Channel. Copyright © 2014 Elsevier B.V. All rights reserved.

  20. Identification of Rare Lewis Oligosaccharide Conformers in Aqueous Solution Using Enhanced Sampling Molecular Dynamics.

    PubMed

    Alibay, Irfan; Burusco, Kepa K; Bruce, Neil J; Bryce, Richard A

    2018-03-08

    Determining the conformations accessible to carbohydrate ligands in aqueous solution is important for understanding their biological action. In this work, we evaluate the conformational free-energy surfaces of Lewis oligosaccharides in explicit aqueous solvent using a multidimensional variant of the swarm-enhanced sampling molecular dynamics (msesMD) method; we compare with multi-microsecond unbiased MD simulations, umbrella sampling, and accelerated MD approaches. For the sialyl Lewis A tetrasaccharide, msesMD simulations in aqueous solution predict conformer landscapes in general agreement with the other biased methods and with triplicate unbiased 10 μs trajectories; these simulations find a predominance of closed conformer and a range of low-occupancy open forms. The msesMD simulations also suggest closed-to-open transitions in the tetrasaccharide are facilitated by changes in ring puckering of its GlcNAc residue away from the 4 C 1 form, in line with previous work. For sialyl Lewis X tetrasaccharide, msesMD simulations predict a minor population of an open form in solution corresponding to a rare lectin-bound pose observed crystallographically. Overall, from comparison with biased MD calculations, we find that triplicate 10 μs unbiased MD simulations may not be enough to fully sample glycan conformations in aqueous solution. However, the computational efficiency and intuitive approach of the msesMD method suggest potential for its application in glycomics as a tool for analysis of oligosaccharide conformation.

  1. Exploring variation-aware contig graphs for (comparative) metagenomics using MaryGold

    PubMed Central

    Nijkamp, Jurgen F.; Pop, Mihai; Reinders, Marcel J. T.; de Ridder, Dick

    2013-01-01

    Motivation: Although many tools are available to study variation and its impact in single genomes, there is a lack of algorithms for finding such variation in metagenomes. This hampers the interpretation of metagenomics sequencing datasets, which are increasingly acquired in research on the (human) microbiome, in environmental studies and in the study of processes in the production of foods and beverages. Existing algorithms often depend on the use of reference genomes, which pose a problem when a metagenome of a priori unknown strain composition is studied. In this article, we develop a method to perform reference-free detection and visual exploration of genomic variation, both within a single metagenome and between metagenomes. Results: We present the MaryGold algorithm and its implementation, which efficiently detects bubble structures in contig graphs using graph decomposition. These bubbles represent variable genomic regions in closely related strains in metagenomic samples. The variation found is presented in a condensed Circos-based visualization, which allows for easy exploration and interpretation of the found variation. We validated the algorithm on two simulated datasets containing three respectively seven Escherichia coli genomes and showed that finding allelic variation in these genomes improves assemblies. Additionally, we applied MaryGold to publicly available real metagenomic datasets, enabling us to find within-sample genomic variation in the metagenomes of a kimchi fermentation process, the microbiome of a premature infant and in microbial communities living on acid mine drainage. Moreover, we used MaryGold for between-sample variation detection and exploration by comparing sequencing data sampled at different time points for both of these datasets. Availability: MaryGold has been written in C++ and Python and can be downloaded from http://bioinformatics.tudelft.nl/software Contact: d.deridder@tudelft.nl PMID:24058058

  2. Loeffler 4.0: Diagnostic Metagenomics.

    PubMed

    Höper, Dirk; Wylezich, Claudia; Beer, Martin

    2017-01-01

    A new world of possibilities for "virus discovery" was opened up with high-throughput sequencing becoming available in the last decade. While scientifically metagenomic analysis was established before the start of the era of high-throughput sequencing, the availability of the first second-generation sequencers was the kick-off for diagnosticians to use sequencing for the detection of novel pathogens. Today, diagnostic metagenomics is becoming the standard procedure for the detection and genetic characterization of new viruses or novel virus variants. Here, we provide an overview about technical considerations of high-throughput sequencing-based diagnostic metagenomics together with selected examples of "virus discovery" for animal diseases or zoonoses and metagenomics for food safety or basic veterinary research. © 2017 Elsevier Inc. All rights reserved.

  3. ELIXIR pilot action: Marine metagenomics – towards a domain specific set of sustainable services

    PubMed Central

    Robertsen, Espen Mikal; Denise, Hubert; Mitchell, Alex; Finn, Robert D.; Bongo, Lars Ailo; Willassen, Nils Peder

    2017-01-01

    Metagenomics, the study of genetic material recovered directly from environmental samples, has the potential to provide insight into the structure and function of heterogeneous microbial communities.  There has been an increased use of metagenomics to discover and understand the diverse biosynthetic capacities of marine microbes, thereby allowing them to be exploited for industrial, food, and health care products. This ELIXIR pilot action was motivated by the need to establish dedicated data resources and harmonized metagenomics pipelines for the marine domain, in order to enhance the exploration and exploitation of marine genetic resources. In this paper, we summarize some of the results from the ELIXIR pilot action “Marine metagenomics – towards user centric services”. PMID:28620454

  4. Introduction to Metagenomics at DOE JGI: Program Overview and Program Informatics (Metagenomics Informatics Challenges Workshop: 10K Genomes at a Time)

    ScienceCinema

    Tringe, Susannah

    2018-01-15

    Susannah Tringe of the DOE Joint Genome Institute talks about the Program Overview and Program Informatics at the Metagenomics Informatics Challenges Workshop held at the DOE JGI on October 12-13, 2011.

  5. Fast and sensitive taxonomic classification for metagenomics with Kaiju

    PubMed Central

    Menzel, Peter; Ng, Kim Lee; Krogh, Anders

    2016-01-01

    Metagenomics emerged as an important field of research not only in microbial ecology but also for human health and disease, and metagenomic studies are performed on increasingly larger scales. While recent taxonomic classification programs achieve high speed by comparing genomic k-mers, they often lack sensitivity for overcoming evolutionary divergence, so that large fractions of the metagenomic reads remain unclassified. Here we present the novel metagenome classifier Kaiju, which finds maximum (in-)exact matches on the protein-level using the Burrows–Wheeler transform. We show in a genome exclusion benchmark that Kaiju classifies reads with higher sensitivity and similar precision compared with current k-mer-based classifiers, especially in genera that are underrepresented in reference databases. We also demonstrate that Kaiju classifies up to 10 times more reads in real metagenomes. Kaiju can process millions of reads per minute and can run on a standard PC. Source code and web server are available at http://kaiju.binf.ku.dk. PMID:27071849

  6. Fast and sensitive taxonomic classification for metagenomics with Kaiju.

    PubMed

    Menzel, Peter; Ng, Kim Lee; Krogh, Anders

    2016-04-13

    Metagenomics emerged as an important field of research not only in microbial ecology but also for human health and disease, and metagenomic studies are performed on increasingly larger scales. While recent taxonomic classification programs achieve high speed by comparing genomic k-mers, they often lack sensitivity for overcoming evolutionary divergence, so that large fractions of the metagenomic reads remain unclassified. Here we present the novel metagenome classifier Kaiju, which finds maximum (in-)exact matches on the protein-level using the Burrows-Wheeler transform. We show in a genome exclusion benchmark that Kaiju classifies reads with higher sensitivity and similar precision compared with current k-mer-based classifiers, especially in genera that are underrepresented in reference databases. We also demonstrate that Kaiju classifies up to 10 times more reads in real metagenomes. Kaiju can process millions of reads per minute and can run on a standard PC. Source code and web server are available at http://kaiju.binf.ku.dk.

  7. Mining the metagenome of activated biomass of an industrial wastewater treatment plant by a novel method.

    PubMed

    Sharma, Nandita; Tanksale, Himgouri; Kapley, Atya; Purohit, Hemant J

    2012-12-01

    Metagenomic libraries herald the era of magnifying the microbial world, tapping into the vast metabolic potential of uncultivated microbes, and enhancing the rate of discovery of novel genes and pathways. In this paper, we describe a method that facilitates the extraction of metagenomic DNA from activated sludge of an industrial wastewater treatment plant and its use in mining the metagenome via library construction. The efficiency of this method was demonstrated by the large representation of the bacterial genome in the constructed metagenomic libraries and by the functional clones obtained. The BAC library represented 95.6 times the bacterial genome, while, the pUC library represented 41.7 times the bacterial genome. Twelve clones in the BAC library demonstrated lipolytic activity, while four clones demonstrated dioxygenase activity. Four clones in pUC library tested positive for cellulase activity. This method, using FTA cards, not only can be used for library construction, but can also store the metagenome at room temperature.

  8. Metagenomic applications in environmental monitoring and bioremediation

    DOE PAGES

    Techtmann, Stephen M.; Hazen, Terry C.

    2016-01-01

    With the rapid advances in sequencing technology, the cost of sequencing has dramatically dropped and the scale of sequencing projects has increased accordingly. This has provided the opportunity for the routine use of sequencing techniques in the monitoring of environmental microbes. While metagenomic applications have been routinely applied to better understand the ecology and diversity of microbes, their use in environmental monitoring and bioremediation is increasingly common. In this review we seek to provide an overview of some of the metagenomic techniques used in environmental systems biology, addressing their application and limitation. We will also provide several recent examples ofmore » the application of metagenomics to bioremediation. We discuss examples where microbial communities have been used to predict the presence and extent of contamination, examples of how metagenomics can be used to characterize the process of natural attenuation by unculturable microbes, as well as examples detailing the use of metagenomics to understand the impact of biostimulation on microbial communities.« less

  9. Elucidation of rice rhizosphere metagenome in relation to methane and nitrogen metabolism under elevated carbon dioxide and temperature using whole genome metagenomic approach.

    PubMed

    Bhattacharyya, P; Roy, K S; Das, M; Ray, S; Balachandar, D; Karthikeyan, S; Nayak, A K; Mohapatra, T

    2016-01-15

    Carbon (C) and nitrogen (N) mineralization is one of the key processes of biogeochemical cycling in terrestrial ecosystem in general and rice ecology in particular. Rice rhizosphere is a rich niche of microbial diversity influenced by change in atmospheric temperature and concentration of carbon dioxide (CO2). Structural changes in microbial communities in rhizosphere influence the nutrient cycling. In the present study, the bacterial diversity and population dynamics were studied under ambient CO2 (a-CO2) and elevated CO2+temperature (e-CO2T) in lowland rice rhizosphere using whole genome metagenomic approach. The whole genome metagenomic sequence data of lowland rice exhibited the dominance of bacterial communities including Proteobacteria, Firmicutes, Acidobacteria, Actinobacteria and Planctomycetes. Interestingly, four genera related to methane production namely, Methanobacterium, Methanosphaera, Methanothermus and Methanothermococcus were absent in a-CO2 but noticed under e-CO2T. The acetoclastic pathway was found as the predominant pathway for methanogenesis, whereas, the serine pathway was found as the principal metabolic pathway for CH4 oxidation in lowland rice. The abundances of reads of enzymes in the acetoclastic methanogenesis pathway and serine pathways of methanotrophy were much higher in e-CO2T (328 and 182, respectively) as compared with a-CO2 (118 and 98, respectively). Rice rhizosphere showed higher structural diversities and functional activities in relation to N metabolism involving nitrogen fixation, assimilatory and dissimilatory nitrate reduction and denitrification under e-CO2T than that of a-CO2. Among the three pathways of N metabolism, dissimilarity pathways were predominant in lowland rice rhizosphere and more so under e-CO2T. Consequently, under e-CO2T, CH4 emission, microbial biomass nitrogen (MBN) and dehydrogenase activities were 45%, 20% and 35% higher than a-CO2, respectively. Holistically, a high bacterial diversity and abundances of C and N decomposing bacteria in lowland rice rhizosphere were found under e-CO2T, which could be explored further for their specific role in nutrient cycling, sustainable agriculture and environment management. Copyright © 2015 Elsevier B.V. All rights reserved.

  10. Molecular cloning, expression, and characterization of four novel thermo-alkaliphilic enzymes retrieved from a metagenomic library.

    PubMed

    Maruthamuthu, Mukil; van Elsas, Jan Dirk

    2017-01-01

    Enzyme discovery is a promising approach to aid in the deconstruction of recalcitrant plant biomass in an industrial process. Novel enzymes can be readily discovered by applying metagenomics on whole microbiomes. Our goal was to select, examine, and characterize eight novel glycoside hydrolases that were previously detected in metagenomic libraries, to serve biotechnological applications with high performance. Here, eight glycosyl hydrolase family candidate genes were selected from metagenomes of wheat straw-degrading microbial consortia using molecular cloning and subsequent gene expression studies in Escherichia coli. Four of the eight enzymes had significant activities on either p NP-β-d-galactopyranoside, p NP-β-d-xylopyranoside, p NP-α-l-arabinopyranoside or p NP-α-d-glucopyranoside. These proteins, denoted as proteins 1, 2, 5 and 6, were his-tag purified and their nature and activities further characterized using molecular and activity screens with the p NP-labeled substrates. Proteins 1 and 2 showed high homologies with (1) a β-galactosidase (74%) and (2) a β-xylosidase (84%), whereas the remaining two (5 and 6) were homologous with proteins reported as a diguanylate cyclase and an aquaporin, respectively. The β-galactosidase- and β-xylosidase-like proteins 1 and 2 were confirmed as being responsible for previously found thermo-alkaliphilic glycosidase activities of extracts of E. coli carrying the respective source fosmids. Remarkably, the β-xylosidase-like protein 2 showed activities with both p NP-Xyl and p NP-Ara in the temperature range 40-50 °C and pH range 8.0-10.0. Moreover, proteins 5 and 6 showed thermotolerant α-glucosidase activity at pH 10.0. In silico structure prediction of protein 5 revealed the presence of a potential "GGDEF" catalytic site, encoding α-glucosidase activity, whereas that of protein 6 showed a "GDSL" site, encoding a 'new family' α-glucosidase activity. Using a rational screening approach, we identified and characterized four thermo-alkaliphilic glycosyl hydrolases that have the potential to serve as constituents of enzyme cocktails that produce sugars from lignocellulosic plant remains.

  11. A Simple "Boxed Molecular Kinetics" Approach To Accelerate Rare Events in the Stochastic Kinetic Master Equation.

    PubMed

    Shannon, Robin; Glowacki, David R

    2018-02-15

    The chemical master equation is a powerful theoretical tool for analyzing the kinetics of complex multiwell potential energy surfaces in a wide range of different domains of chemical kinetics spanning combustion, atmospheric chemistry, gas-surface chemistry, solution phase chemistry, and biochemistry. There are two well-established methodologies for solving the chemical master equation: a stochastic "kinetic Monte Carlo" approach and a matrix-based approach. In principle, the results yielded by both approaches are identical; the decision of which approach is better suited to a particular study depends on the details of the specific system under investigation. In this Article, we present a rigorous method for accelerating stochastic approaches by several orders of magnitude, along with a method for unbiasing the accelerated results to recover the "true" value. The approach we take in this paper is inspired by the so-called "boxed molecular dynamics" (BXD) method, which has previously only been applied to accelerate rare events in molecular dynamics simulations. Here we extend BXD to design a simple algorithmic strategy for accelerating rare events in stochastic kinetic simulations. Tests on a number of systems show that the results obtained using the BXD rare event strategy are in good agreement with unbiased results. To carry out these tests, we have implemented a kinetic Monte Carlo approach in MESMER, which is a cross-platform, open-source, and freely available master equation solver.

  12. Interactive metagenomic visualization in a Web browser.

    PubMed

    Ondov, Brian D; Bergman, Nicholas H; Phillippy, Adam M

    2011-09-30

    A critical output of metagenomic studies is the estimation of abundances of taxonomical or functional groups. The inherent uncertainty in assignments to these groups makes it important to consider both their hierarchical contexts and their prediction confidence. The current tools for visualizing metagenomic data, however, omit or distort quantitative hierarchical relationships and lack the facility for displaying secondary variables. Here we present Krona, a new visualization tool that allows intuitive exploration of relative abundances and confidences within the complex hierarchies of metagenomic classifications. Krona combines a variant of radial, space-filling displays with parametric coloring and interactive polar-coordinate zooming. The HTML5 and JavaScript implementation enables fully interactive charts that can be explored with any modern Web browser, without the need for installed software or plug-ins. This Web-based architecture also allows each chart to be an independent document, making them easy to share via e-mail or post to a standard Web server. To illustrate Krona's utility, we describe its application to various metagenomic data sets and its compatibility with popular metagenomic analysis tools. Krona is both a powerful metagenomic visualization tool and a demonstration of the potential of HTML5 for highly accessible bioinformatic visualizations. Its rich and interactive displays facilitate more informed interpretations of metagenomic analyses, while its implementation as a browser-based application makes it extremely portable and easily adopted into existing analysis packages. Both the Krona rendering code and conversion tools are freely available under a BSD open-source license, and available from: http://krona.sourceforge.net.

  13. Metagenomics, metaMicrobesOnline and Kbase Data Integration (MICW - Metagenomics Informatics Challenges Workshop: 10K Genomes at a Time)

    ScienceCinema

    Dehal, Paramvir

    2018-02-06

    Berkeley Lab's Paramvir Dehal on "Managing and Storing large Datasets in MicrobesOnline, metaMicrobesOnline and the DOE Knowledgebase" at the Metagenomics Informatics Challenges Workshop held at the DOE JGI on October 12-13, 2011.

  14. Biofilm-Growing Bacteria Involved in the Corrosion of Concrete Wastewater Pipes: Protocols for Comparative Metagenomic Analyses

    EPA Science Inventory

    Advances in high-throughput next-generation sequencing (NGS) technology for direct sequencing of environmental DNA (i.e. shotgun metagenomics) is transforming the field of microbiology. NGS technologies are now regularly being applied in comparative metagenomic studies, which pr...

  15. Introduction to Metagenomics at DOE JGI (Opening Remarks for the Metagenomics Informatics Challenges Workshop: 10K Genomes at a Time)

    ScienceCinema

    Kyrpides, Nikos [DOE JGI

    2018-05-30

    After a quick introduction by DOE JGI Director Eddy Rubin, DOE JGI's Nikos Kyrpides delivers the opening remarks at the Metagenomics Informatics Challenges Workshop held at the DOE JGI on October 12-13, 2011.

  16. Proof of Concept for an Approach to a Finer Resolution Inventory

    Treesearch

    Chris J. Cieszewski; Kim Iles; Roger C. Lowe; Michal Zasada

    2005-01-01

    This report presents a proof of concept for a statistical framework to develop a timely, accurate, and unbiased fiber supply assessment in the State of Georgia, U.S.A. The proposed approach is based on using various data sources and modeling techniques to calibrate satellite image-based statewide stand lists, which provide initial estimates for a State inventory on a...

  17. Can We Spin Straw Into Gold? An Evaluation of Immigrant Legal Status Imputation Approaches

    PubMed Central

    Van Hook, Jennifer; Bachmeier, James D.; Coffman, Donna; Harel, Ofer

    2014-01-01

    Researchers have developed logical, demographic, and statistical strategies for imputing immigrants’ legal status, but these methods have never been empirically assessed. We used Monte Carlo simulations to test whether, and under what conditions, legal status imputation approaches yield unbiased estimates of the association of unauthorized status with health insurance coverage. We tested five methods under a range of missing data scenarios. Logical and demographic imputation methods yielded biased estimates across all missing data scenarios. Statistical imputation approaches yielded unbiased estimates only when unauthorized status was jointly observed with insurance coverage; when this condition was not met, these methods overestimated insurance coverage for unauthorized relative to legal immigrants. We next showed how bias can be reduced by incorporating prior information about unauthorized immigrants. Finally, we demonstrated the utility of the best-performing statistical method for increasing power. We used it to produce state/regional estimates of insurance coverage among unauthorized immigrants in the Current Population Survey, a data source that contains no direct measures of immigrants’ legal status. We conclude that commonly employed legal status imputation approaches are likely to produce biased estimates, but data and statistical methods exist that could substantially reduce these biases. PMID:25511332

  18. Bridging the gap between marker-assisted and genomic selection of heading time and plant height in hybrid wheat.

    PubMed

    Zhao, Y; Mette, M F; Gowda, M; Longin, C F H; Reif, J C

    2014-06-01

    Based on data from field trials with a large collection of 135 elite winter wheat inbred lines and 1604 F1 hybrids derived from them, we compared the accuracy of prediction of marker-assisted selection and current genomic selection approaches for the model traits heading time and plant height in a cross-validation approach. For heading time, the high accuracy seen with marker-assisted selection severely dropped with genomic selection approaches RR-BLUP (ridge regression best linear unbiased prediction) and BayesCπ, whereas for plant height, accuracy was low with marker-assisted selection as well as RR-BLUP and BayesCπ. Differences in the linkage disequilibrium structure of the functional and single-nucleotide polymorphism markers relevant for the two traits were identified in a simulation study as a likely explanation for the different trends in accuracies of prediction. A new genomic selection approach, weighted best linear unbiased prediction (W-BLUP), designed to treat the effects of known functional markers more appropriately, proved to increase the accuracy of prediction for both traits and thus closes the gap between marker-assisted and genomic selection.

  19. Bridging the gap between marker-assisted and genomic selection of heading time and plant height in hybrid wheat

    PubMed Central

    Zhao, Y; Mette, M F; Gowda, M; Longin, C F H; Reif, J C

    2014-01-01

    Based on data from field trials with a large collection of 135 elite winter wheat inbred lines and 1604 F1 hybrids derived from them, we compared the accuracy of prediction of marker-assisted selection and current genomic selection approaches for the model traits heading time and plant height in a cross-validation approach. For heading time, the high accuracy seen with marker-assisted selection severely dropped with genomic selection approaches RR-BLUP (ridge regression best linear unbiased prediction) and BayesCπ, whereas for plant height, accuracy was low with marker-assisted selection as well as RR-BLUP and BayesCπ. Differences in the linkage disequilibrium structure of the functional and single-nucleotide polymorphism markers relevant for the two traits were identified in a simulation study as a likely explanation for the different trends in accuracies of prediction. A new genomic selection approach, weighted best linear unbiased prediction (W-BLUP), designed to treat the effects of known functional markers more appropriately, proved to increase the accuracy of prediction for both traits and thus closes the gap between marker-assisted and genomic selection. PMID:24518889

  20. Metagenomic-based impact study of transgenic grapevine rootstock on its associated virome and soil microbiome

    USDA-ARS?s Scientific Manuscript database

    For some crops, the only possible approach to gain a specific trait requires genome modification. The development of virus-resistant transgenic plants based on the pathogen-derived resistance strategy has been a success story for over three decades. However, potential risks associated with the techn...

  1. A new approach for detecting fungal and oomycete plant pathogens in next generation sequencing metagenome data utilising electronic probes

    USDA-ARS?s Scientific Manuscript database

    Early stage infections caused by fungal/oomycete spores can remain undetected until signs or symptoms develop. Serological and molecular techniques are currently used for detecting these pathogens. Next-generation sequencing (NGS) has potential as a diagnostic tool, due to the capacity to target mul...

  2. Investigating the chicken and turkey enteric microbiomes: metagenomics as a tool for virus discovery and community analysis in the poultry gut

    USDA-ARS?s Scientific Manuscript database

    Gut health and the management of the gut microflora in poultry are complicated and overarching concepts that are influenced through management approaches (including the administration of antibiotic growth promoters), feed nutrient composition and utilization, early gut damage by pathogens such as en...

  3. Genetic variability of psychrotolerant Acidithiobacillus ferrivorans revealed by (meta)genomic analysis.

    PubMed

    González, Carolina; Yanquepe, María; Cardenas, Juan Pablo; Valdes, Jorge; Quatrini, Raquel; Holmes, David S; Dopson, Mark

    2014-11-01

    Acidophilic microorganisms inhabit low pH environments such as acid mine drainage that is generated when sulfide minerals are exposed to air. The genome sequence of the psychrotolerant Acidithiobacillus ferrivorans SS3 was compared to a metagenome from a low temperature acidic stream dominated by an A. ferrivorans-like strain. Stretches of genomic DNA characterized by few matches to the metagenome, termed 'metagenomic islands', encoded genes associated with metal efflux and pH homeostasis. The metagenomic islands were enriched in mobile elements such as phage proteins, transposases, integrases and in one case, predicted to be flanked by truncated tRNAs. Cus gene clusters predicted to be involved in copper efflux and further Cus-like RND systems were predicted to be located in metagenomic islands and therefore, constitute part of the flexible gene complement of the species. Phylogenetic analysis of Cus clusters showed both lineage specificity within the Acidithiobacillus genus as well as niche specificity associated with an acidic environment. The metagenomic islands also contained a predicted copper efflux P-type ATPase system and a polyphosphate kinase potentially involved in polyphosphate mediated copper resistance. This study identifies genetic variability of low temperature acidophiles that likely reflects metal resistance selective pressures in the copper rich environment. Copyright © 2014 Institut Pasteur. Published by Elsevier Masson SAS. All rights reserved.

  4. Biased and unbiased perceptual decision-making on vocal emotions.

    PubMed

    Dricu, Mihai; Ceravolo, Leonardo; Grandjean, Didier; Frühholz, Sascha

    2017-11-24

    Perceptual decision-making on emotions involves gathering sensory information about the affective state of another person and forming a decision on the likelihood of a particular state. These perceptual decisions can be of varying complexity as determined by different contexts. We used functional magnetic resonance imaging and a region of interest approach to investigate the brain activation and functional connectivity behind two forms of perceptual decision-making. More complex unbiased decisions on affective voices recruited an extended bilateral network consisting of the posterior inferior frontal cortex, the orbitofrontal cortex, the amygdala, and voice-sensitive areas in the auditory cortex. Less complex biased decisions on affective voices distinctly recruited the right mid inferior frontal cortex, pointing to a functional distinction in this region following decisional requirements. Furthermore, task-induced neural connectivity revealed stronger connections between these frontal, auditory, and limbic regions during unbiased relative to biased decision-making on affective voices. Together, the data shows that different types of perceptual decision-making on auditory emotions have distinct patterns of activations and functional coupling that follow the decisional strategies and cognitive mechanisms involved during these perceptual decisions.

  5. Metagenomic profiling of microbial composition and antibiotic resistance determinants in Puget Sound.

    PubMed

    Port, Jesse A; Wallace, James C; Griffith, William C; Faustman, Elaine M

    2012-01-01

    Human-health relevant impacts on marine ecosystems are increasing on both spatial and temporal scales. Traditional indicators for environmental health monitoring and microbial risk assessment have relied primarily on single species analyses and have provided only limited spatial and temporal information. More high-throughput, broad-scale approaches to evaluate these impacts are therefore needed to provide a platform for informing public health. This study uses shotgun metagenomics to survey the taxonomic composition and antibiotic resistance determinant content of surface water bacterial communities in the Puget Sound estuary. Metagenomic DNA was collected at six sites in Puget Sound in addition to one wastewater treatment plant (WWTP) that discharges into the Sound and pyrosequenced. A total of ~550 Mbp (1.4 million reads) were obtained, 22 Mbp of which could be assembled into contigs. While the taxonomic and resistance determinant profiles across the open Sound samples were similar, unique signatures were identified when comparing these profiles across the open Sound, a nearshore marina and WWTP effluent. The open Sound was dominated by α-Proteobacteria (in particular Rhodobacterales sp.), γ-Proteobacteria and Bacteroidetes while the marina and effluent had increased abundances of Actinobacteria, β-Proteobacteria and Firmicutes. There was a significant increase in the antibiotic resistance gene signal from the open Sound to marina to WWTP effluent, suggestive of a potential link to human impacts. Mobile genetic elements associated with environmental and pathogenic bacteria were also differentially abundant across the samples. This study is the first comparative metagenomic survey of Puget Sound and provides baseline data for further assessments of community composition and antibiotic resistance determinants in the environment using next generation sequencing technologies. In addition, these genomic signals of potential human impact can be used to guide initial public health monitoring as well as more targeted and functionally-based investigations.

  6. Functional Assays and Metagenomic Analyses Reveals Differences between the Microbial Communities Inhabiting the Soil Horizons of a Norway Spruce Plantation

    PubMed Central

    Uroz, Stéphane; Ioannidis, Panos; Lengelle, Juliette; Cébron, Aurélie; Morin, Emmanuelle; Buée, Marc; Martin, Francis

    2013-01-01

    In temperate ecosystems, acidic forest soils are among the most nutrient-poor terrestrial environments. In this context, the long-term differentiation of the forest soils into horizons may impact the assembly and the functions of the soil microbial communities. To gain a more comprehensive understanding of the ecology and functional potentials of these microbial communities, a suite of analyses including comparative metagenomics was applied on independent soil samples from a spruce plantation (Breuil-Chenue, France). The objectives were to assess whether the decreasing nutrient bioavailability and pH variations that naturally occurs between the organic and mineral horizons affects the soil microbial functional biodiversity. The 14 Gbp of pyrosequencing and Illumina sequences generated in this study revealed complex microbial communities dominated by bacteria. Detailed analyses showed that the organic soil horizon was significantly enriched in sequences related to Bacteria, Chordata, Arthropoda and Ascomycota. On the contrary the mineral horizon was significantly enriched in sequences related to Archaea. Our analyses also highlighted that the microbial communities inhabiting the two soil horizons differed significantly in their functional potentials according to functional assays and MG-RAST analyses, suggesting a functional specialisation of these microbial communities. Consistent with this specialisation, our shotgun metagenomic approach revealed a significant increase in the relative abundance of sequences related glycoside hydrolases in the organic horizon compared to the mineral horizon that was significantly enriched in glycoside transferases. This functional stratification according to the soil horizon was also confirmed by a significant correlation between the functional assays performed in this study and the functional metagenomic analyses. Together, our results suggest that the soil stratification and particularly the soil resource availability impact the functional diversity and to a lesser extent the taxonomic diversity of the bacterial communities. PMID:23418476

  7. Metagenomic Characterization of the Human Intestinal Microbiota in Fecal Samples from STEC-Infected Patients

    PubMed Central

    Gigliucci, Federica; von Meijenfeldt, F. A. Bastiaan; Knijn, Arnold; Michelacci, Valeria; Scavia, Gaia; Minelli, Fabio; Dutilh, Bas E.; Ahmad, Hamideh M.; Raangs, Gerwin C.; Friedrich, Alex W.; Rossen, John W. A.; Morabito, Stefano

    2018-01-01

    The human intestinal microbiota is a homeostatic ecosystem with a remarkable impact on human health and the disruption of this equilibrium leads to an increased susceptibility to infection by numerous pathogens. In this study, we used shotgun metagenomic sequencing and two different bioinformatic approaches, based on mapping of the reads onto databases and on the reconstruction of putative draft genomes, to investigate possible changes in the composition of the intestinal microbiota in samples from patients with Shiga Toxin-producing E. coli (STEC) infection compared to healthy and healed controls, collected during an outbreak caused by a STEC O26:H11 infection. Both the bioinformatic procedures used, produced similar result with a good resolution of the taxonomic profiles of the specimens. The stool samples collected from the STEC infected patients showed a lower abundance of the members of Bifidobacteriales and Clostridiales orders in comparison to controls where those microorganisms predominated. These differences seemed to correlate with the STEC infection although a flexion in the relative abundance of the Bifidobacterium genus, part of the Bifidobacteriales order, was observed also in samples from Crohn's disease patients, displaying a STEC-unrelated dysbiosis. The metagenomics also allowed to identify in the STEC positive samples, all the virulence traits present in the genomes of the STEC O26 that caused the outbreak as assessed through isolation of the epidemic strain and whole genome sequencing. The results shown represent a first evidence of the changes occurring in the intestinal microbiota of children in the course of STEC infection and indicate that metagenomics may be a promising tool for the culture-independent clinical diagnosis of the infection. PMID:29468143

  8. Diel Metagenomics and Metatranscriptomics of Elkhorn Slough Hypersaline Microbial Mat

    NASA Astrophysics Data System (ADS)

    Lee, J.; Detweiler, A. M.; Everroad, R. C.; Bebout, L. E.; Weber, P. K.; Pett-Ridge, J.; Bebout, B.

    2014-12-01

    To understand the variation in gene expression associated with the daytime oxygenic phototrophic and nighttime fermentation regimes seen in hypersaline microbial mats, a contiguous mat piece was subjected to sampling at regular intervals over a 24-hour diel period. Additionally, to understand the impact of sulfate reduction on biohydrogen consumption, molybdate was added to a parallel experiment in the same run. 4 metagenome and 12 metatranscriptome Illumina HiSeq lanes were completed over day / night, and control / molybdate experiments. Preliminary comparative examination of noon and midnight metatranscriptomic samples mapped using bowtie2 to reference genomes has revealed several notable results about the dominant mat-building cyanobacterium Microcoleus chthonoplastes PCC 7420. Dominant cyanobacterium M. chthonoplastes PCC 7420 shows expression in several pathways for nitrogen scavenging, including nitrogen fixation. Reads mapped to M. chthonoplastes PCC 7420 shows expression of two starch storage and utilization pathways, one as a starch-trehalose-maltose-glucose pathway, another through UDP-glucose-cellulose-β-1,4 glucan-glucose pathway. The overall trend of gene expression was primarily light driven up-regulation followed by down-regulation in dark, while much of the remaining expression profile appears to be constitutive. Co-assembly of quality-controlled reads from 4 metagenomes was performed using Ray Meta with progressively smaller K-mer sizes, with bins identified and filtered using principal component analysis of coverages from all libraries and a %GC filter, followed by reassembly of the remaining co-assembly reads and binned reads. Despite having relatively similar abundance profiles in each metagenome, this binning approach was able to distinctly resolve bins from dominant taxa, but also sulfate reducing bacteria that are desired for understanding molybdate inhibition. Bins generated from this iterative assembly process will be used for downstream mapping of transcriptomic reads as well as isolation efforts for Cyanobacteria-associated bacteria.

  9. Optimization and validation of sample preparation for metagenomic sequencing of viruses in clinical samples.

    PubMed

    Lewandowska, Dagmara W; Zagordi, Osvaldo; Geissberger, Fabienne-Desirée; Kufner, Verena; Schmutz, Stefan; Böni, Jürg; Metzner, Karin J; Trkola, Alexandra; Huber, Michael

    2017-08-08

    Sequence-specific PCR is the most common approach for virus identification in diagnostic laboratories. However, as specific PCR only detects pre-defined targets, novel virus strains or viruses not included in routine test panels will be missed. Recently, advances in high-throughput sequencing allow for virus-sequence-independent identification of entire virus populations in clinical samples, yet standardized protocols are needed to allow broad application in clinical diagnostics. Here, we describe a comprehensive sample preparation protocol for high-throughput metagenomic virus sequencing using random amplification of total nucleic acids from clinical samples. In order to optimize metagenomic sequencing for application in virus diagnostics, we tested different enrichment and amplification procedures on plasma samples spiked with RNA and DNA viruses. A protocol including filtration, nuclease digestion, and random amplification of RNA and DNA in separate reactions provided the best results, allowing reliable recovery of viral genomes and a good correlation of the relative number of sequencing reads with the virus input. We further validated our method by sequencing a multiplexed viral pathogen reagent containing a range of human viruses from different virus families. Our method proved successful in detecting the majority of the included viruses with high read numbers and compared well to other protocols in the field validated against the same reference reagent. Our sequencing protocol does work not only with plasma but also with other clinical samples such as urine and throat swabs. The workflow for virus metagenomic sequencing that we established proved successful in detecting a variety of viruses in different clinical samples. Our protocol supplements existing virus-specific detection strategies providing opportunities to identify atypical and novel viruses commonly not accounted for in routine diagnostic panels.

  10. Detailed analysis of metagenome datasets obtained from biogas-producing microbial communities residing in biogas reactors does not indicate the presence of putative pathogenic microorganisms

    PubMed Central

    2013-01-01

    Background In recent years biogas plants in Germany have been supposed to be involved in amplification and dissemination of pathogenic bacteria causing severe infections in humans and animals. In particular, biogas plants are discussed to contribute to the spreading of Escherichia coli infections in humans or chronic botulism in cattle caused by Clostridium botulinum. Metagenome datasets of microbial communities from an agricultural biogas plant as well as from anaerobic lab-scale digesters operating at different temperatures and conditions were analyzed for the presence of putative pathogenic bacteria and virulence determinants by various bioinformatic approaches. Results All datasets featured a low abundance of reads that were taxonomically assigned to the genus Escherichia or further selected genera comprising pathogenic species. Higher numbers of reads were taxonomically assigned to the genus Clostridium. However, only very few sequences were predicted to originate from pathogenic clostridial species. Moreover, mapping of metagenome reads to complete genome sequences of selected pathogenic bacteria revealed that not the pathogenic species itself, but only species that are more or less related to pathogenic ones are present in the fermentation samples analyzed. Likewise, known virulence determinants could hardly be detected. Only a marginal number of reads showed similarity to sequences described in the Microbial Virulence Database MvirDB such as those encoding protein toxins, virulence proteins or antibiotic resistance determinants. Conclusions Findings of this first study of metagenomic sequence reads of biogas producing microbial communities suggest that the risk of dissemination of pathogenic bacteria by application of digestates from biogas fermentations as fertilizers is low, because obtained results do not indicate the presence of putative pathogenic microorganisms in the samples analyzed. PMID:23557021

  11. Metagenomic Analysis of Antibiotic Resistance Genes in Dairy Cow Feces following Therapeutic Administration of Third Generation Cephalosporin

    PubMed Central

    Ray, Partha; Zhang, Tong; Pruden, Amy; Strickland, Michael; Knowlton, Katharine

    2015-01-01

    Although dairy manure is widely applied to land, it is relatively understudied compared to other livestock as a potential source of antibiotic resistance genes (ARGs) to the environment and ultimately to human pathogens. Ceftiofur, the most widely used antibiotic used in U.S. dairy cows, is a 3rd generation cephalosporin, a critically important class of antibiotics to human health. The objective of this study was to evaluate the effect of typical ceftiofur antibiotic treatment on the prevalence of ARGs in the fecal microbiome of dairy cows using a metagenomics approach. β-lactam ARGs were found to be elevated in feces from Holstein cows administered ceftiofur (n = 3) relative to control cows (n = 3). However, total numbers of ARGs across all classes were not measurably affected by ceftiofur treatment, likely because of dominance of unaffected tetracycline ARGs in the metagenomics libraries. Functional analysis via MG-RAST further revealed that ceftiofur treatment resulted in increases in gene sequences associated with “phages, prophages, transposable elements, and plasmids”, suggesting that this treatment also enriched the ability to horizontally transfer ARGs. Additional functional shifts were noted with ceftiofur treatment (e.g., increase in genes associated with stress, chemotaxis, and resistance to toxic compounds; decrease in genes associated with metabolism of aromatic compounds and cell division and cell cycle), along with measureable taxonomic shifts (increase in Bacterioidia and decrease in Actinobacteria). This study demonstrates that ceftiofur has a broad, measureable and immediate effect on the cow fecal metagenome. Given the importance of 3rd generation cephalospirins to human medicine, their continued use in dairy cattle should be carefully considered and waste treatment strategies to slow ARG dissemination from dairy cattle manure should be explored. PMID:26258869

  12. Machine Learning Meta-analysis of Large Metagenomic Datasets: Tools and Biological Insights.

    PubMed

    Pasolli, Edoardo; Truong, Duy Tin; Malik, Faizan; Waldron, Levi; Segata, Nicola

    2016-07-01

    Shotgun metagenomic analysis of the human associated microbiome provides a rich set of microbial features for prediction and biomarker discovery in the context of human diseases and health conditions. However, the use of such high-resolution microbial features presents new challenges, and validated computational tools for learning tasks are lacking. Moreover, classification rules have scarcely been validated in independent studies, posing questions about the generality and generalization of disease-predictive models across cohorts. In this paper, we comprehensively assess approaches to metagenomics-based prediction tasks and for quantitative assessment of the strength of potential microbiome-phenotype associations. We develop a computational framework for prediction tasks using quantitative microbiome profiles, including species-level relative abundances and presence of strain-specific markers. A comprehensive meta-analysis, with particular emphasis on generalization across cohorts, was performed in a collection of 2424 publicly available metagenomic samples from eight large-scale studies. Cross-validation revealed good disease-prediction capabilities, which were in general improved by feature selection and use of strain-specific markers instead of species-level taxonomic abundance. In cross-study analysis, models transferred between studies were in some cases less accurate than models tested by within-study cross-validation. Interestingly, the addition of healthy (control) samples from other studies to training sets improved disease prediction capabilities. Some microbial species (most notably Streptococcus anginosus) seem to characterize general dysbiotic states of the microbiome rather than connections with a specific disease. Our results in modelling features of the "healthy" microbiome can be considered a first step toward defining general microbial dysbiosis. The software framework, microbiome profiles, and metadata for thousands of samples are publicly available at http://segatalab.cibio.unitn.it/tools/metaml.

  13. Managing microbial communities for sequentially reconstruct genomes from complex metagenomes

    NASA Astrophysics Data System (ADS)

    Delmont, Tom O.; Vogel, Timothy M.; Simonet, Pascal

    2013-04-01

    Global understanding on environmental microbial communities is currently limited by the bottleneck of genome reconstruction. Soil is a typical example where individual cells are currently mostly uncultured and metagenomic datasets unassembled. In this study, the microbial community composition of a natural grassland soil was managed under several controlled selective pressures to experiment a "multi-evenness" stratagem for sequentially attempt to reconstruct genomes from a complex metagenome. While lowly represented in the natural community, several newly dominant genomes (an enrichment attaining 105 in some cases) were successfully reconstructed under various "harsh" tested conditions. These genomes belong to several genera including (but not restricted to) Leifsonia, Rhodanobacter, Bacillus, Ktedonobacter, Xanthomonas, Streptomyces and Burkholderia. So far, from 10 to 78% of generated metagenomic datasets were reconstructed, so providing access to more than 88 000 genes of known or unknown functions and to their genetic environment. Adaptative genes directly related to selective pressures were found, mostly in large plasmids. Functions of potential industrial interest (e.g., novel polyketide synthase modules in Streptomyces) were also discovered. Furthermore, an important phage infection snapshot (>1500X of coverage for the most represented phage) was observed among the Streptomyces population (three distinct genomes reconstructed) of a particular enrichment (mercury, 0.02g/kg) during the fourth month of incubation. This "divide and conquer" strategy could be applied to other environments and using auxiliary sequencing approaches like single cell to detect, connect and mine taxa and functions of interest while creating an extensive set of reference genomes from across the planet. Next limit could turn out to become our imagination defining novel selective pressures to sequentially make dominant the 1030 cells of the biosphere.

  14. Cost-benefit analysis of introducing next-generation sequencing (metagenomic) pathogen testing in the setting of pyrexia of unknown origin.

    PubMed

    Chai, Jia Hui; Lee, Chun Kiat; Lee, Hong Kai; Wong, Nicholas; Teo, Kahwee; Tan, Chuen Seng; Thokala, Praveen; Tang, Julian Wei-Tze; Tambyah, Paul Anantharajah; Oh, Vernon Min Sen; Loh, Tze Ping; Yoong, Joanne

    2018-01-01

    Pyrexia of unknown origin (PUO) is defined as a temperature of >38.3°C that lasts for >3 weeks, where no cause can be found despite appropriate investigation. Existing protocols for the work-up of PUO can be extensive and costly, motivating the application of recent advances in molecular diagnostics to pathogen testing. There have been many reports describing various analytical methods and performance of metagenomic pathogen testing in clinical samples but the economics of it has been less well studied. This study pragmatically evaluates the feasibility of introducing metagenomic testing in this setting by assessing the relative cost of clinically-relevant strategies employing this investigative tool under various cost and performance scenarios using Singapore as a demonstration case, and assessing the price and performance benchmarks, which would need to be achieved for metagenomic testing to be potentially considered financially viable relative to the current diagnostic standard. This study has some important limitations: we examined only impact of introducing the metagenomic test to the overall diagnostic cost and excluded costs associated with hospitalization and makes assumptions about the performance of the routine diagnostic tests, limiting the cost of metagenomic test, and the lack of further work-up after positive pathogen detection by the metagenomic test. However, these assumptions were necessary to keep the model within reasonable limits. In spite of these, the simplified presentation lends itself to the illustration of the key insights of our paper. In general, we find the use of metagenomic testing as second-line investigation is effectively dominated, and that use of metagenomic testing at first-line would typically require higher rates of detection or lower cost than currently available in order to be justifiable purely as a cost-saving measure. We conclude that current conditions do not warrant a widespread rush to deploy metagenomic testing to resolve any and all uncertainty, but rather as a front-line technology that should be used in specific contexts, as a supplement to rather than a replacement for careful clinical judgement.

  15. Discovery of new cellulases from the metagenome by a metagenomics-guided strategy.

    PubMed

    Yang, Chao; Xia, Yu; Qu, Hong; Li, An-Dong; Liu, Ruihua; Wang, Yubo; Zhang, Tong

    2016-01-01

    Energy shortage has become a global problem. Production of biofuels from renewable biomass resources is an inevitable trend of sustainable development. Cellulose is the most abundant and renewable resource in nature. Lack of new cellulases with unique properties has become the bottleneck of the efficient utilization of cellulose. Environmental metagenomes are regarded as huge reservoirs for a variety of cellulases. However, new cellulases cannot be obtained easily by functional screening of metagenomic libraries. In this work, a metagenomics-guided strategy for obtaining new cellulases from the metagenome was proposed. Metagenomic sequences of DNA extracted from the anaerobic beer lees converting consortium enriched at thermophilic conditions were assembled, and 23 glycoside hydrolase (GH) sequences affiliated with the GH family 5 were identified. Among the 23 GH sequences, three target sequences (designated as cel7482, cel3623 and cel36) showing low identity with those known GHs were chosen as the putative cellulase genes to be functionally expressed in Escherichia coli after PCR cloning. The three cellulases were classified into endo-β-1,4-glucanases by product pattern analysis. The recombinant cellulases were more active at pH 5.5 and within a temperature range of 60-70 °C. Computer-assisted 3D structure modeling indicated that the active residues in the active site of the recombinant cellulases were more similar to each other compared with non-active site residues. The recombinant cel7482 was extremely tolerant to 2 M NaCl, suggesting that cel7482 may be a halotolerant cellulase. Moreover, the recombinant cel7482 was shown to have an ability to resist three ionic liquids (ILs), which are widely used for cellulose pretreatment. Furthermore, active cel7482 was secreted by the twin-arginine translocation (Tat) pathway of Bacillus subtilis 168 into the culture medium, which facilitates the subsequent purification and reduces the formation of inclusion body in the context of overexpression. This study demonstrated a simple and efficient method for direct cloning of new cellulase genes from environmental metagenomes. In the future, the metagenomics-guided strategy may be applied to the high-throughput screening of new cellulases from environmental metagenomes.

  16. Identification of syntrophic acetate-oxidizing bacteria in anaerobic digesters by combined protein-based stable isotope probing and metagenomics

    PubMed Central

    Mosbæk, Freya; Kjeldal, Henrik; Mulat, Daniel G; Albertsen, Mads; Ward, Alastair J; Feilberg, Anders; Nielsen, Jeppe L

    2016-01-01

    Inhibition of anaerobic digestion through accumulation of volatile fatty acids occasionally occurs as the result of unbalanced growth between acidogenic bacteria and methanogens. A fast recovery is a prerequisite for establishing an economical production of biogas. However, very little is known about the microorganisms facilitating this recovery. In this study, we investigated the organisms involved by a novel approach of mapping protein-stable isotope probing (protein-SIP) onto a binned metagenome. Under simulation of acetate accumulation conditions, formations of 13C-labeled CO2 and CH4 were detected immediately following incubation with [U-13C]acetate, indicating high turnover rate of acetate. The identified 13C-labeled peptides were mapped onto a binned metagenome for improved identification of the organisms involved. The results revealed that Methanosarcina and Methanoculleus were actively involved in acetate turnover, as were five subspecies of Clostridia. The acetate-consuming organisms affiliating with Clostridia all contained the FTFHS gene for formyltetrahydrofolate synthetase, a key enzyme for reductive acetogenesis, indicating that these organisms are possible syntrophic acetate-oxidizing (SAO) bacteria that can facilitate acetate consumption via SAO, coupled with hydrogenotrophic methanogenesis (SAO-HM). This study represents the first study applying protein-SIP for analysis of complex biogas samples, a promising method for identifying key microorganisms utilizing specific pathways. PMID:27128991

  17. Metagenomic analysis reveals a functional signature for biomass degradation by cecal microbiota in the leaf-eating flying squirrel (Petaurista alborufus lena).

    PubMed

    Lu, Hsiao-Pei; Wang, Yu-bin; Huang, Shiao-Wei; Lin, Chung-Yen; Wu, Martin; Hsieh, Chih-hao; Yu, Hon-Tsen

    2012-09-10

    Animals co-evolve with their gut microbiota; the latter can perform complex metabolic reactions that cannot be done independently by the host. Although the importance of gut microbiota has been well demonstrated, there is a paucity of research regarding its role in foliage-foraging mammals with a specialized digestive system. In this study, a 16S rRNA gene survey and metagenomic sequencing were used to characterize genetic diversity and functional capability of cecal microbiota of the folivorous flying squirrel (Petaurista alborufus lena). Phylogenetic compositions of the cecal microbiota derived from 3 flying squirrels were dominated by Firmicutes. Based on end-sequences of fosmid clones from 1 flying squirrel, we inferred that microbial metabolism greatly contributed to intestinal functions, including degradation of carbohydrates, metabolism of proteins, and synthesis of vitamins. Moreover, 33 polysaccharide-degrading enzymes and 2 large genomic fragments containing a series of carbohydrate-associated genes were identified. Cecal microbiota of the leaf-eating flying squirrel have great metabolic potential for converting diverse plant materials into absorbable nutrients. The present study should serve as the basis for future investigations, using metagenomic approaches to elucidate the intricate mechanisms and interactions between host and gut microbiota of the flying squirrel digestive system, as well as other mammals with similar adaptations.

  18. Taxonomic and functional profiles of soil samples from Atlantic forest and Caatinga biomes in northeastern Brazil.

    PubMed

    Pacchioni, Ralfo G; Carvalho, Fabíola M; Thompson, Claudia E; Faustino, André L F; Nicolini, Fernanda; Pereira, Tatiana S; Silva, Rita C B; Cantão, Mauricio E; Gerber, Alexandra; Vasconcelos, Ana T R; Agnez-Lima, Lucymara F

    2014-06-01

    Although microorganisms play crucial roles in ecosystems, metagenomic analyses of soil samples are quite scarce, especially in the Southern Hemisphere. In this work, the microbial diversity of soil samples from an Atlantic Forest and Caatinga was analyzed using a metagenomic approach. Proteobacteria and Actinobacteria were the dominant phyla in both samples. Among which, a significant proportion of stress-resistant bacteria associated to organic matter degradation was found. Sequences related to metabolism of amino acids, nitrogen, and DNA and stress resistance were more frequent in Caatinga soil, while the forest sample showed the highest occurrence of hits annotated in phosphorous metabolism, defense mechanisms, and aromatic compound degradation subsystems. The principal component analysis (PCA) showed that our samples are close to the desert metagenomes in relation to taxonomy, but are more similar to rhizosphere microbiota in relation to the functional profiles. The data indicate that soil characteristics affect the taxonomic and functional distribution; these characteristics include low nutrient content, high drainage (both are sandy soils), vegetation, and exposure to stress. In both samples, a rapid turnover of organic matter with low greenhouse gas emission was suggested by the functional profiles obtained, reinforcing the importance of preserving natural areas. © 2014 The Authors. MicrobiologyOpen published by John Wiley & Sons Ltd.

  19. Identification of syntrophic acetate-oxidizing bacteria in anaerobic digesters by combined protein-based stable isotope probing and metagenomics.

    PubMed

    Mosbæk, Freya; Kjeldal, Henrik; Mulat, Daniel G; Albertsen, Mads; Ward, Alastair J; Feilberg, Anders; Nielsen, Jeppe L

    2016-10-01

    Inhibition of anaerobic digestion through accumulation of volatile fatty acids occasionally occurs as the result of unbalanced growth between acidogenic bacteria and methanogens. A fast recovery is a prerequisite for establishing an economical production of biogas. However, very little is known about the microorganisms facilitating this recovery. In this study, we investigated the organisms involved by a novel approach of mapping protein-stable isotope probing (protein-SIP) onto a binned metagenome. Under simulation of acetate accumulation conditions, formations of (13)C-labeled CO2 and CH4 were detected immediately following incubation with [U-(13)C]acetate, indicating high turnover rate of acetate. The identified (13)C-labeled peptides were mapped onto a binned metagenome for improved identification of the organisms involved. The results revealed that Methanosarcina and Methanoculleus were actively involved in acetate turnover, as were five subspecies of Clostridia. The acetate-consuming organisms affiliating with Clostridia all contained the FTFHS gene for formyltetrahydrofolate synthetase, a key enzyme for reductive acetogenesis, indicating that these organisms are possible syntrophic acetate-oxidizing (SAO) bacteria that can facilitate acetate consumption via SAO, coupled with hydrogenotrophic methanogenesis (SAO-HM). This study represents the first study applying protein-SIP for analysis of complex biogas samples, a promising method for identifying key microorganisms utilizing specific pathways.

  20. Detailed investigation of the microbial community in foaming activated sludge reveals novel foam formers

    PubMed Central

    Guo, Feng; Wang, Zhi-Ping; Yu, Ke; Zhang, T.

    2015-01-01

    Foaming of activated sludge (AS) causes adverse impacts on wastewater treatment operation and hygiene. In this study, we investigated the microbial communities of foam, foaming AS and non-foaming AS in a sewage treatment plant via deep-sequencing of the taxonomic marker genes 16S rRNA and mycobacterial rpoB and a metagenomic approach. In addition to Actinobacteria, many genera (e.g., Clostridium XI, Arcobacter, Flavobacterium) were more abundant in the foam than in the AS. On the other hand, deep-sequencing of rpoB did not detect any obligate pathogenic mycobacteria in the foam. We found that unknown factors other than the abundance of Gordonia sp. could determine the foaming process, because abundance of the same species was stable before and after a foaming event over six months. More interestingly, although the dominant Gordonia foam former was the closest with G. amarae, it was identified as an undescribed Gordonia species by referring to the 16S rRNA gene, gyrB and, most convincingly, the reconstructed draft genome from metagenomic reads. Our results, based on metagenomics and deep sequencing, reveal that foams are derived from diverse taxa, which expands previous understanding and provides new insight into the underlying complications of the foaming phenomenon in AS. PMID:25560234

  1. Genome-wide Selective Sweeps in Natural Bacterial Populations Revealed by Time-series Metagenomics

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Chan, Leong-Keat; Bendall, Matthew L.; Malfatti, Stephanie

    2014-06-18

    Multiple evolutionary models have been proposed to explain the formation of genetically and ecologically distinct bacterial groups. Time-series metagenomics enables direct observation of evolutionary processes in natural populations, and if applied over a sufficiently long time frame, this approach could capture events such as gene-specific or genome-wide selective sweeps. Direct observations of either process could help resolve how distinct groups form in natural microbial assemblages. Here, from a three-year metagenomic study of a freshwater lake, we explore changes in single nucleotide polymorphism (SNP) frequencies and patterns of gene gain and loss in populations of Chlorobiaceae and Methylophilaceae. SNP analyses revealedmore » substantial genetic heterogeneity within these populations, although the degree of heterogeneity varied considerably among closely related, co-occurring Methylophilaceae populations. SNP allele frequencies, as well as the relative abundance of certain genes, changed dramatically over time in each population. Interestingly, SNP diversity was purged at nearly every genome position in one of the Chlorobiaceae populations over the course of three years, while at the same time multiple genes either swept through or were swept from this population. These patterns were consistent with a genome-wide selective sweep, a process predicted by the ‘ecotype model’ of diversification, but not previously observed in natural populations.« less

  2. Genome-wide Selective Sweeps in Natural Bacterial Populations Revealed by Time-series Metagenomics

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Chan, Leong-Keat; Bendall, Matthew L.; Malfatti, Stephanie

    2014-05-12

    Multiple evolutionary models have been proposed to explain the formation of genetically and ecologically distinct bacterial groups. Time-series metagenomics enables direct observation of evolutionary processes in natural populations, and if applied over a sufficiently long time frame, this approach could capture events such as gene-specific or genome-wide selective sweeps. Direct observations of either process could help resolve how distinct groups form in natural microbial assemblages. Here, from a three-year metagenomic study of a freshwater lake, we explore changes in single nucleotide polymorphism (SNP) frequencies and patterns of gene gain and loss in populations of Chlorobiaceae and Methylophilaceae. SNP analyses revealedmore » substantial genetic heterogeneity within these populations, although the degree of heterogeneity varied considerably among closely related, co-occurring Methylophilaceae populations. SNP allele frequencies, as well as the relative abundance of certain genes, changed dramatically over time in each population. Interestingly, SNP diversity was purged at nearly every genome position in one of the Chlorobiaceae populations over the course of three years, while at the same time multiple genes either swept through or were swept from this population. These patterns were consistent with a genome-wide selective sweep, a process predicted by the ecotype model? of diversification, but not previously observed in natural populations.« less

  3. Unveiling the metabolic potential of two soil-derived microbial consortia selected on wheat straw

    PubMed Central

    Jiménez, Diego Javier; Chaves-Moreno, Diego; van Elsas, Jan Dirk

    2015-01-01

    Based on the premise that plant biomass can be efficiently degraded by mixed microbial cultures and/or enzymes, we here applied a targeted metagenomics-based approach to explore the metabolic potential of two forest soil-derived lignocellulolytic microbial consortia, denoted RWS and TWS (bred on wheat straw). Using the metagenomes of three selected batches of two experimental systems, about 1.2 Gb of sequence was generated. Comparative analyses revealed an overrepresentation of predicted carbohydrate transporters (ABC, TonB and phosphotransferases), two-component sensing systems and β-glucosidases/galactosidases in the two consortia as compared to the forest soil inoculum. Additionally, “profiling” of carbohydrate-active enzymes showed significant enrichments of several genes encoding glycosyl hydrolases of families GH2, GH43, GH92 and GH95. Sequence analyses revealed these to be most strongly affiliated to genes present on the genomes of Sphingobacterium, Bacteroides, Flavobacterium and Pedobacter spp. Assembly of the RWS and TWS metagenomes generated 16,536 and 15,902 contigs of ≥10 Kb, respectively. Thirteen contigs, containing 39 glycosyl hydrolase genes, constitute novel (hemi)cellulose utilization loci with affiliation to sequences primarily found in the Bacteroidetes. Overall, this study provides deep insight in the plant polysaccharide degrading capabilities of microbial consortia bred from forest soil, highlighting their biotechnological potential. PMID:26343383

  4. Intracellular screen to identify metagenomic clones that induce or inhibit a quorum-sensing biosensor.

    PubMed

    Williamson, Lynn L; Borlee, Bradley R; Schloss, Patrick D; Guan, Changhui; Allen, Heather K; Handelsman, Jo

    2005-10-01

    The goal of this study was to design and evaluate a rapid screen to identify metagenomic clones that produce biologically active small molecules. We built metagenomic libraries with DNA from soil on the floodplain of the Tanana River in Alaska. We extracted DNA directly from the soil and cloned it into fosmid and bacterial artificial chromosome vectors, constructing eight metagenomic libraries that contain 53,000 clones with inserts ranging from 1 to 190 kb. To identify clones of interest, we designed a high throughput "intracellular" screen, designated METREX, in which metagenomic DNA is in a host cell containing a biosensor for compounds that induce bacterial quorum sensing. If the metagenomic clone produces a quorum-sensing inducer, the cell produces green fluorescent protein (GFP) and can be identified by fluorescence microscopy or captured by fluorescence-activated cell sorting. Our initial screen identified 11 clones that induce and two that inhibit expression of GFP. The intracellular screen detected quorum-sensing inducers among metagenomic clones that a traditional overlay screen would not. One inducing clone carries a LuxI homologue that directs the synthesis of an N-acyl homoserine lactone quorum-sensing signal molecule. The LuxI homologue has 62% amino acid sequence identity to its closest match in GenBank, AmfI from Pseudomonas fluorescens, and is on a 78-kb insert that contains 67 open reading frames. Another inducing clone carries a gene with homology to homocitrate synthase. Our results demonstrate the power of an intracellular screen to identify functionally active clones and biologically active small molecules in metagenomic libraries.

  5. Interactive metagenomic visualization in a Web browser

    PubMed Central

    2011-01-01

    Background A critical output of metagenomic studies is the estimation of abundances of taxonomical or functional groups. The inherent uncertainty in assignments to these groups makes it important to consider both their hierarchical contexts and their prediction confidence. The current tools for visualizing metagenomic data, however, omit or distort quantitative hierarchical relationships and lack the facility for displaying secondary variables. Results Here we present Krona, a new visualization tool that allows intuitive exploration of relative abundances and confidences within the complex hierarchies of metagenomic classifications. Krona combines a variant of radial, space-filling displays with parametric coloring and interactive polar-coordinate zooming. The HTML5 and JavaScript implementation enables fully interactive charts that can be explored with any modern Web browser, without the need for installed software or plug-ins. This Web-based architecture also allows each chart to be an independent document, making them easy to share via e-mail or post to a standard Web server. To illustrate Krona's utility, we describe its application to various metagenomic data sets and its compatibility with popular metagenomic analysis tools. Conclusions Krona is both a powerful metagenomic visualization tool and a demonstration of the potential of HTML5 for highly accessible bioinformatic visualizations. Its rich and interactive displays facilitate more informed interpretations of metagenomic analyses, while its implementation as a browser-based application makes it extremely portable and easily adopted into existing analysis packages. Both the Krona rendering code and conversion tools are freely available under a BSD open-source license, and available from: http://krona.sourceforge.net. PMID:21961884

  6. Taxonomic and functional assignment of cloned sequences from high Andean forest soil metagenome.

    PubMed

    Montaña, José Salvador; Jiménez, Diego Javier; Hernández, Mónica; Angel, Tatiana; Baena, Sandra

    2012-02-01

    Total metagenomic DNA was isolated from high Andean forest soil and subjected to taxonomical and functional composition analyses by means of clone library generation and sequencing. The obtained yield of 1.7 μg of DNA/g of soil was used to construct a metagenomic library of approximately 20,000 clones (in the plasmid p-Bluescript II SK+) with an average insert size of 4 Kb, covering 80 Mb of the total metagenomic DNA. Metagenomic sequences near the plasmid cloning site were sequenced and them trimmed and assembled, obtaining 299 reads and 31 contigs (0.3 Mb). Taxonomic assignment of total sequences was performed by BLASTX, resulting in 68.8, 44.8 and 24.5% classification into taxonomic groups using the metagenomic RAST server v2.0, WebCARMA v1.0 online system and MetaGenome Analyzer v3.8 software, respectively. Most clone sequences were classified as Bacteria belonging to phlya Actinobacteria, Proteobacteria and Acidobacteria. Among the most represented orders were Actinomycetales (34% average), Rhizobiales, Burkholderiales and Myxococcales and with a greater number of sequences in the genus Mycobacterium (7% average), Frankia, Streptomyces and Bradyrhizobium. The vast majority of sequences were associated with the metabolism of carbohydrates, proteins, lipids and catalytic functions, such as phosphatases, glycosyltransferases, dehydrogenases, methyltransferases, dehydratases and epoxide hydrolases. In this study we compared different methods of taxonomic and functional assignment of metagenomic clone sequences to evaluate microbial diversity in an unexplored soil ecosystem, searching for putative enzymes of biotechnological interest and generating important information for further functional screening of clone libraries.

  7. Environmental Metagenomics: The Data Assembly and Data Analysis Perspectives

    NASA Astrophysics Data System (ADS)

    Kumar, Vinay; Maitra, S. S.; Shukla, Rohit Nandan

    2015-03-01

    Novel gene finding is one of the emerging fields in the environmental research. In the past decades the research was focused mainly on the discovery of microorganisms which were capable of degrading a particular compound. A lot of methods are available in literature about the cultivation and screening of these novel microorganisms. All of these methods are efficient for screening of microbes which can be cultivated in the laboratory. Microorganisms which live in extreme conditions like hot springs, frozen glaciers, acid mine drainage, etc. cannot be cultivated in the laboratory, this is because of incomplete knowledge about their growth requirements like temperature, nutrients and their mutual dependence on each other. The microbes that can be cultivated correspond only to less than 1 % of the total microbes which are present in the earth. Rest of the 99 % of uncultivated majority remains inaccessible. Metagenomics transcends the culture requirements of microbes. In metagenomics DNA is directly extracted from the environmental samples such as soil, seawater, acid mine drainage etc., followed by construction and screening of metagenomic library. With the ongoing research, a huge amount of metagenomic data is accumulating. Understanding this data is an essential step to extract novel genes of industrial importance. Various bioinformatics tools have been designed to analyze and annotate the data produced from the metagenome. The Bio-informatic requirements of metagenomics data analysis are different in theory and practice. This paper reviews the tools that are available for metagenomic data analysis and the capability such tools—what they can do and their web availability.

  8. Identification of fungi in shotgun metagenomics datasets

    PubMed Central

    Donovan, Paul D.; Gonzalez, Gabriel; Higgins, Desmond G.

    2018-01-01

    Metagenomics uses nucleic acid sequencing to characterize species diversity in different niches such as environmental biomes or the human microbiome. Most studies have used 16S rRNA amplicon sequencing to identify bacteria. However, the decreasing cost of sequencing has resulted in a gradual shift away from amplicon analyses and towards shotgun metagenomic sequencing. Shotgun metagenomic data can be used to identify a wide range of species, but have rarely been applied to fungal identification. Here, we develop a sequence classification pipeline, FindFungi, and use it to identify fungal sequences in public metagenome datasets. We focus primarily on animal metagenomes, especially those from pig and mouse microbiomes. We identified fungi in 39 of 70 datasets comprising 71 fungal species. At least 11 pathogenic species with zoonotic potential were identified, including Candida tropicalis. We identified Pseudogymnoascus species from 13 Antarctic soil samples initially analyzed for the presence of bacteria capable of degrading diesel oil. We also show that Candida tropicalis and Candida loboi are likely the same species. In addition, we identify several examples where contaminating DNA was erroneously included in fungal genome assemblies. PMID:29444186

  9. Scalable metagenomic taxonomy classification using a reference genome database

    PubMed Central

    Ames, Sasha K.; Hysom, David A.; Gardner, Shea N.; Lloyd, G. Scott; Gokhale, Maya B.; Allen, Jonathan E.

    2013-01-01

    Motivation: Deep metagenomic sequencing of biological samples has the potential to recover otherwise difficult-to-detect microorganisms and accurately characterize biological samples with limited prior knowledge of sample contents. Existing metagenomic taxonomic classification algorithms, however, do not scale well to analyze large metagenomic datasets, and balancing classification accuracy with computational efficiency presents a fundamental challenge. Results: A method is presented to shift computational costs to an off-line computation by creating a taxonomy/genome index that supports scalable metagenomic classification. Scalable performance is demonstrated on real and simulated data to show accurate classification in the presence of novel organisms on samples that include viruses, prokaryotes, fungi and protists. Taxonomic classification of the previously published 150 giga-base Tyrolean Iceman dataset was found to take <20 h on a single node 40 core large memory machine and provide new insights on the metagenomic contents of the sample. Availability: Software was implemented in C++ and is freely available at http://sourceforge.net/projects/lmat Contact: allen99@llnl.gov Supplementary information: Supplementary data are available at Bioinformatics online. PMID:23828782

  10. Critical Assessment of Metagenome Interpretation – a benchmark of computational metagenomics software

    PubMed Central

    Sczyrba, Alexander; Hofmann, Peter; Belmann, Peter; Koslicki, David; Janssen, Stefan; Dröge, Johannes; Gregor, Ivan; Majda, Stephan; Fiedler, Jessika; Dahms, Eik; Bremges, Andreas; Fritz, Adrian; Garrido-Oter, Ruben; Jørgensen, Tue Sparholt; Shapiro, Nicole; Blood, Philip D.; Gurevich, Alexey; Bai, Yang; Turaev, Dmitrij; DeMaere, Matthew Z.; Chikhi, Rayan; Nagarajan, Niranjan; Quince, Christopher; Meyer, Fernando; Balvočiūtė, Monika; Hansen, Lars Hestbjerg; Sørensen, Søren J.; Chia, Burton K. H.; Denis, Bertrand; Froula, Jeff L.; Wang, Zhong; Egan, Robert; Kang, Dongwan Don; Cook, Jeffrey J.; Deltel, Charles; Beckstette, Michael; Lemaitre, Claire; Peterlongo, Pierre; Rizk, Guillaume; Lavenier, Dominique; Wu, Yu-Wei; Singer, Steven W.; Jain, Chirag; Strous, Marc; Klingenberg, Heiner; Meinicke, Peter; Barton, Michael; Lingner, Thomas; Lin, Hsin-Hung; Liao, Yu-Chieh; Silva, Genivaldo Gueiros Z.; Cuevas, Daniel A.; Edwards, Robert A.; Saha, Surya; Piro, Vitor C.; Renard, Bernhard Y.; Pop, Mihai; Klenk, Hans-Peter; Göker, Markus; Kyrpides, Nikos C.; Woyke, Tanja; Vorholt, Julia A.; Schulze-Lefert, Paul; Rubin, Edward M.; Darling, Aaron E.; Rattei, Thomas; McHardy, Alice C.

    2018-01-01

    In metagenome analysis, computational methods for assembly, taxonomic profiling and binning are key components facilitating downstream biological data interpretation. However, a lack of consensus about benchmarking datasets and evaluation metrics complicates proper performance assessment. The Critical Assessment of Metagenome Interpretation (CAMI) challenge has engaged the global developer community to benchmark their programs on datasets of unprecedented complexity and realism. Benchmark metagenomes were generated from ~700 newly sequenced microorganisms and ~600 novel viruses and plasmids, including genomes with varying degrees of relatedness to each other and to publicly available ones and representing common experimental setups. Across all datasets, assembly and genome binning programs performed well for species represented by individual genomes, while performance was substantially affected by the presence of related strains. Taxonomic profiling and binning programs were proficient at high taxonomic ranks, with a notable performance decrease below the family level. Parameter settings substantially impacted performances, underscoring the importance of program reproducibility. While highlighting current challenges in computational metagenomics, the CAMI results provide a roadmap for software selection to answer specific research questions. PMID:28967888

  11. Identification and characterization of a novel fumarase gene by metagenome expression cloning from marine microorganisms

    PubMed Central

    2010-01-01

    Background Fumarase catalyzes the reversible hydration of fumarate to L-malate and is a key enzyme in the tricarboxylic acid (TCA) cycle and in amino acid metabolism. Fumarase is also used for the industrial production of L-malate from the substrate fumarate. Thermostable and high-activity fumarases from organisms that inhabit extreme environments may have great potential in industry, biotechnology, and basic research. The marine environment is highly complex and considered one of the main reservoirs of microbial diversity on the planet. However, most of the microorganisms are inaccessible in nature and are not easily cultivated in the laboratory. Metagenomic approaches provide a powerful tool to isolate and identify enzymes with novel biocatalytic activities for various biotechnological applications. Results A plasmid metagenomic library was constructed from uncultivated marine microorganisms within marine water samples. Through sequence-based screening of the DNA library, a gene encoding a novel fumarase (named FumF) was isolated. Amino acid sequence analysis revealed that the FumF protein shared the greatest homology with Class II fumarate hydratases from Bacteroides sp. 2_1_33B and Parabacteroides distasonis ATCC 8503 (26% identical and 43% similar). The putative fumarase gene was subcloned into pETBlue-2 vector and expressed in E. coli BL21(DE3)pLysS. The recombinant protein was purified to homogeneity. Functional characterization by high performance liquid chromatography confirmed that the recombinant FumF protein catalyzed the hydration of fumarate to form L-malate. The maximum activity for FumF protein occurred at pH 8.5 and 55°C in 5 mM Mg2+. The enzyme showed higher affinity and catalytic efficiency under optimal reaction conditions: Km= 0.48 mM, Vmax = 827 μM/min/mg, and kcat/Km = 1900 mM/s. Conclusions We isolated a novel fumarase gene, fumF, from a sequence-based screen of a plasmid metagenomic library from uncultivated marine microorganisms. The properties of FumF protein may be ideal for the industrial production of L-malate under higher temperature conditions. The identification of FumF underscores the potential of marine metagenome screening for novel biomolecules. PMID:21092234

  12. Machine Learning Leveraging Genomes from Metagenomes Identifies Influential Antibiotic Resistance Genes in the Infant Gut Microbiome

    PubMed Central

    Olm, Matthew R.; Morowitz, Michael J.

    2018-01-01

    ABSTRACT Antibiotic resistance in pathogens is extensively studied, and yet little is known about how antibiotic resistance genes of typical gut bacteria influence microbiome dynamics. Here, we leveraged genomes from metagenomes to investigate how genes of the premature infant gut resistome correspond to the ability of bacteria to survive under certain environmental and clinical conditions. We found that formula feeding impacts the resistome. Random forest models corroborated by statistical tests revealed that the gut resistome of formula-fed infants is enriched in class D beta-lactamase genes. Interestingly, Clostridium difficile strains harboring this gene are at higher abundance in formula-fed infants than C. difficile strains lacking this gene. Organisms with genes for major facilitator superfamily drug efflux pumps have higher replication rates under all conditions, even in the absence of antibiotic therapy. Using a machine learning approach, we identified genes that are predictive of an organism’s direction of change in relative abundance after administration of vancomycin and cephalosporin antibiotics. The most accurate results were obtained by reducing annotated genomic data to five principal components classified by boosted decision trees. Among the genes involved in predicting whether an organism increased in relative abundance after treatment are those that encode subclass B2 beta-lactamases and transcriptional regulators of vancomycin resistance. This demonstrates that machine learning applied to genome-resolved metagenomics data can identify key genes for survival after antibiotics treatment and predict how organisms in the gut microbiome will respond to antibiotic administration. IMPORTANCE The process of reconstructing genomes from environmental sequence data (genome-resolved metagenomics) allows unique insight into microbial systems. We apply this technique to investigate how the antibiotic resistance genes of bacteria affect their ability to flourish in the gut under various conditions. Our analysis reveals that strain-level selection in formula-fed infants drives enrichment of beta-lactamase genes in the gut resistome. Using genomes from metagenomes, we built a machine learning model to predict how organisms in the gut microbial community respond to perturbation by antibiotics. This may eventually have clinical applications. PMID:29359195

  13. Low Maternal Microbiota Sharing across Gut, Breast Milk and Vagina, as Revealed by 16S rRNA Gene and Reduced Metagenomic Sequencing.

    PubMed

    Avershina, Ekaterina; Angell, Inga Leena; Simpson, Melanie; Storrø, Ola; Øien, Torbjørn; Johnsen, Roar; Rudi, Knut

    2018-05-01

    The maternal microbiota plays an important role in infant gut colonization. In this work we have investigated which bacterial species are shared across the breast milk, vaginal and stool microbiotas of 109 women shortly before and after giving birth using 16S rRNA gene sequencing and a novel reduced metagenomic sequencing (RMS) approach in a subgroup of 16 women. All the species predicted by the 16S rRNA gene sequencing were also detected by RMS analysis and there was good correspondence between their relative abundances estimated by both approaches. Both approaches also demonstrate a low level of maternal microbiota sharing across the population and RMS analysis identified only two species common to most women and in all sample types ( Bifidobacterium longum and Enterococcus faecalis ). Breast milk was the only sample type that had significantly higher intra- than inter- individual similarity towards both vaginal and stool samples. We also searched our RMS dataset against an in silico generated reference database derived from bacterial isolates in the Human Microbiome Project. The use of this reference-based search enabled further separation of Bifidobacterium longum into Bifidobacterium longum ssp. longum and Bifidobacterium longum ssp. infantis . We also detected the Lactobacillus rhamnosus GG strain, which was used as a probiotic supplement by some women, demonstrating the potential of RMS approach for deeper taxonomic delineation and estimation.

  14. Low Maternal Microbiota Sharing across Gut, Breast Milk and Vagina, as Revealed by 16S rRNA Gene and Reduced Metagenomic Sequencing

    PubMed Central

    Angell, Inga Leena; Storrø, Ola; Øien, Torbjørn; Johnsen, Roar; Rudi, Knut

    2018-01-01

    The maternal microbiota plays an important role in infant gut colonization. In this work we have investigated which bacterial species are shared across the breast milk, vaginal and stool microbiotas of 109 women shortly before and after giving birth using 16S rRNA gene sequencing and a novel reduced metagenomic sequencing (RMS) approach in a subgroup of 16 women. All the species predicted by the 16S rRNA gene sequencing were also detected by RMS analysis and there was good correspondence between their relative abundances estimated by both approaches. Both approaches also demonstrate a low level of maternal microbiota sharing across the population and RMS analysis identified only two species common to most women and in all sample types (Bifidobacterium longum and Enterococcus faecalis). Breast milk was the only sample type that had significantly higher intra- than inter- individual similarity towards both vaginal and stool samples. We also searched our RMS dataset against an in silico generated reference database derived from bacterial isolates in the Human Microbiome Project. The use of this reference-based search enabled further separation of Bifidobacterium longum into Bifidobacterium longum ssp. longum and Bifidobacterium longum ssp. infantis. We also detected the Lactobacillus rhamnosus GG strain, which was used as a probiotic supplement by some women, demonstrating the potential of RMS approach for deeper taxonomic delineation and estimation. PMID:29724017

  15. Open resource metagenomics: a model for sharing metagenomic libraries.

    PubMed

    Neufeld, J D; Engel, K; Cheng, J; Moreno-Hagelsieb, G; Rose, D R; Charles, T C

    2011-11-30

    Both sequence-based and activity-based exploitation of environmental DNA have provided unprecedented access to the genomic content of cultivated and uncultivated microorganisms. Although researchers deposit microbial strains in culture collections and DNA sequences in databases, activity-based metagenomic studies typically only publish sequences from the hits retrieved from specific screens. Physical metagenomic libraries, conceptually similar to entire sequence datasets, are usually not straightforward to obtain by interested parties subsequent to publication. In order to facilitate unrestricted distribution of metagenomic libraries, we propose the adoption of open resource metagenomics, in line with the trend towards open access publishing, and similar to culture- and mutant-strain collections that have been the backbone of traditional microbiology and microbial genetics. The concept of open resource metagenomics includes preparation of physical DNA libraries, preferably in versatile vectors that facilitate screening in a diversity of host organisms, and pooling of clones so that single aliquots containing complete libraries can be easily distributed upon request. Database deposition of associated metadata and sequence data for each library provides researchers with information to select the most appropriate libraries for further research projects. As a starting point, we have established the Canadian MetaMicroBiome Library (CM(2)BL [1]). The CM(2)BL is a publicly accessible collection of cosmid libraries containing environmental DNA from soils collected from across Canada, spanning multiple biomes. The libraries were constructed such that the cloned DNA can be easily transferred to Gateway® compliant vectors, facilitating functional screening in virtually any surrogate microbial host for which there are available plasmid vectors. The libraries, which we are placing in the public domain, will be distributed upon request without restriction to members of both the academic research community and industry. This article invites the scientific community to adopt this philosophy of open resource metagenomics to extend the utility of functional metagenomics beyond initial publication, circumventing the need to start from scratch with each new research project.

  16. Open resource metagenomics: a model for sharing metagenomic libraries

    PubMed Central

    Neufeld, J.D.; Engel, K.; Cheng, J.; Moreno-Hagelsieb, G.; Rose, D.R.; Charles, T.C.

    2011-01-01

    Both sequence-based and activity-based exploitation of environmental DNA have provided unprecedented access to the genomic content of cultivated and uncultivated microorganisms. Although researchers deposit microbial strains in culture collections and DNA sequences in databases, activity-based metagenomic studies typically only publish sequences from the hits retrieved from specific screens. Physical metagenomic libraries, conceptually similar to entire sequence datasets, are usually not straightforward to obtain by interested parties subsequent to publication. In order to facilitate unrestricted distribution of metagenomic libraries, we propose the adoption of open resource metagenomics, in line with the trend towards open access publishing, and similar to culture- and mutant-strain collections that have been the backbone of traditional microbiology and microbial genetics. The concept of open resource metagenomics includes preparation of physical DNA libraries, preferably in versatile vectors that facilitate screening in a diversity of host organisms, and pooling of clones so that single aliquots containing complete libraries can be easily distributed upon request. Database deposition of associated metadata and sequence data for each library provides researchers with information to select the most appropriate libraries for further research projects. As a starting point, we have established the Canadian MetaMicroBiome Library (CM2BL [1]). The CM2BL is a publicly accessible collection of cosmid libraries containing environmental DNA from soils collected from across Canada, spanning multiple biomes. The libraries were constructed such that the cloned DNA can be easily transferred to Gateway® compliant vectors, facilitating functional screening in virtually any surrogate microbial host for which there are available plasmid vectors. The libraries, which we are placing in the public domain, will be distributed upon request without restriction to members of both the academic research community and industry. This article invites the scientific community to adopt this philosophy of open resource metagenomics to extend the utility of functional metagenomics beyond initial publication, circumventing the need to start from scratch with each new research project. PMID:22180823

  17. Profile and Fate of Bacterial Pathogens in Sewage Treatment Plants Revealed by High-Throughput Metagenomic Approach.

    PubMed

    Li, Bing; Ju, Feng; Cai, Lin; Zhang, Tong

    2015-09-01

    The broad-spectrum profile of bacterial pathogens and their fate in sewage treatment plants (STPs) were investigated using high-throughput sequencing based metagenomic approach. This novel approach could provide a united platform to standardize bacterial pathogen detection and realize direct comparison among different samples. Totally, 113 bacterial pathogen species were detected in eight samples including influent, effluent, activated sludge (AS), biofilm, and anaerobic digestion sludge with the abundances ranging from 0.000095% to 4.89%. Among these 113 bacterial pathogens, 79 species were reported in STPs for the first time. Specially, compared to AS in bulk mixed liquor, more pathogen species and higher total abundance were detected in upper foaming layer of AS. This suggests that the foaming layer of AS might impose more threat to onsite workers and citizens in the surrounding areas of STPs because pathogens in foaming layer are easily transferred into air and cause possible infections. The high removal efficiency (98.0%) of total bacterial pathogens suggests that AS treatment process is effective to remove most bacterial pathogens. Remarkable similarities of bacterial pathogen compositions between influent and human gut indicated that bacterial pathogen profiles in influents could well reflect the average bacterial pathogen communities of urban resident guts within the STP catchment area.

  18. The changing landscape of microbial biodiversity exploration and its implications for systematics.

    PubMed

    Hedlund, Brian P; Dodsworth, Jeremy A; Staley, James T

    2015-06-01

    A vast diversity of Bacteria and Archaea exists in nature that has evaded axenic culture. Advancements in single-cell genomics, metagenomics, and molecular microbial ecology approaches provide ever-improving insight into the biology of this so-called "microbial dark matter"; however, due to the International Code of Nomenclature of Prokaryotes, yet-uncultivated microorganisms are not accommodated in formal taxonomy regardless of the quantity or quality of data. Meanwhile, efforts to calibrate the existing taxonomy with phylogenetic anchors and genomic data are increasingly robust. The current climate provides an exciting opportunity to leverage rapidly expanding single-cell genomics and metagenomics datasets to improve the taxonomy of Bacteria and Archaea. However, this opportunity must be weighted carefully in light of the strengths and limitations of these approaches. We propose to expand the definition of the Candidatus taxonomy to include taxa, from the phylum level to the species level, that are described genomically, particularly when genomic work is coupled with advanced molecular ecology approaches to probe metabolic functions in situ. This system would preserve the rigor and value of traditional microbial systematics while enabling growth of a provisional taxonomic structure to facilitate communication about "dark" lineages on the tree of life. Copyright © 2015 Elsevier GmbH. All rights reserved.

  19. Bacterial community composition and predicted functional ecology of sponges, sediment and seawater from the thousand islands reef complex, West Java, Indonesia.

    PubMed

    de Voogd, Nicole J; Cleary, Daniel F R; Polónia, Ana R M; Gomes, Newton C M

    2015-04-01

    In the present study, we assessed the composition of Bacteria in four biotopes namely sediment, seawater and two sponge species (Stylissa massa and Xestospongia testudinaria) at four different reef sites in a coral reef ecosystem in West Java, Indonesia. In addition to this, we used a predictive metagenomic approach to estimate to what extent nitrogen metabolic pathways differed among bacterial communities from different biotopes. We observed marked differences in bacterial composition of the most abundant bacterial phyla, classes and orders among sponge species, water and sediment. Proteobacteria were by far the most abundant phylum in terms of both sequences and Operational Taxonomic Units (OTUs). Predicted counts for genes associated with the nitrogen metabolism suggested that several genes involved in the nitrogen cycle were enriched in sponge samples, including nosZ, nifD, nirK, norB and nrfA genes. Our data show that a combined barcoded pyrosequencing and predictive metagenomic approach can provide novel insights into the potential ecological functions of the microbial communities. Not only is this approach useful for our understanding of the vast microbial diversity found in sponges but also to understand the potential response of microbial communities to environmental change. © FEMS 2015. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  20. Intrinsic challenges in ancient microbiome reconstruction using 16S rRNA gene amplification.

    PubMed

    Ziesemer, Kirsten A; Mann, Allison E; Sankaranarayanan, Krithivasan; Schroeder, Hannes; Ozga, Andrew T; Brandt, Bernd W; Zaura, Egija; Waters-Rist, Andrea; Hoogland, Menno; Salazar-García, Domingo C; Aldenderfer, Mark; Speller, Camilla; Hendy, Jessica; Weston, Darlene A; MacDonald, Sandy J; Thomas, Gavin H; Collins, Matthew J; Lewis, Cecil M; Hofman, Corinne; Warinner, Christina

    2015-11-13

    To date, characterization of ancient oral (dental calculus) and gut (coprolite) microbiota has been primarily accomplished through a metataxonomic approach involving targeted amplification of one or more variable regions in the 16S rRNA gene. Specifically, the V3 region (E. coli 341-534) of this gene has been suggested as an excellent candidate for ancient DNA amplification and microbial community reconstruction. However, in practice this metataxonomic approach often produces highly skewed taxonomic frequency data. In this study, we use non-targeted (shotgun metagenomics) sequencing methods to better understand skewed microbial profiles observed in four ancient dental calculus specimens previously analyzed by amplicon sequencing. Through comparisons of microbial taxonomic counts from paired amplicon (V3 U341F/534R) and shotgun sequencing datasets, we demonstrate that extensive length polymorphisms in the V3 region are a consistent and major cause of differential amplification leading to taxonomic bias in ancient microbiome reconstructions based on amplicon sequencing. We conclude that systematic amplification bias confounds attempts to accurately reconstruct microbiome taxonomic profiles from 16S rRNA V3 amplicon data generated using universal primers. Because in silico analysis indicates that alternative 16S rRNA hypervariable regions will present similar challenges, we advocate for the use of a shotgun metagenomics approach in ancient microbiome reconstructions.

  1. Intrinsic challenges in ancient microbiome reconstruction using 16S rRNA gene amplification

    PubMed Central

    Ziesemer, Kirsten A.; Mann, Allison E.; Sankaranarayanan, Krithivasan; Schroeder, Hannes; Ozga, Andrew T.; Brandt, Bernd W.; Zaura, Egija; Waters-Rist, Andrea; Hoogland, Menno; Salazar-García, Domingo C.; Aldenderfer, Mark; Speller, Camilla; Hendy, Jessica; Weston, Darlene A.; MacDonald, Sandy J.; Thomas, Gavin H.; Collins, Matthew J.; Lewis, Cecil M.; Hofman, Corinne; Warinner, Christina

    2015-01-01

    To date, characterization of ancient oral (dental calculus) and gut (coprolite) microbiota has been primarily accomplished through a metataxonomic approach involving targeted amplification of one or more variable regions in the 16S rRNA gene. Specifically, the V3 region (E. coli 341–534) of this gene has been suggested as an excellent candidate for ancient DNA amplification and microbial community reconstruction. However, in practice this metataxonomic approach often produces highly skewed taxonomic frequency data. In this study, we use non-targeted (shotgun metagenomics) sequencing methods to better understand skewed microbial profiles observed in four ancient dental calculus specimens previously analyzed by amplicon sequencing. Through comparisons of microbial taxonomic counts from paired amplicon (V3 U341F/534R) and shotgun sequencing datasets, we demonstrate that extensive length polymorphisms in the V3 region are a consistent and major cause of differential amplification leading to taxonomic bias in ancient microbiome reconstructions based on amplicon sequencing. We conclude that systematic amplification bias confounds attempts to accurately reconstruct microbiome taxonomic profiles from 16S rRNA V3 amplicon data generated using universal primers. Because in silico analysis indicates that alternative 16S rRNA hypervariable regions will present similar challenges, we advocate for the use of a shotgun metagenomics approach in ancient microbiome reconstructions. PMID:26563586

  2. The Amordad database engine for metagenomics.

    PubMed

    Behnam, Ehsan; Smith, Andrew D

    2014-10-15

    Several technical challenges in metagenomic data analysis, including assembling metagenomic sequence data or identifying operational taxonomic units, are both significant and well known. These forms of analysis are increasingly cited as conceptually flawed, given the extreme variation within traditionally defined species and rampant horizontal gene transfer. Furthermore, computational requirements of such analysis have hindered content-based organization of metagenomic data at large scale. In this article, we introduce the Amordad database engine for alignment-free, content-based indexing of metagenomic datasets. Amordad places the metagenome comparison problem in a geometric context, and uses an indexing strategy that combines random hashing with a regular nearest neighbor graph. This framework allows refinement of the database over time by continual application of random hash functions, with the effect of each hash function encoded in the nearest neighbor graph. This eliminates the need to explicitly maintain the hash functions in order for query efficiency to benefit from the accumulated randomness. Results on real and simulated data show that Amordad can support logarithmic query time for identifying similar metagenomes even as the database size reaches into the millions. Source code, licensed under the GNU general public license (version 3) is freely available for download from http://smithlabresearch.org/amordad andrewds@usc.edu Supplementary data are available at Bioinformatics online. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  3. A high throughput screen for biomining cellulase activity from metagenomic libraries.

    PubMed

    Mewis, Keith; Taupp, Marcus; Hallam, Steven J

    2011-02-01

    Cellulose, the most abundant source of organic carbon on the planet, has wide-ranging industrial applications with increasing emphasis on biofuel production (1). Chemical methods to modify or degrade cellulose typically require strong acids and high temperatures. As such, enzymatic methods have become prominent in the bioconversion process. While the identification of active cellulases from bacterial and fungal isolates has been somewhat effective, the vast majority of microbes in nature resist laboratory cultivation. Environmental genomic, also known as metagenomic, screening approaches have great promise in bridging the cultivation gap in the search for novel bioconversion enzymes. Metagenomic screening approaches have successfully recovered novel cellulases from environments as varied as soils (2), buffalo rumen (3) and the termite hind-gut (4) using carboxymethylcellulose (CMC) agar plates stained with congo red dye (based on the method of Teather and Wood (5)). However, the CMC method is limited in throughput, is not quantitative and manifests a low signal to noise ratio (6). Other methods have been reported (7,8) but each use an agar plate-based assay, which is undesirable for high-throughput screening of large insert genomic libraries. Here we present a solution-based screen for cellulase activity using a chromogenic dinitrophenol (DNP)-cellobioside substrate (9). Our library was cloned into the pCC1 copy control fosmid to increase assay sensitivity through copy number induction (10). The method uses one-pot chemistry in 384-well microplates with the final readout provided as an absorbance measurement. This readout is quantitative, sensitive and automated with a throughput of up to 100X 384-well plates per day using a liquid handler and plate reader with attached stacking system.

  4. Trichoderma harzianum MTCC 5179 impacts the population and functional dynamics of microbial community in the rhizosphere of black pepper (Piper nigrum L.).

    PubMed

    Umadevi, Palaniyandi; Anandaraj, Muthuswamy; Srivastav, Vivek; Benjamin, Sailas

    2017-11-29

    Employing Illumina Hiseq whole genome metagenome sequencing approach, we studied the impact of Trichoderma harzianum on altering the microbial community and its functional dynamics in the rhizhosphere soil of black pepper (Piper nigrum L.). The metagenomic datasets from the rhizosphere with (treatment) and without (control) T. harzianum inoculation were annotated using dual approach, i.e., stand alone and MG-RAST. The probiotic application of T. harzianum in the rhizhosphere soil of black pepper impacted the population dynamics of rhizosphere bacteria, archae, eukaryote as reflected through the selective recruitment of bacteria [Acidobacteriaceae bacterium (p=1.24e-12), Candidatus koribacter versatilis (p=2.66e-10)] and fungi [(Fusarium oxysporum (p=0.013), Talaromyces stipitatus (p=0.219) and Pestalotiopsis fici (p=0.443)] in terms of abundance in population and bacterial chemotaxis (p=0.012), iron metabolism (p=2.97e-5) with the reduction in abundance for pathogenicity islands (p=7.30e-3), phages and prophages (p=7.30e-3) with regard to functional abundance. Interestingly, it was found that the enriched functional metagenomic signatures on phytoremediation such as benzoate transport and degradation (p=2.34e-4), and degradation of heterocyclic aromatic compounds (p=3.59e-13) in the treatment influenced the rhizosphere micro ecosystem favoring growth and health of pepper plant. The population dynamics and functional richness of rhizosphere ecosystem in black pepper influenced by the treatment with T. harzianum provides the ecological importance of T. harzianum in the cultivation of black pepper. Copyright © 2017 Sociedade Brasileira de Microbiologia. Published by Elsevier Editora Ltda. All rights reserved.

  5. Recovery of microbial communities and carbon cycling processes following drought manipulation in southern California

    NASA Astrophysics Data System (ADS)

    Allison, S. D.; Martiny, J. B. H.; Martiny, A.; Berlemont, R.; Treseder, K. K.; Goulden, M.; Brodie, E.

    2016-12-01

    Predicting the functioning of microbial communities under changing environmental conditions remains a key challenge in Earth system science. Metagenomics and other high-throughput molecular approaches can help address this challenge by revealing the functional potential of microbial communities. We coupled metagenomics with models and experimental manipulations to address microbial responses to drought in a California grassland ecosystem along with the consequences for carbon cycling. We developed an approach for extracting trait information from metagenomic data and asked: 1) What is the phylogenetic structure of drought response traits? 2) What is the relationship between these traits and those involved in carbohydrate degradation? 3) How do both classes of traits vary seasonally and with precipitation manipulation? 4) How resilient are these traits in the face of perturbation? We found that drought response traits are phylogenetically conserved at an equivalent of 5-8% ribosomal RNA gene sequence dissimilarity. Experimental drought treatment selected for the genetic potential to degrade starch, xylan, and mixed polysaccharides, suggesting a link between drought response and carbon cycling traits. In addition, microbial communities exposed to experimental drought showed a reduced potential to degrade plant biomass. Particularly among bacteria, seasonal drought had a larger impact on microbial composition, abundance, and carbohydrate-degrading genes compared to experimental drought. Bacterial communities were also more resilient to drought perturbation than fungal communities, which showed legacies of drought perturbation for up to three years. Altogether, these findings imply that microbial communities exhibit trait diversity that facilitates resilience but with substantial time lags and consequences for carbon turnover. This information is being used to inform new trait-based models that address the challenge of predicting microbial functioning under precipitation change.

  6. Metagenomic analysis of soil and freshwater from zoo agricultural area with organic fertilization

    PubMed Central

    Meneghine, Aylan K.; Nielsen, Shaun; Thomas, Torsten; Carareto Alves, Lucia Maria

    2017-01-01

    Microbial communities drive biogeochemical cycles in agricultural areas by decomposing organic materials and converting essential nutrients. Organic amendments improve soil quality by increasing the load of essential nutrients and enhancing the productivity. Additionally, fresh water used for irrigation can affect soil quality of agricultural soils, mainly due to the presence of microbial contaminants and pathogens. In this study, we investigated how microbial communities in irrigation water might contribute to the microbial diversity and function of soil. Whole-metagenomic sequencing approaches were used to investigate the taxonomic and the functional profiles of microbial communities present in fresh water used for irrigation, and in soil from a vegetable crop, which received fertilization with organic compost made from animal carcasses. The taxonomic analysis revealed that the most abundant genera were Polynucleobacter (~8% relative abundance) and Bacillus (~10%) in fresh water and soil from the vegetable crop, respectively. Low abundance (0.38%) of cyanobacterial groups were identified. Based on functional gene prediction, denitrification appears to be an important process in the soil community analysed here. Conversely, genes for nitrogen fixation were abundant in freshwater, indicating that the N-fixation plays a crucial role in this particular ecosystem. Moreover, pathogenicity islands, antibiotic resistance and potential virulence related genes were identified in both samples, but no toxigenic genes were detected. This study provides a better understanding of the community structure of an area under strong agricultural activity with regular irrigation and fertilization with an organic compost made from animal carcasses. Additionally, the use of a metagenomic approach to investigate fresh water quality proved to be a relevant method to evaluate its use in an agricultural ecosystem. PMID:29267397

  7. Genome Informed Trait-Based Models

    NASA Astrophysics Data System (ADS)

    Karaoz, U.; Cheng, Y.; Bouskill, N.; Tang, J.; Beller, H. R.; Brodie, E.; Riley, W. J.

    2013-12-01

    Trait-based approaches are powerful tools for representing microbial communities across both spatial and temporal scales within ecosystem models. Trait-based models (TBMs) represent the diversity of microbial taxa as stochastic assemblages with a distribution of traits constrained by trade-offs between these traits. Such representation with its built-in stochasticity allows the elucidation of the interactions between the microbes and their environment by reducing the complexity of microbial community diversity into a limited number of functional ';guilds' and letting them emerge across spatio-temporal scales. From the biogeochemical/ecosystem modeling perspective, the emergent properties of the microbial community could be directly translated into predictions of biogeochemical reaction rates and microbial biomass. The accuracy of TBMs depends on the identification of key traits of the microbial community members and on the parameterization of these traits. Current approaches to inform TBM parameterization are empirical (i.e., based on literature surveys). Advances in omic technologies (such as genomics, metagenomics, metatranscriptomics, and metaproteomics) pave the way to better-initialize models that can be constrained in a generic or site-specific fashion. Here we describe the coupling of metagenomic data to the development of a TBM representing the dynamics of metabolic guilds from an organic carbon stimulated groundwater microbial community. Illumina paired-end metagenomic data were collected from the community as it transitioned successively through electron-accepting conditions (nitrate-, sulfate-, and Fe(III)-reducing), and used to inform estimates of growth rates and the distribution of metabolic pathways (i.e., aerobic and anaerobic oxidation, fermentation) across a spatially resolved TBM. We use this model to evaluate the emergence of different metabolisms and predict rates of biogeochemical processes over time. We compare our results to observational outputs.

  8. Evaluating the metagenome of two sampling locations in the nasal cavity of cattle with bovine respiratory disease complex

    USDA-ARS?s Scientific Manuscript database

    Bovine respiratory disease complex (BRDC) is a multi-factor disease, and disease incidence may be associated with an animal’s commensal microbiota (metagenome). Evaluation of the animal’s resident microbiota in the nasal cavity may help us to understand the impact of the metagenome on incidence of ...

  9. Benchmarking methods and data sets for ligand enrichment assessment in virtual screening.

    PubMed

    Xia, Jie; Tilahun, Ermias Lemma; Reid, Terry-Elinor; Zhang, Liangren; Wang, Xiang Simon

    2015-01-01

    Retrospective small-scale virtual screening (VS) based on benchmarking data sets has been widely used to estimate ligand enrichments of VS approaches in the prospective (i.e. real-world) efforts. However, the intrinsic differences of benchmarking sets to the real screening chemical libraries can cause biased assessment. Herein, we summarize the history of benchmarking methods as well as data sets and highlight three main types of biases found in benchmarking sets, i.e. "analogue bias", "artificial enrichment" and "false negative". In addition, we introduce our recent algorithm to build maximum-unbiased benchmarking sets applicable to both ligand-based and structure-based VS approaches, and its implementations to three important human histone deacetylases (HDACs) isoforms, i.e. HDAC1, HDAC6 and HDAC8. The leave-one-out cross-validation (LOO CV) demonstrates that the benchmarking sets built by our algorithm are maximum-unbiased as measured by property matching, ROC curves and AUCs. Copyright © 2014 Elsevier Inc. All rights reserved.

  10. Benchmarking Methods and Data Sets for Ligand Enrichment Assessment in Virtual Screening

    PubMed Central

    Xia, Jie; Tilahun, Ermias Lemma; Reid, Terry-Elinor; Zhang, Liangren; Wang, Xiang Simon

    2014-01-01

    Retrospective small-scale virtual screening (VS) based on benchmarking data sets has been widely used to estimate ligand enrichments of VS approaches in the prospective (i.e. real-world) efforts. However, the intrinsic differences of benchmarking sets to the real screening chemical libraries can cause biased assessment. Herein, we summarize the history of benchmarking methods as well as data sets and highlight three main types of biases found in benchmarking sets, i.e. “analogue bias”, “artificial enrichment” and “false negative”. In addition, we introduced our recent algorithm to build maximum-unbiased benchmarking sets applicable to both ligand-based and structure-based VS approaches, and its implementations to three important human histone deacetylase (HDAC) isoforms, i.e. HDAC1, HDAC6 and HDAC8. The Leave-One-Out Cross-Validation (LOO CV) demonstrates that the benchmarking sets built by our algorithm are maximum-unbiased in terms of property matching, ROC curves and AUCs. PMID:25481478

  11. Missing continuous outcomes under covariate dependent missingness in cluster randomised trials

    PubMed Central

    Diaz-Ordaz, Karla; Bartlett, Jonathan W

    2016-01-01

    Attrition is a common occurrence in cluster randomised trials which leads to missing outcome data. Two approaches for analysing such trials are cluster-level analysis and individual-level analysis. This paper compares the performance of unadjusted cluster-level analysis, baseline covariate adjusted cluster-level analysis and linear mixed model analysis, under baseline covariate dependent missingness in continuous outcomes, in terms of bias, average estimated standard error and coverage probability. The methods of complete records analysis and multiple imputation are used to handle the missing outcome data. We considered four scenarios, with the missingness mechanism and baseline covariate effect on outcome either the same or different between intervention groups. We show that both unadjusted cluster-level analysis and baseline covariate adjusted cluster-level analysis give unbiased estimates of the intervention effect only if both intervention groups have the same missingness mechanisms and there is no interaction between baseline covariate and intervention group. Linear mixed model and multiple imputation give unbiased estimates under all four considered scenarios, provided that an interaction of intervention and baseline covariate is included in the model when appropriate. Cluster mean imputation has been proposed as a valid approach for handling missing outcomes in cluster randomised trials. We show that cluster mean imputation only gives unbiased estimates when missingness mechanism is the same between the intervention groups and there is no interaction between baseline covariate and intervention group. Multiple imputation shows overcoverage for small number of clusters in each intervention group. PMID:27177885

  12. Missing continuous outcomes under covariate dependent missingness in cluster randomised trials.

    PubMed

    Hossain, Anower; Diaz-Ordaz, Karla; Bartlett, Jonathan W

    2017-06-01

    Attrition is a common occurrence in cluster randomised trials which leads to missing outcome data. Two approaches for analysing such trials are cluster-level analysis and individual-level analysis. This paper compares the performance of unadjusted cluster-level analysis, baseline covariate adjusted cluster-level analysis and linear mixed model analysis, under baseline covariate dependent missingness in continuous outcomes, in terms of bias, average estimated standard error and coverage probability. The methods of complete records analysis and multiple imputation are used to handle the missing outcome data. We considered four scenarios, with the missingness mechanism and baseline covariate effect on outcome either the same or different between intervention groups. We show that both unadjusted cluster-level analysis and baseline covariate adjusted cluster-level analysis give unbiased estimates of the intervention effect only if both intervention groups have the same missingness mechanisms and there is no interaction between baseline covariate and intervention group. Linear mixed model and multiple imputation give unbiased estimates under all four considered scenarios, provided that an interaction of intervention and baseline covariate is included in the model when appropriate. Cluster mean imputation has been proposed as a valid approach for handling missing outcomes in cluster randomised trials. We show that cluster mean imputation only gives unbiased estimates when missingness mechanism is the same between the intervention groups and there is no interaction between baseline covariate and intervention group. Multiple imputation shows overcoverage for small number of clusters in each intervention group.

  13. Forest soil microbial communities: Using metagenomic approaches to survey permanent plots

    Treesearch

    Amy L. Ross-Davis; Jane E. Stewart; John W. Hanna; John D. Shaw; Andrew T. Hudak; Theresa B. Jain; Robert J. Denner; Russell T. Graham; Deborah S. Page-Dumroese; Joanne M. Tirocke; Mee-Sook Kim; Ned B. Klopfenstein

    2014-01-01

    Forest soil ecosystems include some of the most complex microbial communities on Earth (Fierer et al. 2012). These assemblages of archaea, bacteria, fungi, and protists play essential roles in biogeochemical cycles (van der Heijden et al. 2008) and account for considerable terrestrial biomass (Nielsen et al. 2011). Yet, determining the microbial composition of forest...

  14. Bases for qudits from a nonstandard approach to SU(2)

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kibler, M. R., E-mail: kibler@ipnl.in2p3.fr

    2011-06-15

    Bases of finite-dimensional Hilbert spaces (in dimension d) of relevance for quantum information and quantum computation are constructed from angular momentum theory and su(2) Lie algebraic methods. We report on a formula for deriving in one step the (1 + p)p qupits (i.e., qudits with d = p a prime integer) of a complete set of 1 + p mutually unbiased bases in C{sup p}. Repeated application of the formula can be used for generating mutually unbiased bases in C{sup d} with d = p{sup e} (e {>=} 2) a power of a prime integer. A connection between mutually unbiasedmore » bases and the unitary group SU(d) is briefly discussed in the case d = p{sup e}.« less

  15. Comparative fecal metagenomics unveils unique functional capacity of the swine gut

    PubMed Central

    2011-01-01

    Background Uncovering the taxonomic composition and functional capacity within the swine gut microbial consortia is of great importance to animal physiology and health as well as to food and water safety due to the presence of human pathogens in pig feces. Nonetheless, limited information on the functional diversity of the swine gut microbiome is available. Results Analysis of 637, 722 pyrosequencing reads (130 megabases) generated from Yorkshire pig fecal DNA extracts was performed to help better understand the microbial diversity and largely unknown functional capacity of the swine gut microbiome. Swine fecal metagenomic sequences were annotated using both MG-RAST and JGI IMG/M-ER pipelines. Taxonomic analysis of metagenomic reads indicated that swine fecal microbiomes were dominated by Firmicutes and Bacteroidetes phyla. At a finer phylogenetic resolution, Prevotella spp. dominated the swine fecal metagenome, while some genes associated with Treponema and Anareovibrio species were found to be exclusively within the pig fecal metagenomic sequences analyzed. Functional analysis revealed that carbohydrate metabolism was the most abundant SEED subsystem, representing 13% of the swine metagenome. Genes associated with stress, virulence, cell wall and cell capsule were also abundant. Virulence factors associated with antibiotic resistance genes with highest sequence homology to genes in Bacteroidetes, Clostridia, and Methanosarcina were numerous within the gene families unique to the swine fecal metagenomes. Other abundant proteins unique to the distal swine gut shared high sequence homology to putative carbohydrate membrane transporters. Conclusions The results from this metagenomic survey demonstrated the presence of genes associated with resistance to antibiotics and carbohydrate metabolism suggesting that the swine gut microbiome may be shaped by husbandry practices. PMID:21575148

  16. Consensus statement: Virus taxonomy in the age of metagenomics.

    PubMed

    Simmonds, Peter; Adams, Mike J; Benkő, Mária; Breitbart, Mya; Brister, J Rodney; Carstens, Eric B; Davison, Andrew J; Delwart, Eric; Gorbalenya, Alexander E; Harrach, Balázs; Hull, Roger; King, Andrew M Q; Koonin, Eugene V; Krupovic, Mart; Kuhn, Jens H; Lefkowitz, Elliot J; Nibert, Max L; Orton, Richard; Roossinck, Marilyn J; Sabanadzovic, Sead; Sullivan, Matthew B; Suttle, Curtis A; Tesh, Robert B; van der Vlugt, René A; Varsani, Arvind; Zerbini, F Murilo

    2017-03-01

    The number and diversity of viral sequences that are identified in metagenomic data far exceeds that of experimentally characterized virus isolates. In a recent workshop, a panel of experts discussed the proposal that, with appropriate quality control, viruses that are known only from metagenomic data can, and should be, incorporated into the official classification scheme of the International Committee on Taxonomy of Viruses (ICTV). Although a taxonomy that is based on metagenomic sequence data alone represents a substantial departure from the traditional reliance on phenotypic properties, the development of a robust framework for sequence-based virus taxonomy is indispensable for the comprehensive characterization of the global virome. In this Consensus Statement article, we consider the rationale for why metagenomic sequence data should, and how it can, be incorporated into the ICTV taxonomy, and present proposals that have been endorsed by the Executive Committee of the ICTV.

  17. Signal Processing for Metagenomics: Extracting Information from the Soup

    PubMed Central

    Rosen, Gail L.; Sokhansanj, Bahrad A.; Polikar, Robi; Bruns, Mary Ann; Russell, Jacob; Garbarine, Elaine; Essinger, Steve; Yok, Non

    2009-01-01

    Traditionally, studies in microbial genomics have focused on single-genomes from cultured species, thereby limiting their focus to the small percentage of species that can be cultured outside their natural environment. Fortunately, recent advances in high-throughput sequencing and computational analyses have ushered in the new field of metagenomics, which aims to decode the genomes of microbes from natural communities without the need for cultivation. Although metagenomic studies have shed a great deal of insight into bacterial diversity and coding capacity, several computational challenges remain due to the massive size and complexity of metagenomic sequence data. Current tools and techniques are reviewed in this paper which address challenges in 1) genomic fragment annotation, 2) phylogenetic reconstruction, 3) functional classification of samples, and 4) interpreting complementary metaproteomics and metametabolomics data. Also surveyed are important applications of metagenomic studies, including microbial forensics and the roles of microbial communities in shaping human health and soil ecology. PMID:20436876

  18. MetaBAT, an efficient tool for accurately reconstructing single genomes from complex microbial communities

    DOE PAGES

    Kang, Dongwan D.; Froula, Jeff; Egan, Rob; ...

    2015-01-01

    Grouping large genomic fragments assembled from shotgun metagenomic sequences to deconvolute complex microbial communities, or metagenome binning, enables the study of individual organisms and their interactions. Because of the complex nature of these communities, existing metagenome binning methods often miss a large number of microbial species. In addition, most of the tools are not scalable to large datasets. Here we introduce automated software called MetaBAT that integrates empirical probabilistic distances of genome abundance and tetranucleotide frequency for accurate metagenome binning. MetaBAT outperforms alternative methods in accuracy and computational efficiency on both synthetic and real metagenome datasets. Lastly, it automatically formsmore » hundreds of high quality genome bins on a very large assembly consisting millions of contigs in a matter of hours on a single node. MetaBAT is open source software and available at https://bitbucket.org/berkeleylab/metabat.« less

  19. Experimental Design and Bioinformatics Analysis for the Application of Metagenomics in Environmental Sciences and Biotechnology.

    PubMed

    Ju, Feng; Zhang, Tong

    2015-11-03

    Recent advances in DNA sequencing technologies have prompted the widespread application of metagenomics for the investigation of novel bioresources (e.g., industrial enzymes and bioactive molecules) and unknown biohazards (e.g., pathogens and antibiotic resistance genes) in natural and engineered microbial systems across multiple disciplines. This review discusses the rigorous experimental design and sample preparation in the context of applying metagenomics in environmental sciences and biotechnology. Moreover, this review summarizes the principles, methodologies, and state-of-the-art bioinformatics procedures, tools and database resources for metagenomics applications and discusses two popular strategies (analysis of unassembled reads versus assembled contigs/draft genomes) for quantitative or qualitative insights of microbial community structure and functions. Overall, this review aims to facilitate more extensive application of metagenomics in the investigation of uncultured microorganisms, novel enzymes, microbe-environment interactions, and biohazards in biotechnological applications where microbial communities are engineered for bioenergy production, wastewater treatment, and bioremediation.

  20. Stratification of co-evolving genomic groups using ranked phylogenetic profiles

    PubMed Central

    Freilich, Shiri; Goldovsky, Leon; Gottlieb, Assaf; Blanc, Eric; Tsoka, Sophia; Ouzounis, Christos A

    2009-01-01

    Background Previous methods of detecting the taxonomic origins of arbitrary sequence collections, with a significant impact to genome analysis and in particular metagenomics, have primarily focused on compositional features of genomes. The evolutionary patterns of phylogenetic distribution of genes or proteins, represented by phylogenetic profiles, provide an alternative approach for the detection of taxonomic origins, but typically suffer from low accuracy. Herein, we present rank-BLAST, a novel approach for the assignment of protein sequences into genomic groups of the same taxonomic origin, based on the ranking order of phylogenetic profiles of target genes or proteins across the reference database. Results The rank-BLAST approach is validated by computing the phylogenetic profiles of all sequences for five distinct microbial species of varying degrees of phylogenetic proximity, against a reference database of 243 fully sequenced genomes. The approach - a combination of sequence searches, statistical estimation and clustering - analyses the degree of sequence divergence between sets of protein sequences and allows the classification of protein sequences according to the species of origin with high accuracy, allowing taxonomic classification of 64% of the proteins studied. In most cases, a main cluster is detected, representing the corresponding species. Secondary, functionally distinct and species-specific clusters exhibit different patterns of phylogenetic distribution, thus flagging gene groups of interest. Detailed analyses of such cases are provided as examples. Conclusion Our results indicate that the rank-BLAST approach can capture the taxonomic origins of sequence collections in an accurate and efficient manner. The approach can be useful both for the analysis of genome evolution and the detection of species groups in metagenomics samples. PMID:19860884

  1. Metagenome and metatranscriptome data for Rifle CMT-03 laboratory microcosm experiment completed in April 2014

    DOE Data Explorer

    Jewell, Talia [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Karaoz, Ulas [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Bill, Markus [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Chakraborty, Romy [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Brodie, Eoin L [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Williams, Kenneth Hurst [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States); Beller, Harry R [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States)

    2014-04-01

    Sediment samples were collected during installation of monitoring borehole CMT-03. Microcosms were constructed and inoculated under anerobic conditions with these sediments and anaerobic Rifle artificial groundwater. Microcosm metagenomes and metatranscriptomes were sampled every 5 days for a period of 20 days. The dataset gives gene-level annotations, binning, metagenomic and metatranscriptomic coverages for these microcosms.

  2. Ray Meta: scalable de novo metagenome assembly and profiling

    PubMed Central

    2012-01-01

    Voluminous parallel sequencing datasets, especially metagenomic experiments, require distributed computing for de novo assembly and taxonomic profiling. Ray Meta is a massively distributed metagenome assembler that is coupled with Ray Communities, which profiles microbiomes based on uniquely-colored k-mers. It can accurately assemble and profile a three billion read metagenomic experiment representing 1,000 bacterial genomes of uneven proportions in 15 hours with 1,024 processor cores, using only 1.5 GB per core. The software will facilitate the processing of large and complex datasets, and will help in generating biological insights for specific environments. Ray Meta is open source and available at http://denovoassembler.sf.net. PMID:23259615

  3. Metazen – metadata capture for metagenomes

    PubMed Central

    2014-01-01

    Background As the impact and prevalence of large-scale metagenomic surveys grow, so does the acute need for more complete and standards compliant metadata. Metadata (data describing data) provides an essential complement to experimental data, helping to answer questions about its source, mode of collection, and reliability. Metadata collection and interpretation have become vital to the genomics and metagenomics communities, but considerable challenges remain, including exchange, curation, and distribution. Currently, tools are available for capturing basic field metadata during sampling, and for storing, updating and viewing it. Unfortunately, these tools are not specifically designed for metagenomic surveys; in particular, they lack the appropriate metadata collection templates, a centralized storage repository, and a unique ID linking system that can be used to easily port complete and compatible metagenomic metadata into widely used assembly and sequence analysis tools. Results Metazen was developed as a comprehensive framework designed to enable metadata capture for metagenomic sequencing projects. Specifically, Metazen provides a rapid, easy-to-use portal to encourage early deposition of project and sample metadata. Conclusions Metazen is an interactive tool that aids users in recording their metadata in a complete and valid format. A defined set of mandatory fields captures vital information, while the option to add fields provides flexibility. PMID:25780508

  4. Metazen - metadata capture for metagenomes.

    PubMed

    Bischof, Jared; Harrison, Travis; Paczian, Tobias; Glass, Elizabeth; Wilke, Andreas; Meyer, Folker

    2014-01-01

    As the impact and prevalence of large-scale metagenomic surveys grow, so does the acute need for more complete and standards compliant metadata. Metadata (data describing data) provides an essential complement to experimental data, helping to answer questions about its source, mode of collection, and reliability. Metadata collection and interpretation have become vital to the genomics and metagenomics communities, but considerable challenges remain, including exchange, curation, and distribution. Currently, tools are available for capturing basic field metadata during sampling, and for storing, updating and viewing it. Unfortunately, these tools are not specifically designed for metagenomic surveys; in particular, they lack the appropriate metadata collection templates, a centralized storage repository, and a unique ID linking system that can be used to easily port complete and compatible metagenomic metadata into widely used assembly and sequence analysis tools. Metazen was developed as a comprehensive framework designed to enable metadata capture for metagenomic sequencing projects. Specifically, Metazen provides a rapid, easy-to-use portal to encourage early deposition of project and sample metadata. Metazen is an interactive tool that aids users in recording their metadata in a complete and valid format. A defined set of mandatory fields captures vital information, while the option to add fields provides flexibility.

  5. MetaStorm: A Public Resource for Customizable Metagenomics Annotation

    PubMed Central

    Arango-Argoty, Gustavo; Singh, Gargi; Heath, Lenwood S.; Pruden, Amy; Xiao, Weidong; Zhang, Liqing

    2016-01-01

    Metagenomics is a trending research area, calling for the need to analyze large quantities of data generated from next generation DNA sequencing technologies. The need to store, retrieve, analyze, share, and visualize such data challenges current online computational systems. Interpretation and annotation of specific information is especially a challenge for metagenomic data sets derived from environmental samples, because current annotation systems only offer broad classification of microbial diversity and function. Moreover, existing resources are not configured to readily address common questions relevant to environmental systems. Here we developed a new online user-friendly metagenomic analysis server called MetaStorm (http://bench.cs.vt.edu/MetaStorm/), which facilitates customization of computational analysis for metagenomic data sets. Users can upload their own reference databases to tailor the metagenomics annotation to focus on various taxonomic and functional gene markers of interest. MetaStorm offers two major analysis pipelines: an assembly-based annotation pipeline and the standard read annotation pipeline used by existing web servers. These pipelines can be selected individually or together. Overall, MetaStorm provides enhanced interactive visualization to allow researchers to explore and manipulate taxonomy and functional annotation at various levels of resolution. PMID:27632579

  6. MetaStorm: A Public Resource for Customizable Metagenomics Annotation.

    PubMed

    Arango-Argoty, Gustavo; Singh, Gargi; Heath, Lenwood S; Pruden, Amy; Xiao, Weidong; Zhang, Liqing

    2016-01-01

    Metagenomics is a trending research area, calling for the need to analyze large quantities of data generated from next generation DNA sequencing technologies. The need to store, retrieve, analyze, share, and visualize such data challenges current online computational systems. Interpretation and annotation of specific information is especially a challenge for metagenomic data sets derived from environmental samples, because current annotation systems only offer broad classification of microbial diversity and function. Moreover, existing resources are not configured to readily address common questions relevant to environmental systems. Here we developed a new online user-friendly metagenomic analysis server called MetaStorm (http://bench.cs.vt.edu/MetaStorm/), which facilitates customization of computational analysis for metagenomic data sets. Users can upload their own reference databases to tailor the metagenomics annotation to focus on various taxonomic and functional gene markers of interest. MetaStorm offers two major analysis pipelines: an assembly-based annotation pipeline and the standard read annotation pipeline used by existing web servers. These pipelines can be selected individually or together. Overall, MetaStorm provides enhanced interactive visualization to allow researchers to explore and manipulate taxonomy and functional annotation at various levels of resolution.

  7. Symbiosis insights through metagenomic analysis of a microbial consortium.

    PubMed

    Woyke, Tanja; Teeling, Hanno; Ivanova, Natalia N; Huntemann, Marcel; Richter, Michael; Gloeckner, Frank Oliver; Boffelli, Dario; Anderson, Iain J; Barry, Kerrie W; Shapiro, Harris J; Szeto, Ernest; Kyrpides, Nikos C; Mussmann, Marc; Amann, Rudolf; Bergin, Claudia; Ruehland, Caroline; Rubin, Edward M; Dubilier, Nicole

    2006-10-26

    Symbioses between bacteria and eukaryotes are ubiquitous, yet our understanding of the interactions driving these associations is hampered by our inability to cultivate most host-associated microbes. Here we use a metagenomic approach to describe four co-occurring symbionts from the marine oligochaete Olavius algarvensis, a worm lacking a mouth, gut and nephridia. Shotgun sequencing and metabolic pathway reconstruction revealed that the symbionts are sulphur-oxidizing and sulphate-reducing bacteria, all of which are capable of carbon fixation, thus providing the host with multiple sources of nutrition. Molecular evidence for the uptake and recycling of worm waste products by the symbionts suggests how the worm could eliminate its excretory system, an adaptation unique among annelid worms. We propose a model that describes how the versatile metabolism within this symbiotic consortium provides the host with an optimal energy supply as it shuttles between the upper oxic and lower anoxic coastal sediments that it inhabits.

  8. Metagenomic analyses of drinking water receiving different disinfection treatments.

    PubMed

    Gomez-Alvarez, Vicente; Revetta, Randy P; Santo Domingo, Jorge W

    2012-09-01

    A metagenome-based approach was used to assess the taxonomic affiliation and function potential of microbial populations in free-chlorine-treated (CHL) and monochloramine-treated (CHM) drinking water (DW). In all, 362,640 (averaging 544 bp) and 155,593 (averaging 554 bp) pyrosequencing reads were analyzed for the CHL and CHM samples, respectively. Most annotated proteins were found to be of bacterial origin, although eukaryotic, archaeal, and viral proteins were also identified. Differences in community structure and function were noted. Most notably, Legionella-like genes were more abundant in the CHL samples while mycobacterial genes were more abundant in CHM samples. Genes associated with multiple disinfectant mechanisms were identified in both communities. Moreover, sequences linked to virulence factors, such as antibiotic resistance mechanisms, were observed in both microbial communities. This study provides new insights into the genetic network and potential biological processes associated with the molecular microbial ecology of DW microbial communities.

  9. Metagenomic Analyses of Drinking Water Receiving Different Disinfection Treatments

    PubMed Central

    Gomez-Alvarez, Vicente; Revetta, Randy P.

    2012-01-01

    A metagenome-based approach was used to assess the taxonomic affiliation and function potential of microbial populations in free-chlorine-treated (CHL) and monochloramine-treated (CHM) drinking water (DW). In all, 362,640 (averaging 544 bp) and 155,593 (averaging 554 bp) pyrosequencing reads were analyzed for the CHL and CHM samples, respectively. Most annotated proteins were found to be of bacterial origin, although eukaryotic, archaeal, and viral proteins were also identified. Differences in community structure and function were noted. Most notably, Legionella-like genes were more abundant in the CHL samples while mycobacterial genes were more abundant in CHM samples. Genes associated with multiple disinfectant mechanisms were identified in both communities. Moreover, sequences linked to virulence factors, such as antibiotic resistance mechanisms, were observed in both microbial communities. This study provides new insights into the genetic network and potential biological processes associated with the molecular microbial ecology of DW microbial communities. PMID:22729545

  10. Symbiosis insights through metagenomic analysis of a microbialconsortium

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Woyke, Tanja; Teeling, Hanno; Ivanova, Natalia N.

    Symbioses between bacteria and eukaryotes are ubiquitous, yet our understanding of the interactions driving these associations is hampered by our inability to cultivate most host-associated microbes. Here, we used a metagenomic approach to describe four co-occurring symbionts from the marine oligochaete Olavius algarvensis, a worm lacking a mouth, gut, and nephridia. Shotgun sequencing and metabolic pathway reconstruction revealed that the symbionts are sulfur-oxidizing and sulfate-reducing bacteria, all of which are capable of carbon fixation, providing the host with multiple sources of nutrition. Molecular evidence for the uptake and recycling of worm waste products by the symbionts suggests how the wormmore » could eliminate its excretory system, an adaptation unique among annelid worms. We propose a model which describes how the versatile metabolism within this symbiotic consortium provides the host with an optimal energy supply as it shuttles between the upper oxic and lower anoxic coastal sediments which it inhabits.« less

  11. Mining the human gut microbiome for novel stress resistance genes

    PubMed Central

    Culligan, Eamonn P.; Marchesi, Julian R.; Hill, Colin; Sleator, Roy D.

    2012-01-01

    With the rapid advances in sequencing technologies in recent years, the human genome is now considered incomplete without the complementing microbiome, which outnumbers human genes by a factor of one hundred. The human microbiome, and more specifically the gut microbiome, has received considerable attention and research efforts over the past decade. Many studies have identified and quantified “who is there?,” while others have determined some of their functional capacity, or “what are they doing?” In a recent study, we identified novel salt-tolerance loci from the human gut microbiome using combined functional metagenomic and bioinformatics based approaches. Herein, we discuss the identified loci, their role in salt-tolerance and their importance in the context of the gut environment. We also consider the utility and power of functional metagenomics for mining such environments for novel genes and proteins, as well as the implications and possible applications for future research. PMID:22688726

  12. Challenges and Opportunities of Airborne Metagenomics

    PubMed Central

    Behzad, Hayedeh; Gojobori, Takashi; Mineta, Katsuhiko

    2015-01-01

    Recent metagenomic studies of environments, such as marine and soil, have significantly enhanced our understanding of the diverse microbial communities living in these habitats and their essential roles in sustaining vast ecosystems. The increase in the number of publications related to soil and marine metagenomics is in sharp contrast to those of air, yet airborne microbes are thought to have significant impacts on many aspects of our lives from their potential roles in atmospheric events such as cloud formation, precipitation, and atmospheric chemistry to their major impact on human health. In this review, we will discuss the current progress in airborne metagenomics, with a special focus on exploring the challenges and opportunities of undertaking such studies. The main challenges of conducting metagenomic studies of airborne microbes are as follows: 1) Low density of microorganisms in the air, 2) efficient retrieval of microorganisms from the air, 3) variability in airborne microbial community composition, 4) the lack of standardized protocols and methodologies, and 5) DNA sequencing and bioinformatics-related challenges. Overcoming these challenges could provide the groundwork for comprehensive analysis of airborne microbes and their potential impact on the atmosphere, global climate, and our health. Metagenomic studies offer a unique opportunity to examine viral and bacterial diversity in the air and monitor their spread locally or across the globe, including threats from pathogenic microorganisms. Airborne metagenomic studies could also lead to discoveries of novel genes and metabolic pathways relevant to meteorological and industrial applications, environmental bioremediation, and biogeochemical cycles. PMID:25953766

  13. Comparison of methods for library construction and short read annotation of shellfish viral metagenomes.

    PubMed

    Wei, Hong-Ying; Huang, Sheng; Wang, Jiang-Yong; Gao, Fang; Jiang, Jing-Zhe

    2018-03-01

    The emergence and widespread use of high-throughput sequencing technologies have promoted metagenomic studies on environmental or animal samples. Library construction for metagenome sequencing and annotation of the produced sequence reads are important steps in such studies and influence the quality of metagenomic data. In this study, we collected some marine mollusk samples, such as Crassostrea hongkongensis, Chlamys farreri, and Ruditapes philippinarum, from coastal areas in South China. These samples were divided into two batches to compare two library construction methods for shellfish viral metagenome. Our analysis showed that reverse-transcribing RNA into cDNA and then amplifying it simultaneously with DNA by whole genome amplification (WGA) yielded a larger amount of DNA compared to using only WGA or WTA (whole transcriptome amplification). Moreover, higher quality libraries were obtained by agarose gel extraction rather than with AMPure bead size selection. However, the latter can also provide good results if combined with the adjustment of the filter parameters. This, together with its simplicity, makes it a viable alternative. Finally, we compared three annotation tools (BLAST, DIAMOND, and Taxonomer) and two reference databases (NCBI's NR and Uniprot's Uniref). Considering the limitations of computing resources and data transfer speed, we propose the use of DIAMOND with Uniref for annotating metagenomic short reads as its running speed can guarantee a good annotation rate. This study may serve as a useful reference for selecting methods for Shellfish viral metagenome library construction and read annotation.

  14. Resolving prokaryotic taxonomy without rRNA: longer oligonucleotide word lengths improve genome and metagenome taxonomic classification.

    PubMed

    Alsop, Eric B; Raymond, Jason

    2013-01-01

    Oligonucleotide signatures, especially tetranucleotide signatures, have been used as method for homology binning by exploiting an organism's inherent biases towards the use of specific oligonucleotide words. Tetranucleotide signatures have been especially useful in environmental metagenomics samples as many of these samples contain organisms from poorly classified phyla which cannot be easily identified using traditional homology methods, including NCBI BLAST. This study examines oligonucleotide signatures across 1,424 completed genomes from across the tree of life, substantially expanding upon previous work. A comprehensive analysis of mononucleotide through nonanucleotide word lengths suggests that longer word lengths substantially improve the classification of DNA fragments across a range of sizes of relevance to high throughput sequencing. We find that, at present, heptanucleotide signatures represent an optimal balance between prediction accuracy and computational time for resolving taxonomy using both genomic and metagenomic fragments. We directly compare the ability of tetranucleotide and heptanucleotide world lengths (tetranucleotide signatures are the current standard for oligonucleotide word usage analyses) for taxonomic binning of metagenome reads. We present evidence that heptanucleotide word lengths consistently provide more taxonomic resolving power, particularly in distinguishing between closely related organisms that are often present in metagenomic samples. This implies that longer oligonucleotide word lengths should replace tetranucleotide signatures for most analyses. Finally, we show that the application of longer word lengths to metagenomic datasets leads to more accurate taxonomic binning of DNA scaffolds and have the potential to substantially improve taxonomic assignment and assembly of metagenomic data.

  15. Gut metagenomes of type 2 diabetic patients have characteristic single-nucleotide polymorphism distribution in Bacteroides coprocola.

    PubMed

    Chen, Yaowen; Li, Zongcheng; Hu, Shuofeng; Zhang, Jian; Wu, Jiaqi; Shao, Ningsheng; Bo, Xiaochen; Ni, Ming; Ying, Xiaomin

    2017-02-01

    Gut microbes play a critical role in human health and disease, and researchers have begun to characterize their genomes, the so-called gut metagenome. Thus far, metagenomics studies have focused on genus- or species-level composition and microbial gene sets, while strain-level composition and single-nucleotide polymorphism (SNP) have been overlooked. The gut metagenomes of type 2 diabetes (T2D) patients have been found to be enriched with butyrate-producing bacteria and sulfate reduction functions. However, it is not known whether the gut metagenomes of T2D patients have characteristic strain patterns or SNP distributions. We downloaded public gut metagenome datasets from 170 T2D patients and 174 healthy controls and performed a systematic comparative analysis of their metagenome SNPs. We found that Bacteroides coprocola, whose relative abundance did not differ between the groups, had a characteristic distribution of SNPs in the T2D patient group. We identified 65 genes, all in B. coprocola, that had remarkably different enrichment of SNPs. The first and sixth ranked genes encode glycosyl hydrolases (GenBank accession EDU99824.1 and EDV02301.1). Interestingly, alpha-glucosidase, which is also a glycosyl hydrolase located in the intestine, is an important drug target of T2D. These results suggest that different strains of B. coprocola may have different roles in human gut and a specific set of B. coprocola strains are correlated with T2D.

  16. A Primer on Metagenomics

    PubMed Central

    Wooley, John C.; Godzik, Adam; Friedberg, Iddo

    2010-01-01

    Metagenomics is a discipline that enables the genomic study of uncultured microorganisms. Faster, cheaper sequencing technologies and the ability to sequence uncultured microbes sampled directly from their habitats are expanding and transforming our view of the microbial world. Distilling meaningful information from the millions of new genomic sequences presents a serious challenge to bioinformaticians. In cultured microbes, the genomic data come from a single clone, making sequence assembly and annotation tractable. In metagenomics, the data come from heterogeneous microbial communities, sometimes containing more than 10,000 species, with the sequence data being noisy and partial. From sampling, to assembly, to gene calling and function prediction, bioinformatics faces new demands in interpreting voluminous, noisy, and often partial sequence data. Although metagenomics is a relative newcomer to science, the past few years have seen an explosion in computational methods applied to metagenomic-based research. It is therefore not within the scope of this article to provide an exhaustive review. Rather, we provide here a concise yet comprehensive introduction to the current computational requirements presented by metagenomics, and review the recent progress made. We also note whether there is software that implements any of the methods presented here, and briefly review its utility. Nevertheless, it would be useful if readers of this article would avail themselves of the comment section provided by this journal, and relate their own experiences. Finally, the last section of this article provides a few representative studies illustrating different facets of recent scientific discoveries made using metagenomics. PMID:20195499

  17. Control system estimation and design for aerospace vehicles

    NASA Technical Reports Server (NTRS)

    Stefani, R. T.; Williams, T. L.; Yakowitz, S. J.

    1972-01-01

    The selection of an estimator which is unbiased when applied to structural parameter estimation is discussed. The mathematical relationships for structural parameter estimation are defined. It is shown that a conventional weighted least squares (CWLS) estimate is biased when applied to structural parameter estimation. Two approaches to bias removal are suggested: (1) change the CWLS estimator or (2) change the objective function. The advantages of each approach are analyzed.

  18. Mutually unbiased projectors and duality between lines and bases in finite quantum systems

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Shalaby, M.; Vourdas, A., E-mail: a.vourdas@bradford.ac.uk

    2013-10-15

    Quantum systems with variables in the ring Z(d) are considered, and the concepts of weak mutually unbiased bases and mutually unbiased projectors are discussed. The lines through the origin in the Z(d)×Z(d) phase space, are classified into maximal lines (sets of d points), and sublines (sets of d{sub i} points where d{sub i}|d). The sublines are intersections of maximal lines. It is shown that there exists a duality between the properties of lines (resp., sublines), and the properties of weak mutually unbiased bases (resp., mutually unbiased projectors). -- Highlights: •Lines in discrete phase space. •Bases in finite quantum systems. •Dualitymore » between bases and lines. •Weak mutually unbiased bases.« less

  19. Development of high-throughput phenotyping of metagenomic clones from the human gut microbiome for modulation of eukaryotic cell growth.

    PubMed

    Gloux, Karine; Leclerc, Marion; Iliozer, Harout; L'Haridon, René; Manichanh, Chaysavanh; Corthier, Gérard; Nalin, Renaud; Blottière, Hervé M; Doré, Joël

    2007-06-01

    Metagenomic libraries derived from human intestinal microbiota (20,725 clones) were screened for epithelial cell growth modulation. Modulatory clones belonging to the four phyla represented among the metagenomic libraries were identified (hit rate, 0.04 to 8.7% depending on the screening cutoff). Several candidate loci were identified by transposon mutagenesis and subcloning.

  20. Evaluating the Quantitative Capabilities of Metagenomic Analysis Software.

    PubMed

    Kerepesi, Csaba; Grolmusz, Vince

    2016-05-01

    DNA sequencing technologies are applied widely and frequently today to describe metagenomes, i.e., microbial communities in environmental or clinical samples, without the need for culturing them. These technologies usually return short (100-300 base-pairs long) DNA reads, and these reads are processed by metagenomic analysis software that assign phylogenetic composition-information to the dataset. Here we evaluate three metagenomic analysis software (AmphoraNet--a webserver implementation of AMPHORA2--, MG-RAST, and MEGAN5) for their capabilities of assigning quantitative phylogenetic information for the data, describing the frequency of appearance of the microorganisms of the same taxa in the sample. The difficulties of the task arise from the fact that longer genomes produce more reads from the same organism than shorter genomes, and some software assign higher frequencies to species with longer genomes than to those with shorter ones. This phenomenon is called the "genome length bias." Dozens of complex artificial metagenome benchmarks can be found in the literature. Because of the complexity of those benchmarks, it is usually difficult to judge the resistance of a metagenomic software to this "genome length bias." Therefore, we have made a simple benchmark for the evaluation of the "taxon-counting" in a metagenomic sample: we have taken the same number of copies of three full bacterial genomes of different lengths, break them up randomly to short reads of average length of 150 bp, and mixed the reads, creating our simple benchmark. Because of its simplicity, the benchmark is not supposed to serve as a mock metagenome, but if a software fails on that simple task, it will surely fail on most real metagenomes. We applied three software for the benchmark. The ideal quantitative solution would assign the same proportion to the three bacterial taxa. We have found that AMPHORA2/AmphoraNet gave the most accurate results and the other two software were under-performers: they counted quite reliably each short read to their respective taxon, producing the typical genome length bias. The benchmark dataset is available at http://pitgroup.org/static/3RandomGenome-100kavg150bps.fna.

Top