Sample records for throughput sequencing technologies

  1. [Current applications of high-throughput DNA sequencing technology in antibody drug research].

    PubMed

    Yu, Xin; Liu, Qi-Gang; Wang, Ming-Rong

    2012-03-01

    Since the publication of a high-throughput DNA sequencing technology based on PCR reaction was carried out in oil emulsions in 2005, high-throughput DNA sequencing platforms have been evolved to a robust technology in sequencing genomes and diverse DNA libraries. Antibody libraries with vast numbers of members currently serve as a foundation of discovering novel antibody drugs, and high-throughput DNA sequencing technology makes it possible to rapidly identify functional antibody variants with desired properties. Herein we present a review of current applications of high-throughput DNA sequencing technology in the analysis of antibody library diversity, sequencing of CDR3 regions, identification of potent antibodies based on sequence frequency, discovery of functional genes, and combination with various display technologies, so as to provide an alternative approach of discovery and development of antibody drugs.

  2. The application of the high throughput sequencing technology in the transposable elements.

    PubMed

    Liu, Zhen; Xu, Jian-hong

    2015-09-01

    High throughput sequencing technology has dramatically improved the efficiency of DNA sequencing, and decreased the costs to a great extent. Meanwhile, this technology usually has advantages of better specificity, higher sensitivity and accuracy. Therefore, it has been applied to the research on genetic variations, transcriptomics and epigenomics. Recently, this technology has been widely employed in the studies of transposable elements and has achieved fruitful results. In this review, we summarize the application of high throughput sequencing technology in the fields of transposable elements, including the estimation of transposon content, preference of target sites and distribution, insertion polymorphism and population frequency, identification of rare copies, transposon horizontal transfers as well as transposon tagging. We also briefly introduce the major common sequencing strategies and algorithms, their advantages and disadvantages, and the corresponding solutions. Finally, we envision the developing trends of high throughput sequencing technology, especially the third generation sequencing technology, and its application in transposon studies in the future, hopefully providing a comprehensive understanding and reference for related scientific researchers.

  3. Advanced Virus Detection Technologies Interest Group (AVDTIG): Efforts on High Throughput Sequencing (HTS) for Virus Detection.

    PubMed

    Khan, Arifa S; Vacante, Dominick A; Cassart, Jean-Pol; Ng, Siemon H S; Lambert, Christophe; Charlebois, Robert L; King, Kathryn E

    Several nucleic-acid based technologies have recently emerged with capabilities for broad virus detection. One of these, high throughput sequencing, has the potential for novel virus detection because this method does not depend upon prior viral sequence knowledge. However, the use of high throughput sequencing for testing biologicals poses greater challenges as compared to other newly introduced tests due to its technical complexities and big data bioinformatics. Thus, the Advanced Virus Detection Technologies Users Group was formed as a joint effort by regulatory and industry scientists to facilitate discussions and provide a forum for sharing data and experiences using advanced new virus detection technologies, with a focus on high throughput sequencing technologies. The group was initiated as a task force that was coordinated by the Parenteral Drug Association and subsequently became the Advanced Virus Detection Technologies Interest Group to continue efforts for using new technologies for detection of adventitious viruses with broader participation, including international government agencies, academia, and technology service providers. © PDA, Inc. 2016.

  4. "First generation" automated DNA sequencing technology.

    PubMed

    Slatko, Barton E; Kieleczawa, Jan; Ju, Jingyue; Gardner, Andrew F; Hendrickson, Cynthia L; Ausubel, Frederick M

    2011-10-01

    Beginning in the 1980s, automation of DNA sequencing has greatly increased throughput, reduced costs, and enabled large projects to be completed more easily. The development of automation technology paralleled the development of other aspects of DNA sequencing: better enzymes and chemistry, separation and imaging technology, sequencing protocols, robotics, and computational advancements (including base-calling algorithms with quality scores, database developments, and sequence analysis programs). Despite the emergence of high-throughput sequencing platforms, automated Sanger sequencing technology remains useful for many applications. This unit provides background and a description of the "First-Generation" automated DNA sequencing technology. It also includes protocols for using the current Applied Biosystems (ABI) automated DNA sequencing machines. © 2011 by John Wiley & Sons, Inc.

  5. High-throughput sequence alignment using Graphics Processing Units

    PubMed Central

    Schatz, Michael C; Trapnell, Cole; Delcher, Arthur L; Varshney, Amitabh

    2007-01-01

    Background The recent availability of new, less expensive high-throughput DNA sequencing technologies has yielded a dramatic increase in the volume of sequence data that must be analyzed. These data are being generated for several purposes, including genotyping, genome resequencing, metagenomics, and de novo genome assembly projects. Sequence alignment programs such as MUMmer have proven essential for analysis of these data, but researchers will need ever faster, high-throughput alignment tools running on inexpensive hardware to keep up with new sequence technologies. Results This paper describes MUMmerGPU, an open-source high-throughput parallel pairwise local sequence alignment program that runs on commodity Graphics Processing Units (GPUs) in common workstations. MUMmerGPU uses the new Compute Unified Device Architecture (CUDA) from nVidia to align multiple query sequences against a single reference sequence stored as a suffix tree. By processing the queries in parallel on the highly parallel graphics card, MUMmerGPU achieves more than a 10-fold speedup over a serial CPU version of the sequence alignment kernel, and outperforms the exact alignment component of MUMmer on a high end CPU by 3.5-fold in total application time when aligning reads from recent sequencing projects using Solexa/Illumina, 454, and Sanger sequencing technologies. Conclusion MUMmerGPU is a low cost, ultra-fast sequence alignment program designed to handle the increasing volume of data produced by new, high-throughput sequencing technologies. MUMmerGPU demonstrates that even memory-intensive applications can run significantly faster on the relatively low-cost GPU than on the CPU. PMID:18070356

  6. Overcoming bias and systematic errors in next generation sequencing data.

    PubMed

    Taub, Margaret A; Corrada Bravo, Hector; Irizarry, Rafael A

    2010-12-10

    Considerable time and effort has been spent in developing analysis and quality assessment methods to allow the use of microarrays in a clinical setting. As is the case for microarrays and other high-throughput technologies, data from new high-throughput sequencing technologies are subject to technological and biological biases and systematic errors that can impact downstream analyses. Only when these issues can be readily identified and reliably adjusted for will clinical applications of these new technologies be feasible. Although much work remains to be done in this area, we describe consistently observed biases that should be taken into account when analyzing high-throughput sequencing data. In this article, we review current knowledge about these biases, discuss their impact on analysis results, and propose solutions.

  7. Evaluation of Methods for de novo Genome assembly from High-throughput Sequencing Reads Reveals Dependencies that Affect the Quality of the Results

    USDA-ARS?s Scientific Manuscript database

    Recent developments in high-throughput sequencing technology have made low-cost sequencing an attractive approach for many genome analysis tasks. Increasing read lengths, improving quality and the production of increasingly larger numbers of usable sequences per instrument-run continue to make whole...

  8. Short-read, high-throughput sequencing technology for STR genotyping

    PubMed Central

    Bornman, Daniel M.; Hester, Mark E.; Schuetter, Jared M.; Kasoji, Manjula D.; Minard-Smith, Angela; Barden, Curt A.; Nelson, Scott C.; Godbold, Gene D.; Baker, Christine H.; Yang, Boyu; Walther, Jacquelyn E.; Tornes, Ivan E.; Yan, Pearlly S.; Rodriguez, Benjamin; Bundschuh, Ralf; Dickens, Michael L.; Young, Brian A.; Faith, Seth A.

    2013-01-01

    DNA-based methods for human identification principally rely upon genotyping of short tandem repeat (STR) loci. Electrophoretic-based techniques for variable-length classification of STRs are universally utilized, but are limited in that they have relatively low throughput and do not yield nucleotide sequence information. High-throughput sequencing technology may provide a more powerful instrument for human identification, but is not currently validated for forensic casework. Here, we present a systematic method to perform high-throughput genotyping analysis of the Combined DNA Index System (CODIS) STR loci using short-read (150 bp) massively parallel sequencing technology. Open source reference alignment tools were optimized to evaluate PCR-amplified STR loci using a custom designed STR genome reference. Evaluation of this approach demonstrated that the 13 CODIS STR loci and amelogenin (AMEL) locus could be accurately called from individual and mixture samples. Sensitivity analysis showed that as few as 18,500 reads, aligned to an in silico referenced genome, were required to genotype an individual (>99% confidence) for the CODIS loci. The power of this technology was further demonstrated by identification of variant alleles containing single nucleotide polymorphisms (SNPs) and the development of quantitative measurements (reads) for resolving mixed samples. PMID:25621315

  9. Applications and Case Studies of the Next-Generation Sequencing Technologies in Food, Nutrition and Agriculture.

    USDA-ARS?s Scientific Manuscript database

    Next-generation sequencing technologies are able to produce high-throughput short sequence reads in a cost-effective fashion. The emergence of these technologies has not only facilitated genome sequencing but also changed the landscape of life sciences. Here I survey their major applications ranging...

  10. Recent Applications of DNA Sequencing Technologies in Food, Nutrition and Agriculture

    USDA-ARS?s Scientific Manuscript database

    Next-generation DNA sequencing technologies are able to produce millions of short sequence reads in a high-throughput, cost-effective fashion. The emergence of these technologies has not only facilitated genome sequencing but also changed the landscape of life sciences. This review surveys their rec...

  11. Advances in high throughput DNA sequence data compression.

    PubMed

    Sardaraz, Muhammad; Tahir, Muhammad; Ikram, Ataul Aziz

    2016-06-01

    Advances in high throughput sequencing technologies and reduction in cost of sequencing have led to exponential growth in high throughput DNA sequence data. This growth has posed challenges such as storage, retrieval, and transmission of sequencing data. Data compression is used to cope with these challenges. Various methods have been developed to compress genomic and sequencing data. In this article, we present a comprehensive review of compression methods for genome and reads compression. Algorithms are categorized as referential or reference free. Experimental results and comparative analysis of various methods for data compression are presented. Finally, key challenges and research directions in DNA sequence data compression are highlighted.

  12. Application of whole genome shotgun sequencing for detection and characterization of genetically modified organisms and derived products.

    PubMed

    Holst-Jensen, Arne; Spilsberg, Bjørn; Arulandhu, Alfred J; Kok, Esther; Shi, Jianxin; Zel, Jana

    2016-07-01

    The emergence of high-throughput, massive or next-generation sequencing technologies has created a completely new foundation for molecular analyses. Various selective enrichment processes are commonly applied to facilitate detection of predefined (known) targets. Such approaches, however, inevitably introduce a bias and are prone to miss unknown targets. Here we review the application of high-throughput sequencing technologies and the preparation of fit-for-purpose whole genome shotgun sequencing libraries for the detection and characterization of genetically modified and derived products. The potential impact of these new sequencing technologies for the characterization, breeding selection, risk assessment, and traceability of genetically modified organisms and genetically modified products is yet to be fully acknowledged. The published literature is reviewed, and the prospects for future developments and use of the new sequencing technologies for these purposes are discussed.

  13. Next-generation sequencing coupled with a cell-free display technology for high-throughput production of reliable interactome data

    PubMed Central

    Fujimori, Shigeo; Hirai, Naoya; Ohashi, Hiroyuki; Masuoka, Kazuyo; Nishikimi, Akihiko; Fukui, Yoshinori; Washio, Takanori; Oshikubo, Tomohiro; Yamashita, Tatsuhiro; Miyamoto-Sato, Etsuko

    2012-01-01

    Next-generation sequencing (NGS) has been applied to various kinds of omics studies, resulting in many biological and medical discoveries. However, high-throughput protein-protein interactome datasets derived from detection by sequencing are scarce, because protein-protein interaction analysis requires many cell manipulations to examine the interactions. The low reliability of the high-throughput data is also a problem. Here, we describe a cell-free display technology combined with NGS that can improve both the coverage and reliability of interactome datasets. The completely cell-free method gives a high-throughput and a large detection space, testing the interactions without using clones. The quantitative information provided by NGS reduces the number of false positives. The method is suitable for the in vitro detection of proteins that interact not only with the bait protein, but also with DNA, RNA and chemical compounds. Thus, it could become a universal approach for exploring the large space of protein sequences and interactome networks. PMID:23056904

  14. Novel method for high-throughput colony PCR screening in nanoliter-reactors

    PubMed Central

    Walser, Marcel; Pellaux, Rene; Meyer, Andreas; Bechtold, Matthias; Vanderschuren, Herve; Reinhardt, Richard; Magyar, Joseph; Panke, Sven; Held, Martin

    2009-01-01

    We introduce a technology for the rapid identification and sequencing of conserved DNA elements employing a novel suspension array based on nanoliter (nl)-reactors made from alginate. The reactors have a volume of 35 nl and serve as reaction compartments during monoseptic growth of microbial library clones, colony lysis, thermocycling and screening for sequence motifs via semi-quantitative fluorescence analyses. nl-Reactors were kept in suspension during all high-throughput steps which allowed performing the protocol in a highly space-effective fashion and at negligible expenses of consumables and reagents. As a first application, 11 high-quality microsatellites for polymorphism studies in cassava were isolated and sequenced out of a library of 20 000 clones in 2 days. The technology is widely scalable and we envision that throughputs for nl-reactor based screenings can be increased up to 100 000 and more samples per day thereby efficiently complementing protocols based on established deep-sequencing technologies. PMID:19282448

  15. A technological update of molecular diagnostics for infectious diseases

    PubMed Central

    Liu, Yu-Tsueng

    2008-01-01

    Identification of a causative pathogen is essential for the choice of treatment for most infectious diseases. Many FDA approved molecular assays; usually more sensitive and specific compared to traditional tests, have been developed in the last decade. A new trend of high throughput and multiplexing assays are emerging thanks to technological developments for the human genome sequencing project. The applications of microarray and ultra high throughput sequencing technologies for diagnostic microbiology are reviewed. The race for the $1000 genome technology by 2014 will have a profound impact in diagnosis and treatment of infectious diseases in the near future. PMID:18782035

  16. History, applications, and challenges of immune repertoire research.

    PubMed

    Liu, Xiao; Wu, Jinghua

    2018-02-27

    The diversity of T and B cells in terms of their receptor sequences is huge in the vertebrate's immune system and provides broad protection against the vast diversity of pathogens. Immune repertoire is defined as the sum of T cell receptors and B cell receptors (also named immunoglobulin) that makes the organism's adaptive immune system. Before the emergence of high-throughput sequencing, the studies on immune repertoire were limited by the underdeveloped methodologies, since it was impossible to capture the whole picture by the low-throughput tools. The massive paralleled sequencing technology suits perfectly the researches on immune repertoire. In this article, we review the history of immune repertoire studies, in terms of technologies and research applications. Particularly, we discuss several aspects of challenges in this field and highlight the efforts to develop potential solutions, in the era of high-throughput sequencing of the immune repertoire.

  17. The challenges of sequencing by synthesis.

    PubMed

    Fuller, Carl W; Middendorf, Lyle R; Benner, Steven A; Church, George M; Harris, Timothy; Huang, Xiaohua; Jovanovich, Stevan B; Nelson, John R; Schloss, Jeffery A; Schwartz, David C; Vezenov, Dmitri V

    2009-11-01

    DNA sequencing-by-synthesis (SBS) technology, using a polymerase or ligase enzyme as its core biochemistry, has already been incorporated in several second-generation DNA sequencing systems with significant performance. Notwithstanding the substantial success of these SBS platforms, challenges continue to limit the ability to reduce the cost of sequencing a human genome to $100,000 or less. Achieving dramatically reduced cost with enhanced throughput and quality will require the seamless integration of scientific and technological effort across disciplines within biochemistry, chemistry, physics and engineering. The challenges include sample preparation, surface chemistry, fluorescent labels, optimizing the enzyme-substrate system, optics, instrumentation, understanding tradeoffs of throughput versus accuracy, and read-length/phasing limitations. By framing these challenges in a manner accessible to a broad community of scientists and engineers, we hope to solicit input from the broader research community on means of accelerating the advancement of genome sequencing technology.

  18. Analysis of Illumina Microbial Assemblies

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Clum, Alicia; Foster, Brian; Froula, Jeff

    2010-05-28

    Since the emerging of second generation sequencing technologies, the evaluation of different sequencing approaches and their assembly strategies for different types of genomes has become an important undertaken. Next generation sequencing technologies dramatically increase sequence throughput while decreasing cost, making them an attractive tool for whole genome shotgun sequencing. To compare different approaches for de-novo whole genome assembly, appropriate tools and a solid understanding of both quantity and quality of the underlying sequence data are crucial. Here, we performed an in-depth analysis of short-read Illumina sequence assembly strategies for bacterial and archaeal genomes. Different types of Illumina libraries as wellmore » as different trim parameters and assemblers were evaluated. Results of the comparative analysis and sequencing platforms will be presented. The goal of this analysis is to develop a cost-effective approach for the increased throughput of the generation of high quality microbial genomes.« less

  19. Whole-Genome Sequencing and Assembly with High-Throughput, Short-Read Technologies

    PubMed Central

    Sundquist, Andreas; Ronaghi, Mostafa; Tang, Haixu; Pevzner, Pavel; Batzoglou, Serafim

    2007-01-01

    While recently developed short-read sequencing technologies may dramatically reduce the sequencing cost and eventually achieve the $1000 goal for re-sequencing, their limitations prevent the de novo sequencing of eukaryotic genomes with the standard shotgun sequencing protocol. We present SHRAP (SHort Read Assembly Protocol), a sequencing protocol and assembly methodology that utilizes high-throughput short-read technologies. We describe a variation on hierarchical sequencing with two crucial differences: (1) we select a clone library from the genome randomly rather than as a tiling path and (2) we sample clones from the genome at high coverage and reads from the clones at low coverage. We assume that 200 bp read lengths with a 1% error rate and inexpensive random fragment cloning on whole mammalian genomes is feasible. Our assembly methodology is based on first ordering the clones and subsequently performing read assembly in three stages: (1) local assemblies of regions significantly smaller than a clone size, (2) clone-sized assemblies of the results of stage 1, and (3) chromosome-sized assemblies. By aggressively localizing the assembly problem during the first stage, our method succeeds in assembling short, unpaired reads sampled from repetitive genomes. We tested our assembler using simulated reads from D. melanogaster and human chromosomes 1, 11, and 21, and produced assemblies with large sets of contiguous sequence and a misassembly rate comparable to other draft assemblies. Tested on D. melanogaster and the entire human genome, our clone-ordering method produces accurate maps, thereby localizing fragment assembly and enabling the parallelization of the subsequent steps of our pipeline. Thus, we have demonstrated that truly inexpensive de novo sequencing of mammalian genomes will soon be possible with high-throughput, short-read technologies using our methodology. PMID:17534434

  20. A family-based probabilistic method for capturing de novo mutations from high-throughput short-read sequencing data.

    PubMed

    Cartwright, Reed A; Hussin, Julie; Keebler, Jonathan E M; Stone, Eric A; Awadalla, Philip

    2012-01-06

    Recent advances in high-throughput DNA sequencing technologies and associated statistical analyses have enabled in-depth analysis of whole-genome sequences. As this technology is applied to a growing number of individual human genomes, entire families are now being sequenced. Information contained within the pedigree of a sequenced family can be leveraged when inferring the donors' genotypes. The presence of a de novo mutation within the pedigree is indicated by a violation of Mendelian inheritance laws. Here, we present a method for probabilistically inferring genotypes across a pedigree using high-throughput sequencing data and producing the posterior probability of de novo mutation at each genomic site examined. This framework can be used to disentangle the effects of germline and somatic mutational processes and to simultaneously estimate the effect of sequencing error and the initial genetic variation in the population from which the founders of the pedigree arise. This approach is examined in detail through simulations and areas for method improvement are noted. By applying this method to data from members of a well-defined nuclear family with accurate pedigree information, the stage is set to make the most direct estimates of the human mutation rate to date.

  1. Multiplexed fragaria chloroplast genome sequencing

    Treesearch

    W. Njuguna; A. Liston; R. Cronn; N.V. Bassil

    2010-01-01

    A method to sequence multiple chloroplast genomes using ultra high throughput sequencing technologies was recently described. Complete chloroplast genome sequences can resolve phylogenetic relationships at low taxonomic levels and identify informative point mutations and indels. The objective of this research was to sequence multiple Fragaria...

  2. FMLRC: Hybrid long read error correction using an FM-index.

    PubMed

    Wang, Jeremy R; Holt, James; McMillan, Leonard; Jones, Corbin D

    2018-02-09

    Long read sequencing is changing the landscape of genomic research, especially de novo assembly. Despite the high error rate inherent to long read technologies, increased read lengths dramatically improve the continuity and accuracy of genome assemblies. However, the cost and throughput of these technologies limits their application to complex genomes. One solution is to decrease the cost and time to assemble novel genomes by leveraging "hybrid" assemblies that use long reads for scaffolding and short reads for accuracy. We describe a novel method leveraging a multi-string Burrows-Wheeler Transform with auxiliary FM-index to correct errors in long read sequences using a set of complementary short reads. We demonstrate that our method efficiently produces significantly more high quality corrected sequence than existing hybrid error-correction methods. We also show that our method produces more contiguous assemblies, in many cases, than existing state-of-the-art hybrid and long-read only de novo assembly methods. Our method accurately corrects long read sequence data using complementary short reads. We demonstrate higher total throughput of corrected long reads and a corresponding increase in contiguity of the resulting de novo assemblies. Improved throughput and computational efficiency than existing methods will help better economically utilize emerging long read sequencing technologies.

  3. Genome sequencing in microfabricated high-density picolitre reactors.

    PubMed

    Margulies, Marcel; Egholm, Michael; Altman, William E; Attiya, Said; Bader, Joel S; Bemben, Lisa A; Berka, Jan; Braverman, Michael S; Chen, Yi-Ju; Chen, Zhoutao; Dewell, Scott B; Du, Lei; Fierro, Joseph M; Gomes, Xavier V; Godwin, Brian C; He, Wen; Helgesen, Scott; Ho, Chun Heen; Ho, Chun He; Irzyk, Gerard P; Jando, Szilveszter C; Alenquer, Maria L I; Jarvie, Thomas P; Jirage, Kshama B; Kim, Jong-Bum; Knight, James R; Lanza, Janna R; Leamon, John H; Lefkowitz, Steven M; Lei, Ming; Li, Jing; Lohman, Kenton L; Lu, Hong; Makhijani, Vinod B; McDade, Keith E; McKenna, Michael P; Myers, Eugene W; Nickerson, Elizabeth; Nobile, John R; Plant, Ramona; Puc, Bernard P; Ronan, Michael T; Roth, George T; Sarkis, Gary J; Simons, Jan Fredrik; Simpson, John W; Srinivasan, Maithreyan; Tartaro, Karrie R; Tomasz, Alexander; Vogt, Kari A; Volkmer, Greg A; Wang, Shally H; Wang, Yong; Weiner, Michael P; Yu, Pengguang; Begley, Richard F; Rothberg, Jonathan M

    2005-09-15

    The proliferation of large-scale DNA-sequencing projects in recent years has driven a search for alternative methods to reduce time and cost. Here we describe a scalable, highly parallel sequencing system with raw throughput significantly greater than that of state-of-the-art capillary electrophoresis instruments. The apparatus uses a novel fibre-optic slide of individual wells and is able to sequence 25 million bases, at 99% or better accuracy, in one four-hour run. To achieve an approximately 100-fold increase in throughput over current Sanger sequencing technology, we have developed an emulsion method for DNA amplification and an instrument for sequencing by synthesis using a pyrosequencing protocol optimized for solid support and picolitre-scale volumes. Here we show the utility, throughput, accuracy and robustness of this system by shotgun sequencing and de novo assembly of the Mycoplasma genitalium genome with 96% coverage at 99.96% accuracy in one run of the machine.

  4. A high-throughput and quantitative method to assess the mutagenic potential of translesion DNA synthesis

    PubMed Central

    Taggart, David J.; Camerlengo, Terry L.; Harrison, Jason K.; Sherrer, Shanen M.; Kshetry, Ajay K.; Taylor, John-Stephen; Huang, Kun; Suo, Zucai

    2013-01-01

    Cellular genomes are constantly damaged by endogenous and exogenous agents that covalently and structurally modify DNA to produce DNA lesions. Although most lesions are mended by various DNA repair pathways in vivo, a significant number of damage sites persist during genomic replication. Our understanding of the mutagenic outcomes derived from these unrepaired DNA lesions has been hindered by the low throughput of existing sequencing methods. Therefore, we have developed a cost-effective high-throughput short oligonucleotide sequencing assay that uses next-generation DNA sequencing technology for the assessment of the mutagenic profiles of translesion DNA synthesis catalyzed by any error-prone DNA polymerase. The vast amount of sequencing data produced were aligned and quantified by using our novel software. As an example, the high-throughput short oligonucleotide sequencing assay was used to analyze the types and frequencies of mutations upstream, downstream and at a site-specifically placed cis–syn thymidine–thymidine dimer generated individually by three lesion-bypass human Y-family DNA polymerases. PMID:23470999

  5. High-throughput sequencing in veterinary infection biology and diagnostics.

    PubMed

    Belák, S; Karlsson, O E; Leijon, M; Granberg, F

    2013-12-01

    Sequencing methods have improved rapidly since the first versions of the Sanger techniques, facilitating the development of very powerful tools for detecting and identifying various pathogens, such as viruses, bacteria and other microbes. The ongoing development of high-throughput sequencing (HTS; also known as next-generation sequencing) technologies has resulted in a dramatic reduction in DNA sequencing costs, making the technology more accessible to the average laboratory. In this White Paper of the World Organisation for Animal Health (OIE) Collaborating Centre for the Biotechnology-based Diagnosis of Infectious Diseases in Veterinary Medicine (Uppsala, Sweden), several approaches and examples of HTS are summarised, and their diagnostic applicability is briefly discussed. Selected future aspects of HTS are outlined, including the need for bioinformatic resources, with a focus on improving the diagnosis and control of infectious diseases in veterinary medicine.

  6. Biofilm-Growing Bacteria Involved in the Corrosion of Concrete Wastewater Pipes: Protocols for Comparative Metagenomic Analyses

    EPA Science Inventory

    Advances in high-throughput next-generation sequencing (NGS) technology for direct sequencing of environmental DNA (i.e. shotgun metagenomics) is transforming the field of microbiology. NGS technologies are now regularly being applied in comparative metagenomic studies, which pr...

  7. Forecasting Ecological Genomics: High-Tech Animal Instrumentation Meets High-Throughput Sequencing

    PubMed Central

    Shafer, Aaron B. A.; Northrup, Joseph M.; Wikelski, Martin; Wittemyer, George; Wolf, Jochen B. W.

    2016-01-01

    Recent advancements in animal tracking technology and high-throughput sequencing are rapidly changing the questions and scope of research in the biological sciences. The integration of genomic data with high-tech animal instrumentation comes as a natural progression of traditional work in ecological genetics, and we provide a framework for linking the separate data streams from these technologies. Such a merger will elucidate the genetic basis of adaptive behaviors like migration and hibernation and advance our understanding of fundamental ecological and evolutionary processes such as pathogen transmission, population responses to environmental change, and communication in natural populations. PMID:26745372

  8. Human genetics and genomics a decade after the release of the draft sequence of the human genome.

    PubMed

    Naidoo, Nasheen; Pawitan, Yudi; Soong, Richie; Cooper, David N; Ku, Chee-Seng

    2011-10-01

    Substantial progress has been made in human genetics and genomics research over the past ten years since the publication of the draft sequence of the human genome in 2001. Findings emanating directly from the Human Genome Project, together with those from follow-on studies, have had an enormous impact on our understanding of the architecture and function of the human genome. Major developments have been made in cataloguing genetic variation, the International HapMap Project, and with respect to advances in genotyping technologies. These developments are vital for the emergence of genome-wide association studies in the investigation of complex diseases and traits. In parallel, the advent of high-throughput sequencing technologies has ushered in the 'personal genome sequencing' era for both normal and cancer genomes, and made possible large-scale genome sequencing studies such as the 1000 Genomes Project and the International Cancer Genome Consortium. The high-throughput sequencing and sequence-capture technologies are also providing new opportunities to study Mendelian disorders through exome sequencing and whole-genome sequencing. This paper reviews these major developments in human genetics and genomics over the past decade.

  9. Human genetics and genomics a decade after the release of the draft sequence of the human genome

    PubMed Central

    2011-01-01

    Substantial progress has been made in human genetics and genomics research over the past ten years since the publication of the draft sequence of the human genome in 2001. Findings emanating directly from the Human Genome Project, together with those from follow-on studies, have had an enormous impact on our understanding of the architecture and function of the human genome. Major developments have been made in cataloguing genetic variation, the International HapMap Project, and with respect to advances in genotyping technologies. These developments are vital for the emergence of genome-wide association studies in the investigation of complex diseases and traits. In parallel, the advent of high-throughput sequencing technologies has ushered in the 'personal genome sequencing' era for both normal and cancer genomes, and made possible large-scale genome sequencing studies such as the 1000 Genomes Project and the International Cancer Genome Consortium. The high-throughput sequencing and sequence-capture technologies are also providing new opportunities to study Mendelian disorders through exome sequencing and whole-genome sequencing. This paper reviews these major developments in human genetics and genomics over the past decade. PMID:22155605

  10. A Protocol for Functional Assessment of Whole-Protein Saturation Mutagenesis Libraries Utilizing High-Throughput Sequencing.

    PubMed

    Stiffler, Michael A; Subramanian, Subu K; Salinas, Victor H; Ranganathan, Rama

    2016-07-03

    Site-directed mutagenesis has long been used as a method to interrogate protein structure, function and evolution. Recent advances in massively-parallel sequencing technology have opened up the possibility of assessing the functional or fitness effects of large numbers of mutations simultaneously. Here, we present a protocol for experimentally determining the effects of all possible single amino acid mutations in a protein of interest utilizing high-throughput sequencing technology, using the 263 amino acid antibiotic resistance enzyme TEM-1 β-lactamase as an example. In this approach, a whole-protein saturation mutagenesis library is constructed by site-directed mutagenic PCR, randomizing each position individually to all possible amino acids. The library is then transformed into bacteria, and selected for the ability to confer resistance to β-lactam antibiotics. The fitness effect of each mutation is then determined by deep sequencing of the library before and after selection. Importantly, this protocol introduces methods which maximize sequencing read depth and permit the simultaneous selection of the entire mutation library, by mixing adjacent positions into groups of length accommodated by high-throughput sequencing read length and utilizing orthogonal primers to barcode each group. Representative results using this protocol are provided by assessing the fitness effects of all single amino acid mutations in TEM-1 at a clinically relevant dosage of ampicillin. The method should be easily extendable to other proteins for which a high-throughput selection assay is in place.

  11. The fast changing landscape of sequencing technologies and their impact on microbial genome assemblies and annotation.

    PubMed

    Mavromatis, Konstantinos; Land, Miriam L; Brettin, Thomas S; Quest, Daniel J; Copeland, Alex; Clum, Alicia; Goodwin, Lynne; Woyke, Tanja; Lapidus, Alla; Klenk, Hans Peter; Cottingham, Robert W; Kyrpides, Nikos C

    2012-01-01

    The emergence of next generation sequencing (NGS) has provided the means for rapid and high throughput sequencing and data generation at low cost, while concomitantly creating a new set of challenges. The number of available assembled microbial genomes continues to grow rapidly and their quality reflects the quality of the sequencing technology used, but also of the analysis software employed for assembly and annotation. In this work, we have explored the quality of the microbial draft genomes across various sequencing technologies. We have compared the draft and finished assemblies of 133 microbial genomes sequenced at the Department of Energy-Joint Genome Institute and finished at the Los Alamos National Laboratory using a variety of combinations of sequencing technologies, reflecting the transition of the institute from Sanger-based sequencing platforms to NGS platforms. The quality of the public assemblies and of the associated gene annotations was evaluated using various metrics. Results obtained with the different sequencing technologies, as well as their effects on downstream processes, were analyzed. Our results demonstrate that the Illumina HiSeq 2000 sequencing system, the primary sequencing technology currently used for de novo genome sequencing and assembly at JGI, has various advantages in terms of total sequence throughput and cost, but it also introduces challenges for the downstream analyses. In all cases assembly results although on average are of high quality, need to be viewed critically and consider sources of errors in them prior to analysis. These data follow the evolution of microbial sequencing and downstream processing at the JGI from draft genome sequences with large gaps corresponding to missing genes of significant biological role to assemblies with multiple small gaps (Illumina) and finally to assemblies that generate almost complete genomes (Illumina+PacBio).

  12. Deep sequencing in library selection projects: what insight does it bring?

    PubMed

    Glanville, J; D'Angelo, S; Khan, T A; Reddy, S T; Naranjo, L; Ferrara, F; Bradbury, A R M

    2015-08-01

    High throughput sequencing is poised to change all aspects of the way antibodies and other binders are discovered and engineered. Millions of available sequence reads provide an unprecedented sampling depth able to guide the design and construction of effective, high quality naïve libraries containing tens of billions of unique molecules. Furthermore, during selections, high throughput sequencing enables quantitative tracing of enriched clones and position-specific guidance to amino acid variation under positive selection during antibody engineering. Successful application of the technologies relies on specific PCR reagent design, correct sequencing platform selection, and effective use of computational tools and statistical measures to remove error, identify antibodies, estimate diversity, and extract signatures of selection from the clone down to individual structural positions. Here we review these considerations and discuss some of the remaining challenges to the widespread adoption of the technology. Copyright © 2015 Elsevier Ltd. All rights reserved.

  13. Deep sequencing in library selection projects: what insight does it bring?

    PubMed Central

    Glanville, J; D’Angelo, S; Khan, T.A.; Reddy, S. T.; Naranjo, L.; Ferrara, F.; Bradbury, A.R.M.

    2015-01-01

    High throughput sequencing is poised to change all aspects of the way antibodies and other binders are discovered and engineered. Millions of available sequence reads provide an unprecedented sampling depth able to guide the design and construction of effective, high quality naïve libraries containing tens of billions of unique molecules. Furthermore, during selections, high throughput sequencing enables quantitative tracing of enriched clones and position-specific guidance to amino acid variation under positive selection during antibody engineering. Successful application of the technologies relies on specific PCR reagent design, correct sequencing platform selection, and effective use of computational tools and statistical measures to remove error, identify antibodies, estimate diversity, and extract signatures of selection from the clone down to individual structural positions. Here we review these considerations and discuss some of the remaining challenges to the widespread adoption of the technology. PMID:26451649

  14. The promise and challenge of high-throughput sequencing of the antibody repertoire

    PubMed Central

    Georgiou, George; Ippolito, Gregory C; Beausang, John; Busse, Christian E; Wardemann, Hedda; Quake, Stephen R

    2014-01-01

    Efforts to determine the antibody repertoire encoded by B cells in the blood or lymphoid organs using high-throughput DNA sequencing technologies have been advancing at an extremely rapid pace and are transforming our understanding of humoral immune responses. Information gained from high-throughput DNA sequencing of immunoglobulin genes (Ig-seq) can be applied to detect B-cell malignancies with high sensitivity, to discover antibodies specific for antigens of interest, to guide vaccine development and to understand autoimmunity. Rapid progress in the development of experimental protocols and informatics analysis tools is helping to reduce sequencing artifacts, to achieve more precise quantification of clonal diversity and to extract the most pertinent biological information. That said, broader application of Ig-seq, especially in clinical settings, will require the development of a standardized experimental design framework that will enable the sharing and meta-analysis of sequencing data generated by different laboratories. PMID:24441474

  15. Next generation sequencers: methods and applications in food-borne pathogens

    USDA-ARS?s Scientific Manuscript database

    Next generation sequencers are able to produce millions of short sequence reads in a high-throughput, low-cost way. The emergence of these technologies has not only facilitated genome sequencing but also started to change the landscape of life sciences. This chapter will survey their methods and app...

  16. GENETIC-BASED ANALYTICAL METHODS FOR BACTERIA AND FUNGI

    EPA Science Inventory

    In the past two decades, advances in high-throughput sequencing technologies have lead to a veritable explosion in the generation of nucleic acid sequence information (1). While these advances are illustrated most prominently by the successful sequencing of the human genome, they...

  17. Target enrichment and high-throughput sequencing of 80 ribosomal protein genes to identify mutations associated with Diamond-Blackfan anaemia.

    PubMed

    Gerrard, Gareth; Valgañón, Mikel; Foong, Hui En; Kasperaviciute, Dalia; Iskander, Deena; Game, Laurence; Müller, Michael; Aitman, Timothy J; Roberts, Irene; de la Fuente, Josu; Foroni, Letizia; Karadimitris, Anastasios

    2013-08-01

    Diamond-Blackfan anaemia (DBA) is caused by inactivating mutations in ribosomal protein (RP) genes, with mutations in 13 of the 80 RP genes accounting for 50-60% of cases. The remaining 40-50% cases may harbour mutations in one of the remaining RP genes, but the very low frequencies render conventional genetic screening as challenging. We, therefore, applied custom enrichment technology combined with high-throughput sequencing to screen all 80 RP genes. Using this approach, we identified and validated inactivating mutations in 15/17 (88%) DBA patients. Target enrichment combined with high-throughput sequencing is a robust and improved methodology for the genetic diagnosis of DBA. © 2013 John Wiley & Sons Ltd.

  18. Networking Omic Data to Envisage Systems Biological Regulation.

    PubMed

    Kalapanulak, Saowalak; Saithong, Treenut; Thammarongtham, Chinae

    To understand how biological processes work, it is necessary to explore the systematic regulation governing the behaviour of the processes. Not only driving the normal behavior of organisms, the systematic regulation evidently underlies the temporal responses to surrounding environments (dynamics) and long-term phenotypic adaptation (evolution). The systematic regulation is, in effect, formulated from the regulatory components which collaboratively work together as a network. In the drive to decipher such a code of lives, a spectrum of technologies has continuously been developed in the post-genomic era. With current advances, high-throughput sequencing technologies are tremendously powerful for facilitating genomics and systems biology studies in the attempt to understand system regulation inside the cells. The ability to explore relevant regulatory components which infer transcriptional and signaling regulation, driving core cellular processes, is thus enhanced. This chapter reviews high-throughput sequencing technologies, including second and third generation sequencing technologies, which support the investigation of genomics and transcriptomics data. Utilization of this high-throughput data to form the virtual network of systems regulation is explained, particularly transcriptional regulatory networks. Analysis of the resulting regulatory networks could lead to an understanding of cellular systems regulation at the mechanistic and dynamics levels. The great contribution of the biological networking approach to envisage systems regulation is finally demonstrated by a broad range of examples.

  19. A high-throughput next-generation sequencing-based method for detecting the mutational fingerprint of carcinogens

    PubMed Central

    Besaratinia, Ahmad; Li, Haiqing; Yoon, Jae-In; Zheng, Albert; Gao, Hanlin; Tommasi, Stella

    2012-01-01

    Many carcinogens leave a unique mutational fingerprint in the human genome. These mutational fingerprints manifest as specific types of mutations often clustering at certain genomic loci in tumor genomes from carcinogen-exposed individuals. To develop a high-throughput method for detecting the mutational fingerprint of carcinogens, we have devised a cost-, time- and labor-effective strategy, in which the widely used transgenic Big Blue® mouse mutation detection assay is made compatible with the Roche/454 Genome Sequencer FLX Titanium next-generation sequencing technology. As proof of principle, we have used this novel method to establish the mutational fingerprints of three prominent carcinogens with varying mutagenic potencies, including sunlight ultraviolet radiation, 4-aminobiphenyl and secondhand smoke that are known to be strong, moderate and weak mutagens, respectively. For verification purposes, we have compared the mutational fingerprints of these carcinogens obtained by our newly developed method with those obtained by parallel analyses using the conventional low-throughput approach, that is, standard mutation detection assay followed by direct DNA sequencing using a capillary DNA sequencer. We demonstrate that this high-throughput next-generation sequencing-based method is highly specific and sensitive to detect the mutational fingerprints of the tested carcinogens. The method is reproducible, and its accuracy is comparable with that of the currently available low-throughput method. In conclusion, this novel method has the potential to move the field of carcinogenesis forward by allowing high-throughput analysis of mutations induced by endogenous and/or exogenous genotoxic agents. PMID:22735701

  20. A high-throughput next-generation sequencing-based method for detecting the mutational fingerprint of carcinogens.

    PubMed

    Besaratinia, Ahmad; Li, Haiqing; Yoon, Jae-In; Zheng, Albert; Gao, Hanlin; Tommasi, Stella

    2012-08-01

    Many carcinogens leave a unique mutational fingerprint in the human genome. These mutational fingerprints manifest as specific types of mutations often clustering at certain genomic loci in tumor genomes from carcinogen-exposed individuals. To develop a high-throughput method for detecting the mutational fingerprint of carcinogens, we have devised a cost-, time- and labor-effective strategy, in which the widely used transgenic Big Blue mouse mutation detection assay is made compatible with the Roche/454 Genome Sequencer FLX Titanium next-generation sequencing technology. As proof of principle, we have used this novel method to establish the mutational fingerprints of three prominent carcinogens with varying mutagenic potencies, including sunlight ultraviolet radiation, 4-aminobiphenyl and secondhand smoke that are known to be strong, moderate and weak mutagens, respectively. For verification purposes, we have compared the mutational fingerprints of these carcinogens obtained by our newly developed method with those obtained by parallel analyses using the conventional low-throughput approach, that is, standard mutation detection assay followed by direct DNA sequencing using a capillary DNA sequencer. We demonstrate that this high-throughput next-generation sequencing-based method is highly specific and sensitive to detect the mutational fingerprints of the tested carcinogens. The method is reproducible, and its accuracy is comparable with that of the currently available low-throughput method. In conclusion, this novel method has the potential to move the field of carcinogenesis forward by allowing high-throughput analysis of mutations induced by endogenous and/or exogenous genotoxic agents.

  1. Application of Genomic Technologies to the Breeding of Trees

    PubMed Central

    Badenes, Maria L.; Fernández i Martí, Angel; Ríos, Gabino; Rubio-Cabetas, María J.

    2016-01-01

    The recent introduction of next generation sequencing (NGS) technologies represents a major revolution in providing new tools for identifying the genes and/or genomic intervals controlling important traits for selection in breeding programs. In perennial fruit trees with long generation times and large sizes of adult plants, the impact of these techniques is even more important. High-throughput DNA sequencing technologies have provided complete annotated sequences in many important tree species. Most of the high-throughput genotyping platforms described are being used for studies of genetic diversity and population structure. Dissection of complex traits became possible through the availability of genome sequences along with phenotypic variation data, which allow to elucidate the causative genetic differences that give rise to observed phenotypic variation. Association mapping facilitates the association between genetic markers and phenotype in unstructured and complex populations, identifying molecular markers for assisted selection and breeding. Also, genomic data provide in silico identification and characterization of genes and gene families related to important traits, enabling new tools for molecular marker assisted selection in tree breeding. Deep sequencing of transcriptomes is also a powerful tool for the analysis of precise expression levels of each gene in a sample. It consists in quantifying short cDNA reads, obtained by NGS technologies, in order to compare the entire transcriptomes between genotypes and environmental conditions. The miRNAs are non-coding short RNAs involved in the regulation of different physiological processes, which can be identified by high-throughput sequencing of RNA libraries obtained by reverse transcription of purified short RNAs, and by in silico comparison with known miRNAs from other species. All together, NGS techniques and their applications have increased the resources for plant breeding in tree species, closing the former gap of genetic tools between trees and annual species. PMID:27895664

  2. Application of Genomic Technologies to the Breeding of Trees.

    PubMed

    Badenes, Maria L; Fernández I Martí, Angel; Ríos, Gabino; Rubio-Cabetas, María J

    2016-01-01

    The recent introduction of next generation sequencing (NGS) technologies represents a major revolution in providing new tools for identifying the genes and/or genomic intervals controlling important traits for selection in breeding programs. In perennial fruit trees with long generation times and large sizes of adult plants, the impact of these techniques is even more important. High-throughput DNA sequencing technologies have provided complete annotated sequences in many important tree species. Most of the high-throughput genotyping platforms described are being used for studies of genetic diversity and population structure. Dissection of complex traits became possible through the availability of genome sequences along with phenotypic variation data, which allow to elucidate the causative genetic differences that give rise to observed phenotypic variation. Association mapping facilitates the association between genetic markers and phenotype in unstructured and complex populations, identifying molecular markers for assisted selection and breeding. Also, genomic data provide in silico identification and characterization of genes and gene families related to important traits, enabling new tools for molecular marker assisted selection in tree breeding. Deep sequencing of transcriptomes is also a powerful tool for the analysis of precise expression levels of each gene in a sample. It consists in quantifying short cDNA reads, obtained by NGS technologies, in order to compare the entire transcriptomes between genotypes and environmental conditions. The miRNAs are non-coding short RNAs involved in the regulation of different physiological processes, which can be identified by high-throughput sequencing of RNA libraries obtained by reverse transcription of purified short RNAs, and by in silico comparison with known miRNAs from other species. All together, NGS techniques and their applications have increased the resources for plant breeding in tree species, closing the former gap of genetic tools between trees and annual species.

  3. Mobile element biology – new possibilities with high-throughput sequencing

    PubMed Central

    Xing, Jinchuan; Witherspoon, David J.; Jorde, Lynn B.

    2014-01-01

    Mobile elements compose more than half of the human genome, but until recently their large-scale detection was time-consuming and challenging. With the development of new high-throughput sequencing technologies, the complete spectrum of mobile element variation in humans can now be identified and analyzed. Thousands of new mobile element insertions have been discovered, yielding new insights into mobile element biology, evolution, and genomic variation. We review several high-throughput methods, with an emphasis on techniques that specifically target mobile element insertions in humans, and we highlight recent applications of these methods in evolutionary studies and in the analysis of somatic alterations in human cancers. PMID:23312846

  4. The autism sequencing consortium: large-scale, high-throughput sequencing in autism spectrum disorders.

    PubMed

    Buxbaum, Joseph D; Daly, Mark J; Devlin, Bernie; Lehner, Thomas; Roeder, Kathryn; State, Matthew W

    2012-12-20

    Research during the past decade has seen significant progress in the understanding of the genetic architecture of autism spectrum disorders (ASDs), with gene discovery accelerating as the characterization of genomic variation has become increasingly comprehensive. At the same time, this research has highlighted ongoing challenges. Here we address the enormous impact of high-throughput sequencing (HTS) on ASD gene discovery, outline a consensus view for leveraging this technology, and describe a large multisite collaboration developed to accomplish these goals. Similar approaches could prove effective for severe neurodevelopmental disorders more broadly. Copyright © 2012 Elsevier Inc. All rights reserved.

  5. BarraCUDA - a fast short read sequence aligner using graphics processing units

    PubMed Central

    2012-01-01

    Background With the maturation of next-generation DNA sequencing (NGS) technologies, the throughput of DNA sequencing reads has soared to over 600 gigabases from a single instrument run. General purpose computing on graphics processing units (GPGPU), extracts the computing power from hundreds of parallel stream processors within graphics processing cores and provides a cost-effective and energy efficient alternative to traditional high-performance computing (HPC) clusters. In this article, we describe the implementation of BarraCUDA, a GPGPU sequence alignment software that is based on BWA, to accelerate the alignment of sequencing reads generated by these instruments to a reference DNA sequence. Findings Using the NVIDIA Compute Unified Device Architecture (CUDA) software development environment, we ported the most computational-intensive alignment component of BWA to GPU to take advantage of the massive parallelism. As a result, BarraCUDA offers a magnitude of performance boost in alignment throughput when compared to a CPU core while delivering the same level of alignment fidelity. The software is also capable of supporting multiple CUDA devices in parallel to further accelerate the alignment throughput. Conclusions BarraCUDA is designed to take advantage of the parallelism of GPU to accelerate the alignment of millions of sequencing reads generated by NGS instruments. By doing this, we could, at least in part streamline the current bioinformatics pipeline such that the wider scientific community could benefit from the sequencing technology. BarraCUDA is currently available from http://seqbarracuda.sf.net PMID:22244497

  6. Genotyping-by-sequencing (GBS) revealed molecular genetic diversity of Iranian wheat landraces and cultivars

    USDA-ARS?s Scientific Manuscript database

    Genetic diversity is an essential resource for breeders to improve new cultivars with desirable characteristics. Recently genotyping-by-sequencing (GBS), a next generation sequencing (NGS) based technology that can simplify complex genomes, has been used as a high-throughput and cost-effective molec...

  7. Development of a high-throughput SNP resource to advance genomic, genetic and breeding research in carrot (Daucus carota L.)

    USDA-ARS?s Scientific Manuscript database

    The rapid advancement in high-throughput SNP genotyping technologies along with next generation sequencing (NGS) platforms has decreased the cost, improved the quality of large-scale genome surveys, and allowed specialty crops with limited genomic resources such as carrot (Daucus carota) to access t...

  8. Epigenetics and Epigenomics of Plants.

    PubMed

    Yadav, Chandra Bhan; Pandey, Garima; Muthamilarasan, Mehanathan; Prasad, Manoj

    2018-01-23

    The genetic material DNA in association with histone proteins forms the complex structure called chromatin, which is prone to undergo modification through certain epigenetic mechanisms including cytosine DNA methylation, histone modifications, and small RNA-mediated methylation. Alterations in chromatin structure lead to inaccessibility of genomic DNA to various regulatory proteins such as transcription factors, which eventually modulates gene expression. Advancements in high-throughput sequencing technologies have provided the opportunity to study the epigenetic mechanisms at genome-wide levels. Epigenomic studies using high-throughput technologies will widen the understanding of mechanisms as well as functions of regulatory pathways in plant genomes, which will further help in manipulating these pathways using genetic and biochemical approaches. This technology could be a potential research tool for displaying the systematic associations of genetic and epigenetic variations, especially in terms of cytosine methylation onto the genomic region in a specific cell or tissue. A comprehensive study of plant populations to correlate genotype to epigenotype and to phenotype, and also the study of methyl quantitative trait loci (QTL) or epiGWAS, is possible by using high-throughput sequencing methods, which will further accelerate molecular breeding programs for crop improvement. Graphical Abstract.

  9. The main challenges that remain in applying high-throughput sequencing to clinical diagnostics.

    PubMed

    Loeffelholz, Michael; Fofanov, Yuriy

    2015-01-01

    Over the last 10 years, the quality, price and availability of high-throughput sequencing instruments have improved to the point that this technology may be close to becoming a routine tool in the diagnostic microbiology laboratory. Two groups of challenges, however, have to be resolved in order to move this powerful research technology into routine use in the clinical microbiology laboratory. The computational/bioinformatics challenges include data storage cost and privacy concerns, requiring analysis to be performed without access to cloud storage or expensive computational infrastructure. The logistical challenges include interpretation of complex results and acceptance and understanding of the advantages and limitations of this technology by the medical community. This article focuses on the approaches to address these challenges, such as file formats, algorithms, data collection, reporting and good laboratory practices.

  10. Research progress of plant population genomics based on high-throughput sequencing.

    PubMed

    Wang, Yun-sheng

    2016-08-01

    Population genomics, a new paradigm for population genetics, combine the concepts and techniques of genomics with the theoretical system of population genetics and improve our understanding of microevolution through identification of site-specific effect and genome-wide effects using genome-wide polymorphic sites genotypeing. With the appearance and improvement of the next generation high-throughput sequencing technology, the numbers of plant species with complete genome sequences increased rapidly and large scale resequencing has also been carried out in recent years. Parallel sequencing has also been done in some plant species without complete genome sequences. These studies have greatly promoted the development of population genomics and deepened our understanding of the genetic diversity, level of linking disequilibium, selection effect, demographical history and molecular mechanism of complex traits of relevant plant population at a genomic level. In this review, I briely introduced the concept and research methods of population genomics and summarized the research progress of plant population genomics based on high-throughput sequencing. I also discussed the prospect as well as existing problems of plant population genomics in order to provide references for related studies.

  11. Microbial forensics: fiber optic microarray subtyping of Bacillus anthracis

    NASA Astrophysics Data System (ADS)

    Shepard, Jason R. E.

    2009-05-01

    The past decade has seen increased development and subsequent adoption of rapid molecular techniques involving DNA analysis for detection of pathogenic microorganisms, also termed microbial forensics. The continued accumulation of microbial sequence information in genomic databases now better positions the field of high-throughput DNA analysis to proceed in a more manageable fashion. The potential to build off of these databases exists as technology continues to develop, which will enable more rapid, cost effective analyses. This wealth of genetic information, along with new technologies, has the potential to better address some of the current problems and solve the key issues involved in DNA analysis of pathogenic microorganisms. To this end, a high density fiber optic microarray has been employed, housing numerous DNA sequences simultaneously for detection of various pathogenic microorganisms, including Bacillus anthracis, among others. Each organism is analyzed with multiple sequences and can be sub-typed against other closely related organisms. For public health labs, real-time PCR methods have been developed as an initial preliminary screen, but culture and growth are still considered the gold standard. Technologies employing higher throughput than these standard methods are better suited to capitalize on the limitless potential garnered from the sequence information. Microarray analyses are one such format positioned to exploit this potential, and our array platform is reusable, allowing repetitive tests on a single array, providing an increase in throughput and decrease in cost, along with a certainty of detection, down to the individual strain level.

  12. High-Throughput Tabular Data Processor - Platform independent graphical tool for processing large data sets.

    PubMed

    Madanecki, Piotr; Bałut, Magdalena; Buckley, Patrick G; Ochocka, J Renata; Bartoszewski, Rafał; Crossman, David K; Messiaen, Ludwine M; Piotrowski, Arkadiusz

    2018-01-01

    High-throughput technologies generate considerable amount of data which often requires bioinformatic expertise to analyze. Here we present High-Throughput Tabular Data Processor (HTDP), a platform independent Java program. HTDP works on any character-delimited column data (e.g. BED, GFF, GTF, PSL, WIG, VCF) from multiple text files and supports merging, filtering and converting of data that is produced in the course of high-throughput experiments. HTDP can also utilize itemized sets of conditions from external files for complex or repetitive filtering/merging tasks. The program is intended to aid global, real-time processing of large data sets using a graphical user interface (GUI). Therefore, no prior expertise in programming, regular expression, or command line usage is required of the user. Additionally, no a priori assumptions are imposed on the internal file composition. We demonstrate the flexibility and potential of HTDP in real-life research tasks including microarray and massively parallel sequencing, i.e. identification of disease predisposing variants in the next generation sequencing data as well as comprehensive concurrent analysis of microarray and sequencing results. We also show the utility of HTDP in technical tasks including data merge, reduction and filtering with external criteria files. HTDP was developed to address functionality that is missing or rudimentary in other GUI software for processing character-delimited column data from high-throughput technologies. Flexibility, in terms of input file handling, provides long term potential functionality in high-throughput analysis pipelines, as the program is not limited by the currently existing applications and data formats. HTDP is available as the Open Source software (https://github.com/pmadanecki/htdp).

  13. High-Throughput Tabular Data Processor – Platform independent graphical tool for processing large data sets

    PubMed Central

    Bałut, Magdalena; Buckley, Patrick G.; Ochocka, J. Renata; Bartoszewski, Rafał; Crossman, David K.; Messiaen, Ludwine M.; Piotrowski, Arkadiusz

    2018-01-01

    High-throughput technologies generate considerable amount of data which often requires bioinformatic expertise to analyze. Here we present High-Throughput Tabular Data Processor (HTDP), a platform independent Java program. HTDP works on any character-delimited column data (e.g. BED, GFF, GTF, PSL, WIG, VCF) from multiple text files and supports merging, filtering and converting of data that is produced in the course of high-throughput experiments. HTDP can also utilize itemized sets of conditions from external files for complex or repetitive filtering/merging tasks. The program is intended to aid global, real-time processing of large data sets using a graphical user interface (GUI). Therefore, no prior expertise in programming, regular expression, or command line usage is required of the user. Additionally, no a priori assumptions are imposed on the internal file composition. We demonstrate the flexibility and potential of HTDP in real-life research tasks including microarray and massively parallel sequencing, i.e. identification of disease predisposing variants in the next generation sequencing data as well as comprehensive concurrent analysis of microarray and sequencing results. We also show the utility of HTDP in technical tasks including data merge, reduction and filtering with external criteria files. HTDP was developed to address functionality that is missing or rudimentary in other GUI software for processing character-delimited column data from high-throughput technologies. Flexibility, in terms of input file handling, provides long term potential functionality in high-throughput analysis pipelines, as the program is not limited by the currently existing applications and data formats. HTDP is available as the Open Source software (https://github.com/pmadanecki/htdp). PMID:29432475

  14. The history and advances of reversible terminators used in new generations of sequencing technology.

    PubMed

    Chen, Fei; Dong, Mengxing; Ge, Meng; Zhu, Lingxiang; Ren, Lufeng; Liu, Guocheng; Mu, Rong

    2013-02-01

    DNA sequencing using reversible terminators, as one sequencing by synthesis strategy, has garnered a great deal of interest due to its popular application in the second-generation high-throughput DNA sequencing technology. In this review, we provided its history of development, classification, and working mechanism of this technology. We also outlined the screening strategies for DNA polymerases to accommodate the reversible terminators as substrates during polymerization; particularly, we introduced the "REAP" method developed by us. At the end of this review, we discussed current limitations of this approach and provided potential solutions to extend its application. Copyright © 2013. Production and hosting by Elsevier Ltd.

  15. Emerging Technologies for Gut Microbiome Research

    PubMed Central

    Arnold, Jason W.; Roach, Jeffrey; Azcarate-Peril, M. Andrea

    2016-01-01

    Understanding the importance of the gut microbiome on modulation of host health has become a subject of great interest for researchers across disciplines. As an intrinsically multidisciplinary field, microbiome research has been able to reap the benefits of technological advancements in systems and synthetic biology, biomaterials engineering, and traditional microbiology. Gut microbiome research has been revolutionized by high-throughput sequencing technology, permitting compositional and functional analyses that were previously an unrealistic undertaking. Emerging technologies including engineered organoids derived from human stem cells, high-throughput culturing, and microfluidics assays allowing for the introduction of novel approaches will improve the efficiency and quality of microbiome research. Here, we will discuss emerging technologies and their potential impact on gut microbiome studies. PMID:27426971

  16. High-Throughput Sequencing, a Versatile Weapon to Support Genome-Based Diagnosis in Infectious Diseases: Applications to Clinical Bacteriology

    PubMed Central

    Caboche, Ségolène; Audebert, Christophe; Hot, David

    2014-01-01

    The recent progresses of high-throughput sequencing (HTS) technologies enable easy and cost-reduced access to whole genome sequencing (WGS) or re-sequencing. HTS associated with adapted, automatic and fast bioinformatics solutions for sequencing applications promises an accurate and timely identification and characterization of pathogenic agents. Many studies have demonstrated that data obtained from HTS analysis have allowed genome-based diagnosis, which has been consistent with phenotypic observations. These proofs of concept are probably the first steps toward the future of clinical microbiology. From concept to routine use, many parameters need to be considered to promote HTS as a powerful tool to help physicians and clinicians in microbiological investigations. This review highlights the milestones to be completed toward this purpose. PMID:25437800

  17. Differential Expression and Functional Analysis of High-Throughput -Omics Data Using Open Source Tools.

    PubMed

    Kebschull, Moritz; Fittler, Melanie Julia; Demmer, Ryan T; Papapanou, Panos N

    2017-01-01

    Today, -omics analyses, including the systematic cataloging of messenger RNA and microRNA sequences or DNA methylation patterns in a cell population, organ, or tissue sample, allow for an unbiased, comprehensive genome-level analysis of complex diseases, offering a large advantage over earlier "candidate" gene or pathway analyses. A primary goal in the analysis of these high-throughput assays is the detection of those features among several thousand that differ between different groups of samples. In the context of oral biology, our group has successfully utilized -omics technology to identify key molecules and pathways in different diagnostic entities of periodontal disease.A major issue when inferring biological information from high-throughput -omics studies is the fact that the sheer volume of high-dimensional data generated by contemporary technology is not appropriately analyzed using common statistical methods employed in the biomedical sciences.In this chapter, we outline a robust and well-accepted bioinformatics workflow for the initial analysis of -omics data generated using microarrays or next-generation sequencing technology using open-source tools. Starting with quality control measures and necessary preprocessing steps for data originating from different -omics technologies, we next outline a differential expression analysis pipeline that can be used for data from both microarray and sequencing experiments, and offers the possibility to account for random or fixed effects. Finally, we present an overview of the possibilities for a functional analysis of the obtained data.

  18. Comparing Sanger sequencing and high-throughput metabarcoding for inferring photobiont diversity in lichens.

    PubMed

    Paul, Fiona; Otte, Jürgen; Schmitt, Imke; Dal Grande, Francesco

    2018-06-05

    The implementation of HTS (high-throughput sequencing) approaches is rapidly changing our understanding of the lichen symbiosis, by uncovering high bacterial and fungal diversity, which is often host-specific. Recently, HTS methods revealed the presence of multiple photobionts inside a single thallus in several lichen species. This differs from Sanger technology, which typically yields a single, unambiguous algal sequence per individual. Here we compared HTS and Sanger methods for estimating the diversity of green algal symbionts within lichen thalli using 240 lichen individuals belonging to two species of lichen-forming fungi. According to HTS data, Sanger technology consistently yielded the most abundant photobiont sequence in the sample. However, if the second most abundant photobiont exceeded 30% of the total HTS reads in a sample, Sanger sequencing generally failed. Our results suggest that most lichen individuals in the two analyzed species, Lasallia hispanica and L. pustulata, indeed contain a single, predominant green algal photobiont. We conclude that Sanger sequencing is a valid approach to detect the dominant photobionts in lichen individuals and populations. We discuss which research areas in lichen ecology and evolution will continue to benefit from Sanger sequencing, and which areas will profit from HTS approaches to assessing symbiont diversity.

  19. Ion channel drug discovery and research: the automated Nano-Patch-Clamp technology.

    PubMed

    Brueggemann, A; George, M; Klau, M; Beckler, M; Steindl, J; Behrends, J C; Fertig, N

    2004-01-01

    Unlike the genomics revolution, which was largely enabled by a single technological advance (high throughput sequencing), rapid advancement in proteomics will require a broader effort to increase the throughput of a number of key tools for functional analysis of different types of proteins. In the case of ion channels -a class of (membrane) proteins of great physiological importance and potential as drug targets- the lack of adequate assay technologies is felt particularly strongly. The available, indirect, high throughput screening methods for ion channels clearly generate insufficient information. The best technology to study ion channel function and screen for compound interaction is the patch clamp technique, but patch clamping suffers from low throughput, which is not acceptable for drug screening. A first step towards a solution is presented here. The nano patch clamp technology, which is based on a planar, microstructured glass chip, enables automatic whole cell patch clamp measurements. The Port-a-Patch is an automated electrophysiology workstation, which uses planar patch clamp chips. This approach enables high quality and high content ion channel and compound evaluation on a one-cell-at-a-time basis. The presented automation of the patch process and its scalability to an array format are the prerequisites for any higher throughput electrophysiology instruments.

  20. Transcription profile of boar spermatozoa as revealed by RNA-sequencing

    USDA-ARS?s Scientific Manuscript database

    High-throughput RNA sequencing (RNA-Seq) overcomes the limitations of the current hybridization-based techniques to detect the actual pool of RNA transcripts in spermatozoa. The application of this technology in livestock can speed the discovery of potential predictors of male fertility. As a first ...

  1. RIKEN Integrated Sequence Analysis (RISA) System—384-Format Sequencing Pipeline with 384 Multicapillary Sequencer

    PubMed Central

    Shibata, Kazuhiro; Itoh, Masayoshi; Aizawa, Katsunori; Nagaoka, Sumiharu; Sasaki, Nobuya; Carninci, Piero; Konno, Hideaki; Akiyama, Junichi; Nishi, Katsuo; Kitsunai, Tokuji; Tashiro, Hideo; Itoh, Mari; Sumi, Noriko; Ishii, Yoshiyuki; Nakamura, Shin; Hazama, Makoto; Nishine, Tsutomu; Harada, Akira; Yamamoto, Rintaro; Matsumoto, Hiroyuki; Sakaguchi, Sumito; Ikegami, Takashi; Kashiwagi, Katsuya; Fujiwake, Syuji; Inoue, Kouji; Togawa, Yoshiyuki; Izawa, Masaki; Ohara, Eiji; Watahiki, Masanori; Yoneda, Yuko; Ishikawa, Tomokazu; Ozawa, Kaori; Tanaka, Takumi; Matsuura, Shuji; Kawai, Jun; Okazaki, Yasushi; Muramatsu, Masami; Inoue, Yorinao; Kira, Akira; Hayashizaki, Yoshihide

    2000-01-01

    The RIKEN high-throughput 384-format sequencing pipeline (RISA system) including a 384-multicapillary sequencer (the so-called RISA sequencer) was developed for the RIKEN mouse encyclopedia project. The RISA system consists of colony picking, template preparation, sequencing reaction, and the sequencing process. A novel high-throughput 384-format capillary sequencer system (RISA sequencer system) was developed for the sequencing process. This system consists of a 384-multicapillary auto sequencer (RISA sequencer), a 384-multicapillary array assembler (CAS), and a 384-multicapillary casting device. The RISA sequencer can simultaneously analyze 384 independent sequencing products. The optical system is a scanning system chosen after careful comparison with an image detection system for the simultaneous detection of the 384-capillary array. This scanning system can be used with any fluorescent-labeled sequencing reaction (chain termination reaction), including transcriptional sequencing based on RNA polymerase, which was originally developed by us, and cycle sequencing based on thermostable DNA polymerase. For long-read sequencing, 380 out of 384 sequences (99.2%) were successfully analyzed and the average read length, with more than 99% accuracy, was 654.4 bp. A single RISA sequencer can analyze 216 kb with >99% accuracy in 2.7 h (90 kb/h). For short-read sequencing to cluster the 3′ end and 5′ end sequencing by reading 350 bp, 384 samples can be analyzed in 1.5 h. We have also developed a RISA inoculator, RISA filtrator and densitometer, RISA plasmid preparator which can handle throughput of 40,000 samples in 17.5 h, and a high-throughput RISA thermal cycler which has four 384-well sites. The combination of these technologies allowed us to construct the RISA system consisting of 16 RISA sequencers, which can process 50,000 DNA samples per day. One haploid genome shotgun sequence of a higher organism, such as human, mouse, rat, domestic animals, and plants, can be revealed by seven RISA systems within one month. PMID:11076861

  2. A confidence interval analysis of sampling effort, sequencing depth, and taxonomic resolution of fungal community ecology in the era of high-throughput sequencing.

    PubMed

    Oono, Ryoko

    2017-01-01

    High-throughput sequencing technology has helped microbial community ecologists explore ecological and evolutionary patterns at unprecedented scales. The benefits of a large sample size still typically outweigh that of greater sequencing depths per sample for accurate estimations of ecological inferences. However, excluding or not sequencing rare taxa may mislead the answers to the questions 'how and why are communities different?' This study evaluates the confidence intervals of ecological inferences from high-throughput sequencing data of foliar fungal endophytes as case studies through a range of sampling efforts, sequencing depths, and taxonomic resolutions to understand how technical and analytical practices may affect our interpretations. Increasing sampling size reliably decreased confidence intervals across multiple community comparisons. However, the effects of sequencing depths on confidence intervals depended on how rare taxa influenced the dissimilarity estimates among communities and did not significantly decrease confidence intervals for all community comparisons. A comparison of simulated communities under random drift suggests that sequencing depths are important in estimating dissimilarities between microbial communities under neutral selective processes. Confidence interval analyses reveal important biases as well as biological trends in microbial community studies that otherwise may be ignored when communities are only compared for statistically significant differences.

  3. A confidence interval analysis of sampling effort, sequencing depth, and taxonomic resolution of fungal community ecology in the era of high-throughput sequencing

    PubMed Central

    2017-01-01

    High-throughput sequencing technology has helped microbial community ecologists explore ecological and evolutionary patterns at unprecedented scales. The benefits of a large sample size still typically outweigh that of greater sequencing depths per sample for accurate estimations of ecological inferences. However, excluding or not sequencing rare taxa may mislead the answers to the questions ‘how and why are communities different?’ This study evaluates the confidence intervals of ecological inferences from high-throughput sequencing data of foliar fungal endophytes as case studies through a range of sampling efforts, sequencing depths, and taxonomic resolutions to understand how technical and analytical practices may affect our interpretations. Increasing sampling size reliably decreased confidence intervals across multiple community comparisons. However, the effects of sequencing depths on confidence intervals depended on how rare taxa influenced the dissimilarity estimates among communities and did not significantly decrease confidence intervals for all community comparisons. A comparison of simulated communities under random drift suggests that sequencing depths are important in estimating dissimilarities between microbial communities under neutral selective processes. Confidence interval analyses reveal important biases as well as biological trends in microbial community studies that otherwise may be ignored when communities are only compared for statistically significant differences. PMID:29253889

  4. Assembly and diploid architecture of an individual human genome via single-molecule technologies

    PubMed Central

    Pendleton, Matthew; Sebra, Robert; Pang, Andy Wing Chun; Ummat, Ajay; Franzen, Oscar; Rausch, Tobias; Stütz, Adrian M; Stedman, William; Anantharaman, Thomas; Hastie, Alex; Dai, Heng; Fritz, Markus Hsi-Yang; Cao, Han; Cohain, Ariella; Deikus, Gintaras; Durrett, Russell E; Blanchard, Scott C; Altman, Roger; Chin, Chen-Shan; Guo, Yan; Paxinos, Ellen E; Korbel, Jan O; Darnell, Robert B; McCombie, W Richard; Kwok, Pui-Yan; Mason, Christopher E; Schadt, Eric E; Bashir, Ali

    2015-01-01

    We present the first comprehensive analysis of a diploid human genome that combines single-molecule sequencing with single-molecule genome maps. Our hybrid assembly markedly improves upon the contiguity observed from traditional shotgun sequencing approaches, with scaffold N50 values approaching 30 Mb, and we identified complex structural variants (SVs) missed by other high-throughput approaches. Furthermore, by combining Illumina short-read data with long reads, we phased both single-nucleotide variants and SVs, generating haplotypes with over 99% consistency with previous trio-based studies. Our work shows that it is now possible to integrate single-molecule and high-throughput sequence data to generate de novo assembled genomes that approach reference quality. PMID:26121404

  5. Assembly and diploid architecture of an individual human genome via single-molecule technologies.

    PubMed

    Pendleton, Matthew; Sebra, Robert; Pang, Andy Wing Chun; Ummat, Ajay; Franzen, Oscar; Rausch, Tobias; Stütz, Adrian M; Stedman, William; Anantharaman, Thomas; Hastie, Alex; Dai, Heng; Fritz, Markus Hsi-Yang; Cao, Han; Cohain, Ariella; Deikus, Gintaras; Durrett, Russell E; Blanchard, Scott C; Altman, Roger; Chin, Chen-Shan; Guo, Yan; Paxinos, Ellen E; Korbel, Jan O; Darnell, Robert B; McCombie, W Richard; Kwok, Pui-Yan; Mason, Christopher E; Schadt, Eric E; Bashir, Ali

    2015-08-01

    We present the first comprehensive analysis of a diploid human genome that combines single-molecule sequencing with single-molecule genome maps. Our hybrid assembly markedly improves upon the contiguity observed from traditional shotgun sequencing approaches, with scaffold N50 values approaching 30 Mb, and we identified complex structural variants (SVs) missed by other high-throughput approaches. Furthermore, by combining Illumina short-read data with long reads, we phased both single-nucleotide variants and SVs, generating haplotypes with over 99% consistency with previous trio-based studies. Our work shows that it is now possible to integrate single-molecule and high-throughput sequence data to generate de novo assembled genomes that approach reference quality.

  6. Lights, camera, action: high-throughput plant phenotyping is ready for a close-up

    USDA-ARS?s Scientific Manuscript database

    Modern techniques for crop improvement rely on both DNA sequencing and accurate quantification of plant traits to identify genes and germplasm of interest. With rapid advances in DNA sequencing technologies, plant phenotyping is now a bottleneck in advancing crop yields [1,2]. Furthermore, the envir...

  7. Mining conifers’ mega-genome using rapid and efficient multiplexed high-throughput genotyping-by-sequencing (GBS) SNP discovery platform

    USDA-ARS?s Scientific Manuscript database

    Next-generation sequencing (NGS) technologies are revolutionizing both medical and biological research through generation of massive SNP data sets for identifying heritable genome variation underlying key traits, from rare human diseases to important agronomic phenotypes in crop species. We evaluate...

  8. Ultra high-throughput nucleic acid sequencing as a tool for virus discovery in the turkey gut.

    USDA-ARS?s Scientific Manuscript database

    Recently, the use of the next generation of nucleic acid sequencing technology (i.e., 454 pyrosequencing, as developed by Roche/454 Life Sciences) has allowed an in-depth look at the uncultivated microorganisms present in complex environmental samples, including samples with agricultural importance....

  9. Physico-chemical foundations underpinning microarray and next-generation sequencing experiments

    PubMed Central

    Harrison, Andrew; Binder, Hans; Buhot, Arnaud; Burden, Conrad J.; Carlon, Enrico; Gibas, Cynthia; Gamble, Lara J.; Halperin, Avraham; Hooyberghs, Jef; Kreil, David P.; Levicky, Rastislav; Noble, Peter A.; Ott, Albrecht; Pettitt, B. Montgomery; Tautz, Diethard; Pozhitkov, Alexander E.

    2013-01-01

    Hybridization of nucleic acids on solid surfaces is a key process involved in high-throughput technologies such as microarrays and, in some cases, next-generation sequencing (NGS). A physical understanding of the hybridization process helps to determine the accuracy of these technologies. The goal of a widespread research program is to develop reliable transformations between the raw signals reported by the technologies and individual molecular concentrations from an ensemble of nucleic acids. This research has inputs from many areas, from bioinformatics and biostatistics, to theoretical and experimental biochemistry and biophysics, to computer simulations. A group of leading researchers met in Ploen Germany in 2011 to discuss present knowledge and limitations of our physico-chemical understanding of high-throughput nucleic acid technologies. This meeting inspired us to write this summary, which provides an overview of the state-of-the-art approaches based on physico-chemical foundation to modeling of the nucleic acids hybridization process on solid surfaces. In addition, practical application of current knowledge is emphasized. PMID:23307556

  10. Minfi: a flexible and comprehensive Bioconductor package for the analysis of Infinium DNA methylation microarrays

    PubMed Central

    Aryee, Martin J.; Jaffe, Andrew E.; Corrada-Bravo, Hector; Ladd-Acosta, Christine; Feinberg, Andrew P.; Hansen, Kasper D.; Irizarry, Rafael A.

    2014-01-01

    Motivation: The recently released Infinium HumanMethylation450 array (the ‘450k’ array) provides a high-throughput assay to quantify DNA methylation (DNAm) at ∼450 000 loci across a range of genomic features. Although less comprehensive than high-throughput sequencing-based techniques, this product is more cost-effective and promises to be the most widely used DNAm high-throughput measurement technology over the next several years. Results: Here we describe a suite of computational tools that incorporate state-of-the-art statistical techniques for the analysis of DNAm data. The software is structured to easily adapt to future versions of the technology. We include methods for preprocessing, quality assessment and detection of differentially methylated regions from the kilobase to the megabase scale. We show how our software provides a powerful and flexible development platform for future methods. We also illustrate how our methods empower the technology to make discoveries previously thought to be possible only with sequencing-based methods. Availability and implementation: http://bioconductor.org/packages/release/bioc/html/minfi.html. Contact: khansen@jhsph.edu; rafa@jimmy.harvard.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:24478339

  11. Enabling systematic interrogation of protein-protein interactions in live cells with a versatile ultra-high-throughput biosensor platform | Office of Cancer Genomics

    Cancer.gov

    The vast datasets generated by next generation gene sequencing and expression profiling have transformed biological and translational research. However, technologies to produce large-scale functional genomics datasets, such as high-throughput detection of protein-protein interactions (PPIs), are still in early development. While a number of powerful technologies have been employed to detect PPIs, a singular PPI biosensor platform featured with both high sensitivity and robustness in a mammalian cell environment remains to be established.

  12. Report for the NGFA-5 project.

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Jaing, C; Jackson, P; Thissen, J

    The objective of this project is to provide DHS a comprehensive evaluation of the current genomic technologies including genotyping, TaqMan PCR, multiple locus variable tandem repeat analysis (MLVA), microarray and high-throughput DNA sequencing in the analysis of biothreat agents from complex environmental samples. To effectively compare the sensitivity and specificity of the different genomic technologies, we used SNP TaqMan PCR, MLVA, microarray and high-throughput illumine and 454 sequencing to test various strains from B. anthracis, B. thuringiensis, BioWatch aerosol filter extracts or soil samples that were spiked with B. anthracis, and samples that were previously collected during DHS and EPAmore » environmental release exercises that were known to contain B. thuringiensis spores. The results of all the samples against the various assays are discussed in this report.« less

  13. Comparison of Burrows-Wheeler transform-based mapping algorithms used in high-throughput whole-genome sequencing: application to Illumina data for livestock genomes

    USDA-ARS?s Scientific Manuscript database

    Ongoing developments and cost decreases in next-generation sequencing (NGS) technologies have led to an increase in their application, which has greatly enhanced the fields of genetics and genomics. Mapping sequence reads onto a reference genome is a fundamental step in the analysis of NGS data. Eff...

  14. High-Throughput Mapping of Single-Neuron Projections by Sequencing of Barcoded RNA.

    PubMed

    Kebschull, Justus M; Garcia da Silva, Pedro; Reid, Ashlan P; Peikon, Ian D; Albeanu, Dinu F; Zador, Anthony M

    2016-09-07

    Neurons transmit information to distant brain regions via long-range axonal projections. In the mouse, area-to-area connections have only been systematically mapped using bulk labeling techniques, which obscure the diverse projections of intermingled single neurons. Here we describe MAPseq (Multiplexed Analysis of Projections by Sequencing), a technique that can map the projections of thousands or even millions of single neurons by labeling large sets of neurons with random RNA sequences ("barcodes"). Axons are filled with barcode mRNA, each putative projection area is dissected, and the barcode mRNA is extracted and sequenced. Applying MAPseq to the locus coeruleus (LC), we find that individual LC neurons have preferred cortical targets. By recasting neuroanatomy, which is traditionally viewed as a problem of microscopy, as a problem of sequencing, MAPseq harnesses advances in sequencing technology to permit high-throughput interrogation of brain circuits. Copyright © 2016 Elsevier Inc. All rights reserved.

  15. Identification and correction of systematic error in high-throughput sequence data

    PubMed Central

    2011-01-01

    Background A feature common to all DNA sequencing technologies is the presence of base-call errors in the sequenced reads. The implications of such errors are application specific, ranging from minor informatics nuisances to major problems affecting biological inferences. Recently developed "next-gen" sequencing technologies have greatly reduced the cost of sequencing, but have been shown to be more error prone than previous technologies. Both position specific (depending on the location in the read) and sequence specific (depending on the sequence in the read) errors have been identified in Illumina and Life Technology sequencing platforms. We describe a new type of systematic error that manifests as statistically unlikely accumulations of errors at specific genome (or transcriptome) locations. Results We characterize and describe systematic errors using overlapping paired reads from high-coverage data. We show that such errors occur in approximately 1 in 1000 base pairs, and that they are highly replicable across experiments. We identify motifs that are frequent at systematic error sites, and describe a classifier that distinguishes heterozygous sites from systematic error. Our classifier is designed to accommodate data from experiments in which the allele frequencies at heterozygous sites are not necessarily 0.5 (such as in the case of RNA-Seq), and can be used with single-end datasets. Conclusions Systematic errors can easily be mistaken for heterozygous sites in individuals, or for SNPs in population analyses. Systematic errors are particularly problematic in low coverage experiments, or in estimates of allele-specific expression from RNA-Seq data. Our characterization of systematic error has allowed us to develop a program, called SysCall, for identifying and correcting such errors. We conclude that correction of systematic errors is important to consider in the design and interpretation of high-throughput sequencing experiments. PMID:22099972

  16. SMARTIV: combined sequence and structure de-novo motif discovery for in-vivo RNA binding data.

    PubMed

    Polishchuk, Maya; Paz, Inbal; Yakhini, Zohar; Mandel-Gutfreund, Yael

    2018-05-25

    Gene expression regulation is highly dependent on binding of RNA-binding proteins (RBPs) to their RNA targets. Growing evidence supports the notion that both RNA primary sequence and its local secondary structure play a role in specific Protein-RNA recognition and binding. Despite the great advance in high-throughput experimental methods for identifying sequence targets of RBPs, predicting the specific sequence and structure binding preferences of RBPs remains a major challenge. We present a novel webserver, SMARTIV, designed for discovering and visualizing combined RNA sequence and structure motifs from high-throughput RNA-binding data, generated from in-vivo experiments. The uniqueness of SMARTIV is that it predicts motifs from enriched k-mers that combine information from ranked RNA sequences and their predicted secondary structure, obtained using various folding methods. Consequently, SMARTIV generates Position Weight Matrices (PWMs) in a combined sequence and structure alphabet with assigned P-values. SMARTIV concisely represents the sequence and structure motif content as a single graphical logo, which is informative and easy for visual perception. SMARTIV was examined extensively on a variety of high-throughput binding experiments for RBPs from different families, generated from different technologies, showing consistent and accurate results. Finally, SMARTIV is a user-friendly webserver, highly efficient in run-time and freely accessible via http://smartiv.technion.ac.il/.

  17. Low-Cost, High-Throughput Sequencing of DNA Assemblies Using a Highly Multiplexed Nextera Process.

    PubMed

    Shapland, Elaine B; Holmes, Victor; Reeves, Christopher D; Sorokin, Elena; Durot, Maxime; Platt, Darren; Allen, Christopher; Dean, Jed; Serber, Zach; Newman, Jack; Chandran, Sunil

    2015-07-17

    In recent years, next-generation sequencing (NGS) technology has greatly reduced the cost of sequencing whole genomes, whereas the cost of sequence verification of plasmids via Sanger sequencing has remained high. Consequently, industrial-scale strain engineers either limit the number of designs or take short cuts in quality control. Here, we show that over 4000 plasmids can be completely sequenced in one Illumina MiSeq run for less than $3 each (15× coverage), which is a 20-fold reduction over using Sanger sequencing (2× coverage). We reduced the volume of the Nextera tagmentation reaction by 100-fold and developed an automated workflow to prepare thousands of samples for sequencing. We also developed software to track the samples and associated sequence data and to rapidly identify correctly assembled constructs having the fewest defects. As DNA synthesis and assembly become a centralized commodity, this NGS quality control (QC) process will be essential to groups operating high-throughput pipelines for DNA construction.

  18. Next-Generation Technologies for Multiomics Approaches Including Interactome Sequencing

    PubMed Central

    Ohashi, Hiroyuki; Miyamoto-Sato, Etsuko

    2015-01-01

    The development of high-speed analytical techniques such as next-generation sequencing and microarrays allows high-throughput analysis of biological information at a low cost. These techniques contribute to medical and bioscience advancements and provide new avenues for scientific research. Here, we outline a variety of new innovative techniques and discuss their use in omics research (e.g., genomics, transcriptomics, metabolomics, proteomics, and interactomics). We also discuss the possible applications of these methods, including an interactome sequencing technology that we developed, in future medical and life science research. PMID:25649523

  19. [Prospects for applications in human health of nanopore-based sequencing].

    PubMed

    Audebert, Christophe; Hot, David; Caboche, Ségolène

    2018-04-01

    High throughput sequencing has opened up new clinical opportunities moving towards a medicine of precision. Oncology, infectious diseases or human genomics, many applications have been developed in recent years. The introduction of a third generation of nanopore-based sequencing technology, addressing some of the weaknesses of the previous generation, heralds a new revolution. Portability, real time, long reads and marginal investment costs, these promising new technologies point to a new shift of paradigm. What are the perspectives opened up by nanopores for clinical applications? © 2018 médecine/sciences – Inserm.

  20. Historical Perspective, Development and Applications of Next-Generation Sequencing in Plant Virology

    PubMed Central

    Barba, Marina; Czosnek, Henryk; Hadidi, Ahmed

    2014-01-01

    Next-generation high throughput sequencing technologies became available at the onset of the 21st century. They provide a highly efficient, rapid, and low cost DNA sequencing platform beyond the reach of the standard and traditional DNA sequencing technologies developed in the late 1970s. They are continually improved to become faster, more efficient and cheaper. They have been used in many fields of biology since 2004. In 2009, next-generation sequencing (NGS) technologies began to be applied to several areas of plant virology including virus/viroid genome sequencing, discovery and detection, ecology and epidemiology, replication and transcription. Identification and characterization of known and unknown viruses and/or viroids in infected plants are currently among the most successful applications of these technologies. It is expected that NGS will play very significant roles in many research and non-research areas of plant virology. PMID:24399207

  1. Comparative Transcriptomes and EVO-DEVO Studies Depending on Next Generation Sequencing.

    PubMed

    Liu, Tiancheng; Yu, Lin; Liu, Lei; Li, Hong; Li, Yixue

    2015-01-01

    High throughput technology has prompted the progressive omics studies, including genomics and transcriptomics. We have reviewed the improvement of comparative omic studies, which are attributed to the high throughput measurement of next generation sequencing technology. Comparative genomics have been successfully applied to evolution analysis while comparative transcriptomics are adopted in comparison of expression profile from two subjects by differential expression or differential coexpression, which enables their application in evolutionary developmental biology (EVO-DEVO) studies. EVO-DEVO studies focus on the evolutionary pressure affecting the morphogenesis of development and previous works have been conducted to illustrate the most conserved stages during embryonic development. Old measurements of these studies are based on the morphological similarity from macro view and new technology enables the micro detection of similarity in molecular mechanism. Evolutionary model of embryo development, which includes the "funnel-like" model and the "hourglass" model, has been evaluated by combination of these new comparative transcriptomic methods with prior comparative genomic information. Although the technology has promoted the EVO-DEVO studies into a new era, technological and material limitation still exist and further investigations require more subtle study design and procedure.

  2. High throughput ion-channel pharmacology: planar-array-based voltage clamp.

    PubMed

    Kiss, Laszlo; Bennett, Paul B; Uebele, Victor N; Koblan, Kenneth S; Kane, Stefanie A; Neagle, Brad; Schroeder, Kirk

    2003-02-01

    Technological advances often drive major breakthroughs in biology. Examples include PCR, automated DNA sequencing, confocal/single photon microscopy, AFM, and voltage/patch-clamp methods. The patch-clamp method, first described nearly 30 years ago, was a major technical achievement that permitted voltage-clamp analysis (membrane potential control) of ion channels in most cells and revealed a role for channels in unimagined areas. Because of the high information content, voltage clamp is the best way to study ion-channel function; however, throughput is too low for drug screening. Here we describe a novel breakthrough planar-array-based HT patch-clamp technology developed by Essen Instruments capable of voltage-clamping thousands of cells per day. This technology provides greater than two orders of magnitude increase in throughput compared with the traditional voltage-clamp techniques. We have applied this method to study the hERG K(+) channel and to determine the pharmacological profile of QT prolonging drugs.

  3. Re-engineering adenovirus vector systems to enable high-throughput analyses of gene function.

    PubMed

    Stanton, Richard J; McSharry, Brian P; Armstrong, Melanie; Tomasec, Peter; Wilkinson, Gavin W G

    2008-12-01

    With the enhanced capacity of bioinformatics to interrogate extensive banks of sequence data, more efficient technologies are needed to test gene function predictions. Replication-deficient recombinant adenovirus (Ad) vectors are widely used in expression analysis since they provide for extremely efficient expression of transgenes in a wide range of cell types. To facilitate rapid, high-throughput generation of recombinant viruses, we have re-engineered an adenovirus vector (designated AdZ) to allow single-step, directional gene insertion using recombineering technology. Recombineering allows for direct insertion into the Ad vector of PCR products, synthesized sequences, or oligonucleotides encoding shRNAs without requirement for a transfer vector Vectors were optimized for high-throughput applications by making them "self-excising" through incorporating the I-SceI homing endonuclease into the vector removing the need to linearize vectors prior to transfection into packaging cells. AdZ vectors allow genes to be expressed in their native form or with strep, V5, or GFP tags. Insertion of tetracycline operators downstream of the human cytomegalovirus major immediate early (HCMV MIE) promoter permits silencing of transgenes in helper cells expressing the tet repressor thus making the vector compatible with the cloning of toxic gene products. The AdZ vector system is robust, straightforward, and suited to both sporadic and high-throughput applications.

  4. High-throughput metagenomic technologies for complex microbial community analysis: open and closed formats.

    PubMed

    Zhou, Jizhong; He, Zhili; Yang, Yunfeng; Deng, Ye; Tringe, Susannah G; Alvarez-Cohen, Lisa

    2015-01-27

    Understanding the structure, functions, activities and dynamics of microbial communities in natural environments is one of the grand challenges of 21st century science. To address this challenge, over the past decade, numerous technologies have been developed for interrogating microbial communities, of which some are amenable to exploratory work (e.g., high-throughput sequencing and phenotypic screening) and others depend on reference genes or genomes (e.g., phylogenetic and functional gene arrays). Here, we provide a critical review and synthesis of the most commonly applied "open-format" and "closed-format" detection technologies. We discuss their characteristics, advantages, and disadvantages within the context of environmental applications and focus on analysis of complex microbial systems, such as those in soils, in which diversity is high and reference genomes are few. In addition, we discuss crucial issues and considerations associated with applying complementary high-throughput molecular technologies to address important ecological questions. Copyright © 2015 Zhou et al.

  5. High-Throughput Metagenomic Technologies for Complex Microbial Community Analysis: Open and Closed Formats

    PubMed Central

    He, Zhili; Yang, Yunfeng; Deng, Ye; Tringe, Susannah G.; Alvarez-Cohen, Lisa

    2015-01-01

    ABSTRACT   Understanding the structure, functions, activities and dynamics of microbial communities in natural environments is one of the grand challenges of 21st century science. To address this challenge, over the past decade, numerous technologies have been developed for interrogating microbial communities, of which some are amenable to exploratory work (e.g., high-throughput sequencing and phenotypic screening) and others depend on reference genes or genomes (e.g., phylogenetic and functional gene arrays). Here, we provide a critical review and synthesis of the most commonly applied “open-format” and “closed-format” detection technologies. We discuss their characteristics, advantages, and disadvantages within the context of environmental applications and focus on analysis of complex microbial systems, such as those in soils, in which diversity is high and reference genomes are few. In addition, we discuss crucial issues and considerations associated with applying complementary high-throughput molecular technologies to address important ecological questions. PMID:25626903

  6. High-throughput metagenomic technologies for complex microbial community analysis. Open and closed formats

    DOE PAGES

    Zhou, Jizhong; He, Zhili; Yang, Yunfeng; ...

    2015-01-27

    Understanding the structure, functions, activities and dynamics of microbial communities in natural environments is one of the grand challenges of 21st century science. To address this challenge, over the past decade, numerous technologies have been developed for interrogating microbial communities, of which some are amenable to exploratory work (e.g., high-throughput sequencing and phenotypic screening) and others depend on reference genes or genomes (e.g., phylogenetic and functional gene arrays). Here, we provide a critical review and synthesis of the most commonly applied “open-format” and “closed-format” detection technologies. We discuss their characteristics, advantages, and disadvantages within the context of environmental applications andmore » focus on analysis of complex microbial systems, such as those in soils, in which diversity is high and reference genomes are few. In addition, we discuss crucial issues and considerations associated with applying complementary high-throughput molecular technologies to address important ecological questions.« less

  7. Dissecting enzyme function with microfluidic-based deep mutational scanning.

    PubMed

    Romero, Philip A; Tran, Tuan M; Abate, Adam R

    2015-06-09

    Natural enzymes are incredibly proficient catalysts, but engineering them to have new or improved functions is challenging due to the complexity of how an enzyme's sequence relates to its biochemical properties. Here, we present an ultrahigh-throughput method for mapping enzyme sequence-function relationships that combines droplet microfluidic screening with next-generation DNA sequencing. We apply our method to map the activity of millions of glycosidase sequence variants. Microfluidic-based deep mutational scanning provides a comprehensive and unbiased view of the enzyme function landscape. The mapping displays expected patterns of mutational tolerance and a strong correspondence to sequence variation within the enzyme family, but also reveals previously unreported sites that are crucial for glycosidase function. We modified the screening protocol to include a high-temperature incubation step, and the resulting thermotolerance landscape allowed the discovery of mutations that enhance enzyme thermostability. Droplet microfluidics provides a general platform for enzyme screening that, when combined with DNA-sequencing technologies, enables high-throughput mapping of enzyme sequence space.

  8. Novel genetic tools for studying food-borne Salmonella.

    PubMed

    Andrews-Polymenis, Helene L; Santiviago, Carlos A; McClelland, Michael

    2009-04-01

    Nontyphoidal Salmonellae are highly prevalent food-borne pathogens. High-throughput sequencing of Salmonella genomes is expanding our knowledge of the evolution of serovars and epidemic isolates. Genome sequences have also allowed the creation of complete microarrays. Microarrays have improved the throughput of in vivo expression technology (IVET) used to uncover promoters active during infection. In another method, signature tagged mutagenesis (STM), pools of mutants are subjected to selection. Changes in the population are monitored on a microarray, revealing genes under selection. Complete genome sequences permit the construction of pools of targeted in-frame deletions that have improved STM by minimizing the number of clones and the polarity of each mutant. Together, genome sequences and the continuing development of new tools for functional genomics will drive a revolution in the understanding of Salmonellae in many different niches that are critical for food safety.

  9. A Next-Generation Sequencing Primer—How Does It Work and What Can It Do?

    PubMed Central

    Alekseyev, Yuriy O.; Fazeli, Roghayeh; Yang, Shi; Basran, Raveen; Miller, Nancy S.

    2018-01-01

    Next-generation sequencing refers to a high-throughput technology that determines the nucleic acid sequences and identifies variants in a sample. The technology has been introduced into clinical laboratory testing and produces test results for precision medicine. Since next-generation sequencing is relatively new, graduate students, medical students, pathology residents, and other physicians may benefit from a primer to provide a foundation about basic next-generation sequencing methods and applications, as well as specific examples where it has had diagnostic and prognostic utility. Next-generation sequencing technology grew out of advances in multiple fields to produce a sophisticated laboratory test with tremendous potential. Next-generation sequencing may be used in the clinical setting to look for specific genetic alterations in patients with cancer, diagnose inherited conditions such as cystic fibrosis, and detect and profile microbial organisms. This primer will review DNA sequencing technology, the commercialization of next-generation sequencing, and clinical uses of next-generation sequencing. Specific applications where next-generation sequencing has demonstrated utility in oncology are provided. PMID:29761157

  10. High-throughput tetrad analysis.

    PubMed

    Ludlow, Catherine L; Scott, Adrian C; Cromie, Gareth A; Jeffery, Eric W; Sirr, Amy; May, Patrick; Lin, Jake; Gilbert, Teresa L; Hays, Michelle; Dudley, Aimée M

    2013-07-01

    Tetrad analysis has been a gold-standard genetic technique for several decades. Unfortunately, the need to manually isolate, disrupt and space tetrads has relegated its application to small-scale studies and limited its integration with high-throughput DNA sequencing technologies. We have developed a rapid, high-throughput method, called barcode-enabled sequencing of tetrads (BEST), that uses (i) a meiosis-specific GFP fusion protein to isolate tetrads by FACS and (ii) molecular barcodes that are read during genotyping to identify spores derived from the same tetrad. Maintaining tetrad information allows accurate inference of missing genetic markers and full genotypes of missing (and presumably nonviable) individuals. An individual researcher was able to isolate over 3,000 yeast tetrads in 3 h, an output equivalent to that of almost 1 month of manual dissection. BEST is transferable to other microorganisms for which meiotic mapping is significantly more laborious.

  11. Molecular Markers and Cotton Genetic Improvement: Current Status and Future Prospects

    PubMed Central

    Malik, Waqas; Iqbal, Muhammad Zaffar; Ali Khan, Asif; Qayyum, Abdul; Ali Abid, Muhammad; Noor, Etrat; Qadir Ahmad, Muhammad; Hasan Abbasi, Ghulam

    2014-01-01

    Narrow genetic base and complex allotetraploid genome of cotton (Gossypium hirsutum L.) is stimulating efforts to avail required polymorphism for marker based breeding. The availability of draft genome sequence of G. raimondii and G. arboreum and next generation sequencing (NGS) technologies facilitated the development of high-throughput marker technologies in cotton. The concepts of genetic diversity, QTL mapping, and marker assisted selection (MAS) are evolving into more efficient concepts of linkage disequilibrium, association mapping, and genomic selection, respectively. The objective of the current review is to analyze the pace of evolution in the molecular marker technologies in cotton during the last ten years into the following four areas: (i) comparative analysis of low- and high-throughput marker technologies available in cotton, (ii) genetic diversity in the available wild and improved gene pools of cotton, (iii) identification of the genomic regions within cotton genome underlying economic traits, and (iv) marker based selection methodologies. Moreover, the applications of marker technologies to enhance the breeding efficiency in cotton are also summarized. Aforementioned genomic technologies and the integration of several other omics resources are expected to enhance the cotton productivity and meet the global fiber quantity and quality demands. PMID:25401149

  12. The Genome Sequencer FLX System--longer reads, more applications, straight forward bioinformatics and more complete data sets.

    PubMed

    Droege, Marcus; Hill, Brendon

    2008-08-31

    The Genome Sequencer FLX System (GS FLX), powered by 454 Sequencing, is a next-generation DNA sequencing technology featuring a unique mix of long reads, exceptional accuracy, and ultra-high throughput. It has been proven to be the most versatile of all currently available next-generation sequencing technologies, supporting many high-profile studies in over seven applications categories. GS FLX users have pursued innovative research in de novo sequencing, re-sequencing of whole genomes and target DNA regions, metagenomics, and RNA analysis. 454 Sequencing is a powerful tool for human genetics research, having recently re-sequenced the genome of an individual human, currently re-sequencing the complete human exome and targeted genomic regions using the NimbleGen sequence capture process, and detected low-frequency somatic mutations linked to cancer.

  13. [Community composition and diversity of endophytic fungi from roots of Sinopodophyllum hexandrum in forest of Upper-north mountain of Qinghai province].

    PubMed

    Ning, Yi; Li, Yan-Ling; Zhou, Guo-Ying; Yang, Lu-Cun; Xu, Wen-Hua

    2016-04-01

    High throughput sequencing technology is also called Next Generation Sequencing (NGS), which can sequence hundreds and thousands sequences in different samples at the same time. In the present study, the culture-independent high throughput sequencing technology was applied to sequence the fungi metagenomic DNA of the fungal internal transcribed spacer 1(ITS 1) in the root of Sinopodophyllum hexandrum. Sequencing data suggested that after the quality control, 22 565 reads were remained. Cluster similarity analysis was done based on 97% sequence similarity, which obtained 517 OTUs for the three samples (LD1, LD2 and LD3). All the fungi which identified from all the reads of OTUs based on 0.8 classification thresholds using the software of RDP classifier were classified as 13 classes, 35 orders, 44 family, 55 genera. Among these genera, the genus of Tetracladium was the dominant genera in all samples(35.49%, 68.55% and 12.96%).The Shannon's diversity indices and the Simpson indices of the endophytic fungi in the samples ranged from 1.75-2.92, 0.11-0.32, respectively.This is the first time for applying high through put sequencing technol-ogyto analyze the community composition and diversity of endophytic fungi in the medicinal plant, and the results showed that there were hyper diver sity and high community composition complexity of endophytic fungi in the root of S. hexandrum. It is also proved that the high through put sequencing technology has great advantage for analyzing ecommunity composition and diversity of endophtye in the plant. Copyright© by the Chinese Pharmaceutical Association.

  14. The first FDA marketing authorizations of next-generation sequencing technology and tests: challenges, solutions and impact for future assays.

    PubMed

    Bijwaard, Karen; Dickey, Jennifer S; Kelm, Kellie; Težak, Živana

    2015-01-01

    The rapid emergence and clinical translation of novel high-throughput sequencing technologies created a need to clarify the regulatory pathway for the evaluation and authorization of these unique technologies. Recently, the US FDA authorized for marketing four next generation sequencing (NGS)-based diagnostic devices which consisted of two heritable disease-specific assays, library preparation reagents and a NGS platform that are intended for human germline targeted sequencing from whole blood. These first authorizations can serve as a case study in how different types of NGS-based technology are reviewed by the FDA. In this manuscript we describe challenges associated with the evaluation of these novel technologies and provide an overview of what was reviewed. Besides making validated NGS-based devices available for in vitro diagnostic use, these first authorizations create a regulatory path for similar future instruments and assays.

  15. The future scalability of pH-based genome sequencers: A theoretical perspective

    NASA Astrophysics Data System (ADS)

    Go, Jonghyun; Alam, Muhammad A.

    2013-10-01

    Sequencing of human genome is an essential prerequisite for personalized medicine and early prognosis of various genetic diseases. The state-of-art, high-throughput genome sequencing technologies provide improved sequencing; however, their reliance on relatively expensive optical detection schemes has prevented wide-spread adoption of the technology in routine care. In contrast, the recently announced pH-based electronic genome sequencers achieve fast sequencing at low cost because of the compatibility with the current microelectronics technology. While the progress in technology development has been rapid, the physics of the sequencing chips and the potential for future scaling (and therefore, cost reduction) remain unexplored. In this article, we develop a theoretical framework and a scaling theory to explain the principle of operation of the pH-based sequencing chips and use the framework to explore various perceived scaling limits of the technology related to signal to noise ratio, well-to-well crosstalk, and sequencing accuracy. We also address several limitations inherent to the key steps of pH-based genome sequencers, which are widely shared by many other sequencing platforms in the market but remained unexplained properly so far.

  16. Precision medicine in the age of big data: The present and future role of large-scale unbiased sequencing in drug discovery and development.

    PubMed

    Vicini, P; Fields, O; Lai, E; Litwack, E D; Martin, A-M; Morgan, T M; Pacanowski, M A; Papaluca, M; Perez, O D; Ringel, M S; Robson, M; Sakul, H; Vockley, J; Zaks, T; Dolsten, M; Søgaard, M

    2016-02-01

    High throughput molecular and functional profiling of patients is a key driver of precision medicine. DNA and RNA characterization has been enabled at unprecedented cost and scale through rapid, disruptive progress in sequencing technology, but challenges persist in data management and interpretation. We analyze the state-of-the-art of large-scale unbiased sequencing in drug discovery and development, including technology, application, ethical, regulatory, policy and commercial considerations, and discuss issues of LUS implementation in clinical and regulatory practice. © 2015 American Society for Clinical Pharmacology and Therapeutics.

  17. Deep sequencing of evolving pathogen populations: applications, errors, and bioinformatic solutions

    PubMed Central

    2014-01-01

    Deep sequencing harnesses the high throughput nature of next generation sequencing technologies to generate population samples, treating information contained in individual reads as meaningful. Here, we review applications of deep sequencing to pathogen evolution. Pioneering deep sequencing studies from the virology literature are discussed, such as whole genome Roche-454 sequencing analyses of the dynamics of the rapidly mutating pathogens hepatitis C virus and HIV. Extension of the deep sequencing approach to bacterial populations is then discussed, including the impacts of emerging sequencing technologies. While it is clear that deep sequencing has unprecedented potential for assessing the genetic structure and evolutionary history of pathogen populations, bioinformatic challenges remain. We summarise current approaches to overcoming these challenges, in particular methods for detecting low frequency variants in the context of sequencing error and reconstructing individual haplotypes from short reads. PMID:24428920

  18. Characterizing ncRNAs in Human Pathogenic Protists Using High-Throughput Sequencing Technology

    PubMed Central

    Collins, Lesley Joan

    2011-01-01

    ncRNAs are key genes in many human diseases including cancer and viral infection, as well as providing critical functions in pathogenic organisms such as fungi, bacteria, viruses, and protists. Until now the identification and characterization of ncRNAs associated with disease has been slow or inaccurate requiring many years of testing to understand complicated RNA and protein gene relationships. High-throughput sequencing now offers the opportunity to characterize miRNAs, siRNAs, small nucleolar RNAs (snoRNAs), and long ncRNAs on a genomic scale, making it faster and easier to clarify how these ncRNAs contribute to the disease state. However, this technology is still relatively new, and ncRNA discovery is not an application of high priority for streamlined bioinformatics. Here we summarize background concepts and practical approaches for ncRNA analysis using high-throughput sequencing, and how it relates to understanding human disease. As a case study, we focus on the parasitic protists Giardia lamblia and Trichomonas vaginalis, where large evolutionary distance has meant difficulties in comparing ncRNAs with those from model eukaryotes. A combination of biological, computational, and sequencing approaches has enabled easier classification of ncRNA classes such as snoRNAs, but has also aided the identification of novel classes. It is hoped that a higher level of understanding of ncRNA expression and interaction may aid in the development of less harsh treatment for protist-based diseases. PMID:22303390

  19. Diff-seq: A high throughput sequencing-based mismatch detection assay for DNA variant enrichment and discovery

    PubMed Central

    Karas, Vlad O; Sinnott-Armstrong, Nicholas A; Varghese, Vici; Shafer, Robert W; Greenleaf, William J; Sherlock, Gavin

    2018-01-01

    Abstract Much of the within species genetic variation is in the form of single nucleotide polymorphisms (SNPs), typically detected by whole genome sequencing (WGS) or microarray-based technologies. However, WGS produces mostly uninformative reads that perfectly match the reference, while microarrays require genome-specific reagents. We have developed Diff-seq, a sequencing-based mismatch detection assay for SNP discovery without the requirement for specialized nucleic-acid reagents. Diff-seq leverages the Surveyor endonuclease to cleave mismatched DNA molecules that are generated after cross-annealing of a complex pool of DNA fragments. Sequencing libraries enriched for Surveyor-cleaved molecules result in increased coverage at the variant sites. Diff-seq detected all mismatches present in an initial test substrate, with specific enrichment dependent on the identity and context of the variation. Application to viral sequences resulted in increased observation of variant alleles in a biologically relevant context. Diff-Seq has the potential to increase the sensitivity and efficiency of high-throughput sequencing in the detection of variation. PMID:29361139

  20. Multiplex amplification of large sets of human exons.

    PubMed

    Porreca, Gregory J; Zhang, Kun; Li, Jin Billy; Xie, Bin; Austin, Derek; Vassallo, Sara L; LeProust, Emily M; Peck, Bill J; Emig, Christopher J; Dahl, Fredrik; Gao, Yuan; Church, George M; Shendure, Jay

    2007-11-01

    A new generation of technologies is poised to reduce DNA sequencing costs by several orders of magnitude. But our ability to fully leverage the power of these technologies is crippled by the absence of suitable 'front-end' methods for isolating complex subsets of a mammalian genome at a scale that matches the throughput at which these platforms will routinely operate. We show that targeting oligonucleotides released from programmable microarrays can be used to capture and amplify approximately 10,000 human exons in a single multiplex reaction. Additionally, we show integration of this protocol with ultra-high-throughput sequencing for targeted variation discovery. Although the multiplex capture reaction is highly specific, we found that nonuniform capture is a key issue that will need to be resolved by additional optimization. We anticipate that highly multiplexed methods for targeted amplification will enable the comprehensive resequencing of human exons at a fraction of the cost of whole-genome resequencing.

  1. Translational bioinformatics in the cloud: an affordable alternative

    PubMed Central

    2010-01-01

    With the continued exponential expansion of publicly available genomic data and access to low-cost, high-throughput molecular technologies for profiling patient populations, computational technologies and informatics are becoming vital considerations in genomic medicine. Although cloud computing technology is being heralded as a key enabling technology for the future of genomic research, available case studies are limited to applications in the domain of high-throughput sequence data analysis. The goal of this study was to evaluate the computational and economic characteristics of cloud computing in performing a large-scale data integration and analysis representative of research problems in genomic medicine. We find that the cloud-based analysis compares favorably in both performance and cost in comparison to a local computational cluster, suggesting that cloud computing technologies might be a viable resource for facilitating large-scale translational research in genomic medicine. PMID:20691073

  2. Ultrafast DNA sequencing on a microchip by a hybrid separation mechanism that gives 600 bases in 6.5 minutes.

    PubMed

    Fredlake, Christopher P; Hert, Daniel G; Kan, Cheuk-Wai; Chiesl, Thomas N; Root, Brian E; Forster, Ryan E; Barron, Annelise E

    2008-01-15

    To realize the immense potential of large-scale genomic sequencing after the completion of the second human genome (Venter's), the costs for the complete sequencing of additional genomes must be dramatically reduced. Among the technologies being developed to reduce sequencing costs, microchip electrophoresis is the only new technology ready to produce the long reads most suitable for the de novo sequencing and assembly of large and complex genomes. Compared with the current paradigm of capillary electrophoresis, microchip systems promise to reduce sequencing costs dramatically by increasing throughput, reducing reagent consumption, and integrating the many steps of the sequencing pipeline onto a single platform. Although capillary-based systems require approximately 70 min to deliver approximately 650 bases of contiguous sequence, we report sequencing up to 600 bases in just 6.5 min by microchip electrophoresis with a unique polymer matrix/adsorbed polymer wall coating combination. This represents a two-thirds reduction in sequencing time over any previously published chip sequencing result, with comparable read length and sequence quality. We hypothesize that these ultrafast long reads on chips can be achieved because the combined polymer system engenders a recently discovered "hybrid" mechanism of DNA electromigration, in which DNA molecules alternate rapidly between repeating through the intact polymer network and disrupting network entanglements to drag polymers through the solution, similar to dsDNA dynamics we observe in single-molecule DNA imaging studies. Most importantly, these results reveal the surprisingly powerful ability of microchip electrophoresis to provide ultrafast Sanger sequencing, which will translate to increased system throughput and reduced costs.

  3. Ultrafast DNA sequencing on a microchip by a hybrid separation mechanism that gives 600 bases in 6.5 minutes

    PubMed Central

    Fredlake, Christopher P.; Hert, Daniel G.; Kan, Cheuk-Wai; Chiesl, Thomas N.; Root, Brian E.; Forster, Ryan E.; Barron, Annelise E.

    2008-01-01

    To realize the immense potential of large-scale genomic sequencing after the completion of the second human genome (Venter's), the costs for the complete sequencing of additional genomes must be dramatically reduced. Among the technologies being developed to reduce sequencing costs, microchip electrophoresis is the only new technology ready to produce the long reads most suitable for the de novo sequencing and assembly of large and complex genomes. Compared with the current paradigm of capillary electrophoresis, microchip systems promise to reduce sequencing costs dramatically by increasing throughput, reducing reagent consumption, and integrating the many steps of the sequencing pipeline onto a single platform. Although capillary-based systems require ≈70 min to deliver ≈650 bases of contiguous sequence, we report sequencing up to 600 bases in just 6.5 min by microchip electrophoresis with a unique polymer matrix/adsorbed polymer wall coating combination. This represents a two-thirds reduction in sequencing time over any previously published chip sequencing result, with comparable read length and sequence quality. We hypothesize that these ultrafast long reads on chips can be achieved because the combined polymer system engenders a recently discovered “hybrid” mechanism of DNA electromigration, in which DNA molecules alternate rapidly between reptating through the intact polymer network and disrupting network entanglements to drag polymers through the solution, similar to dsDNA dynamics we observe in single-molecule DNA imaging studies. Most importantly, these results reveal the surprisingly powerful ability of microchip electrophoresis to provide ultrafast Sanger sequencing, which will translate to increased system throughput and reduced costs. PMID:18184818

  4. Genome-derived vaccines.

    PubMed

    De Groot, Anne S; Rappuoli, Rino

    2004-02-01

    Vaccine research entered a new era when the complete genome of a pathogenic bacterium was published in 1995. Since then, more than 97 bacterial pathogens have been sequenced and at least 110 additional projects are now in progress. Genome sequencing has also dramatically accelerated: high-throughput facilities can draft the sequence of an entire microbe (two to four megabases) in 1 to 2 days. Vaccine developers are using microarrays, immunoinformatics, proteomics and high-throughput immunology assays to reduce the truly unmanageable volume of information available in genome databases to a manageable size. Vaccines composed by novel antigens discovered from genome mining are already in clinical trials. Within 5 years we can expect to see a novel class of vaccines composed by genome-predicted, assembled and engineered T- and Bcell epitopes. This article addresses the convergence of three forces--microbial genome sequencing, computational immunology and new vaccine technologies--that are shifting genome mining for vaccines onto the forefront of immunology research.

  5. Accounting for Errors in Low Coverage High-Throughput Sequencing Data When Constructing Genetic Maps Using Biparental Outcrossed Populations

    PubMed Central

    Bilton, Timothy P.; Schofield, Matthew R.; Black, Michael A.; Chagné, David; Wilcox, Phillip L.; Dodds, Ken G.

    2018-01-01

    Next-generation sequencing is an efficient method that allows for substantially more markers than previous technologies, providing opportunities for building high-density genetic linkage maps, which facilitate the development of nonmodel species’ genomic assemblies and the investigation of their genes. However, constructing genetic maps using data generated via high-throughput sequencing technology (e.g., genotyping-by-sequencing) is complicated by the presence of sequencing errors and genotyping errors resulting from missing parental alleles due to low sequencing depth. If unaccounted for, these errors lead to inflated genetic maps. In addition, map construction in many species is performed using full-sibling family populations derived from the outcrossing of two individuals, where unknown parental phase and varying segregation types further complicate construction. We present a new methodology for modeling low coverage sequencing data in the construction of genetic linkage maps using full-sibling populations of diploid species, implemented in a package called GUSMap. Our model is based on the Lander–Green hidden Markov model but extended to account for errors present in sequencing data. We were able to obtain accurate estimates of the recombination fractions and overall map distance using GUSMap, while most existing mapping packages produced inflated genetic maps in the presence of errors. Our results demonstrate the feasibility of using low coverage sequencing data to produce genetic maps without requiring extensive filtering of potentially erroneous genotypes, provided that the associated errors are correctly accounted for in the model. PMID:29487138

  6. Accounting for Errors in Low Coverage High-Throughput Sequencing Data When Constructing Genetic Maps Using Biparental Outcrossed Populations.

    PubMed

    Bilton, Timothy P; Schofield, Matthew R; Black, Michael A; Chagné, David; Wilcox, Phillip L; Dodds, Ken G

    2018-05-01

    Next-generation sequencing is an efficient method that allows for substantially more markers than previous technologies, providing opportunities for building high-density genetic linkage maps, which facilitate the development of nonmodel species' genomic assemblies and the investigation of their genes. However, constructing genetic maps using data generated via high-throughput sequencing technology ( e.g. , genotyping-by-sequencing) is complicated by the presence of sequencing errors and genotyping errors resulting from missing parental alleles due to low sequencing depth. If unaccounted for, these errors lead to inflated genetic maps. In addition, map construction in many species is performed using full-sibling family populations derived from the outcrossing of two individuals, where unknown parental phase and varying segregation types further complicate construction. We present a new methodology for modeling low coverage sequencing data in the construction of genetic linkage maps using full-sibling populations of diploid species, implemented in a package called GUSMap. Our model is based on the Lander-Green hidden Markov model but extended to account for errors present in sequencing data. We were able to obtain accurate estimates of the recombination fractions and overall map distance using GUSMap, while most existing mapping packages produced inflated genetic maps in the presence of errors. Our results demonstrate the feasibility of using low coverage sequencing data to produce genetic maps without requiring extensive filtering of potentially erroneous genotypes, provided that the associated errors are correctly accounted for in the model. Copyright © 2018 Bilton et al.

  7. A computational genomics pipeline for prokaryotic sequencing projects.

    PubMed

    Kislyuk, Andrey O; Katz, Lee S; Agrawal, Sonia; Hagen, Matthew S; Conley, Andrew B; Jayaraman, Pushkala; Nelakuditi, Viswateja; Humphrey, Jay C; Sammons, Scott A; Govil, Dhwani; Mair, Raydel D; Tatti, Kathleen M; Tondella, Maria L; Harcourt, Brian H; Mayer, Leonard W; Jordan, I King

    2010-08-01

    New sequencing technologies have accelerated research on prokaryotic genomes and have made genome sequencing operations outside major genome sequencing centers routine. However, no off-the-shelf solution exists for the combined assembly, gene prediction, genome annotation and data presentation necessary to interpret sequencing data. The resulting requirement to invest significant resources into custom informatics support for genome sequencing projects remains a major impediment to the accessibility of high-throughput sequence data. We present a self-contained, automated high-throughput open source genome sequencing and computational genomics pipeline suitable for prokaryotic sequencing projects. The pipeline has been used at the Georgia Institute of Technology and the Centers for Disease Control and Prevention for the analysis of Neisseria meningitidis and Bordetella bronchiseptica genomes. The pipeline is capable of enhanced or manually assisted reference-based assembly using multiple assemblers and modes; gene predictor combining; and functional annotation of genes and gene products. Because every component of the pipeline is executed on a local machine with no need to access resources over the Internet, the pipeline is suitable for projects of a sensitive nature. Annotation of virulence-related features makes the pipeline particularly useful for projects working with pathogenic prokaryotes. The pipeline is licensed under the open-source GNU General Public License and available at the Georgia Tech Neisseria Base (http://nbase.biology.gatech.edu/). The pipeline is implemented with a combination of Perl, Bourne Shell and MySQL and is compatible with Linux and other Unix systems.

  8. Identification and removal of low-complexity sites in allele-specific analysis of ChIP-seq data.

    PubMed

    Waszak, Sebastian M; Kilpinen, Helena; Gschwind, Andreas R; Orioli, Andrea; Raghav, Sunil K; Witwicki, Robert M; Migliavacca, Eugenia; Yurovsky, Alisa; Lappalainen, Tuuli; Hernandez, Nouria; Reymond, Alexandre; Dermitzakis, Emmanouil T; Deplancke, Bart

    2014-01-15

    High-throughput sequencing technologies enable the genome-wide analysis of the impact of genetic variation on molecular phenotypes at unprecedented resolution. However, although powerful, these technologies can also introduce unexpected artifacts. We investigated the impact of library amplification bias on the identification of allele-specific (AS) molecular events from high-throughput sequencing data derived from chromatin immunoprecipitation assays (ChIP-seq). Putative AS DNA binding activity for RNA polymerase II was determined using ChIP-seq data derived from lymphoblastoid cell lines of two parent-daughter trios. We found that, at high-sequencing depth, many significant AS binding sites suffered from an amplification bias, as evidenced by a larger number of clonal reads representing one of the two alleles. To alleviate this bias, we devised an amplification bias detection strategy, which filters out sites with low read complexity and sites featuring a significant excess of clonal reads. This method will be useful for AS analyses involving ChIP-seq and other functional sequencing assays. The R package abs filter for library clonality simulations and detection of amplification-biased sites is available from http://updepla1srv1.epfl.ch/waszaks/absfilter

  9. Long Reads: their Purpose and Place.

    PubMed

    Pollard, Martin O; Gurdasani, Deepti; Mentzer, Alexander J; Porter, Tarryn; Sandhu, Manjinder S

    2018-05-14

    In recent years long read technologies have moved from being a niche and specialist field to a point of relative maturity likely to feature frequently in the genomic landscape. Analogous to next generation sequencing (NGS), the cost of sequencing using long read technologies has materially dropped whilst the instrument throughput continues to increase. Together these changes present the prospect of sequencing large numbers of individuals with the aim of fully characterising genomes at high resolution. In this article, we will endeavour to present an introduction to long read technologies showing: what long reads are; how they are distinct from short reads; why long reads are useful; and how they are being used. We will highlight the recent developments in this field, and the applications and potential of these technologies in medical research, and clinical diagnostics and therapeutics.

  10. DnaSAM: Software to perform neutrality testing for large datasets with complex null models.

    PubMed

    Eckert, Andrew J; Liechty, John D; Tearse, Brandon R; Pande, Barnaly; Neale, David B

    2010-05-01

    Patterns of DNA sequence polymorphisms can be used to understand the processes of demography and adaptation within natural populations. High-throughput generation of DNA sequence data has historically been the bottleneck with respect to data processing and experimental inference. Advances in marker technologies have largely solved this problem. Currently, the limiting step is computational, with most molecular population genetic software allowing a gene-by-gene analysis through a graphical user interface. An easy-to-use analysis program that allows both high-throughput processing of multiple sequence alignments along with the flexibility to simulate data under complex demographic scenarios is currently lacking. We introduce a new program, named DnaSAM, which allows high-throughput estimation of DNA sequence diversity and neutrality statistics from experimental data along with the ability to test those statistics via Monte Carlo coalescent simulations. These simulations are conducted using the ms program, which is able to incorporate several genetic parameters (e.g. recombination) and demographic scenarios (e.g. population bottlenecks). The output is a set of diversity and neutrality statistics with associated probability values under a user-specified null model that are stored in easy to manipulate text file. © 2009 Blackwell Publishing Ltd.

  11. Assaying gene function by growth competition experiment.

    PubMed

    Merritt, Joshua; Edwards, Jeremy S

    2004-07-01

    High-throughput screening and analysis is one of the emerging paradigms in biotechnology. In particular, high-throughput methods are essential in the field of functional genomics because of the vast amount of data generated in recent and ongoing genome sequencing efforts. In this report we discuss integrated functional analysis methodologies which incorporate both a growth competition component and a highly parallel assay used to quantify results of the growth competition. Several applications of the two most widely used technologies in the field, i.e., transposon mutagenesis and deletion strain library growth competition, and individual applications of several developing or less widely reported technologies are presented.

  12. Single-Molecule Electrical Random Resequencing of DNA and RNA

    NASA Astrophysics Data System (ADS)

    Ohshiro, Takahito; Matsubara, Kazuki; Tsutsui, Makusu; Furuhashi, Masayuki; Taniguchi, Masateru; Kawai, Tomoji

    2012-07-01

    Two paradigm shifts in DNA sequencing technologies--from bulk to single molecules and from optical to electrical detection--are expected to realize label-free, low-cost DNA sequencing that does not require PCR amplification. It will lead to development of high-throughput third-generation sequencing technologies for personalized medicine. Although nanopore devices have been proposed as third-generation DNA-sequencing devices, a significant milestone in these technologies has been attained by demonstrating a novel technique for resequencing DNA using electrical signals. Here we report single-molecule electrical resequencing of DNA and RNA using a hybrid method of identifying single-base molecules via tunneling currents and random sequencing. Our method reads sequences of nine types of DNA oligomers. The complete sequence of 5'-UGAGGUA-3' from the let-7 microRNA family was also identified by creating a composite of overlapping fragment sequences, which was randomly determined using tunneling current conducted by single-base molecules as they passed between a pair of nanoelectrodes.

  13. Environmental microbiology through the lens of high-throughput DNA sequencing: synopsis of current platforms and bioinformatics approaches.

    PubMed

    Logares, Ramiro; Haverkamp, Thomas H A; Kumar, Surendra; Lanzén, Anders; Nederbragt, Alexander J; Quince, Christopher; Kauserud, Håvard

    2012-10-01

    The incursion of High-Throughput Sequencing (HTS) in environmental microbiology brings unique opportunities and challenges. HTS now allows a high-resolution exploration of the vast taxonomic and metabolic diversity present in the microbial world, which can provide an exceptional insight on global ecosystem functioning, ecological processes and evolution. This exploration has also economic potential, as we will have access to the evolutionary innovation present in microbial metabolisms, which could be used for biotechnological development. HTS is also challenging the research community, and the current bottleneck is present in the data analysis side. At the moment, researchers are in a sequence data deluge, with sequencing throughput advancing faster than the computer power needed for data analysis. However, new tools and approaches are being developed constantly and the whole process could be depicted as a fast co-evolution between sequencing technology, informatics and microbiologists. In this work, we examine the most popular and recently commercialized HTS platforms as well as bioinformatics methods for data handling and analysis used in microbial metagenomics. This non-exhaustive review is intended to serve as a broad state-of-the-art guide to researchers expanding into this rapidly evolving field. Copyright © 2012 Elsevier B.V. All rights reserved.

  14. Bacterial Pathogens and Community Composition in Advanced Sewage Treatment Systems Revealed by Metagenomics Analysis Based on High-Throughput Sequencing

    PubMed Central

    Lu, Xin; Zhang, Xu-Xiang; Wang, Zhu; Huang, Kailong; Wang, Yuan; Liang, Weigang; Tan, Yunfei; Liu, Bo; Tang, Junying

    2015-01-01

    This study used 454 pyrosequencing, Illumina high-throughput sequencing and metagenomic analysis to investigate bacterial pathogens and their potential virulence in a sewage treatment plant (STP) applying both conventional and advanced treatment processes. Pyrosequencing and Illumina sequencing consistently demonstrated that Arcobacter genus occupied over 43.42% of total abundance of potential pathogens in the STP. At species level, potential pathogens Arcobacter butzleri, Aeromonas hydrophila and Klebsiella pneumonia dominated in raw sewage, which was also confirmed by quantitative real time PCR. Illumina sequencing also revealed prevalence of various types of pathogenicity islands and virulence proteins in the STP. Most of the potential pathogens and virulence factors were eliminated in the STP, and the removal efficiency mainly depended on oxidation ditch. Compared with sand filtration, magnetic resin seemed to have higher removals in most of the potential pathogens and virulence factors. However, presence of the residual A. butzleri in the final effluent still deserves more concerns. The findings indicate that sewage acts as an important source of environmental pathogens, but STPs can effectively control their spread in the environment. Joint use of the high-throughput sequencing technologies is considered a reliable method for deep and comprehensive overview of environmental bacterial virulence. PMID:25938416

  15. Improved Selection of Internal Transcribed Spacer-Specific Primers Enables Quantitative, Ultra-High-Throughput Profiling of Fungal Communities

    PubMed Central

    Bokulich, Nicholas A.

    2013-01-01

    Ultra-high-throughput sequencing (HTS) of fungal communities has been restricted by short read lengths and primer amplification bias, slowing the adoption of newer sequencing technologies to fungal community profiling. To address these issues, we evaluated the performance of several common internal transcribed spacer (ITS) primers and designed a novel primer set and work flow for simultaneous quantification and species-level interrogation of fungal consortia. Primer comparison and validation were predicted in silico and by sequencing a “mock community” of mixed yeast species to explore the challenges of amplicon length and amplification bias for reconstructing defined yeast community structures. The amplicon size and distribution of this primer set are smaller than for all preexisting ITS primer sets, maximizing sequencing coverage of hypervariable ITS domains by very-short-amplicon, high-throughput sequencing platforms. This feature also enables the optional integration of quantitative PCR (qPCR) directly into the HTS preparatory work flow by substituting qPCR with these primers for standard PCR, yielding quantification of individual community members. The complete work flow described here, utilizing any of the qualified primer sets evaluated, can rapidly profile mixed fungal communities and capably reconstructed well-characterized beer and wine fermentation fungal communities. PMID:23377949

  16. Miniaturization Technologies for Efficient Single-Cell Library Preparation for Next-Generation Sequencing.

    PubMed

    Mora-Castilla, Sergio; To, Cuong; Vaezeslami, Soheila; Morey, Robert; Srinivasan, Srimeenakshi; Dumdie, Jennifer N; Cook-Andersen, Heidi; Jenkins, Joby; Laurent, Louise C

    2016-08-01

    As the cost of next-generation sequencing has decreased, library preparation costs have become a more significant proportion of the total cost, especially for high-throughput applications such as single-cell RNA profiling. Here, we have applied novel technologies to scale down reaction volumes for library preparation. Our system consisted of in vitro differentiated human embryonic stem cells representing two stages of pancreatic differentiation, for which we prepared multiple biological and technical replicates. We used the Fluidigm (San Francisco, CA) C1 single-cell Autoprep System for single-cell complementary DNA (cDNA) generation and an enzyme-based tagmentation system (Nextera XT; Illumina, San Diego, CA) with a nanoliter liquid handler (mosquito HTS; TTP Labtech, Royston, UK) for library preparation, reducing the reaction volume down to 2 µL and using as little as 20 pg of input cDNA. The resulting sequencing data were bioinformatically analyzed and correlated among the different library reaction volumes. Our results showed that decreasing the reaction volume did not interfere with the quality or the reproducibility of the sequencing data, and the transcriptional data from the scaled-down libraries allowed us to distinguish between single cells. Thus, we have developed a process to enable efficient and cost-effective high-throughput single-cell transcriptome sequencing. © 2016 Society for Laboratory Automation and Screening.

  17. Future technologies for monitoring HIV drug resistance and cure.

    PubMed

    Parikh, Urvi M; McCormick, Kevin; van Zyl, Gert; Mellors, John W

    2017-03-01

    Sensitive, scalable and affordable assays are critically needed for monitoring the success of interventions for preventing, treating and attempting to cure HIV infection. This review evaluates current and emerging technologies that are applicable for both surveillance of HIV drug resistance (HIVDR) and characterization of HIV reservoirs that persist despite antiretroviral therapy and are obstacles to curing HIV infection. Next-generation sequencing (NGS) has the potential to be adapted into high-throughput, cost-efficient approaches for HIVDR surveillance and monitoring during continued scale-up of antiretroviral therapy and rollout of preexposure prophylaxis. Similarly, improvements in PCR and NGS are resulting in higher throughput single genome sequencing to detect intact proviruses and to characterize HIV integration sites and clonal expansions of infected cells. Current population genotyping methods for resistance monitoring are high cost and low throughput. NGS, combined with simpler sample collection and storage matrices (e.g. dried blood spots), has considerable potential to broaden global surveillance and patient monitoring for HIVDR. Recent adaptions of NGS to identify integration sites of HIV in the human genome and to characterize the integrated HIV proviruses are likely to facilitate investigations of the impact of experimental 'curative' interventions on HIV reservoirs.

  18. Brain Connectivity as a DNA Sequencing Problem

    NASA Astrophysics Data System (ADS)

    Zador, Anthony

    The mammalian cortex consists of millions or billions of neurons, each connected to thousands of other neurons. Traditional methods for determining the brain connectivity rely on microscopy to visualize neuronal connections, but such methods are slow, labor-intensive and often lack single neuron resolution. We have recently developed a new method, MAPseq, to recast the determination of brain wiring into a form that can exploit the tremendous recent advances in high-throughput DNA sequencing. DNA sequencing technology has outpaced even Moore's law, so that the cost of sequencing the human genome has dropped from a billion dollars in 2001 to below a thousand dollars today. MAPseq works by introducing random sequences of DNA-``barcodes''-to tag neurons uniquely. With MAPseq, we can determine the connectivity of over 50K single neurons in a single mouse cortex in about a week, an unprecedented throughput, ushering in the era of ``big data'' for brain wiring. We are now developing analytical tools and algorithms to make sense of these novel data sets.

  19. Genome-wide identification of conserved microRNA and their response to drought stress in Dongxiang wild rice (Oryza rufipogon Griff.).

    PubMed

    Zhang, Fantao; Luo, Xiangdong; Zhou, Yi; Xie, Jiankun

    2016-04-01

    To identify drought stress-responsive conserved microRNA (miRNA) from Dongxiang wild rice (Oryza rufipogon Griff., DXWR) on a genome-wide scale, high-throughput sequencing technology was used to sequence libraries of DXWR samples, treated with and without drought stress. 505 conserved miRNAs corresponding to 215 families were identified. 17 were significantly down-regulated and 16 were up-regulated under drought stress. Stem-loop qRT-PCR revealed the same expression patterns as high-throughput sequencing, suggesting the accuracy of the sequencing result was high. Potential target genes of the drought-responsive miRNA were predicted to be involved in diverse biological processes. Furthermore, 16 miRNA families were first identified to be involved in drought stress response from plants. These results present a comprehensive view of the conserved miRNA and their expression patterns under drought stress for DXWR, which will provide valuable information and sequence resources for future basis studies.

  20. Leveraging the Power of High Performance Computing for Next Generation Sequencing Data Analysis: Tricks and Twists from a High Throughput Exome Workflow

    PubMed Central

    Wonczak, Stephan; Thiele, Holger; Nieroda, Lech; Jabbari, Kamel; Borowski, Stefan; Sinha, Vishal; Gunia, Wilfried; Lang, Ulrich; Achter, Viktor; Nürnberg, Peter

    2015-01-01

    Next generation sequencing (NGS) has been a great success and is now a standard method of research in the life sciences. With this technology, dozens of whole genomes or hundreds of exomes can be sequenced in rather short time, producing huge amounts of data. Complex bioinformatics analyses are required to turn these data into scientific findings. In order to run these analyses fast, automated workflows implemented on high performance computers are state of the art. While providing sufficient compute power and storage to meet the NGS data challenge, high performance computing (HPC) systems require special care when utilized for high throughput processing. This is especially true if the HPC system is shared by different users. Here, stability, robustness and maintainability are as important for automated workflows as speed and throughput. To achieve all of these aims, dedicated solutions have to be developed. In this paper, we present the tricks and twists that we utilized in the implementation of our exome data processing workflow. It may serve as a guideline for other high throughput data analysis projects using a similar infrastructure. The code implementing our solutions is provided in the supporting information files. PMID:25942438

  1. Mining and Development of Novel SSR Markers Using Next Generation Sequencing (NGS) Data in Plants.

    PubMed

    Taheri, Sima; Lee Abdullah, Thohirah; Yusop, Mohd Rafii; Hanafi, Mohamed Musa; Sahebi, Mahbod; Azizi, Parisa; Shamshiri, Redmond Ramin

    2018-02-13

    Microsatellites, or simple sequence repeats (SSRs), are one of the most informative and multi-purpose genetic markers exploited in plant functional genomics. However, the discovery of SSRs and development using traditional methods are laborious, time-consuming, and costly. Recently, the availability of high-throughput sequencing technologies has enabled researchers to identify a substantial number of microsatellites at less cost and effort than traditional approaches. Illumina is a noteworthy transcriptome sequencing technology that is currently used in SSR marker development. Although 454 pyrosequencing datasets can be used for SSR development, this type of sequencing is no longer supported. This review aims to present an overview of the next generation sequencing, with a focus on the efficient use of de novo transcriptome sequencing (RNA-Seq) and related tools for mining and development of microsatellites in plants.

  2. From cheek swabs to consensus sequences: an A to Z protocol for high-throughput DNA sequencing of complete human mitochondrial genomes

    PubMed Central

    2014-01-01

    Background Next-generation DNA sequencing (NGS) technologies have made huge impacts in many fields of biological research, but especially in evolutionary biology. One area where NGS has shown potential is for high-throughput sequencing of complete mtDNA genomes (of humans and other animals). Despite the increasing use of NGS technologies and a better appreciation of their importance in answering biological questions, there remain significant obstacles to the successful implementation of NGS-based projects, especially for new users. Results Here we present an ‘A to Z’ protocol for obtaining complete human mitochondrial (mtDNA) genomes – from DNA extraction to consensus sequence. Although designed for use on humans, this protocol could also be used to sequence small, organellar genomes from other species, and also nuclear loci. This protocol includes DNA extraction, PCR amplification, fragmentation of PCR products, barcoding of fragments, sequencing using the 454 GS FLX platform, and a complete bioinformatics pipeline (primer removal, reference-based mapping, output of coverage plots and SNP calling). Conclusions All steps in this protocol are designed to be straightforward to implement, especially for researchers who are undertaking next-generation sequencing for the first time. The molecular steps are scalable to large numbers (hundreds) of individuals and all steps post-DNA extraction can be carried out in 96-well plate format. Also, the protocol has been assembled so that individual ‘modules’ can be swapped out to suit available resources. PMID:24460871

  3. OSG-GEM: Gene Expression Matrix Construction Using the Open Science Grid.

    PubMed

    Poehlman, William L; Rynge, Mats; Branton, Chris; Balamurugan, D; Feltus, Frank A

    2016-01-01

    High-throughput DNA sequencing technology has revolutionized the study of gene expression while introducing significant computational challenges for biologists. These computational challenges include access to sufficient computer hardware and functional data processing workflows. Both these challenges are addressed with our scalable, open-source Pegasus workflow for processing high-throughput DNA sequence datasets into a gene expression matrix (GEM) using computational resources available to U.S.-based researchers on the Open Science Grid (OSG). We describe the usage of the workflow (OSG-GEM), discuss workflow design, inspect performance data, and assess accuracy in mapping paired-end sequencing reads to a reference genome. A target OSG-GEM user is proficient with the Linux command line and possesses basic bioinformatics experience. The user may run this workflow directly on the OSG or adapt it to novel computing environments.

  4. OSG-GEM: Gene Expression Matrix Construction Using the Open Science Grid

    PubMed Central

    Poehlman, William L.; Rynge, Mats; Branton, Chris; Balamurugan, D.; Feltus, Frank A.

    2016-01-01

    High-throughput DNA sequencing technology has revolutionized the study of gene expression while introducing significant computational challenges for biologists. These computational challenges include access to sufficient computer hardware and functional data processing workflows. Both these challenges are addressed with our scalable, open-source Pegasus workflow for processing high-throughput DNA sequence datasets into a gene expression matrix (GEM) using computational resources available to U.S.-based researchers on the Open Science Grid (OSG). We describe the usage of the workflow (OSG-GEM), discuss workflow design, inspect performance data, and assess accuracy in mapping paired-end sequencing reads to a reference genome. A target OSG-GEM user is proficient with the Linux command line and possesses basic bioinformatics experience. The user may run this workflow directly on the OSG or adapt it to novel computing environments. PMID:27499617

  5. Perspectives on pathway perturbation: Focused research to enhance 3R objectives

    EPA Science Inventory

    In vitro high-throughput screening (HTS) and in silico technologies are emerging as 21st century tools for hazard identification. Computational methods that strategically examine cross-species conservation of protein sequence/structural information for chemical molecular targets ...

  6. Precision Medicine: Functional Advancements.

    PubMed

    Caskey, Thomas

    2018-01-29

    Precision medicine was conceptualized on the strength of genomic sequence analysis. High-throughput functional metrics have enhanced sequence interpretation and clinical precision. These technologies include metabolomics, magnetic resonance imaging, and I rhythm (cardiac monitoring), among others. These technologies are discussed and placed in clinical context for the medical specialties of internal medicine, pediatrics, obstetrics, and gynecology. Publications in these fields support the concept of a higher level of precision in identifying disease risk. Precise disease risk identification has the potential to enable intervention with greater specificity, resulting in disease prevention-an important goal of precision medicine.

  7. Sequence investigation of 34 forensic autosomal STRs with massively parallel sequencing.

    PubMed

    Zhang, Suhua; Niu, Yong; Bian, Yingnan; Dong, Rixia; Liu, Xiling; Bao, Yun; Jin, Chao; Zheng, Hancheng; Li, Chengtao

    2018-05-01

    STRs vary not only in the length of the repeat units and the number of repeats but also in the region with which they conform to an incremental repeat pattern. Massively parallel sequencing (MPS) offers new possibilities in the analysis of STRs since they can simultaneously sequence multiple targets in a single reaction and capture potential internal sequence variations. Here, we sequenced 34 STRs applied in the forensic community of China with a custom-designed panel. MPS performance were evaluated from sequencing reads analysis, concordance study and sensitivity testing. High coverage sequencing data were obtained to determine the constitute ratios and heterozygous balance. No actual inconsistent genotypes were observed between capillary electrophoresis (CE) and MPS, demonstrating the reliability of the panel and the MPS technology. With the sequencing data from the 200 investigated individuals, 346 and 418 alleles were obtained via CE and MPS technologies at the 34 STRs, indicating MPS technology provides higher discrimination than CE detection. The whole study demonstrated that STR genotyping with the custom panel and MPS technology has the potential not only to reveal length and sequence variations but also to satisfy the demands of high throughput and high multiplexing with acceptable sensitivity.

  8. OPTIMA: sensitive and accurate whole-genome alignment of error-prone genomic maps by combinatorial indexing and technology-agnostic statistical analysis.

    PubMed

    Verzotto, Davide; M Teo, Audrey S; Hillmer, Axel M; Nagarajan, Niranjan

    2016-01-01

    Resolution of complex repeat structures and rearrangements in the assembly and analysis of large eukaryotic genomes is often aided by a combination of high-throughput sequencing and genome-mapping technologies (for example, optical restriction mapping). In particular, mapping technologies can generate sparse maps of large DNA fragments (150 kilo base pairs (kbp) to 2 Mbp) and thus provide a unique source of information for disambiguating complex rearrangements in cancer genomes. Despite their utility, combining high-throughput sequencing and mapping technologies has been challenging because of the lack of efficient and sensitive map-alignment algorithms for robustly aligning error-prone maps to sequences. We introduce a novel seed-and-extend glocal (short for global-local) alignment method, OPTIMA (and a sliding-window extension for overlap alignment, OPTIMA-Overlap), which is the first to create indexes for continuous-valued mapping data while accounting for mapping errors. We also present a novel statistical model, agnostic with respect to technology-dependent error rates, for conservatively evaluating the significance of alignments without relying on expensive permutation-based tests. We show that OPTIMA and OPTIMA-Overlap outperform other state-of-the-art approaches (1.6-2 times more sensitive) and are more efficient (170-200 %) and precise in their alignments (nearly 99 % precision). These advantages are independent of the quality of the data, suggesting that our indexing approach and statistical evaluation are robust, provide improved sensitivity and guarantee high precision.

  9. Comparing microarrays and next-generation sequencing technologies for microbial ecology research.

    PubMed

    Roh, Seong Woon; Abell, Guy C J; Kim, Kyoung-Ho; Nam, Young-Do; Bae, Jin-Woo

    2010-06-01

    Recent advances in molecular biology have resulted in the application of DNA microarrays and next-generation sequencing (NGS) technologies to the field of microbial ecology. This review aims to examine the strengths and weaknesses of each of the methodologies, including depth and ease of analysis, throughput and cost-effectiveness. It also intends to highlight the optimal application of each of the individual technologies toward the study of a particular environment and identify potential synergies between the two main technologies, whereby both sample number and coverage can be maximized. We suggest that the efficient use of microarray and NGS technologies will allow researchers to advance the field of microbial ecology, and importantly, improve our understanding of the role of microorganisms in their various environments.

  10. DNA-encoded chemistry: enabling the deeper sampling of chemical space.

    PubMed

    Goodnow, Robert A; Dumelin, Christoph E; Keefe, Anthony D

    2017-02-01

    DNA-encoded chemical library technologies are increasingly being adopted in drug discovery for hit and lead generation. DNA-encoded chemistry enables the exploration of chemical spaces four to five orders of magnitude more deeply than is achievable by traditional high-throughput screening methods. Operation of this technology requires developing a range of capabilities including aqueous synthetic chemistry, building block acquisition, oligonucleotide conjugation, large-scale molecular biological transformations, selection methodologies, PCR, sequencing, sequence data analysis and the analysis of large chemistry spaces. This Review provides an overview of the development and applications of DNA-encoded chemistry, highlighting the challenges and future directions for the use of this technology.

  11. High Throughput Biological Analysis Using Multi-bit Magnetic Digital Planar Tags

    NASA Astrophysics Data System (ADS)

    Hong, B.; Jeong, J.-R.; Llandro, J.; Hayward, T. J.; Ionescu, A.; Trypiniotis, T.; Mitrelias, T.; Kopper, K. P.; Steinmuller, S. J.; Bland, J. A. C.

    2008-06-01

    We report a new magnetic labelling technology for high-throughput biomolecular identification and DNA sequencing. Planar multi-bit magnetic tags have been designed and fabricated, which comprise a magnetic barcode formed by an ensemble of micron-sized thin film Ni80Fe20 bars encapsulated in SU8. We show that by using a globally applied magnetic field and magneto-optical Kerr microscopy the magnetic elements in the multi-bit magnetic tags can be addressed individually and encoded/decoded remotely. The critical steps needed to show the feasibility of this technology are demonstrated, including fabrication, flow transport, remote writing and reading, and successful functionalization of the tags as verified by fluorescence detection. This approach is ideal for encoding information on tags in microfluidic flow or suspension, for such applications as labelling of chemical precursors during drug synthesis and combinatorial library-based high-throughput multiplexed bioassays.

  12. Preparation of next-generation sequencing libraries using Nextera™ technology: simultaneous DNA fragmentation and adaptor tagging by in vitro transposition.

    PubMed

    Caruccio, Nicholas

    2011-01-01

    DNA library preparation is a common entry point and bottleneck for next-generation sequencing. Current methods generally consist of distinct steps that often involve significant sample loss and hands-on time: DNA fragmentation, end-polishing, and adaptor-ligation. In vitro transposition with Nextera™ Transposomes simultaneously fragments and covalently tags the target DNA, thereby combining these three distinct steps into a single reaction. Platform-specific sequencing adaptors can be added, and the sample can be enriched and bar-coded using limited-cycle PCR to prepare di-tagged DNA fragment libraries. Nextera technology offers a streamlined, efficient, and high-throughput method for generating bar-coded libraries compatible with multiple next-generation sequencing platforms.

  13. Analysis, annotation, and profiling of the oat seed transcriptome

    USDA-ARS?s Scientific Manuscript database

    Novel high-throughput next generation sequencing (NGS) technologies are providing opportunities to explore genomes and transcriptomes in a cost-effective manner. To construct a gene expression atlas of developing oat (Avena sativa) seeds, two software packages specifically designed for RNA-seq (Trin...

  14. A world of opportunities with nanopore sequencing.

    PubMed

    Leggett, Richard M; Clark, Matthew D

    2017-11-28

    Oxford Nanopore Technologies' MinION sequencer was launched in pre-release form in 2014 and represents an exciting new sequencing paradigm. The device offers multi-kilobase reads and a streamed mode of operation that allows processing of reads as they are generated. Crucially, it is an extremely compact device that is powered from the USB port of a laptop computer, enabling it to be taken out of the lab and facilitating previously impossible in-field sequencing experiments to be undertaken. Many of the initial publications concerning the platform focused on provision of tools to access and analyse the new sequence formats and then demonstrating the assembly of microbial genomes. More recently, as throughput and accuracy have increased, it has been possible to begin work involving more complex genomes and metagenomes. With the release of the high-throughput GridION X5 and PromethION platforms, the sequencing of large genomes will become more cost efficient, and enable the leveraging of extremely long (>100 kb) reads for resolution of complex genomic structures. This review provides a brief overview of nanopore sequencing technology, describes the growing range of nanopore bioinformatics tools, and highlights some of the most influential publications that have emerged over the last 2 years. Finally, we look to the future and the potential the platform has to disrupt work in human, microbiome, and plant genomics. © The Author 2017. Published by Oxford University Press on behalf of the Society for Experimental Biology. All rights reserved. For permissions, please email: journals.permissions@oup.com.

  15. A computational genomics pipeline for prokaryotic sequencing projects

    PubMed Central

    Kislyuk, Andrey O.; Katz, Lee S.; Agrawal, Sonia; Hagen, Matthew S.; Conley, Andrew B.; Jayaraman, Pushkala; Nelakuditi, Viswateja; Humphrey, Jay C.; Sammons, Scott A.; Govil, Dhwani; Mair, Raydel D.; Tatti, Kathleen M.; Tondella, Maria L.; Harcourt, Brian H.; Mayer, Leonard W.; Jordan, I. King

    2010-01-01

    Motivation: New sequencing technologies have accelerated research on prokaryotic genomes and have made genome sequencing operations outside major genome sequencing centers routine. However, no off-the-shelf solution exists for the combined assembly, gene prediction, genome annotation and data presentation necessary to interpret sequencing data. The resulting requirement to invest significant resources into custom informatics support for genome sequencing projects remains a major impediment to the accessibility of high-throughput sequence data. Results: We present a self-contained, automated high-throughput open source genome sequencing and computational genomics pipeline suitable for prokaryotic sequencing projects. The pipeline has been used at the Georgia Institute of Technology and the Centers for Disease Control and Prevention for the analysis of Neisseria meningitidis and Bordetella bronchiseptica genomes. The pipeline is capable of enhanced or manually assisted reference-based assembly using multiple assemblers and modes; gene predictor combining; and functional annotation of genes and gene products. Because every component of the pipeline is executed on a local machine with no need to access resources over the Internet, the pipeline is suitable for projects of a sensitive nature. Annotation of virulence-related features makes the pipeline particularly useful for projects working with pathogenic prokaryotes. Availability and implementation: The pipeline is licensed under the open-source GNU General Public License and available at the Georgia Tech Neisseria Base (http://nbase.biology.gatech.edu/). The pipeline is implemented with a combination of Perl, Bourne Shell and MySQL and is compatible with Linux and other Unix systems. Contact: king.jordan@biology.gatech.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:20519285

  16. Novel selection methods for DNA-encoded chemical libraries

    PubMed Central

    Chan, Alix I.; McGregor, Lynn M.; Liu, David R.

    2015-01-01

    Driven by the need for new compounds to serve as biological probes and leads for therapeutic development and the growing accessibility of DNA technologies including high-throughput sequencing, many academic and industrial groups have begun to use DNA-encoded chemical libraries as a source of bioactive small molecules. In this review, we describe the technologies that have enabled the selection of compounds with desired activities from these libraries. These methods exploit the sensitivity of in vitro selection coupled with DNA amplification to overcome some of the limitations and costs associated with conventional screening methods. In addition, we highlight newer techniques with the potential to be applied to the high-throughput evaluation of DNA-encoded chemical libraries. PMID:25723146

  17. [Review of Second Generation Sequencing and Its Application in Forensic Genetics].

    PubMed

    Zhang, S H; Bian, Y N; Zhao, Q; Li, C T

    2016-08-01

    The rapid development of second generation sequencing (SGS) within the past few years has led to the increasement of data throughput and read length while at the same time brought down substantially the sequencing cost. This made new breakthrough in the area of biology and ushered the forensic genetics into a new era. Based on the history of sequencing application in forensic genetics, this paper reviews the importance of sequencing technologies for genetic marker detection. The application status and potential of SGS in forensic genetics are discussed based on the already explored SGS platforms of Roche, Illumina and Life Technologies. With these platforms, DNA markers (SNP, STR), RNA markers (mRNA, microRNA) and whole mtDNA can be sequenced. However, development and validation of application kits, maturation of analysis software, connection to the existing databases and the possible ethical issues occurred with big data will be the key factors that determine whether this technology can substitute or supplement PCR-CE, the mature technology, and be widely used for cases detection. Copyright© by the Editorial Department of Journal of Forensic Medicine.

  18. Microbial community analysis of the hypersaline water of the Dead Sea using high-throughput amplicon sequencing.

    PubMed

    Jacob, Jacob H; Hussein, Emad I; Shakhatreh, Muhamad Ali K; Cornelison, Christopher T

    2017-10-01

    Amplicon sequencing using next-generation technology (bTEFAP ® ) has been utilized in describing the diversity of Dead Sea microbiota. The investigated area is a well-known salt lake in the western part of Jordan found in the lowest geographical location in the world (more than 420 m below sea level) and characterized by extreme salinity (approximately, 34%) in addition to other extreme conditions (low pH, unique ionic composition different from sea water). DNA was extracted from Dead Sea water. A total of 314,310 small subunit RNA (SSU rRNA) sequences were parsed, and 288,452 sequences were then clustered. For alpha diversity analysis, sample was rarefied to 3,000 sequences. The Shannon-Wiener index curve plot reached a plateau at approximately 3,000 sequences indicating that sequencing depth was sufficient to capture the full scope of microbial diversity. Archaea was found to be dominating the sequences (52%), whereas Bacteria constitute 45% of the sequences. Altogether, prokaryotic sequences (which constitute 97% of all sequences) were found to predominate. The findings expand on previous studies by using high-throughput amplicon sequencing to describe the microbial community in an environment which in recent years has been shown to hide some interesting diversity. © 2017 The Authors. MicrobiologyOpen published by John Wiley & Sons Ltd.

  19. DDBJ read annotation pipeline: a cloud computing-based pipeline for high-throughput analysis of next-generation sequencing data.

    PubMed

    Nagasaki, Hideki; Mochizuki, Takako; Kodama, Yuichi; Saruhashi, Satoshi; Morizaki, Shota; Sugawara, Hideaki; Ohyanagi, Hajime; Kurata, Nori; Okubo, Kousaku; Takagi, Toshihisa; Kaminuma, Eli; Nakamura, Yasukazu

    2013-08-01

    High-performance next-generation sequencing (NGS) technologies are advancing genomics and molecular biological research. However, the immense amount of sequence data requires computational skills and suitable hardware resources that are a challenge to molecular biologists. The DNA Data Bank of Japan (DDBJ) of the National Institute of Genetics (NIG) has initiated a cloud computing-based analytical pipeline, the DDBJ Read Annotation Pipeline (DDBJ Pipeline), for a high-throughput annotation of NGS reads. The DDBJ Pipeline offers a user-friendly graphical web interface and processes massive NGS datasets using decentralized processing by NIG supercomputers currently free of charge. The proposed pipeline consists of two analysis components: basic analysis for reference genome mapping and de novo assembly and subsequent high-level analysis of structural and functional annotations. Users may smoothly switch between the two components in the pipeline, facilitating web-based operations on a supercomputer for high-throughput data analysis. Moreover, public NGS reads of the DDBJ Sequence Read Archive located on the same supercomputer can be imported into the pipeline through the input of only an accession number. This proposed pipeline will facilitate research by utilizing unified analytical workflows applied to the NGS data. The DDBJ Pipeline is accessible at http://p.ddbj.nig.ac.jp/.

  20. DDBJ Read Annotation Pipeline: A Cloud Computing-Based Pipeline for High-Throughput Analysis of Next-Generation Sequencing Data

    PubMed Central

    Nagasaki, Hideki; Mochizuki, Takako; Kodama, Yuichi; Saruhashi, Satoshi; Morizaki, Shota; Sugawara, Hideaki; Ohyanagi, Hajime; Kurata, Nori; Okubo, Kousaku; Takagi, Toshihisa; Kaminuma, Eli; Nakamura, Yasukazu

    2013-01-01

    High-performance next-generation sequencing (NGS) technologies are advancing genomics and molecular biological research. However, the immense amount of sequence data requires computational skills and suitable hardware resources that are a challenge to molecular biologists. The DNA Data Bank of Japan (DDBJ) of the National Institute of Genetics (NIG) has initiated a cloud computing-based analytical pipeline, the DDBJ Read Annotation Pipeline (DDBJ Pipeline), for a high-throughput annotation of NGS reads. The DDBJ Pipeline offers a user-friendly graphical web interface and processes massive NGS datasets using decentralized processing by NIG supercomputers currently free of charge. The proposed pipeline consists of two analysis components: basic analysis for reference genome mapping and de novo assembly and subsequent high-level analysis of structural and functional annotations. Users may smoothly switch between the two components in the pipeline, facilitating web-based operations on a supercomputer for high-throughput data analysis. Moreover, public NGS reads of the DDBJ Sequence Read Archive located on the same supercomputer can be imported into the pipeline through the input of only an accession number. This proposed pipeline will facilitate research by utilizing unified analytical workflows applied to the NGS data. The DDBJ Pipeline is accessible at http://p.ddbj.nig.ac.jp/. PMID:23657089

  1. High Throughput Sequencing for Detection of Foodborne Pathogens

    PubMed Central

    Sekse, Camilla; Holst-Jensen, Arne; Dobrindt, Ulrich; Johannessen, Gro S.; Li, Weihua; Spilsberg, Bjørn; Shi, Jianxin

    2017-01-01

    High-throughput sequencing (HTS) is becoming the state-of-the-art technology for typing of microbial isolates, especially in clinical samples. Yet, its application is still in its infancy for monitoring and outbreak investigations of foods. Here we review the published literature, covering not only bacterial but also viral and Eukaryote food pathogens, to assess the status and potential of HTS implementation to inform stakeholders, improve food safety and reduce outbreak impacts. The developments in sequencing technology and bioinformatics have outpaced the capacity to analyze and interpret the sequence data. The influence of sample processing, nucleic acid extraction and purification, harmonized protocols for generation and interpretation of data, and properly annotated and curated reference databases including non-pathogenic “natural” strains are other major obstacles to the realization of the full potential of HTS in analytical food surveillance, epidemiological and outbreak investigations, and in complementing preventive approaches for the control and management of foodborne pathogens. Despite significant obstacles, the achieved progress in capacity and broadening of the application range over the last decade is impressive and unprecedented, as illustrated with the chosen examples from the literature. Large consortia, often with broad international participation, are making coordinated efforts to cope with many of the mentioned obstacles. Further rapid progress can therefore be prospected for the next decade. PMID:29104564

  2. How Can We Use Bioinformatics to Predict Which Agents Will Cause Birth Defects?

    EPA Science Inventory

    The availability of genomic sequences from a growing number of human and model organisms has provided an explosion of data, information, and knowledge regarding biological systems and disease processes. High-throughput technologies such as DNA and protein microarray biochips are ...

  3. Germplasm Management in the Post-genomics Era-a case study with lettuce

    USDA-ARS?s Scientific Manuscript database

    High-throughput genotyping platforms and next-generation sequencing technologies revolutionized our ways in germplasm characterization. In collaboration with UC Davis Genome Center, we completed a project of genotyping the entire cultivated lettuce (Lactuca sativa L.) collection of 1,066 accessions ...

  4. Fractal-like Distributions over the Rational Numbers in High-throughput Biological and Clinical Data

    NASA Astrophysics Data System (ADS)

    Trifonov, Vladimir; Pasqualucci, Laura; Dalla-Favera, Riccardo; Rabadan, Raul

    2011-12-01

    Recent developments in extracting and processing biological and clinical data are allowing quantitative approaches to studying living systems. High-throughput sequencing (HTS), expression profiles, proteomics, and electronic health records (EHR) are some examples of such technologies. Extracting meaningful information from those technologies requires careful analysis of the large volumes of data they produce. In this note, we present a set of fractal-like distributions that commonly appear in the analysis of such data. The first set of examples are drawn from a HTS experiment. Here, the distributions appear as part of the evaluation of the error rate of the sequencing and the identification of tumorogenic genomic alterations. The other examples are obtained from risk factor evaluation and analysis of relative disease prevalence and co-mordbidity as these appear in EHR. The distributions are also relevant to identification of subclonal populations in tumors and the study of quasi-species and intrahost diversity of viral populations.

  5. The technology and biology of single-cell RNA sequencing.

    PubMed

    Kolodziejczyk, Aleksandra A; Kim, Jong Kyoung; Svensson, Valentine; Marioni, John C; Teichmann, Sarah A

    2015-05-21

    The differences between individual cells can have profound functional consequences, in both unicellular and multicellular organisms. Recently developed single-cell mRNA-sequencing methods enable unbiased, high-throughput, and high-resolution transcriptomic analysis of individual cells. This provides an additional dimension to transcriptomic information relative to traditional methods that profile bulk populations of cells. Already, single-cell RNA-sequencing methods have revealed new biology in terms of the composition of tissues, the dynamics of transcription, and the regulatory relationships between genes. Rapid technological developments at the level of cell capture, phenotyping, molecular biology, and bioinformatics promise an exciting future with numerous biological and medical applications. Copyright © 2015 Elsevier Inc. All rights reserved.

  6. Empirical analysis of RNA robustness and evolution using high-throughput sequencing of ribozyme reactions.

    PubMed

    Hayden, Eric J

    2016-08-15

    RNA molecules provide a realistic but tractable model of a genotype to phenotype relationship. This relationship has been extensively investigated computationally using secondary structure prediction algorithms. Enzymatic RNA molecules, or ribozymes, offer access to genotypic and phenotypic information in the laboratory. Advancements in high-throughput sequencing technologies have enabled the analysis of sequences in the lab that now rivals what can be accomplished computationally. This has motivated a resurgence of in vitro selection experiments and opened new doors for the analysis of the distribution of RNA functions in genotype space. A body of computational experiments has investigated the persistence of specific RNA structures despite changes in the primary sequence, and how this mutational robustness can promote adaptations. This article summarizes recent approaches that were designed to investigate the role of mutational robustness during the evolution of RNA molecules in the laboratory, and presents theoretical motivations, experimental methods and approaches to data analysis. Copyright © 2016 Elsevier Inc. All rights reserved.

  7. Reference-free compression of high throughput sequencing data with a probabilistic de Bruijn graph.

    PubMed

    Benoit, Gaëtan; Lemaitre, Claire; Lavenier, Dominique; Drezen, Erwan; Dayris, Thibault; Uricaru, Raluca; Rizk, Guillaume

    2015-09-14

    Data volumes generated by next-generation sequencing (NGS) technologies is now a major concern for both data storage and transmission. This triggered the need for more efficient methods than general purpose compression tools, such as the widely used gzip method. We present a novel reference-free method meant to compress data issued from high throughput sequencing technologies. Our approach, implemented in the software LEON, employs techniques derived from existing assembly principles. The method is based on a reference probabilistic de Bruijn Graph, built de novo from the set of reads and stored in a Bloom filter. Each read is encoded as a path in this graph, by memorizing an anchoring kmer and a list of bifurcations. The same probabilistic de Bruijn Graph is used to perform a lossy transformation of the quality scores, which allows to obtain higher compression rates without losing pertinent information for downstream analyses. LEON was run on various real sequencing datasets (whole genome, exome, RNA-seq or metagenomics). In all cases, LEON showed higher overall compression ratios than state-of-the-art compression software. On a C. elegans whole genome sequencing dataset, LEON divided the original file size by more than 20. LEON is an open source software, distributed under GNU affero GPL License, available for download at http://gatb.inria.fr/software/leon/.

  8. Next-generation sequencing: the future of molecular genetics in poultry production and food safety.

    PubMed

    Diaz-Sanchez, S; Hanning, I; Pendleton, Sean; D'Souza, Doris

    2013-02-01

    The era of molecular biology and automation of the Sanger chain-terminator sequencing method has led to discovery and advances in diagnostics and biotechnology. The Sanger methodology dominated research for over 2 decades, leading to significant accomplishments and technological improvements in DNA sequencing. Next-generation high-throughput sequencing (HT-NGS) technologies were developed subsequently to overcome the limitations of this first generation technology that include higher speed, less labor, and lowered cost. Various platforms developed include sequencing-by-synthesis 454 Life Sciences, Illumina (Solexa) sequencing, SOLiD sequencing (among others), and the Ion Torrent semiconductor sequencing technologies that use different detection principles. As technology advances, progress made toward third generation sequencing technologies are being reported, which include Nanopore Sequencing and real-time monitoring of PCR activity through fluorescent resonant energy transfer. The advantages of these technologies include scalability, simplicity, with increasing DNA polymerase performance and yields, being less error prone, and even more economically feasible with the eventual goal of obtaining real-time results. These technologies can be directly applied to improve poultry production and enhance food safety. For example, sequence-based (determination of the gut microbial community, genes for metabolic pathways, or presence of plasmids) and function-based (screening for function such as antibiotic resistance, or vitamin production) metagenomic analysis can be carried out. Gut microbialflora/communities of poultry can be sequenced to determine the changes that affect health and disease along with efficacy of methods to control pathogenic growth. Thus, the purpose of this review is to provide an overview of the principles of these current technologies and their potential application to improve poultry production and food safety as well as public health.

  9. Nucleic Acids for Ultra-Sensitive Protein Detection

    PubMed Central

    Janssen, Kris P. F.; Knez, Karel; Spasic, Dragana; Lammertyn, Jeroen

    2013-01-01

    Major advancements in molecular biology and clinical diagnostics cannot be brought about strictly through the use of genomics based methods. Improved methods for protein detection and proteomic screening are an absolute necessity to complement to wealth of information offered by novel, high-throughput sequencing technologies. Only then will it be possible to advance insights into clinical processes and to characterize the importance of specific protein biomarkers for disease detection or the realization of “personalized medicine”. Currently however, large-scale proteomic information is still not as easily obtained as its genomic counterpart, mainly because traditional antibody-based technologies struggle to meet the stringent sensitivity and throughput requirements that are required whereas mass-spectrometry based methods might be burdened by significant costs involved. However, recent years have seen the development of new biodetection strategies linking nucleic acids with existing antibody technology or replacing antibodies with oligonucleotide recognition elements altogether. These advancements have unlocked many new strategies to lower detection limits and dramatically increase throughput of protein detection assays. In this review, an overview of these new strategies will be given. PMID:23337338

  10. The use of museum specimens with high-throughput DNA sequencers

    PubMed Central

    Burrell, Andrew S.; Disotell, Todd R.; Bergey, Christina M.

    2015-01-01

    Natural history collections have long been used by morphologists, anatomists, and taxonomists to probe the evolutionary process and describe biological diversity. These biological archives also offer great opportunities for genetic research in taxonomy, conservation, systematics, and population biology. They allow assays of past populations, including those of extinct species, giving context to present patterns of genetic variation and direct measures of evolutionary processes. Despite this potential, museum specimens are difficult to work with because natural postmortem processes and preservation methods fragment and damage DNA. These problems have restricted geneticists’ ability to use natural history collections primarily by limiting how much of the genome can be surveyed. Recent advances in DNA sequencing technology, however, have radically changed this, making truly genomic studies from museum specimens possible. We review the opportunities and drawbacks of the use of museum specimens, and suggest how to best execute projects when incorporating such samples. Several high-throughput (HT) sequencing methodologies, including whole genome shotgun sequencing, sequence capture, and restriction digests (demonstrated here), can be used with archived biomaterials. PMID:25532801

  11. PLAN: a web platform for automating high-throughput BLAST searches and for managing and mining results.

    PubMed

    He, Ji; Dai, Xinbin; Zhao, Xuechun

    2007-02-09

    BLAST searches are widely used for sequence alignment. The search results are commonly adopted for various functional and comparative genomics tasks such as annotating unknown sequences, investigating gene models and comparing two sequence sets. Advances in sequencing technologies pose challenges for high-throughput analysis of large-scale sequence data. A number of programs and hardware solutions exist for efficient BLAST searching, but there is a lack of generic software solutions for mining and personalized management of the results. Systematically reviewing the results and identifying information of interest remains tedious and time-consuming. Personal BLAST Navigator (PLAN) is a versatile web platform that helps users to carry out various personalized pre- and post-BLAST tasks, including: (1) query and target sequence database management, (2) automated high-throughput BLAST searching, (3) indexing and searching of results, (4) filtering results online, (5) managing results of personal interest in favorite categories, (6) automated sequence annotation (such as NCBI NR and ontology-based annotation). PLAN integrates, by default, the Decypher hardware-based BLAST solution provided by Active Motif Inc. with a greatly improved efficiency over conventional BLAST software. BLAST results are visualized by spreadsheets and graphs and are full-text searchable. BLAST results and sequence annotations can be exported, in part or in full, in various formats including Microsoft Excel and FASTA. Sequences and BLAST results are organized in projects, the data publication levels of which are controlled by the registered project owners. In addition, all analytical functions are provided to public users without registration. PLAN has proved a valuable addition to the community for automated high-throughput BLAST searches, and, more importantly, for knowledge discovery, management and sharing based on sequence alignment results. The PLAN web interface is platform-independent, easily configurable and capable of comprehensive expansion, and user-intuitive. PLAN is freely available to academic users at http://bioinfo.noble.org/plan/. The source code for local deployment is provided under free license. Full support on system utilization, installation, configuration and customization are provided to academic users.

  12. PLAN: a web platform for automating high-throughput BLAST searches and for managing and mining results

    PubMed Central

    He, Ji; Dai, Xinbin; Zhao, Xuechun

    2007-01-01

    Background BLAST searches are widely used for sequence alignment. The search results are commonly adopted for various functional and comparative genomics tasks such as annotating unknown sequences, investigating gene models and comparing two sequence sets. Advances in sequencing technologies pose challenges for high-throughput analysis of large-scale sequence data. A number of programs and hardware solutions exist for efficient BLAST searching, but there is a lack of generic software solutions for mining and personalized management of the results. Systematically reviewing the results and identifying information of interest remains tedious and time-consuming. Results Personal BLAST Navigator (PLAN) is a versatile web platform that helps users to carry out various personalized pre- and post-BLAST tasks, including: (1) query and target sequence database management, (2) automated high-throughput BLAST searching, (3) indexing and searching of results, (4) filtering results online, (5) managing results of personal interest in favorite categories, (6) automated sequence annotation (such as NCBI NR and ontology-based annotation). PLAN integrates, by default, the Decypher hardware-based BLAST solution provided by Active Motif Inc. with a greatly improved efficiency over conventional BLAST software. BLAST results are visualized by spreadsheets and graphs and are full-text searchable. BLAST results and sequence annotations can be exported, in part or in full, in various formats including Microsoft Excel and FASTA. Sequences and BLAST results are organized in projects, the data publication levels of which are controlled by the registered project owners. In addition, all analytical functions are provided to public users without registration. Conclusion PLAN has proved a valuable addition to the community for automated high-throughput BLAST searches, and, more importantly, for knowledge discovery, management and sharing based on sequence alignment results. The PLAN web interface is platform-independent, easily configurable and capable of comprehensive expansion, and user-intuitive. PLAN is freely available to academic users at . The source code for local deployment is provided under free license. Full support on system utilization, installation, configuration and customization are provided to academic users. PMID:17291345

  13. Deep Sequencing in Infectious Diseases: Immune and Pathogen Repertoires for the Improvement of Patient Outcomes.

    PubMed

    Burkholder, William F; Newell, Evan W; Poidinger, Michael; Chen, Swaine; Fink, Katja

    2017-01-01

    The inaugural workshop "Deep Sequencing in Infectious Diseases: Immune and Pathogen Repertoires for the Improvement of Patient Outcomes" was held in Singapore on 13-14 October 2016. The aim of the workshop was to discuss the latest trends in using high-throughput sequencing, bioinformatics, and allied technologies to analyze immune and pathogen repertoires and their interplay within the host, bringing together key international players in the field and Singapore-based researchers and clinician-scientists. The focus was in particular on the application of these technologies for the improvement of patient diagnosis, prognosis and treatment, and for other broad public health outcomes. The presentations by scientists and clinicians showed the potential of deep sequencing technology to capture the coevolution of adaptive immunity and pathogens. For clinical applications, some key challenges remain, such as the long turnaround time and relatively high cost of deep sequencing for pathogen identification and characterization and the lack of international standardization in immune repertoire analysis.

  14. Deep Sequencing in Infectious Diseases: Immune and Pathogen Repertoires for the Improvement of Patient Outcomes

    PubMed Central

    Burkholder, William F.; Newell, Evan W.; Poidinger, Michael; Chen, Swaine; Fink, Katja

    2017-01-01

    The inaugural workshop “Deep Sequencing in Infectious Diseases: Immune and Pathogen Repertoires for the Improvement of Patient Outcomes” was held in Singapore on 13–14 October 2016. The aim of the workshop was to discuss the latest trends in using high-throughput sequencing, bioinformatics, and allied technologies to analyze immune and pathogen repertoires and their interplay within the host, bringing together key international players in the field and Singapore-based researchers and clinician-scientists. The focus was in particular on the application of these technologies for the improvement of patient diagnosis, prognosis and treatment, and for other broad public health outcomes. The presentations by scientists and clinicians showed the potential of deep sequencing technology to capture the coevolution of adaptive immunity and pathogens. For clinical applications, some key challenges remain, such as the long turnaround time and relatively high cost of deep sequencing for pathogen identification and characterization and the lack of international standardization in immune repertoire analysis. PMID:28620372

  15. PANGEA: pipeline for analysis of next generation amplicons

    PubMed Central

    Giongo, Adriana; Crabb, David B; Davis-Richardson, Austin G; Chauliac, Diane; Mobberley, Jennifer M; Gano, Kelsey A; Mukherjee, Nabanita; Casella, George; Roesch, Luiz FW; Walts, Brandon; Riva, Alberto; King, Gary; Triplett, Eric W

    2010-01-01

    High-throughput DNA sequencing can identify organisms and describe population structures in many environmental and clinical samples. Current technologies generate millions of reads in a single run, requiring extensive computational strategies to organize, analyze and interpret those sequences. A series of bioinformatics tools for high-throughput sequencing analysis, including preprocessing, clustering, database matching and classification, have been compiled into a pipeline called PANGEA. The PANGEA pipeline was written in Perl and can be run on Mac OSX, Windows or Linux. With PANGEA, sequences obtained directly from the sequencer can be processed quickly to provide the files needed for sequence identification by BLAST and for comparison of microbial communities. Two different sets of bacterial 16S rRNA sequences were used to show the efficiency of this workflow. The first set of 16S rRNA sequences is derived from various soils from Hawaii Volcanoes National Park. The second set is derived from stool samples collected from diabetes-resistant and diabetes-prone rats. The workflow described here allows the investigator to quickly assess libraries of sequences on personal computers with customized databases. PANGEA is provided for users as individual scripts for each step in the process or as a single script where all processes, except the χ2 step, are joined into one program called the ‘backbone’. PMID:20182525

  16. PANGEA: pipeline for analysis of next generation amplicons.

    PubMed

    Giongo, Adriana; Crabb, David B; Davis-Richardson, Austin G; Chauliac, Diane; Mobberley, Jennifer M; Gano, Kelsey A; Mukherjee, Nabanita; Casella, George; Roesch, Luiz F W; Walts, Brandon; Riva, Alberto; King, Gary; Triplett, Eric W

    2010-07-01

    High-throughput DNA sequencing can identify organisms and describe population structures in many environmental and clinical samples. Current technologies generate millions of reads in a single run, requiring extensive computational strategies to organize, analyze and interpret those sequences. A series of bioinformatics tools for high-throughput sequencing analysis, including pre-processing, clustering, database matching and classification, have been compiled into a pipeline called PANGEA. The PANGEA pipeline was written in Perl and can be run on Mac OSX, Windows or Linux. With PANGEA, sequences obtained directly from the sequencer can be processed quickly to provide the files needed for sequence identification by BLAST and for comparison of microbial communities. Two different sets of bacterial 16S rRNA sequences were used to show the efficiency of this workflow. The first set of 16S rRNA sequences is derived from various soils from Hawaii Volcanoes National Park. The second set is derived from stool samples collected from diabetes-resistant and diabetes-prone rats. The workflow described here allows the investigator to quickly assess libraries of sequences on personal computers with customized databases. PANGEA is provided for users as individual scripts for each step in the process or as a single script where all processes, except the chi(2) step, are joined into one program called the 'backbone'.

  17. Novel selection methods for DNA-encoded chemical libraries.

    PubMed

    Chan, Alix I; McGregor, Lynn M; Liu, David R

    2015-06-01

    Driven by the need for new compounds to serve as biological probes and leads for therapeutic development and the growing accessibility of DNA technologies including high-throughput sequencing, many academic and industrial groups have begun to use DNA-encoded chemical libraries as a source of bioactive small molecules. In this review, we describe the technologies that have enabled the selection of compounds with desired activities from these libraries. These methods exploit the sensitivity of in vitro selection coupled with DNA amplification to overcome some of the limitations and costs associated with conventional screening methods. In addition, we highlight newer techniques with the potential to be applied to the high-throughput evaluation of DNA-encoded chemical libraries. Copyright © 2015 Elsevier Ltd. All rights reserved.

  18. Transcriptome analysis of Pseudomonas syringae identifies new genes, ncRNAs, and antisense activity

    USDA-ARS?s Scientific Manuscript database

    To fully understand how bacteria respond to their environment, it is essential to assess genome-wide transcriptional activity. New high throughput sequencing technologies make it possible to query the transcriptome of an organism in an efficient unbiased manner. We applied a strand-specific method t...

  19. SNP discovery by high-throughput sequencing in soybean

    PubMed Central

    2010-01-01

    Background With the advance of new massively parallel genotyping technologies, quantitative trait loci (QTL) fine mapping and map-based cloning become more achievable in identifying genes for important and complex traits. Development of high-density genetic markers in the QTL regions of specific mapping populations is essential for fine-mapping and map-based cloning of economically important genes. Single nucleotide polymorphisms (SNPs) are the most abundant form of genetic variation existing between any diverse genotypes that are usually used for QTL mapping studies. The massively parallel sequencing technologies (Roche GS/454, Illumina GA/Solexa, and ABI/SOLiD), have been widely applied to identify genome-wide sequence variations. However, it is still remains unclear whether sequence data at a low sequencing depth are enough to detect the variations existing in any QTL regions of interest in a crop genome, and how to prepare sequencing samples for a complex genome such as soybean. Therefore, with the aims of identifying SNP markers in a cost effective way for fine-mapping several QTL regions, and testing the validation rate of the putative SNPs predicted with Solexa short sequence reads at a low sequencing depth, we evaluated a pooled DNA fragment reduced representation library and SNP detection methods applied to short read sequences generated by Solexa high-throughput sequencing technology. Results A total of 39,022 putative SNPs were identified by the Illumina/Solexa sequencing system using a reduced representation DNA library of two parental lines of a mapping population. The validation rates of these putative SNPs predicted with low and high stringency were 72% and 85%, respectively. One hundred sixty four SNP markers resulted from the validation of putative SNPs and have been selectively chosen to target a known QTL, thereby increasing the marker density of the targeted region to one marker per 42 K bp. Conclusions We have demonstrated how to quickly identify large numbers of SNPs for fine mapping of QTL regions by applying massively parallel sequencing combined with genome complexity reduction techniques. This SNP discovery approach is more efficient for targeting multiple QTL regions in a same genetic population, which can be applied to other crops. PMID:20701770

  20. Progress in ion torrent semiconductor chip based sequencing.

    PubMed

    Merriman, Barry; Rothberg, Jonathan M

    2012-12-01

    In order for next-generation sequencing to become widely used as a diagnostic in the healthcare industry, sequencing instrumentation will need to be mass produced with a high degree of quality and economy. One way to achieve this is to recast DNA sequencing in a format that fully leverages the manufacturing base created for computer chips, complementary metal-oxide semiconductor chip fabrication, which is the current pinnacle of large scale, high quality, low-cost manufacturing of high technology. To achieve this, ideally the entire sensory apparatus of the sequencer would be embodied in a standard semiconductor chip, manufactured in the same fab facilities used for logic and memory chips. Recently, such a sequencing chip, and the associated sequencing platform, has been developed and commercialized by Ion Torrent, a division of Life Technologies, Inc. Here we provide an overview of this semiconductor chip based sequencing technology, and summarize the progress made since its commercial introduction. We described in detail the progress in chip scaling, sequencing throughput, read length, and accuracy. We also summarize the enhancements in the associated platform, including sample preparation, data processing, and engagement of the broader development community through open source and crowdsourcing initiatives. © 2012 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  1. How Can We Better Detect Unauthorized GMOs in Food and Feed Chains?

    PubMed

    Fraiture, Marie-Alice; Herman, Philippe; De Loose, Marc; Debode, Frédéric; Roosens, Nancy H

    2017-06-01

    Current GMO detection systems have limited abilities to detect unauthorized genetically modified organisms (GMOs). Here, we propose a new workflow, based on next-generation sequencing (NGS) technology, to overcome this problem. In providing information about DNA sequences, this high-throughput workflow can distinguish authorized and unauthorized GMOs by strengthening the tools commonly used by enforcement laboratories with the help of NGS technology. In addition, thanks to its massive sequencing capacity, this workflow could be used to monitor GMOs present in the food and feed chain. In view of its potential implementation by enforcement laboratories, we discuss this innovative approach, its current limitations, and its sustainability of use over time. Copyright © 2017 Elsevier Ltd. All rights reserved.

  2. De novo assembly of human genomes with massively parallel short read sequencing.

    PubMed

    Li, Ruiqiang; Zhu, Hongmei; Ruan, Jue; Qian, Wubin; Fang, Xiaodong; Shi, Zhongbin; Li, Yingrui; Li, Shengting; Shan, Gao; Kristiansen, Karsten; Li, Songgang; Yang, Huanming; Wang, Jian; Wang, Jun

    2010-02-01

    Next-generation massively parallel DNA sequencing technologies provide ultrahigh throughput at a substantially lower unit data cost; however, the data are very short read length sequences, making de novo assembly extremely challenging. Here, we describe a novel method for de novo assembly of large genomes from short read sequences. We successfully assembled both the Asian and African human genome sequences, achieving an N50 contig size of 7.4 and 5.9 kilobases (kb) and scaffold of 446.3 and 61.9 kb, respectively. The development of this de novo short read assembly method creates new opportunities for building reference sequences and carrying out accurate analyses of unexplored genomes in a cost-effective way.

  3. Library construction for next-generation sequencing: Overviews and challenges

    PubMed Central

    Head, Steven R.; Komori, H. Kiyomi; LaMere, Sarah A.; Whisenant, Thomas; Van Nieuwerburgh, Filip; Salomon, Daniel R.; Ordoukhanian, Phillip

    2014-01-01

    High-throughput sequencing, also known as next-generation sequencing (NGS), has revolutionized genomic research. In recent years, NGS technology has steadily improved, with costs dropping and the number and range of sequencing applications increasing exponentially. Here, we examine the critical role of sequencing library quality and consider important challenges when preparing NGS libraries from DNA and RNA sources. Factors such as the quantity and physical characteristics of the RNA or DNA source material as well as the desired application (i.e., genome sequencing, targeted sequencing, RNA-seq, ChIP-seq, RIP-seq, and methylation) are addressed in the context of preparing high quality sequencing libraries. In addition, the current methods for preparing NGS libraries from single cells are also discussed. PMID:24502796

  4. Application of Next-generation Sequencing Technology in Forensic Science

    PubMed Central

    Yang, Yaran; Xie, Bingbing; Yan, Jiangwei

    2014-01-01

    Next-generation sequencing (NGS) technology, with its high-throughput capacity and low cost, has developed rapidly in recent years and become an important analytical tool for many genomics researchers. New opportunities in the research domain of the forensic studies emerge by harnessing the power of NGS technology, which can be applied to simultaneously analyzing multiple loci of forensic interest in different genetic contexts, such as autosomes, mitochondrial and sex chromosomes. Furthermore, NGS technology can also have potential applications in many other aspects of research. These include DNA database construction, ancestry and phenotypic inference, monozygotic twin studies, body fluid and species identification, and forensic animal, plant and microbiological analyses. Here we review the application of NGS technology in the field of forensic science with the aim of providing a reference for future forensics studies and practice. PMID:25462152

  5. Unravelling the complexity of microRNA-mediated gene regulation in black pepper (Piper nigrum L.) using high-throughput small RNA profiling.

    PubMed

    Asha, Srinivasan; Sreekumar, Sweda; Soniya, E V

    2016-01-01

    Analysis of high-throughput small RNA deep sequencing data, in combination with black pepper transcriptome sequences revealed microRNA-mediated gene regulation in black pepper ( Piper nigrum L.). Black pepper is an important spice crop and its berries are used worldwide as a natural food additive that contributes unique flavour to foods. In the present study to characterize microRNAs from black pepper, we generated a small RNA library from black pepper leaf and sequenced it by Illumina high-throughput sequencing technology. MicroRNAs belonging to a total of 303 conserved miRNA families were identified from the sRNAome data. Subsequent analysis from recently sequenced black pepper transcriptome confirmed precursor sequences of 50 conserved miRNAs and four potential novel miRNA candidates. Stem-loop qRT-PCR experiments demonstrated differential expression of eight conserved miRNAs in black pepper. Computational analysis of targets of the miRNAs showed 223 potential black pepper unigene targets that encode diverse transcription factors and enzymes involved in plant development, disease resistance, metabolic and signalling pathways. RLM-RACE experiments further mapped miRNA-mediated cleavage at five of the mRNA targets. In addition, miRNA isoforms corresponding to 18 miRNA families were also identified from black pepper. This study presents the first large-scale identification of microRNAs from black pepper and provides the foundation for the future studies of miRNA-mediated gene regulation of stress responses and diverse metabolic processes in black pepper.

  6. Emerging Genomic Tools for Legume Breeding: Current Status and Future Prospects

    PubMed Central

    Pandey, Manish K.; Roorkiwal, Manish; Singh, Vikas K.; Ramalingam, Abirami; Kudapa, Himabindu; Thudi, Mahendar; Chitikineni, Anu; Rathore, Abhishek; Varshney, Rajeev K.

    2016-01-01

    Legumes play a vital role in ensuring global nutritional food security and improving soil quality through nitrogen fixation. Accelerated higher genetic gains is required to meet the demand of ever increasing global population. In recent years, speedy developments have been witnessed in legume genomics due to advancements in next-generation sequencing (NGS) and high-throughput genotyping technologies. Reference genome sequences for many legume crops have been reported in the last 5 years. The availability of the draft genome sequences and re-sequencing of elite genotypes for several important legume crops have made it possible to identify structural variations at large scale. Availability of large-scale genomic resources and low-cost and high-throughput genotyping technologies are enhancing the efficiency and resolution of genetic mapping and marker-trait association studies. Most importantly, deployment of molecular breeding approaches has resulted in development of improved lines in some legume crops such as chickpea and groundnut. In order to support genomics-driven crop improvement at a fast pace, the deployment of breeder-friendly genomics and decision support tools seems appear to be critical in breeding programs in developing countries. This review provides an overview of emerging genomics and informatics tools/approaches that will be the key driving force for accelerating genomics-assisted breeding and ultimately ensuring nutritional and food security in developing countries. PMID:27199998

  7. Diagnostic Applications of Next Generation Sequencing in Immunogenetics and Molecular Oncology

    PubMed Central

    Grumbt, Barbara; Eck, Sebastian H.; Hinrichsen, Tanja; Hirv, Kaimo

    2013-01-01

    Summary With the introduction of the next generation sequencing (NGS) technologies, remarkable new diagnostic applications have been established in daily routine. Implementation of NGS is challenging in clinical diagnostics, but definite advantages and new diagnostic possibilities make the switch to the technology inevitable. In addition to the higher sequencing capacity, clonal sequencing of single molecules, multiplexing of samples, higher diagnostic sensitivity, workflow miniaturization, and cost benefits are some of the valuable features of the technology. After the recent advances, NGS emerged as a proven alternative for classical Sanger sequencing in the typing of human leukocyte antigens (HLA). By virtue of the clonal amplification of single DNA molecules ambiguous typing results can be avoided. Simultaneously, a higher sample throughput can be achieved by tagging of DNA molecules with multiplex identifiers and pooling of PCR products before sequencing. In our experience, up to 380 samples can be typed for HLA-A, -B, and -DRB1 in high-resolution during every sequencing run. In molecular oncology, NGS shows a markedly increased sensitivity in comparison to the conventional Sanger sequencing and is developing to the standard diagnostic tool in detection of somatic mutations in cancer cells with great impact on personalized treatment of patients. PMID:23922545

  8. A Rapid, High-Quality, Cost-Effective, Comprehensive and Expandable Targeted Next-Generation Sequencing Assay for Inherited Heart Diseases.

    PubMed

    Wilson, Kitchener D; Shen, Peidong; Fung, Eula; Karakikes, Ioannis; Zhang, Angela; InanlooRahatloo, Kolsoum; Odegaard, Justin; Sallam, Karim; Davis, Ronald W; Lui, George K; Ashley, Euan A; Scharfe, Curt; Wu, Joseph C

    2015-09-11

    Thousands of mutations across >50 genes have been implicated in inherited cardiomyopathies. However, options for sequencing this rapidly evolving gene set are limited because many sequencing services and off-the-shelf kits suffer from slow turnaround, inefficient capture of genomic DNA, and high cost. Furthermore, customization of these assays to cover emerging targets that suit individual needs is often expensive and time consuming. We sought to develop a custom high throughput, clinical-grade next-generation sequencing assay for detecting cardiac disease gene mutations with improved accuracy, flexibility, turnaround, and cost. We used double-stranded probes (complementary long padlock probes), an inexpensive and customizable capture technology, to efficiently capture and amplify the entire coding region and flanking intronic and regulatory sequences of 88 genes and 40 microRNAs associated with inherited cardiomyopathies, congenital heart disease, and cardiac development. Multiplexing 11 samples per sequencing run resulted in a mean base pair coverage of 420, of which 97% had >20× coverage and >99% were concordant with known heterozygous single nucleotide polymorphisms. The assay correctly detected germline variants in 24 individuals and revealed several polymorphic regions in miR-499. Total run time was 3 days at an approximate cost of $100 per sample. Accurate, high-throughput detection of mutations across numerous cardiac genes is achievable with complementary long padlock probe technology. Moreover, this format allows facile insertion of additional probes as more cardiomyopathy and congenital heart disease genes are discovered, giving researchers a powerful new tool for DNA mutation detection and discovery. © 2015 American Heart Association, Inc.

  9. Comparison of mapping algorithms used in high-throughput sequencing: application to Ion Torrent data

    PubMed Central

    2014-01-01

    Background The rapid evolution in high-throughput sequencing (HTS) technologies has opened up new perspectives in several research fields and led to the production of large volumes of sequence data. A fundamental step in HTS data analysis is the mapping of reads onto reference sequences. Choosing a suitable mapper for a given technology and a given application is a subtle task because of the difficulty of evaluating mapping algorithms. Results In this paper, we present a benchmark procedure to compare mapping algorithms used in HTS using both real and simulated datasets and considering four evaluation criteria: computational resource and time requirements, robustness of mapping, ability to report positions for reads in repetitive regions, and ability to retrieve true genetic variation positions. To measure robustness, we introduced a new definition for a correctly mapped read taking into account not only the expected start position of the read but also the end position and the number of indels and substitutions. We developed CuReSim, a new read simulator, that is able to generate customized benchmark data for any kind of HTS technology by adjusting parameters to the error types. CuReSim and CuReSimEval, a tool to evaluate the mapping quality of the CuReSim simulated reads, are freely available. We applied our benchmark procedure to evaluate 14 mappers in the context of whole genome sequencing of small genomes with Ion Torrent data for which such a comparison has not yet been established. Conclusions A benchmark procedure to compare HTS data mappers is introduced with a new definition for the mapping correctness as well as tools to generate simulated reads and evaluate mapping quality. The application of this procedure to Ion Torrent data from the whole genome sequencing of small genomes has allowed us to validate our benchmark procedure and demonstrate that it is helpful for selecting a mapper based on the intended application, questions to be addressed, and the technology used. This benchmark procedure can be used to evaluate existing or in-development mappers as well as to optimize parameters of a chosen mapper for any application and any sequencing platform. PMID:24708189

  10. MGIS: Managing banana (Musa spp.) genetic resources information and high-throughput genotyping data

    USDA-ARS?s Scientific Manuscript database

    Unraveling genetic diversity held in genebanks on a large scale is underway, due to the advances in Next-generation sequence-based technologies that produce high-density genetic markers for a large number of samples at low cost. Genebank users should be in a position to identify and select germplasm...

  11. Cross-species extrapolation of mammalian-based ToxCast Data using Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS)

    EPA Science Inventory

    In vitro high-throughput screening (HTS) and in silico technologies have emerged as 21st century tools for chemical hazard identification. In 2007 the U.S. Environmental Protection Agency (EPA) launched the ToxCast Program, which has screened thousands of chemicals in hundreds of...

  12. CloVR: a virtual machine for automated and portable sequence analysis from the desktop using cloud computing.

    PubMed

    Angiuoli, Samuel V; Matalka, Malcolm; Gussman, Aaron; Galens, Kevin; Vangala, Mahesh; Riley, David R; Arze, Cesar; White, James R; White, Owen; Fricke, W Florian

    2011-08-30

    Next-generation sequencing technologies have decentralized sequence acquisition, increasing the demand for new bioinformatics tools that are easy to use, portable across multiple platforms, and scalable for high-throughput applications. Cloud computing platforms provide on-demand access to computing infrastructure over the Internet and can be used in combination with custom built virtual machines to distribute pre-packaged with pre-configured software. We describe the Cloud Virtual Resource, CloVR, a new desktop application for push-button automated sequence analysis that can utilize cloud computing resources. CloVR is implemented as a single portable virtual machine (VM) that provides several automated analysis pipelines for microbial genomics, including 16S, whole genome and metagenome sequence analysis. The CloVR VM runs on a personal computer, utilizes local computer resources and requires minimal installation, addressing key challenges in deploying bioinformatics workflows. In addition CloVR supports use of remote cloud computing resources to improve performance for large-scale sequence processing. In a case study, we demonstrate the use of CloVR to automatically process next-generation sequencing data on multiple cloud computing platforms. The CloVR VM and associated architecture lowers the barrier of entry for utilizing complex analysis protocols on both local single- and multi-core computers and cloud systems for high throughput data processing.

  13. Characterization of a new apple luteovirus identified by high-throughput sequencing.

    PubMed

    Liu, Huawei; Wu, Liping; Nikolaeva, Ekaterina; Peter, Kari; Liu, Zongrang; Mollov, Dimitre; Cao, Mengji; Li, Ruhui

    2018-05-15

    'Rapid Apple Decline' (RAD) is a newly emerging problem of young, dwarf apple trees in the Northeastern USA. The affected trees show trunk necrosis, cracking and canker before collapse in summer. In this study, we discovered and characterized a new luteovirus from apple trees in RAD-affected orchards using high-throughput sequencing (HTS) technology and subsequent Sanger sequencing. Illumina NextSeq sequencing was applied to total RNAs prepared from three diseased apple trees. Sequence reads were de novo assembled, and contigs were annotated by BLASTx. RT-PCR and 5'/3' RACE sequencing were used to obtain the complete genome of a new virus. RT-PCR was used to detect the virus. Three common apple viruses and a new luteovirus were identified from the diseased trees by HTS and RT-PCR. Sequence analyses of the complete genome of the new virus show that it is a new species of the genus Luteovirus in the family Luteoviridae. The virus is graft transmissible and detected by RT-PCR in apple trees in a couple of orchards. A new luteovirus and/or three known viruses were found to be associated with RAD. Molecular characterization of the new luteovirus provides important information for further investigation of its distribution and etiological role.

  14. Evaluation of second-generation sequencing of 19 dilated cardiomyopathy genes for clinical applications.

    PubMed

    Gowrisankar, Sivakumar; Lerner-Ellis, Jordan P; Cox, Stephanie; White, Emily T; Manion, Megan; LeVan, Kevin; Liu, Jonathan; Farwell, Lisa M; Iartchouk, Oleg; Rehm, Heidi L; Funke, Birgit H

    2010-11-01

    Medical sequencing for diseases with locus and allelic heterogeneities has been limited by the high cost and low throughput of traditional sequencing technologies. "Second-generation" sequencing (SGS) technologies allow the parallel processing of a large number of genes and, therefore, offer great promise for medical sequencing; however, their use in clinical laboratories is still in its infancy. Our laboratory offers clinical resequencing for dilated cardiomyopathy (DCM) using an array-based platform that interrogates 19 of more than 30 genes known to cause DCM. We explored both the feasibility and cost effectiveness of using PCR amplification followed by SGS technology for sequencing these 19 genes in a set of five samples enriched for known sequence alterations (109 unique substitutions and 27 insertions and deletions). While the analytical sensitivity for substitutions was comparable to that of the DCM array (98%), SGS technology performed better than the DCM array for insertions and deletions (90.6% versus 58%). Overall, SGS performed substantially better than did the current array-based testing platform; however, the operational cost and projected turnaround time do not meet our current standards. Therefore, efficient capture methods and/or sample pooling strategies that shorten the turnaround time and decrease reagent and labor costs are needed before implementing this platform into routine clinical applications.

  15. Development and use of molecular markers: past and present.

    PubMed

    Grover, Atul; Sharma, P C

    2016-01-01

    Molecular markers, due to their stability, cost-effectiveness and ease of use provide an immensely popular tool for a variety of applications including genome mapping, gene tagging, genetic diversity diversity, phylogenetic analysis and forensic investigations. In the last three decades, a number of molecular marker techniques have been developed and exploited worldwide in different systems. However, only a handful of these techniques, namely RFLPs, RAPDs, AFLPs, ISSRs, SSRs and SNPs have received global acceptance. A recent revolution in DNA sequencing techniques has taken the discovery and application of molecular markers to high-throughput and ultrahigh-throughput levels. Although, the choice of marker will obviously depend on the targeted use, microsatellites, SNPs and genotyping by sequencing (GBS) largely fulfill most of the user requirements. Further, modern transcriptomic and functional markers will lead the ventures onto high-density genetic map construction, identification of QTLs, breeding and conservation strategies in times to come in combination with other high throughput techniques. This review presents an overview of different marker technologies and their variants with a comparative account of their characteristic features and applications.

  16. Process in manufacturing high efficiency AlGaAs/GaAs solar cells by MO-CVD

    NASA Technical Reports Server (NTRS)

    Yeh, Y. C. M.; Chang, K. I.; Tandon, J.

    1984-01-01

    Manufacturing technology for mass producing high efficiency GaAs solar cells is discussed. A progress using a high throughput MO-CVD reactor to produce high efficiency GaAs solar cells is discussed. Thickness and doping concentration uniformity of metal oxide chemical vapor deposition (MO-CVD) GaAs and AlGaAs layer growth are discussed. In addition, new tooling designs are given which increase the throughput of solar cell processing. To date, 2cm x 2cm AlGaAs/GaAs solar cells with efficiency up to 16.5% were produced. In order to meet throughput goals for mass producing GaAs solar cells, a large MO-CVD system (Cambridge Instrument Model MR-200) with a susceptor which was initially capable of processing 20 wafers (up to 75 mm diameter) during a single growth run was installed. In the MR-200, the sequencing of the gases and the heating power are controlled by a microprocessor-based programmable control console. Hence, operator errors can be reduced, leading to a more reproducible production sequence.

  17. Soil DNA metabarcoding and high-throughput sequencing as a forensic tool: considerations, potential limitations and recommendations.

    PubMed

    Young, J M; Austin, J J; Weyrich, L S

    2017-02-01

    Analysis of physical evidence is typically a deciding factor in forensic casework by establishing what transpired at a scene or who was involved. Forensic geoscience is an emerging multi-disciplinary science that can offer significant benefits to forensic investigations. Soil is a powerful, nearly 'ideal' contact trace evidence, as it is highly individualistic, easy to characterise, has a high transfer and retention probability, and is often overlooked in attempts to conceal evidence. However, many real-life cases encounter close proximity soil samples or soils with low inorganic content, which cannot be easily discriminated based on current physical and chemical analysis techniques. The capability to improve forensic soil discrimination, and identify key indicator taxa from soil using the organic fraction is currently lacking. The development of new DNA sequencing technologies offers the ability to generate detailed genetic profiles from soils and enhance current forensic soil analyses. Here, we discuss the use of DNA metabarcoding combined with high-throughput sequencing (HTS) technology to distinguish between soils from different locations in a forensic context. Specifically, we provide recommendations for best practice, outline the potential limitations encountered in a forensic context and describe the future directions required to integrate soil DNA analysis into casework. © FEMS 2016. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  18. Sharing of photobionts in sympatric populations of Thamnolia and Cetraria lichens: evidence from high-throughput sequencing.

    PubMed

    Onuț-Brännström, Ioana; Benjamin, Mitchell; Scofield, Douglas G; Heiðmarsson, Starri; Andersson, Martin G I; Lindström, Eva S; Johannesson, Hanna

    2018-03-13

    In this study, we explored the diversity of green algal symbionts (photobionts) in sympatric populations of the cosmopolitan lichen-forming fungi Thamnolia and Cetraria. We sequenced with both Sanger and Ion Torrent High-Throughput Sequencing technologies the photobiont ITS-region of 30 lichen thalli from two islands: Iceland and Öland. While Sanger recovered just one photobiont genotype from each thallus, the Ion Torrent data recovered 10-18 OTUs for each pool of 5 lichen thalli, suggesting that individual lichens can contain heterogeneous photobiont populations. Both methods showed evidence for photobiont sharing between Thamnolia and Cetraria on Iceland. In contrast, our data suggest that on Öland the two mycobionts associate with distinct photobiont communities, with few shared OTUs revealed by Ion Torrent sequencing. Furthermore, by comparing our sequences with public data, we identified closely related photobionts from geographically distant localities. Taken together, we suggest that the photobiont composition in Thamnolia and Cetraria results from both photobiont-mycobiont codispersal and local acquisition during mycobiont establishment and/or lichen growth. We hypothesize that this is a successful strategy for lichens to be flexible in the use of the most adapted photobiont for the environment.

  19. Single nucleotide polymorphism discovery in rainbow trout by deep sequencing of a reduced representation library.

    PubMed

    Sánchez, Cecilia Castaño; Smith, Timothy P L; Wiedmann, Ralph T; Vallejo, Roger L; Salem, Mohamed; Yao, Jianbo; Rexroad, Caird E

    2009-11-25

    To enhance capabilities for genomic analyses in rainbow trout, such as genomic selection, a large suite of polymorphic markers that are amenable to high-throughput genotyping protocols must be identified. Expressed Sequence Tags (ESTs) have been used for single nucleotide polymorphism (SNP) discovery in salmonids. In those strategies, the salmonid semi-tetraploid genomes often led to assemblies of paralogous sequences and therefore resulted in a high rate of false positive SNP identification. Sequencing genomic DNA using primers identified from ESTs proved to be an effective but time consuming methodology of SNP identification in rainbow trout, therefore not suitable for high throughput SNP discovery. In this study, we employed a high-throughput strategy that used pyrosequencing technology to generate data from a reduced representation library constructed with genomic DNA pooled from 96 unrelated rainbow trout that represent the National Center for Cool and Cold Water Aquaculture (NCCCWA) broodstock population. The reduced representation library consisted of 440 bp fragments resulting from complete digestion with the restriction enzyme HaeIII; sequencing produced 2,000,000 reads providing an average 6 fold coverage of the estimated 150,000 unique genomic restriction fragments (300,000 fragment ends). Three independent data analyses identified 22,022 to 47,128 putative SNPs on 13,140 to 24,627 independent contigs. A set of 384 putative SNPs, randomly selected from the sets produced by the three analyses were genotyped on individual fish to determine the validation rate of putative SNPs among analyses, distinguish apparent SNPs that actually represent paralogous loci in the tetraploid genome, examine Mendelian segregation, and place the validated SNPs on the rainbow trout linkage map. Approximately 48% (183) of the putative SNPs were validated; 167 markers were successfully incorporated into the rainbow trout linkage map. In addition, 2% of the sequences from the validated markers were associated with rainbow trout transcripts. The use of reduced representation libraries and pyrosequencing technology proved to be an effective strategy for the discovery of a high number of putative SNPs in rainbow trout; however, modifications to the technique to decrease the false discovery rate resulting from the evolutionary recent genome duplication would be desirable.

  20. BiQ Analyzer HT: locus-specific analysis of DNA methylation by high-throughput bisulfite sequencing

    PubMed Central

    Lutsik, Pavlo; Feuerbach, Lars; Arand, Julia; Lengauer, Thomas; Walter, Jörn; Bock, Christoph

    2011-01-01

    Bisulfite sequencing is a widely used method for measuring DNA methylation in eukaryotic genomes. The assay provides single-base pair resolution and, given sufficient sequencing depth, its quantitative accuracy is excellent. High-throughput sequencing of bisulfite-converted DNA can be applied either genome wide or targeted to a defined set of genomic loci (e.g. using locus-specific PCR primers or DNA capture probes). Here, we describe BiQ Analyzer HT (http://biq-analyzer-ht.bioinf.mpi-inf.mpg.de/), a user-friendly software tool that supports locus-specific analysis and visualization of high-throughput bisulfite sequencing data. The software facilitates the shift from time-consuming clonal bisulfite sequencing to the more quantitative and cost-efficient use of high-throughput sequencing for studying locus-specific DNA methylation patterns. In addition, it is useful for locus-specific visualization of genome-wide bisulfite sequencing data. PMID:21565797

  1. Managing the genomic revolution in cancer diagnostics.

    PubMed

    Nguyen, Doreen; Gocke, Christopher D

    2017-08-01

    Molecular tumor profiling is now a routine part of patient care, revealing targetable genomic alterations and molecularly distinct tumor subtypes with therapeutic and prognostic implications. The widespread adoption of next-generation sequencing technologies has greatly facilitated clinical implementation of genomic data and opened the door for high-throughput multigene-targeted sequencing. Herein, we discuss the variability of cancer genetic profiling currently offered by clinical laboratories, the challenges of applying rapidly evolving medical knowledge to individual patients, and the need for more standardized population-based molecular profiling.

  2. Sequence Data for Clostridium autoethanogenum using Three Generations of Sequencing Technologies

    DOE PAGES

    Utturkar, Sagar M.; Klingeman, Dawn Marie; Bruno-Barcena, José M.; ...

    2015-04-14

    During the past decade, DNA sequencing output has been mostly dominated by the second generation sequencing platforms which are characterized by low cost, high throughput and shorter read lengths for example, Illumina. The emergence and development of so called third generation sequencing platforms such as PacBio has permitted exceptionally long reads (over 20 kb) to be generated. Due to read length increases, algorithm improvements and hybrid assembly approaches, the concept of one chromosome, one contig and automated finishing of microbial genomes is now a realistic and achievable task for many microbial laboratories. In this paper, we describe high quality sequencemore » datasets which span three generations of sequencing technologies, containing six types of data from four NGS platforms and originating from a single microorganism, Clostridium autoethanogenum. The dataset reported here will be useful for the scientific community to evaluate upcoming NGS platforms, enabling comparison of existing and novel bioinformatics approaches and will encourage interest in the development of innovative experimental and computational methods for NGS data.« less

  3. Portable and Error-Free DNA-Based Data Storage.

    PubMed

    Yazdi, S M Hossein Tabatabaei; Gabrys, Ryan; Milenkovic, Olgica

    2017-07-10

    DNA-based data storage is an emerging nonvolatile memory technology of potentially unprecedented density, durability, and replication efficiency. The basic system implementation steps include synthesizing DNA strings that contain user information and subsequently retrieving them via high-throughput sequencing technologies. Existing architectures enable reading and writing but do not offer random-access and error-free data recovery from low-cost, portable devices, which is crucial for making the storage technology competitive with classical recorders. Here we show for the first time that a portable, random-access platform may be implemented in practice using nanopore sequencers. The novelty of our approach is to design an integrated processing pipeline that encodes data to avoid costly synthesis and sequencing errors, enables random access through addressing, and leverages efficient portable sequencing via new iterative alignment and deletion error-correcting codes. Our work represents the only known random access DNA-based data storage system that uses error-prone nanopore sequencers, while still producing error-free readouts with the highest reported information rate/density. As such, it represents a crucial step towards practical employment of DNA molecules as storage media.

  4. Metagenomic Assembly: Overview, Challenges and Applications

    PubMed Central

    Ghurye, Jay S.; Cepeda-Espinoza, Victoria; Pop, Mihai

    2016-01-01

    Advances in sequencing technologies have led to the increased use of high throughput sequencing in characterizing the microbial communities associated with our bodies and our environment. Critical to the analysis of the resulting data are sequence assembly algorithms able to reconstruct genes and organisms from complex mixtures. Metagenomic assembly involves new computational challenges due to the specific characteristics of the metagenomic data. In this survey, we focus on major algorithmic approaches for genome and metagenome assembly, and discuss the new challenges and opportunities afforded by this new field. We also review several applications of metagenome assembly in addressing interesting biological problems. PMID:27698619

  5. Accounting for uncertainty in DNA sequencing data.

    PubMed

    O'Rawe, Jason A; Ferson, Scott; Lyon, Gholson J

    2015-02-01

    Science is defined in part by an honest exposition of the uncertainties that arise in measurements and propagate through calculations and inferences, so that the reliabilities of its conclusions are made apparent. The recent rapid development of high-throughput DNA sequencing technologies has dramatically increased the number of measurements made at the biochemical and molecular level. These data come from many different DNA-sequencing technologies, each with their own platform-specific errors and biases, which vary widely. Several statistical studies have tried to measure error rates for basic determinations, but there are no general schemes to project these uncertainties so as to assess the surety of the conclusions drawn about genetic, epigenetic, and more general biological questions. We review here the state of uncertainty quantification in DNA sequencing applications, describe sources of error, and propose methods that can be used for accounting and propagating these errors and their uncertainties through subsequent calculations. Copyright © 2014 Elsevier Ltd. All rights reserved.

  6. A time-and-motion approach to micro-costing of high-throughput genomic assays

    PubMed Central

    Costa, S.; Regier, D.A.; Meissner, B.; Cromwell, I.; Ben-Neriah, S.; Chavez, E.; Hung, S.; Steidl, C.; Scott, D.W.; Marra, M.A.; Peacock, S.J.; Connors, J.M.

    2016-01-01

    Background Genomic technologies are increasingly used to guide clinical decision-making in cancer control. Economic evidence about the cost-effectiveness of genomic technologies is limited, in part because of a lack of published comprehensive cost estimates. In the present micro-costing study, we used a time-and-motion approach to derive cost estimates for 3 genomic assays and processes—digital gene expression profiling (gep), fluorescence in situ hybridization (fish), and targeted capture sequencing, including bioinformatics analysis—in the context of lymphoma patient management. Methods The setting for the study was the Department of Lymphoid Cancer Research laboratory at the BC Cancer Agency in Vancouver, British Columbia. Mean per-case hands-on time and resource measurements were determined from a series of direct observations of each assay. Per-case cost estimates were calculated using a bottom-up costing approach, with labour, capital and equipment, supplies and reagents, and overhead costs included. Results The most labour-intensive assay was found to be fish at 258.2 minutes per case, followed by targeted capture sequencing (124.1 minutes per case) and digital gep (14.9 minutes per case). Based on a historical case throughput of 180 cases annually, the mean per-case cost (2014 Canadian dollars) was estimated to be $1,029.16 for targeted capture sequencing and bioinformatics analysis, $596.60 for fish, and $898.35 for digital gep with an 807-gene code set. Conclusions With the growing emphasis on personalized approaches to cancer management, the need for economic evaluations of high-throughput genomic assays is increasing. Through economic modelling and budget-impact analyses, the cost estimates presented here can be used to inform priority-setting decisions about the implementation of such assays in clinical practice. PMID:27803594

  7. Genome-Wide Identification of Different Dormant Medicago sativa L. MicroRNAs in Response to Fall Dormancy

    PubMed Central

    Du, Hongqi; Sun, Xiaoge; Shi, Yinghua; Wang, Chengzhang

    2014-01-01

    Background MicroRNAs (miRNAs) are a class of regulatory small RNAs (sRNAs) that regulate gene post-transcriptional expression in plants and animals. High-throughput sequencing technology is capable of identifying small RNAs in plant species. Alfalfa (Medicago sativa L.) is one of the most widely cultivated perennial forage legumes worldwide, and fall dormancy is an adaptive characteristic related to the biomass production and winter survival in alfalfa. Here, we applied high-throughput sRNA sequencing to identify some miRNAs that were responsive to fall dormancy in standard variety (Maverick and CUF101) of alfalfa. Results Four sRNA libraries were generated and sequenced from alfalfa leaves in two typical varieties at distinct seasons. Through integrative analysis, we identified 51 novel miRNA candidates of 206 families. Additionally, we identified 28 miRNAs associated with fall dormancy in standard variety (Maverick and CUF101), including 20 known miRNAs and eight novel miRNAs. Both high-throughput sequencing and RT-qPCR confirmed that eight known miRNA members were up-regulated and six known miRNA members were down-regulated in response to fall dormancy in standard variety (Maverick and CUF101). Among the 51 novel miRNA candidates, five miRNAs were up-regulated and three miRNAs were down-regulated in response to fall dormancy in standard variety (Maverick and CUF101), and five of them were confirmed by Northern blot analysis. Conclusion We identified 20 known miRNAs and eight new miRNA candidates that were responsive to fall dormancy in standard variety (Maverick and CUF101) by high-throughput sequencing of small RNAs from Medicago sativa. Our data provide a useful resource for investigating miRNA-mediated regulatory mechanisms of fall dormancy in alfalfa, and these findings are important for our understanding of the roles played by miRNAs in the response of plants to abiotic stress in general and fall dormancy in alfalfa. PMID:25473944

  8. Genome-wide identification of different dormant Medicago sativa L. MicroRNAs in response to fall dormancy.

    PubMed

    Fan, Wenna; Zhang, Senhao; Du, Hongqi; Sun, Xiaoge; Shi, Yinghua; Wang, Chengzhang

    2014-01-01

    MicroRNAs (miRNAs) are a class of regulatory small RNAs (sRNAs) that regulate gene post-transcriptional expression in plants and animals. High-throughput sequencing technology is capable of identifying small RNAs in plant species. Alfalfa (Medicago sativa L.) is one of the most widely cultivated perennial forage legumes worldwide, and fall dormancy is an adaptive characteristic related to the biomass production and winter survival in alfalfa. Here, we applied high-throughput sRNA sequencing to identify some miRNAs that were responsive to fall dormancy in standard variety (Maverick and CUF101) of alfalfa. Four sRNA libraries were generated and sequenced from alfalfa leaves in two typical varieties at distinct seasons. Through integrative analysis, we identified 51 novel miRNA candidates of 206 families. Additionally, we identified 28 miRNAs associated with fall dormancy in standard variety (Maverick and CUF101), including 20 known miRNAs and eight novel miRNAs. Both high-throughput sequencing and RT-qPCR confirmed that eight known miRNA members were up-regulated and six known miRNA members were down-regulated in response to fall dormancy in standard variety (Maverick and CUF101). Among the 51 novel miRNA candidates, five miRNAs were up-regulated and three miRNAs were down-regulated in response to fall dormancy in standard variety (Maverick and CUF101), and five of them were confirmed by Northern blot analysis. We identified 20 known miRNAs and eight new miRNA candidates that were responsive to fall dormancy in standard variety (Maverick and CUF101) by high-throughput sequencing of small RNAs from Medicago sativa. Our data provide a useful resource for investigating miRNA-mediated regulatory mechanisms of fall dormancy in alfalfa, and these findings are important for our understanding of the roles played by miRNAs in the response of plants to abiotic stress in general and fall dormancy in alfalfa.

  9. Assessing the performance of the Oxford Nanopore Technologies MinION

    PubMed Central

    Laver, T.; Harrison, J.; O’Neill, P.A.; Moore, K.; Farbos, A.; Paszkiewicz, K.; Studholme, D.J.

    2015-01-01

    The Oxford Nanopore Technologies (ONT) MinION is a new sequencing technology that potentially offers read lengths of tens of kilobases (kb) limited only by the length of DNA molecules presented to it. The device has a low capital cost, is by far the most portable DNA sequencer available, and can produce data in real-time. It has numerous prospective applications including improving genome sequence assemblies and resolution of repeat-rich regions. Before such a technology is widely adopted, it is important to assess its performance and limitations in respect of throughput and accuracy. In this study we assessed the performance of the MinION by re-sequencing three bacterial genomes, with very different nucleotide compositions ranging from 28.6% to 70.7%; the high G + C strain was underrepresented in the sequencing reads. We estimate the error rate of the MinION (after base calling) to be 38.2%. Mean and median read lengths were 2 kb and 1 kb respectively, while the longest single read was 98 kb. The whole length of a 5 kb rRNA operon was covered by a single read. As the first nanopore-based single molecule sequencer available to researchers, the MinION is an exciting prospect; however, the current error rate limits its ability to compete with existing sequencing technologies, though we do show that MinION sequence reads can enhance contiguity of de novo assembly when used in conjunction with Illumina MiSeq data. PMID:26753127

  10. At the Tipping Point

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Wiley, H. S.

    There comes a time in every field of science when things suddenly change. While it might not be immediately apparent that things are different, a tipping point has occurred. Biology is now at such a point. The reason is the introduction of high-throughput genomics-based technologies. I am not talking about the consequences of the sequencing of the human genome (and every other genome within reach). The change is due to new technologies that generate an enormous amount of data about the molecular composition of cells. These include proteomics, transcriptional profiling by sequencing, and the ability to globally measure microRNAs andmore » post-translational modifications of proteins. These mountains of digital data can be mapped to a common frame of reference: the organism’s genome. With the new high-throughput technologies, we can generate tens of thousands of data points from each sample. Data are now measured in terabytes and the time necessary to analyze data can now require years. Obviously, we can’t wait to interpret the data fully before the next experiment. In fact, we might never be able to even look at all of it, much less understand it. This volume of data requires sophisticated computational and statistical methods for its analysis and is forcing biologists to approach data interpretation as a collaborative venture.« less

  11. High throughput sequencing analysis of RNA libraries reveals the influences of initial library and PCR methods on SELEX efficiency.

    PubMed

    Takahashi, Mayumi; Wu, Xiwei; Ho, Michelle; Chomchan, Pritsana; Rossi, John J; Burnett, John C; Zhou, Jiehua

    2016-09-22

    The systemic evolution of ligands by exponential enrichment (SELEX) technique is a powerful and effective aptamer-selection procedure. However, modifications to the process can dramatically improve selection efficiency and aptamer performance. For example, droplet digital PCR (ddPCR) has been recently incorporated into SELEX selection protocols to putatively reduce the propagation of byproducts and avoid selection bias that result from differences in PCR efficiency of sequences within the random library. However, a detailed, parallel comparison of the efficacy of conventional solution PCR versus the ddPCR modification in the RNA aptamer-selection process is needed to understand effects on overall SELEX performance. In the present study, we took advantage of powerful high throughput sequencing technology and bioinformatics analysis coupled with SELEX (HT-SELEX) to thoroughly investigate the effects of initial library and PCR methods in the RNA aptamer identification. Our analysis revealed that distinct "biased sequences" and nucleotide composition existed in the initial, unselected libraries purchased from two different manufacturers and that the fate of the "biased sequences" was target-dependent during selection. Our comparison of solution PCR- and ddPCR-driven HT-SELEX demonstrated that PCR method affected not only the nucleotide composition of the enriched sequences, but also the overall SELEX efficiency and aptamer efficacy.

  12. Tempo and mode of genomic mutations unveil human evolutionary history.

    PubMed

    Hara, Yuichiro

    2015-01-01

    Mutations that have occurred in human genomes provide insight into various aspects of evolutionary history such as speciation events and degrees of natural selection. Comparing genome sequences between human and great apes or among humans is a feasible approach for inferring human evolutionary history. Recent advances in high-throughput or so-called 'next-generation' DNA sequencing technologies have enabled the sequencing of thousands of individual human genomes, as well as a variety of reference genomes of hominids, many of which are publicly available. These sequence data can help to unveil the detailed demographic history of the lineage leading to humans as well as the explosion of modern human population size in the last several thousand years. In addition, high-throughput sequencing illustrates the tempo and mode of de novo mutations, which are producing human genetic variation at this moment. Pedigree-based human genome sequencing has shown that mutation rates vary significantly across the human genome. These studies have also provided an improved timescale of human evolution, because the mutation rate estimated from pedigree analysis is half that estimated from traditional analyses based on molecular phylogeny. Because of the dramatic reduction in sequencing cost, sequencing on-demand samples designed for specific studies is now also becoming popular. To produce data of sufficient quality to meet the requirements of the study, it is necessary to set an explicit sequencing plan that includes the choice of sample collection methods, sequencing platforms, and number of sequence reads.

  13. Next Generation Sequencing Technologies: The Doorway to the Unexplored Genomics of Non-Model Plants

    PubMed Central

    Unamba, Chibuikem I. N.; Nag, Akshay; Sharma, Ram K.

    2015-01-01

    Non-model plants i.e., the species which have one or all of the characters such as long life cycle, difficulty to grow in the laboratory or poor fecundity, have been schemed out of sequencing projects earlier, due to high running cost of Sanger sequencing. Consequently, the information about their genomics and key biological processes are inadequate. However, the advent of fast and cost effective next generation sequencing (NGS) platforms in the recent past has enabled the unearthing of certain characteristic gene structures unique to these species. It has also aided in gaining insight about mechanisms underlying processes of gene expression and secondary metabolism as well as facilitated development of genomic resources for diversity characterization, evolutionary analysis and marker assisted breeding even without prior availability of genomic sequence information. In this review we explore how different Next Gen Sequencing platforms, as well as recent advances in NGS based high throughput genotyping technologies are rewarding efforts on de-novo whole genome/transcriptome sequencing, development of genome wide sequence based markers resources for improvement of non-model crops that are less costly than phenotyping. PMID:26734016

  14. A Framework for the Evaluation of Biosecurity, Commercial, Regulatory, and Scientific Impacts of Plant Viruses and Viroids Identified by NGS Technologies

    PubMed Central

    Massart, Sebastien; Candresse, Thierry; Gil, José; Lacomme, Christophe; Predajna, Lukas; Ravnikar, Maja; Reynard, Jean-Sébastien; Rumbou, Artemis; Saldarelli, Pasquale; Škorić, Dijana; Vainio, Eeva J.; Valkonen, Jari P. T.; Vanderschuren, Hervé; Varveri, Christina; Wetzel, Thierry

    2017-01-01

    Recent advances in high-throughput sequencing technologies and bioinformatics have generated huge new opportunities for discovering and diagnosing plant viruses and viroids. Plant virology has undoubtedly benefited from these new methodologies, but at the same time, faces now substantial bottlenecks, namely the biological characterization of the newly discovered viruses and the analysis of their impact at the biosecurity, commercial, regulatory, and scientific levels. This paper proposes a scaled and progressive scientific framework for efficient biological characterization and risk assessment when a previously known or a new plant virus is detected by next generation sequencing (NGS) technologies. Four case studies are also presented to illustrate the need for such a framework, and to discuss the scenarios. PMID:28174561

  15. 76 FR 28990 - Ultra High Throughput Sequencing for Clinical Diagnostic Applications-Approaches To Assess...

    Federal Register 2010, 2011, 2012, 2013, 2014

    2011-05-19

    ...: 900). If you have never attended a Connect Pro meeting before, test your connection at: https://collaboration.fda.gov/common/help/en/support/meeting_test.htm . To get a quick overview of the Connect Pro... technologies are currently extensively used in research and are entering clinical diagnostic use; they are...

  16. Microfluidic single-cell whole-transcriptome sequencing.

    PubMed

    Streets, Aaron M; Zhang, Xiannian; Cao, Chen; Pang, Yuhong; Wu, Xinglong; Xiong, Liang; Yang, Lu; Fu, Yusi; Zhao, Liang; Tang, Fuchou; Huang, Yanyi

    2014-05-13

    Single-cell whole-transcriptome analysis is a powerful tool for quantifying gene expression heterogeneity in populations of cells. Many techniques have, thus, been recently developed to perform transcriptome sequencing (RNA-Seq) on individual cells. To probe subtle biological variation between samples with limiting amounts of RNA, more precise and sensitive methods are still required. We adapted a previously developed strategy for single-cell RNA-Seq that has shown promise for superior sensitivity and implemented the chemistry in a microfluidic platform for single-cell whole-transcriptome analysis. In this approach, single cells are captured and lysed in a microfluidic device, where mRNAs with poly(A) tails are reverse-transcribed into cDNA. Double-stranded cDNA is then collected and sequenced using a next generation sequencing platform. We prepared 94 libraries consisting of single mouse embryonic cells and technical replicates of extracted RNA and thoroughly characterized the performance of this technology. Microfluidic implementation increased mRNA detection sensitivity as well as improved measurement precision compared with tube-based protocols. With 0.2 M reads per cell, we were able to reconstruct a majority of the bulk transcriptome with 10 single cells. We also quantified variation between and within different types of mouse embryonic cells and found that enhanced measurement precision, detection sensitivity, and experimental throughput aided the distinction between biological variability and technical noise. With this work, we validated the advantages of an early approach to single-cell RNA-Seq and showed that the benefits of combining microfluidic technology with high-throughput sequencing will be valuable for large-scale efforts in single-cell transcriptome analysis.

  17. Deciphering the genomic targets of alkylating polyamide conjugates using high-throughput sequencing

    PubMed Central

    Chandran, Anandhakumar; Syed, Junetha; Taylor, Rhys D.; Kashiwazaki, Gengo; Sato, Shinsuke; Hashiya, Kaori; Bando, Toshikazu; Sugiyama, Hiroshi

    2016-01-01

    Chemically engineered small molecules targeting specific genomic sequences play an important role in drug development research. Pyrrole-imidazole polyamides (PIPs) are a group of molecules that can bind to the DNA minor-groove and can be engineered to target specific sequences. Their biological effects rely primarily on their selective DNA binding. However, the binding mechanism of PIPs at the chromatinized genome level is poorly understood. Herein, we report a method using high-throughput sequencing to identify the DNA-alkylating sites of PIP-indole-seco-CBI conjugates. High-throughput sequencing analysis of conjugate 2 showed highly similar DNA-alkylating sites on synthetic oligos (histone-free DNA) and on human genomes (chromatinized DNA context). To our knowledge, this is the first report identifying alkylation sites across genomic DNA by alkylating PIP conjugates using high-throughput sequencing. PMID:27098039

  18. High-Throughput Sequencing of Germline and Tumor From Men with Early-Onset Metastatic Prostate Cancer

    DTIC Science & Technology

    2016-12-01

    AWARD NUMBER: W81XWH-13-1-0371 TITLE: High-Throughput Sequencing of Germline and Tumor From Men with Early- Onset Metastatic Prostate Cancer...DATES COVERED 30 Sep 2013 - 29 Sep 2016 4. TITLE AND SUBTITLE 5a. CONTRACT NUMBER High-Throughput Sequencing of Germline and Tumor From Men with...presenting with metastatic prostate cancer at a young age (before age 60 years). Whole exome sequencing identified a panel of germline variants that have

  19. FastaValidator: an open-source Java library to parse and validate FASTA formatted sequences.

    PubMed

    Waldmann, Jost; Gerken, Jan; Hankeln, Wolfgang; Schweer, Timmy; Glöckner, Frank Oliver

    2014-06-14

    Advances in sequencing technologies challenge the efficient importing and validation of FASTA formatted sequence data which is still a prerequisite for most bioinformatic tools and pipelines. Comparative analysis of commonly used Bio*-frameworks (BioPerl, BioJava and Biopython) shows that their scalability and accuracy is hampered. FastaValidator represents a platform-independent, standardized, light-weight software library written in the Java programming language. It targets computer scientists and bioinformaticians writing software which needs to parse quickly and accurately large amounts of sequence data. For end-users FastaValidator includes an interactive out-of-the-box validation of FASTA formatted files, as well as a non-interactive mode designed for high-throughput validation in software pipelines. The accuracy and performance of the FastaValidator library qualifies it for large data sets such as those commonly produced by massive parallel (NGS) technologies. It offers scientists a fast, accurate and standardized method for parsing and validating FASTA formatted sequence data.

  20. Automated sample-preparation technologies in genome sequencing projects.

    PubMed

    Hilbert, H; Lauber, J; Lubenow, H; Düsterhöft, A

    2000-01-01

    A robotic workstation system (BioRobot 96OO, QIAGEN) and a 96-well UV spectrophotometer (Spectramax 250, Molecular Devices) were integrated in to the process of high-throughput automated sequencing of double-stranded plasmid DNA templates. An automated 96-well miniprep kit protocol (QIAprep Turbo, QIAGEN) provided high-quality plasmid DNA from shotgun clones. The DNA prepared by this procedure was used to generate more than two mega bases of final sequence data for two genomic projects (Arabidopsis thaliana and Schizosaccharomyces pombe), three thousand expressed sequence tags (ESTs) plus half a mega base of human full-length cDNA clones, and approximately 53,000 single reads for a whole genome shotgun project (Pseudomonas putida).

  1. Connecting Earth observation to high-throughput biodiversity data.

    PubMed

    Bush, Alex; Sollmann, Rahel; Wilting, Andreas; Bohmann, Kristine; Cole, Beth; Balzter, Heiko; Martius, Christopher; Zlinszky, András; Calvignac-Spencer, Sébastien; Cobbold, Christina A; Dawson, Terence P; Emerson, Brent C; Ferrier, Simon; Gilbert, M Thomas P; Herold, Martin; Jones, Laurence; Leendertz, Fabian H; Matthews, Louise; Millington, James D A; Olson, John R; Ovaskainen, Otso; Raffaelli, Dave; Reeve, Richard; Rödel, Mark-Oliver; Rodgers, Torrey W; Snape, Stewart; Visseren-Hamakers, Ingrid; Vogler, Alfried P; White, Piran C L; Wooster, Martin J; Yu, Douglas W

    2017-06-22

    Understandably, given the fast pace of biodiversity loss, there is much interest in using Earth observation technology to track biodiversity, ecosystem functions and ecosystem services. However, because most biodiversity is invisible to Earth observation, indicators based on Earth observation could be misleading and reduce the effectiveness of nature conservation and even unintentionally decrease conservation effort. We describe an approach that combines automated recording devices, high-throughput DNA sequencing and modern ecological modelling to extract much more of the information available in Earth observation data. This approach is achievable now, offering efficient and near-real-time monitoring of management impacts on biodiversity and its functions and services.

  2. A Perspective on the Future of High-Throughput RNAi Screening: Will CRISPR Cut Out the Competition or Can RNAi Help Guide the Way?

    PubMed

    Taylor, Jessica; Woodcock, Simon

    2015-09-01

    For more than a decade, RNA interference (RNAi) has brought about an entirely new approach to functional genomics screening. Enabling high-throughput loss-of-function (LOF) screens against the human genome, identifying new drug targets, and significantly advancing experimental biology, RNAi is a fast, flexible technology that is compatible with existing high-throughput systems and processes; however, the recent advent of clustered regularly interspaced palindromic repeats (CRISPR)-Cas, a powerful new precise genome-editing (PGE) technology, has opened up vast possibilities for functional genomics. CRISPR-Cas is novel in its simplicity: one piece of easily engineered guide RNA (gRNA) is used to target a gene sequence, and Cas9 expression is required in the cells. The targeted double-strand break introduced by the gRNA-Cas9 complex is highly effective at removing gene expression compared to RNAi. Together with the reduced cost and complexity of CRISPR-Cas, there is the realistic opportunity to use PGE to screen for phenotypic effects in a total gene knockout background. This review summarizes the exciting development of CRISPR-Cas as a high-throughput screening tool, comparing its future potential to that of well-established RNAi screening techniques, and highlighting future challenges and opportunities within these disciplines. We conclude that the two technologies actually complement rather than compete with each other, enabling greater understanding of the genome in relation to drug discovery. © 2015 Society for Laboratory Automation and Screening.

  3. CloVR: A virtual machine for automated and portable sequence analysis from the desktop using cloud computing

    PubMed Central

    2011-01-01

    Background Next-generation sequencing technologies have decentralized sequence acquisition, increasing the demand for new bioinformatics tools that are easy to use, portable across multiple platforms, and scalable for high-throughput applications. Cloud computing platforms provide on-demand access to computing infrastructure over the Internet and can be used in combination with custom built virtual machines to distribute pre-packaged with pre-configured software. Results We describe the Cloud Virtual Resource, CloVR, a new desktop application for push-button automated sequence analysis that can utilize cloud computing resources. CloVR is implemented as a single portable virtual machine (VM) that provides several automated analysis pipelines for microbial genomics, including 16S, whole genome and metagenome sequence analysis. The CloVR VM runs on a personal computer, utilizes local computer resources and requires minimal installation, addressing key challenges in deploying bioinformatics workflows. In addition CloVR supports use of remote cloud computing resources to improve performance for large-scale sequence processing. In a case study, we demonstrate the use of CloVR to automatically process next-generation sequencing data on multiple cloud computing platforms. Conclusion The CloVR VM and associated architecture lowers the barrier of entry for utilizing complex analysis protocols on both local single- and multi-core computers and cloud systems for high throughput data processing. PMID:21878105

  4. A field ornithologist’s guide to genomics: Practical considerations for ecology and conservation

    USGS Publications Warehouse

    Oyler-McCance, Sara J.; Oh, Kevin; Langin, Kathryn; Aldridge, Cameron L.

    2016-01-01

    Vast improvements in sequencing technology have made it practical to simultaneously sequence millions of nucleotides distributed across the genome, opening the door for genomic studies in virtually any species. Ornithological research stands to benefit in three substantial ways. First, genomic methods enhance our ability to parse and simultaneously analyze both neutral and non-neutral genomic regions, thus providing insight into adaptive evolution and divergence. Second, the sheer quantity of sequence data generated by current sequencing platforms allows increased precision and resolution in analyses. Third, high-throughput sequencing can benefit applications that focus on a small number of loci that are otherwise prohibitively expensive, time-consuming, and technically difficult using traditional sequencing methods. These advances have improved our ability to understand evolutionary processes like speciation and local adaptation, but they also offer many practical applications in the fields of population ecology, migration tracking, conservation planning, diet analyses, and disease ecology. This review provides a guide for field ornithologists interested in incorporating genomic approaches into their research program, with an emphasis on techniques related to ecology and conservation. We present a general overview of contemporary genomic approaches and methods, as well as important considerations when selecting a genomic technique. We also discuss research questions that are likely to benefit from utilizing high-throughput sequencing instruments, highlighting select examples from recent avian studies.

  5. PCR Primers to Study the Diversity of Expressed Fungal Genes Encoding Lignocellulolytic Enzymes in Soils Using High-Throughput Sequencing

    PubMed Central

    Barbi, Florian; Bragalini, Claudia; Vallon, Laurent; Prudent, Elsa; Dubost, Audrey; Fraissinet-Tachet, Laurence; Marmeisse, Roland; Luis, Patricia

    2014-01-01

    Plant biomass degradation in soil is one of the key steps of carbon cycling in terrestrial ecosystems. Fungal saprotrophic communities play an essential role in this process by producing hydrolytic enzymes active on the main components of plant organic matter. Open questions in this field regard the diversity of the species involved, the major biochemical pathways implicated and how these are affected by external factors such as litter quality or climate changes. This can be tackled by environmental genomic approaches involving the systematic sequencing of key enzyme-coding gene families using soil-extracted RNA as material. Such an approach necessitates the design and evaluation of gene family-specific PCR primers producing sequence fragments compatible with high-throughput sequencing approaches. In the present study, we developed and evaluated PCR primers for the specific amplification of fungal CAZy Glycoside Hydrolase gene families GH5 (subfamily 5) and GH11 encoding endo-β-1,4-glucanases and endo-β-1,4-xylanases respectively as well as Basidiomycota class II peroxidases, corresponding to the CAZy Auxiliary Activity family 2 (AA2), active on lignin. These primers were experimentally validated using DNA extracted from a wide range of Ascomycota and Basidiomycota species including 27 with sequenced genomes. Along with the published primers for Glycoside Hydrolase GH7 encoding enzymes active on cellulose, the newly design primers were shown to be compatible with the Illumina MiSeq sequencing technology. Sequences obtained from RNA extracted from beech or spruce forest soils showed a high diversity and were uniformly distributed in gene trees featuring the global diversity of these gene families. This high-throughput sequencing approach using several degenerate primers constitutes a robust method, which allows the simultaneous characterization of the diversity of different fungal transcripts involved in plant organic matter degradation and may lead to the discovery of complex patterns in gene expression of soil fungal communities. PMID:25545363

  6. Quantum-Sequencing: Fast electronic single DNA molecule sequencing

    NASA Astrophysics Data System (ADS)

    Casamada Ribot, Josep; Chatterjee, Anushree; Nagpal, Prashant

    2014-03-01

    A major goal of third-generation sequencing technologies is to develop a fast, reliable, enzyme-free, high-throughput and cost-effective, single-molecule sequencing method. Here, we present the first demonstration of unique ``electronic fingerprint'' of all nucleotides (A, G, T, C), with single-molecule DNA sequencing, using Quantum-tunneling Sequencing (Q-Seq) at room temperature. We show that the electronic state of the nucleobases shift depending on the pH, with most distinct states identified at acidic pH. We also demonstrate identification of single nucleotide modifications (methylation here). Using these unique electronic fingerprints (or tunneling data), we report a partial sequence of beta lactamase (bla) gene, which encodes resistance to beta-lactam antibiotics, with over 95% success rate. These results highlight the potential of Q-Seq as a robust technique for next-generation sequencing.

  7. Molecular characterization of a novel Nucleorhabdovirus from black currant identified by high-throughput sequencing

    USDA-ARS?s Scientific Manuscript database

    Contigs with sequence similarities to several nucleorhabdoviruses were identified by high-throughput sequencing analysis from a black currant (Ribes nigrum L.) cultivar. The complete genomic sequence of this new nucleorhabdovirus is 14,432 nucleotides. Its genomic organization is typical of nucleorh...

  8. High Throughput Plasmid Sequencing with Illumina and CLC Bio (Seventh Annual Sequencing, Finishing, Analysis in the Future (SFAF) Meeting 2012)

    ScienceCinema

    Athavale, Ajay

    2018-01-04

    Ajay Athavale (Monsanto) presents "High Throughput Plasmid Sequencing with Illumina and CLC Bio" at the 7th Annual Sequencing, Finishing, Analysis in the Future (SFAF) Meeting held in June, 2012 in Santa Fe, NM.

  9. NCBI GEO: archive for high-throughput functional genomic data.

    PubMed

    Barrett, Tanya; Troup, Dennis B; Wilhite, Stephen E; Ledoux, Pierre; Rudnev, Dmitry; Evangelista, Carlos; Kim, Irene F; Soboleva, Alexandra; Tomashevsky, Maxim; Marshall, Kimberly A; Phillippy, Katherine H; Sherman, Patti M; Muertter, Rolf N; Edgar, Ron

    2009-01-01

    The Gene Expression Omnibus (GEO) at the National Center for Biotechnology Information (NCBI) is the largest public repository for high-throughput gene expression data. Additionally, GEO hosts other categories of high-throughput functional genomic data, including those that examine genome copy number variations, chromatin structure, methylation status and transcription factor binding. These data are generated by the research community using high-throughput technologies like microarrays and, more recently, next-generation sequencing. The database has a flexible infrastructure that can capture fully annotated raw and processed data, enabling compliance with major community-derived scientific reporting standards such as 'Minimum Information About a Microarray Experiment' (MIAME). In addition to serving as a centralized data storage hub, GEO offers many tools and features that allow users to effectively explore, analyze and download expression data from both gene-centric and experiment-centric perspectives. This article summarizes the GEO repository structure, content and operating procedures, as well as recently introduced data mining features. GEO is freely accessible at http://www.ncbi.nlm.nih.gov/geo/.

  10. Automation, parallelism, and robotics for proteomics.

    PubMed

    Alterovitz, Gil; Liu, Jonathan; Chow, Jijun; Ramoni, Marco F

    2006-07-01

    The speed of the human genome project (Lander, E. S., Linton, L. M., Birren, B., Nusbaum, C. et al., Nature 2001, 409, 860-921) was made possible, in part, by developments in automation of sequencing technologies. Before these technologies, sequencing was a laborious, expensive, and personnel-intensive task. Similarly, automation and robotics are changing the field of proteomics today. Proteomics is defined as the effort to understand and characterize proteins in the categories of structure, function and interaction (Englbrecht, C. C., Facius, A., Comb. Chem. High Throughput Screen. 2005, 8, 705-715). As such, this field nicely lends itself to automation technologies since these methods often require large economies of scale in order to achieve cost and time-saving benefits. This article describes some of the technologies and methods being applied in proteomics in order to facilitate automation within the field as well as in linking proteomics-based information with other related research areas.

  11. Rapid DNA Sequencing by Direct Nanoscale Reading of Nucleotide Bases on Individual DNA Chains

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lee, James Weifu; Meller, Amit

    2007-01-01

    Since the independent invention of DNA sequencing by Sanger and by Gilbert 30 years ago, it has grown from a small scale technique capable of reading several kilobase-pair of sequence per day into today's multibillion dollar industry. This growth has spurred the development of new sequencing technologies that do not involve either electrophoresis or Sanger sequencing chemistries. Sequencing by Synthesis (SBS) involves multiple parallel micro-sequencing addition events occurring on a surface, where data from each round is detected by imaging. New High Throughput Technologies for DNA Sequencing and Genomics is the second volume in the Perspectives in Bioanalysis series, whichmore » looks at the electroanalytical chemistry of nucleic acids and proteins, development of electrochemical sensors and their application in biomedicine and in the new fields of genomics and proteomics. The authors have expertly formatted the information for a wide variety of readers, including new developments that will inspire students and young scientists to create new tools for science and medicine in the 21st century. Reviews of complementary developments in Sanger and SBS sequencing chemistries, capillary electrophoresis and microdevice integration, MS sequencing and applications set the framework for the book.« less

  12. Transcriptome-based differentiation of closely-related Miscanthus lines.

    PubMed

    Chouvarine, Philippe; Cooksey, Amanda M; McCarthy, Fiona M; Ray, David A; Baldwin, Brian S; Burgess, Shane C; Peterson, Daniel G

    2012-01-01

    Distinguishing between individuals is critical to those conducting animal/plant breeding, food safety/quality research, diagnostic and clinical testing, and evolutionary biology studies. Classical genetic identification studies are based on marker polymorphisms, but polymorphism-based techniques are time and labor intensive and often cannot distinguish between closely related individuals. Illumina sequencing technologies provide the detailed sequence data required for rapid and efficient differentiation of related species, lines/cultivars, and individuals in a cost-effective manner. Here we describe the use of Illumina high-throughput exome sequencing, coupled with SNP mapping, as a rapid means of distinguishing between related cultivars of the lignocellulosic bioenergy crop giant miscanthus (Miscanthus × giganteus). We provide the first exome sequence database for Miscanthus species complete with Gene Ontology (GO) functional annotations. A SNP comparative analysis of rhizome-derived cDNA sequences was successfully utilized to distinguish three Miscanthus × giganteus cultivars from each other and from other Miscanthus species. Moreover, the resulting phylogenetic tree generated from SNP frequency data parallels the known breeding history of the plants examined. Some of the giant miscanthus plants exhibit considerable sequence divergence. Here we describe an analysis of Miscanthus in which high-throughput exome sequencing was utilized to differentiate between closely related genotypes despite the current lack of a reference genome sequence. We functionally annotated the exome sequences and provide resources to support Miscanthus systems biology. In addition, we demonstrate the use of the commercial high-performance cloud computing to do computational GO annotation.

  13. High-throughput discovery of mutations in tef semi-dwarfing genes by next-generation sequencing analysis.

    PubMed

    Zhu, Qihui; Smith, Shavannor M; Ayele, Mulu; Yang, Lixing; Jogi, Ansuya; Chaluvadi, Srinivasa R; Bennetzen, Jeffrey L

    2012-11-01

    Tef (Eragrostis tef) is a major cereal crop in Ethiopia. Lodging is the primary constraint to increasing productivity in this allotetraploid species, accounting for losses of ∼15-45% in yield each year. As a first step toward identifying semi-dwarf varieties that might have improved lodging resistance, an ∼6× fosmid library was constructed and used to identify both homeologues of the dw3 semi-dwarfing gene of Sorghum bicolor. An EMS mutagenized population, consisting of ∼21,210 tef plants, was planted and leaf materials were collected into 23 superpools. Two dwarfing candidate genes, homeologues of dw3 of sorghum and rht1 of wheat, were sequenced directly from each superpool with 454 technology, and 120 candidate mutations were identified. Out of 10 candidates tested, six independent mutations were validated by Sanger sequencing, including two predicted detrimental mutations in both dw3 homeologues with a potential to improve lodging resistance in tef through further breeding. This study demonstrates that high-throughput sequencing can identify potentially valuable mutations in under-studied plant species like tef and has provided mutant lines that can now be combined and tested in breeding programs for improved lodging resistance.

  14. High-throughput multiplex HLA-typing by ligase detection reaction (LDR) and universal array (UA) approach.

    PubMed

    Consolandi, Clarissa

    2009-01-01

    One major goal of genetic research is to understand the role of genetic variation in living systems. In humans, by far the most common type of such variation involves differences in single DNA nucleotides, and is thus termed single nucleotide polymorphism (SNP). The need for improvement in throughput and reliability of traditional techniques makes it necessary to develop new technologies. Thus the past few years have witnessed an extraordinary surge of interest in DNA microarray technology. This new technology offers the first great hope for providing a systematic way to explore the genome. It permits a very rapid analysis of thousands genes for the purpose of gene discovery, sequencing, mapping, expression, and polymorphism detection. We generated a series of analytical tools to address the manufacturing, detection and data analysis components of a microarray experiment. In particular, we set up a universal array approach in combination with a PCR-LDR (polymerase chain reaction-ligation detection reaction) strategy for allele identification in the HLA gene.

  15. Quantitative phenotyping via deep barcode sequencing.

    PubMed

    Smith, Andrew M; Heisler, Lawrence E; Mellor, Joseph; Kaper, Fiona; Thompson, Michael J; Chee, Mark; Roth, Frederick P; Giaever, Guri; Nislow, Corey

    2009-10-01

    Next-generation DNA sequencing technologies have revolutionized diverse genomics applications, including de novo genome sequencing, SNP detection, chromatin immunoprecipitation, and transcriptome analysis. Here we apply deep sequencing to genome-scale fitness profiling to evaluate yeast strain collections in parallel. This method, Barcode analysis by Sequencing, or "Bar-seq," outperforms the current benchmark barcode microarray assay in terms of both dynamic range and throughput. When applied to a complex chemogenomic assay, Bar-seq quantitatively identifies drug targets, with performance superior to the benchmark microarray assay. We also show that Bar-seq is well-suited for a multiplex format. We completely re-sequenced and re-annotated the yeast deletion collection using deep sequencing, found that approximately 20% of the barcodes and common priming sequences varied from expectation, and used this revised list of barcode sequences to improve data quality. Together, this new assay and analysis routine provide a deep-sequencing-based toolkit for identifying gene-environment interactions on a genome-wide scale.

  16. Molecular characterization of a novel Luteovirus from peach identified by high-throughput sequencing

    USDA-ARS?s Scientific Manuscript database

    Contigs with sequence homologies to Cherry-associated luteovirus were identified by high-throughput sequencing analysis of two peach accessions undergoing quarantine testing. The complete genomic sequences of the two isolates of this virus are 5,819 and 5,814 nucleotides. Their genome organization i...

  17. [Research on soil bacteria under the impact of sealed CO2 leakage by high-throughput sequencing technology].

    PubMed

    Tian, Di; Ma, Xin; Li, Yu-E; Zha, Liang-Song; Wu, Yang; Zou, Xiao-Xia; Liu, Shuang

    2013-10-01

    Carbon dioxide Capture and Storage has provided a new option for mitigating global anthropogenic CO2 emission with its unique advantages. However, there is a risk of the sealed CO2 leakage, bringing a serious threat to the ecology system. It is widely known that soil microorganisms are closely related to soil health, while the study on the impact of sequestered CO2 leakage on soil microorganisms is quite deficient. In this study, the leakage scenarios of sealed CO2 were constructed and the 16S rRNA genes of soil bacteria were sequenced by Illumina high-throughput sequencing technology on Miseq platform, and related biological analysis was conducted to explore the changes of soil bacterial abundance, diversity and structure. There were 486,645 reads for 43,017 OTUs of 15 soil samples and the results of biological analysis showed that there were differences in the abundance, diversity and community structure of soil bacterial community under different CO, leakage scenarios while the abundance and diversity of the bacterial community declined with the amplification of CO2 leakage quantity and leakage time, and some bacteria species became the dominant bacteria species in the bacteria community, therefore the increase of Acidobacteria species would be a biological indicator for the impact of sealed CO2 leakage on soil ecology system.

  18. NCBI GEO: archive for functional genomics data sets--10 years on.

    PubMed

    Barrett, Tanya; Troup, Dennis B; Wilhite, Stephen E; Ledoux, Pierre; Evangelista, Carlos; Kim, Irene F; Tomashevsky, Maxim; Marshall, Kimberly A; Phillippy, Katherine H; Sherman, Patti M; Muertter, Rolf N; Holko, Michelle; Ayanbule, Oluwabukunmi; Yefanov, Andrey; Soboleva, Alexandra

    2011-01-01

    A decade ago, the Gene Expression Omnibus (GEO) database was established at the National Center for Biotechnology Information (NCBI). The original objective of GEO was to serve as a public repository for high-throughput gene expression data generated mostly by microarray technology. However, the research community quickly applied microarrays to non-gene-expression studies, including examination of genome copy number variation and genome-wide profiling of DNA-binding proteins. Because the GEO database was designed with a flexible structure, it was possible to quickly adapt the repository to store these data types. More recently, as the microarray community switches to next-generation sequencing technologies, GEO has again adapted to host these data sets. Today, GEO stores over 20,000 microarray- and sequence-based functional genomics studies, and continues to handle the majority of direct high-throughput data submissions from the research community. Multiple mechanisms are provided to help users effectively search, browse, download and visualize the data at the level of individual genes or entire studies. This paper describes recent database enhancements, including new search and data representation tools, as well as a brief review of how the community uses GEO data. GEO is freely accessible at http://www.ncbi.nlm.nih.gov/geo/.

  19. Comparison of the performance of Ion Torrent chips in noninvasive prenatal trisomy detection.

    PubMed

    Wang, Yanlin; Wen, Zujia; Shen, Jiawei; Cheng, Weiwei; Li, Jun; Qin, Xiaolan; Ma, Duan; Shi, Yongyong

    2014-07-01

    Semiconductor high-throughput sequencing, represented by Ion Torrent PGM/Proton, proves to be feasible in the noninvasive prenatal diagnosis of fetal aneuploidies. It is commendable that, with less data and relevant cost also, an accurate result can be achieved owing to the high sensitivity and specificity of such kind of technology. We conducted a comparative analysis of the performance of four different Ion chips in detecting fetal chromosomal aneuploidies. Eight maternal plasma DNA samples, including four pregnancies with normal fetuses and four with trisomy 21 fetuses, were sequenced on Ion Torrent 314/316/318/PI chips, respectively. Results such as read mapped ratio, correlation coefficient and phred quality score were calculated and parallelly compared. All samples were correctly classified even with low-throughput chip, and, among the four chips, the 316 chip had the highest read mapped ratio, correlation coefficient, mean read length and phred quality score. All chips were well consistent with each other. Our results showed that all Ion chips are applicable in noninvasive prenatal fetal aneuploidy diagnosis. We recommend researchers or clinicians to use the appropriate chip with barcoding technology on the basis of the sample number.

  20. Molecular biology and immunology of head and neck cancer.

    PubMed

    Guo, Theresa; Califano, Joseph A

    2015-07-01

    In recent years, our knowledge and understanding of head and neck squamous cell carcinoma (HNSCC) has expanded dramatically. New high-throughput sequencing technologies have accelerated these discoveries since the first reports of whole-exome sequencing of HNSCC tumors in 2011. In addition, the discovery of human papillomavirus in relationship with oropharyngeal squamous cell carcinoma has shifted our molecular understanding of the disease. New investigation into the role of immune evasion in HNSCC has also led to potential novel therapies based on immune-specific systemic therapies. Copyright © 2015 Elsevier Inc. All rights reserved.

  1. Microbial community structure in a full-scale anaerobic treatment plant during start-up and first year of operation revealed by high-throughput 16S rRNA gene amplicon sequencing.

    PubMed

    Fykse, Else Marie; Aarskaug, Tone; Madslien, Elisabeth H; Dybwad, Marius

    2016-12-01

    High-throughput amplicon sequencing of six biomass samples from a full-scale anaerobic reactor at a Norwegian wood and pulp factory using Biothane Biobed Expanded Granular Sludge Bed (EGSB) technology during start-up and first year of operation was performed. A total of 106,166 16S rRNA gene sequences (V3-V5 region) were obtained. The number of operational taxonomic units (OTUs) ranged from 595 to 2472, and a total of 38 different phyla and 143 families were observed. The predominant phyla were Bacteroidetes, Chloroflexi, Firmicutes, Proteobacteria, and Spirochaetes. A more diverse microbial community was observed in the inoculum biomass coming from an Upflow Anaerobic Sludge Blanket (USAB) reactor, reflecting an adaptation of the inoculum diversity to the specific conditions of the new reactor. In addition, no taxa classified as obligate pathogens were identified and potentially opportunistic pathogens were absent or observed in low abundances. No Legionella bacteria were identified by traditional culture-based and molecular methods. Copyright © 2016 Elsevier Ltd. All rights reserved.

  2. High throughput SNP discovery and genotyping in grapevine (Vitis vinifera L.) by combining a re-sequencing approach and SNPlex technology

    PubMed Central

    Lijavetzky, Diego; Cabezas, José Antonio; Ibáñez, Ana; Rodríguez, Virginia; Martínez-Zapater, José M

    2007-01-01

    Background Single-nucleotide polymorphisms (SNPs) are the most abundant type of DNA sequence polymorphisms. Their higher availability and stability when compared to simple sequence repeats (SSRs) provide enhanced possibilities for genetic and breeding applications such as cultivar identification, construction of genetic maps, the assessment of genetic diversity, the detection of genotype/phenotype associations, or marker-assisted breeding. In addition, the efficiency of these activities can be improved thanks to the ease with which SNP genotyping can be automated. Expressed sequence tags (EST) sequencing projects in grapevine are allowing for the in silico detection of multiple putative sequence polymorphisms within and among a reduced number of cultivars. In parallel, the sequence of the grapevine cultivar Pinot Noir is also providing thousands of polymorphisms present in this highly heterozygous genome. Still the general application of those SNPs requires further validation since their use could be restricted to those specific genotypes. Results In order to develop a large SNP set of wide application in grapevine we followed a systematic re-sequencing approach in a group of 11 grape genotypes corresponding to ancient unrelated cultivars as well as wild plants. Using this approach, we have sequenced 230 gene fragments, what represents the analysis of over 1 Mb of grape DNA sequence. This analysis has allowed the discovery of 1573 SNPs with an average of one SNP every 64 bp (one SNP every 47 bp in non-coding regions and every 69 bp in coding regions). Nucleotide diversity in grape (π = 0.0051) was found to be similar to values observed in highly polymorphic plant species such as maize. The average number of haplotypes per gene sequence was estimated as six, with three haplotypes representing over 83% of the analyzed sequences. Short-range linkage disequilibrium (LD) studies within the analyzed sequences indicate the existence of a rapid decay of LD within the selected grapevine genotypes. To validate the use of the detected polymorphisms in genetic mapping, cultivar identification and genetic diversity studies we have used the SNPlex™ genotyping technology in a sample of grapevine genotypes and segregating progenies. Conclusion These results provide accurate values for nucleotide diversity in coding sequences and a first estimate of short-range LD in grapevine. Using SNPlex™ genotyping we have shown the application of a set of discovered SNPs as molecular markers for cultivar identification, linkage mapping and genetic diversity studies. Thus, the combination a highly efficient re-sequencing approach and the SNPlex™ high throughput genotyping technology provide a powerful tool for grapevine genetic analysis. PMID:18021442

  3. Molecular beacons for DNA binding proteins: an emerging technology for detection of DNA binding proteins and their ligands.

    PubMed

    Dummitt, Benjamin; Chang, Yie-Hwa

    2006-06-01

    Quantitation of the level or activity of specific proteins is one of the most commonly performed experiments in biomedical research. Protein detection has historically been difficult to adapt to high throughput platforms because of heavy reliance upon antibodies for protein detection. Molecular beacons for DNA binding proteins is a recently developed technology that attempts to overcome such limitations. Protein detection is accomplished using inexpensive, easy-to-synthesize oligonucleotides, accompanied by a fluorescence readout. Importantly, detection of the protein and reporting of the signal occur simultaneously, allowing for one-step protocols and increased potential for use in high throughput analysis. While the initial iteration of the technology allowed only for the detection of sequence-specific DNA binding proteins, more recent adaptations allow for the possibility of development of beacons for any protein, independent of native DNA binding activity. Here, we discuss the development of the technology, the mechanism of the reaction, and recent improvements and modifications made to improve the assay in terms of sensitivity, potential for multiplexing, and broad applicability.

  4. Technical Considerations for Reduced Representation Bisulfite Sequencing with Multiplexed Libraries

    PubMed Central

    Chatterjee, Aniruddha; Rodger, Euan J.; Stockwell, Peter A.; Weeks, Robert J.; Morison, Ian M.

    2012-01-01

    Reduced representation bisulfite sequencing (RRBS), which couples bisulfite conversion and next generation sequencing, is an innovative method that specifically enriches genomic regions with a high density of potential methylation sites and enables investigation of DNA methylation at single-nucleotide resolution. Recent advances in the Illumina DNA sample preparation protocol and sequencing technology have vastly improved sequencing throughput capacity. Although the new Illumina technology is now widely used, the unique challenges associated with multiplexed RRBS libraries on this platform have not been previously described. We have made modifications to the RRBS library preparation protocol to sequence multiplexed libraries on a single flow cell lane of the Illumina HiSeq 2000. Furthermore, our analysis incorporates a bioinformatics pipeline specifically designed to process bisulfite-converted sequencing reads and evaluate the output and quality of the sequencing data generated from the multiplexed libraries. We obtained an average of 42 million paired-end reads per sample for each flow-cell lane, with a high unique mapping efficiency to the reference human genome. Here we provide a roadmap of modifications, strategies, and trouble shooting approaches we implemented to optimize sequencing of multiplexed libraries on an a RRBS background. PMID:23193365

  5. De novo characterization of Lentinula edodes C(91-3) transcriptome by deep Solexa sequencing.

    PubMed

    Zhong, Mintao; Liu, Ben; Wang, Xiaoli; Liu, Lei; Lun, Yongzhi; Li, Xingyun; Ning, Anhong; Cao, Jing; Huang, Min

    2013-02-01

    Lentinula edodes, has been utilized as food, as well as, in popular medicine, moreover, its extract isolated from its mycelium and fruiting body have shown several therapeutic properties. Yet little is understood about its genes involved in these properties, and the absence of L.edodes genomes has been a barrier to the development of functional genomics research. However, high throughput sequencing technologies are now being widely applied to non-model species. To facilitate research on L.edodes, we leveraged Solexa sequencing technology in de novo assembly of L.edodes C(91-3) transcriptome. In a single run, we produced more than 57 million sequencing reads. These reads were assembled into 28,923 unigene sequences (mean size=689bp) including 18,120 unigenes with coding sequence (CDS). Based on similarity search with known proteins, assembled unigene sequences were annotated with gene descriptions, gene ontology (GO) and clusters of orthologous group (COG) terms. Our data provides the first comprehensive sequence resource available for functional genomics studies in L.edodes, and demonstrates the utility of Illumina/Solexa sequencing for de novo transcriptome characterization and gene discovery in a non-model mushroom. Copyright © 2012 Elsevier Inc. All rights reserved.

  6. Optimization and high-throughput screening of antimicrobial peptides.

    PubMed

    Blondelle, Sylvie E; Lohner, Karl

    2010-01-01

    While a well-established process for lead compound discovery in for-profit companies, high-throughput screening is becoming more popular in basic and applied research settings in academia. The development of combinatorial libraries combined with easy and less expensive access to new technologies have greatly contributed to the implementation of high-throughput screening in academic laboratories. While such techniques were earlier applied to simple assays involving single targets or based on binding affinity, they have now been extended to more complex systems such as whole cell-based assays. In particular, the urgent need for new antimicrobial compounds that would overcome the rapid rise of drug-resistant microorganisms, where multiple target assays or cell-based assays are often required, has forced scientists to focus onto high-throughput technologies. Based on their existence in natural host defense systems and their different mode of action relative to commercial antibiotics, antimicrobial peptides represent a new hope in discovering novel antibiotics against multi-resistant bacteria. The ease of generating peptide libraries in different formats has allowed a rapid adaptation of high-throughput assays to the search for novel antimicrobial peptides. Similarly, the availability nowadays of high-quantity and high-quality antimicrobial peptide data has permitted the development of predictive algorithms to facilitate the optimization process. This review summarizes the various library formats that lead to de novo antimicrobial peptide sequences as well as the latest structural knowledge and optimization processes aimed at improving the peptides selectivity.

  7. Noninvasive diagnosis of fetal aneuploidy by shotgun sequencing DNA from maternal blood

    PubMed Central

    Fan, H. Christina; Blumenfeld, Yair J.; Chitkara, Usha; Hudgins, Louanne; Quake, Stephen R.

    2008-01-01

    We directly sequenced cell-free DNA with high-throughput shotgun sequencing technology from plasma of pregnant women, obtaining, on average, 5 million sequence tags per patient sample. This enabled us to measure the over- and underrepresentation of chromosomes from an aneuploid fetus. The sequencing approach is polymorphism-independent and therefore universally applicable for the noninvasive detection of fetal aneuploidy. Using this method, we successfully identified all nine cases of trisomy 21 (Down syndrome), two cases of trisomy 18 (Edward syndrome), and one case of trisomy 13 (Patau syndrome) in a cohort of 18 normal and aneuploid pregnancies; trisomy was detected at gestational ages as early as the 14th week. Direct sequencing also allowed us to study the characteristics of cell-free plasma DNA, and we found evidence that this DNA is enriched for sequences from nucleosomes. PMID:18838674

  8. Sequencing intractable DNA to close microbial genomes.

    PubMed

    Hurt, Richard A; Brown, Steven D; Podar, Mircea; Palumbo, Anthony V; Elias, Dwayne A

    2012-01-01

    Advancement in high throughput DNA sequencing technologies has supported a rapid proliferation of microbial genome sequencing projects, providing the genetic blueprint for in-depth studies. Oftentimes, difficult to sequence regions in microbial genomes are ruled "intractable" resulting in a growing number of genomes with sequence gaps deposited in databases. A procedure was developed to sequence such problematic regions in the "non-contiguous finished" Desulfovibrio desulfuricans ND132 genome (6 intractable gaps) and the Desulfovibrio africanus genome (1 intractable gap). The polynucleotides surrounding each gap formed GC rich secondary structures making the regions refractory to amplification and sequencing. Strand-displacing DNA polymerases used in concert with a novel ramped PCR extension cycle supported amplification and closure of all gap regions in both genomes. The developed procedures support accurate gene annotation, and provide a step-wise method that reduces the effort required for genome finishing.

  9. Using genomics for surveillance of veterinary infectious agents.

    PubMed

    Mathijs, E; Vandenbussche, F; Van Borm, S

    2016-04-01

    Factors such as globalisation, climate change and agricultural intensification can increase the risk of microbial emergence. As a result, there is a growing need for flexible laboratory-based surveillance tools to rapidly identify, characterise and monitor global (re-)emerging diseases. Although many tools are available, novel sequencing technologies have launched a new era in pathogen surveillance. Here, the authors review the potential applications of high-throughput genomic technologies for the surveillance of veterinary pathogens. They focus on the two types of surveillance that will benefit most from these new tools: hazard-specific surveillance (pathogen identification and typing) and early-warning surveillance (pathogen discovery). The paper reviews how the resulting sequencing data can be used to improve diagnosis and concludes by highlighting the major challenges that hinder the routine use of this technology in the veterinary field.

  10. Microfluidics for genome-wide studies involving next generation sequencing

    PubMed Central

    Murphy, Travis W.; Lu, Chang

    2017-01-01

    Next-generation sequencing (NGS) has revolutionized how molecular biology studies are conducted. Its decreasing cost and increasing throughput permit profiling of genomic, transcriptomic, and epigenomic features for a wide range of applications. Microfluidics has been proven to be highly complementary to NGS technology with its unique capabilities for handling small volumes of samples and providing platforms for automation, integration, and multiplexing. In this article, we review recent progress on applying microfluidics to facilitate genome-wide studies. We emphasize on several technical aspects of NGS and how they benefit from coupling with microfluidic technology. We also summarize recent efforts on developing microfluidic technology for genomic, transcriptomic, and epigenomic studies, with emphasis on single cell analysis. We envision rapid growth in these directions, driven by the needs for testing scarce primary cell samples from patients in the context of precision medicine. PMID:28396707

  11. Discovery of precursor and mature microRNAs and their putative gene targets using high-throughput sequencing in pineapple (Ananas comosus var. comosus).

    PubMed

    Yusuf, Noor Hydayaty Md; Ong, Wen Dee; Redwan, Raimi Mohamed; Latip, Mariam Abd; Kumar, S Vijay

    2015-10-15

    MicroRNAs (miRNAs) are a class of small, endogenous non-coding RNAs that negatively regulate gene expression, resulting in the silencing of target mRNA transcripts through mRNA cleavage or translational inhibition. MiRNAs play significant roles in various biological and physiological processes in plants. However, the miRNA-mediated gene regulatory network in pineapple, the model tropical non-climacteric fruit, remains largely unexplored. Here, we report a complete list of pineapple mature miRNAs obtained from high-throughput small RNA sequencing and precursor miRNAs (pre-miRNAs) obtained from ESTs. Two small RNA libraries were constructed from pineapple fruits and leaves, respectively, using Illumina's Solexa technology. Sequence similarity analysis using miRBase revealed 579,179 reads homologous to 153 miRNAs from 41 miRNA families. In addition, a pineapple fruit transcriptome library consisting of approximately 30,000 EST contigs constructed using Solexa sequencing was used for the discovery of pre-miRNAs. In all, four pre-miRNAs were identified (MIR156, MIR399, MIR444 and MIR2673). Furthermore, the same pineapple transcriptome was used to dissect the function of the miRNAs in pineapple by predicting their putative targets in conjunction with their regulatory networks. In total, 23 metabolic pathways were found to be regulated by miRNAs in pineapple. The use of high-throughput sequencing in pineapples to unveil the presence of miRNAs and their regulatory pathways provides insight into the repertoire of miRNA regulation used exclusively in this non-climacteric model plant. Copyright © 2015 Elsevier B.V. All rights reserved.

  12. SEED 2: a user-friendly platform for amplicon high-throughput sequencing data analyses.

    PubMed

    Vetrovský, Tomáš; Baldrian, Petr; Morais, Daniel; Berger, Bonnie

    2018-02-14

    Modern molecular methods have increased our ability to describe microbial communities. Along with the advances brought by new sequencing technologies, we now require intensive computational resources to make sense of the large numbers of sequences continuously produced. The software developed by the scientific community to address this demand, although very useful, require experience of the command-line environment, extensive training and have steep learning curves, limiting their use. We created SEED 2, a graphical user interface for handling high-throughput amplicon-sequencing data under Windows operating systems. SEED 2 is the only sequence visualizer that empowers users with tools to handle amplicon-sequencing data of microbial community markers. It is suitable for any marker genes sequences obtained through Illumina, IonTorrent or Sanger sequencing. SEED 2 allows the user to process raw sequencing data, identify specific taxa, produce of OTU-tables, create sequence alignments and construct phylogenetic trees. Standard dual core laptops with 8 GB of RAM can handle ca. 8 million of Illumina PE 300 bp sequences, ca. 4GB of data. SEED 2 was implemented in Object Pascal and uses internal functions and external software for amplicon data processing. SEED 2 is a freeware software, available at http://www.biomed.cas.cz/mbu/lbwrf/seed/ as a self-contained file, including all the dependencies, and does not require installation. Supplementary data contain a comprehensive list of supported functions. daniel.morais@biomed.cas.cz. Supplementary data are available at Bioinformatics online. © The Author(s) 2018. Published by Oxford University Press.

  13. RNA-Seq Technology and Its Application in Fish Transcriptomics

    PubMed Central

    Ba, Yi; Zhuang, Qianfeng

    2014-01-01

    Abstract High-throughput sequencing technologies, also known as next-generation sequencing (NGS) technologies, have revolutionized the way that genomic research is advancing. In addition to the static genome, these state-of-art technologies have been recently exploited to analyze the dynamic transcriptome, and the resulting technology is termed RNA sequencing (RNA-seq). RNA-seq is free from many limitations of other transcriptomic approaches, such as microarray and tag-based sequencing method. Although RNA-seq has only been available for a short time, studies using this method have completely changed our perspective of the breadth and depth of eukaryotic transcriptomes. In terms of the transcriptomics of teleost fishes, both model and non-model species have benefited from the RNA-seq approach and have undergone tremendous advances in the past several years. RNA-seq has helped not only in mapping and annotating fish transcriptome but also in our understanding of many biological processes in fish, such as development, adaptive evolution, host immune response, and stress response. In this review, we first provide an overview of each step of RNA-seq from library construction to the bioinformatic analysis of the data. We then summarize and discuss the recent biological insights obtained from the RNA-seq studies in a variety of fish species. PMID:24380445

  14. Adaptive efficient compression of genomes

    PubMed Central

    2012-01-01

    Modern high-throughput sequencing technologies are able to generate DNA sequences at an ever increasing rate. In parallel to the decreasing experimental time and cost necessary to produce DNA sequences, computational requirements for analysis and storage of the sequences are steeply increasing. Compression is a key technology to deal with this challenge. Recently, referential compression schemes, storing only the differences between a to-be-compressed input and a known reference sequence, gained a lot of interest in this field. However, memory requirements of the current algorithms are high and run times often are slow. In this paper, we propose an adaptive, parallel and highly efficient referential sequence compression method which allows fine-tuning of the trade-off between required memory and compression speed. When using 12 MB of memory, our method is for human genomes on-par with the best previous algorithms in terms of compression ratio (400:1) and compression speed. In contrast, it compresses a complete human genome in just 11 seconds when provided with 9 GB of main memory, which is almost three times faster than the best competitor while using less main memory. PMID:23146997

  15. Development of a Rapid Identification Method for a Variety of Antibody Candidates Using High-throughput Sequencing.

    PubMed

    Ito, Yuji

    2017-01-01

    As an alternative to hybridoma technology, the antibody phage library system can also be used for antibody selection. This method enables the isolation of antigen-specific binders through an in vitro selection process known as biopanning. While it has several advantages, such as an avoidance of animal immunization, the phage cloning and screening steps of biopanning are time-consuming and problematic. Here, we introduce a novel biopanning method combined with high-throughput sequencing (HTS) using a next-generation sequencer (NGS) to save time and effort in antibody selection, and to increase the diversity of acquired antibody sequences. Biopannings against a target antigen were performed using a human single chain Fv (scFv) antibody phage library. VH genes in pooled phages at each round of biopanning were analyzed by HTS on a NGS. The obtained data were trimmed, merged, and translated into amino acid sequences. The frequencies (%) of the respective VH sequences at each biopanning step were calculated, and the amplification factor (change of frequency through biopanning) was obtained to estimate the potential for antigen binding. A phylogenetic tree was drawn using the top 50 VH sequences with high amplification factors. Representative VH sequences forming the cluster were then picked up and used to reconstruct scFv genes harboring these VHs. Their derived scFv-Fc fusion proteins showed clear antigen binding activity. These results indicate that a combination of biopanning and HTS enables the rapid and comprehensive identification of specific binders from antibody phage libraries.

  16. Characterization and complete genome sequence of a previously uncharacterized panicovirus from Bermuda grass detected by high throughput sequencing

    USDA-ARS?s Scientific Manuscript database

    Bermuda grass samples were examined by transmission electron microscopy and 28-30 nm spherical virus particles were observed. Total RNA from these plants was subjected to high throughput sequencing (HTS). The nearly full genome sequence of a previously uncharacterized Panicovirus was identified from...

  17. High-resolution phylogenetic microbial community profiling

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Singer, Esther; Bushnell, Brian; Coleman-Derr, Devin

    Over the past decade, high-throughput short-read 16S rRNA gene amplicon sequencing has eclipsed clone-dependent long-read Sanger sequencing for microbial community profiling. The transition to new technologies has provided more quantitative information at the expense of taxonomic resolution with implications for inferring metabolic traits in various ecosystems. We applied single-molecule real-time sequencing for microbial community profiling, generating full-length 16S rRNA gene sequences at high throughput, which we propose to name PhyloTags. We benchmarked and validated this approach using a defined microbial community. When further applied to samples from the water column of meromictic Sakinaw Lake, we show that while community structuresmore » at the phylum level are comparable between PhyloTags and Illumina V4 16S rRNA gene sequences (iTags), variance increases with community complexity at greater water depths. PhyloTags moreover allowed less ambiguous classification. Last, a platform-independent comparison of PhyloTags and in silico generated partial 16S rRNA gene sequences demonstrated significant differences in community structure and phylogenetic resolution across multiple taxonomic levels, including a severe underestimation in the abundance of specific microbial genera involved in nitrogen and methane cycling across the Lake's water column. Thus, PhyloTags provide a reliable adjunct or alternative to cost-effective iTags, enabling more accurate phylogenetic resolution of microbial communities and predictions on their metabolic potential.« less

  18. High-resolution phylogenetic microbial community profiling

    DOE PAGES

    Singer, Esther; Bushnell, Brian; Coleman-Derr, Devin; ...

    2016-02-09

    Over the past decade, high-throughput short-read 16S rRNA gene amplicon sequencing has eclipsed clone-dependent long-read Sanger sequencing for microbial community profiling. The transition to new technologies has provided more quantitative information at the expense of taxonomic resolution with implications for inferring metabolic traits in various ecosystems. We applied single-molecule real-time sequencing for microbial community profiling, generating full-length 16S rRNA gene sequences at high throughput, which we propose to name PhyloTags. We benchmarked and validated this approach using a defined microbial community. When further applied to samples from the water column of meromictic Sakinaw Lake, we show that while community structuresmore » at the phylum level are comparable between PhyloTags and Illumina V4 16S rRNA gene sequences (iTags), variance increases with community complexity at greater water depths. PhyloTags moreover allowed less ambiguous classification. Last, a platform-independent comparison of PhyloTags and in silico generated partial 16S rRNA gene sequences demonstrated significant differences in community structure and phylogenetic resolution across multiple taxonomic levels, including a severe underestimation in the abundance of specific microbial genera involved in nitrogen and methane cycling across the Lake's water column. Thus, PhyloTags provide a reliable adjunct or alternative to cost-effective iTags, enabling more accurate phylogenetic resolution of microbial communities and predictions on their metabolic potential.« less

  19. An innovative SNP genotyping method adapting to multiple platforms and throughputs.

    PubMed

    Long, Y M; Chao, W S; Ma, G J; Xu, S S; Qi, L L

    2017-03-01

    An innovative genotyping method designated as semi-thermal asymmetric reverse PCR (STARP) was developed for genotyping individual SNPs with improved accuracy, flexible throughputs, low operational costs, and high platform compatibility. Multiplex chip-based technology for genome-scale genotyping of single nucleotide polymorphisms (SNPs) has made great progress in the past two decades. However, PCR-based genotyping of individual SNPs still remains problematic in accuracy, throughput, simplicity, and/or operational costs as well as the compatibility with multiple platforms. Here, we report a novel SNP genotyping method designated semi-thermal asymmetric reverse PCR (STARP). In this method, genotyping assay was performed under unique PCR conditions using two universal priming element-adjustable primers (PEA-primers) and one group of three locus-specific primers: two asymmetrically modified allele-specific primers (AMAS-primers) and their common reverse primer. The two AMAS-primers each were substituted one base in different positions at their 3' regions to significantly increase the amplification specificity of the two alleles and tailed at 5' ends to provide priming sites for PEA-primers. The two PEA-primers were developed for common use in all genotyping assays to stringently target the PCR fragments generated by the two AMAS-primers with similar PCR efficiencies and for flexible detection using either gel-free fluorescence signals or gel-based size separation. The state-of-the-art primer design and unique PCR conditions endowed STARP with all the major advantages of high accuracy, flexible throughputs, simple assay design, low operational costs, and platform compatibility. In addition to SNPs, STARP can also be employed in genotyping of indels (insertion-deletion polymorphisms). As vast variations in DNA sequences are being unearthed by many genome sequencing projects and genotyping by sequencing, STARP will have wide applications across all biological organisms in agriculture, medicine, and forensics.

  20. CisSERS: Customizable in silico sequence evaluation for restriction sites

    DOE PAGES

    Sharpe, Richard M.; Koepke, Tyson; Harper, Artemus; ...

    2016-04-12

    High-throughput sequencing continues to produce an immense volume of information that is processed and assembled into mature sequence data. Here, data analysis tools are urgently needed that leverage the embedded DNA sequence polymorphisms and consequent changes to restriction sites or sequence motifs in a high-throughput manner to enable biological experimentation. CisSERS was developed as a standalone open source tool to analyze sequence datasets and provide biologists with individual or comparative genome organization information in terms of presence and frequency of patterns or motifs such as restriction enzymes. Predicted agarose gel visualization of the custom analyses results was also integrated tomore » enhance the usefulness of the software. CisSERS offers several novel functionalities, such as handling of large and multiple datasets in parallel, multiple restriction enzyme site detection and custom motif detection features, which are seamlessly integrated with real time agarose gel visualization. Using a simple fasta-formatted file as input, CisSERS utilizes the REBASE enzyme database. Results from CisSERSenable the user to make decisions for designing genotyping by sequencing experiments, reduced representation sequencing, 3’UTR sequencing, and cleaved amplified polymorphic sequence (CAPS) molecular markers for large sample sets. CisSERS is a java based graphical user interface built around a perl backbone. Several of the applications of CisSERS including CAPS molecular marker development were successfully validated using wet-lab experimentation. Here, we present the tool CisSERSand results from in-silico and corresponding wet-lab analyses demonstrating that CisSERS is a technology platform solution that facilitates efficient data utilization in genomics and genetics studies.« less

  1. CisSERS: Customizable in silico sequence evaluation for restriction sites

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Sharpe, Richard M.; Koepke, Tyson; Harper, Artemus

    High-throughput sequencing continues to produce an immense volume of information that is processed and assembled into mature sequence data. Here, data analysis tools are urgently needed that leverage the embedded DNA sequence polymorphisms and consequent changes to restriction sites or sequence motifs in a high-throughput manner to enable biological experimentation. CisSERS was developed as a standalone open source tool to analyze sequence datasets and provide biologists with individual or comparative genome organization information in terms of presence and frequency of patterns or motifs such as restriction enzymes. Predicted agarose gel visualization of the custom analyses results was also integrated tomore » enhance the usefulness of the software. CisSERS offers several novel functionalities, such as handling of large and multiple datasets in parallel, multiple restriction enzyme site detection and custom motif detection features, which are seamlessly integrated with real time agarose gel visualization. Using a simple fasta-formatted file as input, CisSERS utilizes the REBASE enzyme database. Results from CisSERSenable the user to make decisions for designing genotyping by sequencing experiments, reduced representation sequencing, 3’UTR sequencing, and cleaved amplified polymorphic sequence (CAPS) molecular markers for large sample sets. CisSERS is a java based graphical user interface built around a perl backbone. Several of the applications of CisSERS including CAPS molecular marker development were successfully validated using wet-lab experimentation. Here, we present the tool CisSERSand results from in-silico and corresponding wet-lab analyses demonstrating that CisSERS is a technology platform solution that facilitates efficient data utilization in genomics and genetics studies.« less

  2. Efficient Identification of Murine M2 Macrophage Peptide Targeting Ligands by Phage Display and Next-Generation Sequencing.

    PubMed

    Liu, Gary W; Livesay, Brynn R; Kacherovsky, Nataly A; Cieslewicz, Maryelise; Lutz, Emi; Waalkes, Adam; Jensen, Michael C; Salipante, Stephen J; Pun, Suzie H

    2015-08-19

    Peptide ligands are used to increase the specificity of drug carriers to their target cells and to facilitate intracellular delivery. One method to identify such peptide ligands, phage display, enables high-throughput screening of peptide libraries for ligands binding to therapeutic targets of interest. However, conventional methods for identifying target binders in a library by Sanger sequencing are low-throughput, labor-intensive, and provide a limited perspective (<0.01%) of the complete sequence space. Moreover, the small sample space can be dominated by nonspecific, preferentially amplifying "parasitic sequences" and plastic-binding sequences, which may lead to the identification of false positives or exclude the identification of target-binding sequences. To overcome these challenges, we employed next-generation Illumina sequencing to couple high-throughput screening and high-throughput sequencing, enabling more comprehensive access to the phage display library sequence space. In this work, we define the hallmarks of binding sequences in next-generation sequencing data, and develop a method that identifies several target-binding phage clones for murine, alternatively activated M2 macrophages with a high (100%) success rate: sequences and binding motifs were reproducibly present across biological replicates; binding motifs were identified across multiple unique sequences; and an unselected, amplified library accurately filtered out parasitic sequences. In addition, we validate the Multiple Em for Motif Elicitation tool as an efficient and principled means of discovering binding sequences.

  3. Cracking the Code of Human Diseases Using Next-Generation Sequencing: Applications, Challenges, and Perspectives

    PubMed Central

    Precone, Vincenza; Del Monaco, Valentina; Esposito, Maria Valeria; De Palma, Fatima Domenica Elisa; Ruocco, Anna; D'Argenio, Valeria

    2015-01-01

    Next-generation sequencing (NGS) technologies have greatly impacted on every field of molecular research mainly because they reduce costs and increase throughput of DNA sequencing. These features, together with the technology's flexibility, have opened the way to a variety of applications including the study of the molecular basis of human diseases. Several analytical approaches have been developed to selectively enrich regions of interest from the whole genome in order to identify germinal and/or somatic sequence variants and to study DNA methylation. These approaches are now widely used in research, and they are already being used in routine molecular diagnostics. However, some issues are still controversial, namely, standardization of methods, data analysis and storage, and ethical aspects. Besides providing an overview of the NGS-based approaches most frequently used to study the molecular basis of human diseases at DNA level, we discuss the principal challenges and applications of NGS in the field of human genomics. PMID:26665001

  4. Identification of Sequence Specificity of 5-Methylcytosine Oxidation by Tet1 Protein with High-Throughput Sequencing.

    PubMed

    Kizaki, Seiichiro; Chandran, Anandhakumar; Sugiyama, Hiroshi

    2016-03-02

    Tet (ten-eleven translocation) family proteins have the ability to oxidize 5-methylcytosine (mC) to 5-hydroxymethylcytosine (hmC), 5-formylcytosine (fC), and 5-carboxycytosine (caC). However, the oxidation reaction of Tet is not understood completely. Evaluation of genomic-level epigenetic changes by Tet protein requires unbiased identification of the highly selective oxidation sites. In this study, we used high-throughput sequencing to investigate the sequence specificity of mC oxidation by Tet1. A 6.6×10(4) -member mC-containing random DNA-sequence library was constructed. The library was subjected to Tet-reactive pulldown followed by high-throughput sequencing. Analysis of the obtained sequence data identified the Tet1-reactive sequences. We identified mCpG as a highly reactive sequence of Tet1 protein. © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  5. A high-throughput Sanger strategy for human mitochondrial genome sequencing

    PubMed Central

    2013-01-01

    Background A population reference database of complete human mitochondrial genome (mtGenome) sequences is needed to enable the use of mitochondrial DNA (mtDNA) coding region data in forensic casework applications. However, the development of entire mtGenome haplotypes to forensic data quality standards is difficult and laborious. A Sanger-based amplification and sequencing strategy that is designed for automated processing, yet routinely produces high quality sequences, is needed to facilitate high-volume production of these mtGenome data sets. Results We developed a robust 8-amplicon Sanger sequencing strategy that regularly produces complete, forensic-quality mtGenome haplotypes in the first pass of data generation. The protocol works equally well on samples representing diverse mtDNA haplogroups and DNA input quantities ranging from 50 pg to 1 ng, and can be applied to specimens of varying DNA quality. The complete workflow was specifically designed for implementation on robotic instrumentation, which increases throughput and reduces both the opportunities for error inherent to manual processing and the cost of generating full mtGenome sequences. Conclusions The described strategy will assist efforts to generate complete mtGenome haplotypes which meet the highest data quality expectations for forensic genetic and other applications. Additionally, high-quality data produced using this protocol can be used to assess mtDNA data developed using newer technologies and chemistries. Further, the amplification strategy can be used to enrich for mtDNA as a first step in sample preparation for targeted next-generation sequencing. PMID:24341507

  6. High throughput MLVA-16 typing for Brucella based on the microfluidics technology

    PubMed Central

    2011-01-01

    Background Brucellosis, a zoonosis caused by the genus Brucella, has been eradicated in Northern Europe, Australia, the USA and Canada, but remains endemic in most areas of the world. The strain and biovar typing of Brucella field samples isolated in outbreaks is useful for tracing back source of infection and may be crucial for discriminating naturally occurring outbreaks versus bioterrorist events, being Brucella a potential biological warfare agent. In the last years MLVA-16 has been described for Brucella spp. genotyping. The MLVA band profiles may be resolved by different techniques i.e. the manual agarose gels, the capillary electrophoresis sequencing systems or the microfluidic Lab-on-Chip electrophoresis. In this paper we described a high throughput system of MLVA-16 typing for Brucella spp. by using of the microfluidics technology. Results The Caliper LabChip 90 equipment was evaluated for MLVA-16 typing of sixty-three Brucella samples. Furthermore, in order to validate the system, DNA samples previously resolved by sequencing system and Agilent technology, were de novo genotyped. The comparison of the MLVA typing data obtained by the Caliper equipment and those previously obtained by the other analysis methods showed a good correlation. However the outputs were not accurate as the Caliper DNA fragment sizes showed discrepancies compared with real data and a conversion table from observed to expected data was created. Conclusion In this paper we described the MLVA-16 using a rapid, sophisticated microfluidics technology for detection of amplification product sizes. The comparison of the MLVA typing data produced by Caliper LabChip 90 system with the data obtained by different techniques showed a general concordance of the results. Furthermore this platform represents a significant improvement in terms of handling, data acquiring, computational efficiency and rapidity, allowing to perform the strain genotyping in a time equal to one sixth respect to other microfluidics systems as e.g. the Agilent 2100 bioanalyzer. Finally, this platform can be considered a valid alternative to standard genotyping techniques, particularly useful dealing with a large number of samples in short time. These data confirmed that this technology represents a significative advancement in high-throughput accurate Brucella genotyping. PMID:21435217

  7. Genome Sequencing and Assembly by Long Reads in Plants

    PubMed Central

    Li, Changsheng; Lin, Feng; An, Dong; Huang, Ruidong

    2017-01-01

    Plant genomes generated by Sanger and Next Generation Sequencing (NGS) have provided insight into species diversity and evolution. However, Sanger sequencing is limited in its applications due to high cost, labor intensity, and low throughput, while NGS reads are too short to resolve abundant repeats and polyploidy, leading to incomplete or ambiguous assemblies. The advent and improvement of long-read sequencing by Third Generation Sequencing (TGS) methods such as PacBio and Nanopore have shown promise in producing high-quality assemblies for complex genomes. Here, we review the development of sequencing, introducing the application as well as considerations of experimental design in TGS of plant genomes. We also introduce recent revolutionary scaffolding technologies including BioNano, Hi-C, and 10× Genomics. We expect that the informative guidance for genome sequencing and assembly by long reads will benefit the initiation of scientists’ projects. PMID:29283420

  8. A combination of LongSAGE with Solexa sequencing is well suited to explore the depth and the complexity of transcriptome

    PubMed Central

    Hanriot, Lucie; Keime, Céline; Gay, Nadine; Faure, Claudine; Dossat, Carole; Wincker, Patrick; Scoté-Blachon, Céline; Peyron, Christelle; Gandrillon, Olivier

    2008-01-01

    Background "Open" transcriptome analysis methods allow to study gene expression without a priori knowledge of the transcript sequences. As of now, SAGE (Serial Analysis of Gene Expression), LongSAGE and MPSS (Massively Parallel Signature Sequencing) are the mostly used methods for "open" transcriptome analysis. Both LongSAGE and MPSS rely on the isolation of 21 pb tag sequences from each transcript. In contrast to LongSAGE, the high throughput sequencing method used in MPSS enables the rapid sequencing of very large libraries containing several millions of tags, allowing deep transcriptome analysis. However, a bias in the complexity of the transcriptome representation obtained by MPSS was recently uncovered. Results In order to make a deep analysis of mouse hypothalamus transcriptome avoiding the limitation introduced by MPSS, we combined LongSAGE with the Solexa sequencing technology and obtained a library of more than 11 millions of tags. We then compared it to a LongSAGE library of mouse hypothalamus sequenced with the Sanger method. Conclusion We found that Solexa sequencing technology combined with LongSAGE is perfectly suited for deep transcriptome analysis. In contrast to MPSS, it gives a complex representation of transcriptome as reliable as a LongSAGE library sequenced by the Sanger method. PMID:18796152

  9. Evaluating High-Throughput Ab Initio Gene Finders to Discover Proteins Encoded in Eukaryotic Pathogen Genomes Missed by Laboratory Techniques

    PubMed Central

    Goodswen, Stephen J.; Kennedy, Paul J.; Ellis, John T.

    2012-01-01

    Next generation sequencing technology is advancing genome sequencing at an unprecedented level. By unravelling the code within a pathogen’s genome, every possible protein (prior to post-translational modifications) can theoretically be discovered, irrespective of life cycle stages and environmental stimuli. Now more than ever there is a great need for high-throughput ab initio gene finding. Ab initio gene finders use statistical models to predict genes and their exon-intron structures from the genome sequence alone. This paper evaluates whether existing ab initio gene finders can effectively predict genes to deduce proteins that have presently missed capture by laboratory techniques. An aim here is to identify possible patterns of prediction inaccuracies for gene finders as a whole irrespective of the target pathogen. All currently available ab initio gene finders are considered in the evaluation but only four fulfil high-throughput capability: AUGUSTUS, GeneMark_hmm, GlimmerHMM, and SNAP. These gene finders require training data specific to a target pathogen and consequently the evaluation results are inextricably linked to the availability and quality of the data. The pathogen, Toxoplasma gondii, is used to illustrate the evaluation methods. The results support current opinion that predicted exons by ab initio gene finders are inaccurate in the absence of experimental evidence. However, the results reveal some patterns of inaccuracy that are common to all gene finders and these inaccuracies may provide a focus area for future gene finder developers. PMID:23226328

  10. Technological advancements and their importance for nematode identification

    NASA Astrophysics Data System (ADS)

    Ahmed, Mohammed; Sapp, Melanie; Prior, Thomas; Karssen, Gerrit; Back, Matthew Alan

    2016-06-01

    Nematodes represent a species-rich and morphologically diverse group of metazoans known to inhabit both aquatic and terrestrial environments. Their role as biological indicators and as key players in nutrient cycling has been well documented. Some plant-parasitic species are also known to cause significant losses to crop production. In spite of this, there still exists a huge gap in our knowledge of their diversity due to the enormity of time and expertise often involved in characterising species using phenotypic features. Molecular methodology provides useful means of complementing the limited number of reliable diagnostic characters available for morphology-based identification. We discuss herein some of the limitations of traditional taxonomy and how molecular methodologies, especially the use of high-throughput sequencing, have assisted in carrying out large-scale nematode community studies and characterisation of phytonematodes through rapid identification of multiple taxa. We also provide brief descriptions of some the current and almost-outdated high-throughput sequencing platforms and their applications in both plant nematology and soil ecology.

  11. QSRA: a quality-value guided de novo short read assembler.

    PubMed

    Bryant, Douglas W; Wong, Weng-Keen; Mockler, Todd C

    2009-02-24

    New rapid high-throughput sequencing technologies have sparked the creation of a new class of assembler. Since all high-throughput sequencing platforms incorporate errors in their output, short-read assemblers must be designed to account for this error while utilizing all available data. We have designed and implemented an assembler, Quality-value guided Short Read Assembler, created to take advantage of quality-value scores as a further method of dealing with error. Compared to previous published algorithms, our assembler shows significant improvements not only in speed but also in output quality. QSRA generally produced the highest genomic coverage, while being faster than VCAKE. QSRA is extremely competitive in its longest contig and N50/N80 contig lengths, producing results of similar quality to those of EDENA and VELVET. QSRA provides a step closer to the goal of de novo assembly of complex genomes, improving upon the original VCAKE algorithm by not only drastically reducing runtimes but also increasing the viability of the assembly algorithm through further error handling capabilities.

  12. Comprehensive analysis of the T-cell receptor beta chain gene in rhesus monkey by high throughput sequencing

    PubMed Central

    Li, Zhoufang; Liu, Guangjie; Tong, Yin; Zhang, Meng; Xu, Ying; Qin, Li; Wang, Zhanhui; Chen, Xiaoping; He, Jiankui

    2015-01-01

    Profiling immune repertoires by high throughput sequencing enhances our understanding of immune system complexity and immune-related diseases in humans. Previously, cloning and Sanger sequencing identified limited numbers of T cell receptor (TCR) nucleotide sequences in rhesus monkeys, thus their full immune repertoire is unknown. We applied multiplex PCR and Illumina high throughput sequencing to study the TCRβ of rhesus monkeys. We identified 1.26 million TCRβ sequences corresponding to 643,570 unique TCRβ sequences and 270,557 unique complementarity-determining region 3 (CDR3) gene sequences. Precise measurements of CDR3 length distribution, CDR3 amino acid distribution, length distribution of N nucleotide of junctional region, and TCRV and TCRJ gene usage preferences were performed. A comprehensive profile of rhesus monkey immune repertoire might aid human infectious disease studies using rhesus monkeys. PMID:25961410

  13. The mitogenome of Onchocerca volvulus from the Brazilian Amazonia focus.

    PubMed

    Crainey, James L; Silva, Túllio R R da; Encinas, Fernando; Marín, Michel A; Vicente, Ana Carolina P; Luz, Sérgio L B

    2016-01-01

    We report here the first complete mitochondria genome of Onchocerca volvulus from a focus outside of Africa. An O. volvulus mitogenome from the Brazilian Amazonia focus was obtained using a combination of high-throughput and Sanger sequencing technologies. Comparisons made between this mitochondrial genome and publicly available mitochondrial sequences identified 46 variant nucleotide positions and suggested that our Brazilian mitogenome is more closely related to Cameroon-origin mitochondria than West African-origin mitochondria. As well as providing insights into the origins of Latin American onchocerciasis, the Brazilian Amazonia focus mitogenome may also have value as an epidemiological resource.

  14. Next Generation MUT-MAP, a High-Sensitivity High-Throughput Microfluidics Chip-Based Mutation Analysis Panel

    PubMed Central

    Patel, Rajesh; Tsan, Alison; Sumiyoshi, Teiko; Fu, Ling; Desai, Rupal; Schoenbrunner, Nancy; Myers, Thomas W.; Bauer, Keith; Smith, Edward; Raja, Rajiv

    2014-01-01

    Molecular profiling of tumor tissue to detect alterations, such as oncogenic mutations, plays a vital role in determining treatment options in oncology. Hence, there is an increasing need for a robust and high-throughput technology to detect oncogenic hotspot mutations. Although commercial assays are available to detect genetic alterations in single genes, only a limited amount of tissue is often available from patients, requiring multiplexing to allow for simultaneous detection of mutations in many genes using low DNA input. Even though next-generation sequencing (NGS) platforms provide powerful tools for this purpose, they face challenges such as high cost, large DNA input requirement, complex data analysis, and long turnaround times, limiting their use in clinical settings. We report the development of the next generation mutation multi-analyte panel (MUT-MAP), a high-throughput microfluidic, panel for detecting 120 somatic mutations across eleven genes of therapeutic interest (AKT1, BRAF, EGFR, FGFR3, FLT3, HRAS, KIT, KRAS, MET, NRAS, and PIK3CA) using allele-specific PCR (AS-PCR) and Taqman technology. This mutation panel requires as little as 2 ng of high quality DNA from fresh frozen or 100 ng of DNA from formalin-fixed paraffin-embedded (FFPE) tissues. Mutation calls, including an automated data analysis process, have been implemented to run 88 samples per day. Validation of this platform using plasmids showed robust signal and low cross-reactivity in all of the newly added assays and mutation calls in cell line samples were found to be consistent with the Catalogue of Somatic Mutations in Cancer (COSMIC) database allowing for direct comparison of our platform to Sanger sequencing. High correlation with NGS when compared to the SuraSeq500 panel run on the Ion Torrent platform in a FFPE dilution experiment showed assay sensitivity down to 0.45%. This multiplexed mutation panel is a valuable tool for high-throughput biomarker discovery in personalized medicine and cancer drug development. PMID:24658394

  15. Persistence and evolution of allergen-specific IgE repertoires during subcutaneous specific immunotherapy

    PubMed Central

    Levin, Mattias; King, Jasmine J.; Glanville, Jacob; Jackson, Katherine J. L.; Looney, Timothy J.; Hoh, Ramona A.; Mari, Adriano; Andersson, Morgan; Greiff, Lennart; Fire, Andrew Z.; Boyd, Scott D.; Ohlin, Mats

    2016-01-01

    Background Specific immunotherapy (SIT) is the only treatment with proven long-term curative potential in allergic disease. Allergen-specific IgE is the causative agent of allergic disease, and antibodies contribute to SIT, but the effects of SIT on aeroallergen-specific B cell repertoires are not well understood. Objective To characterize the IgE sequences expressed by allergen-specific B cells, and track the fate of these B cell clones during SIT. Methods We have used high-throughput antibody gene sequencing and identification of allergen-specific IgE using combinatorial antibody fragment library technology to analyze immunoglobulin repertoires of blood and nasal mucosa of aeroallergen-sensitized individuals before and during the first year of subcutaneous SIT. Results Of 52 distinct allergen-specific IgE heavy chains from eight allergic donors, 37 were also detected by high-throughput antibody gene sequencing of blood, nasal mucosa, or both sample types. The allergen-specific clones had increased persistence, higher likelihood of belonging to clones expressing other switched isotypes, and possibly larger clone size than the rest of the IgE repertoire. Clone members in nasal tissue showed close mutational relationships. Conclusion Combining functional binding studies, deep antibody repertoire sequencing, and information on clinical outcomes in larger studies may in the future aid assessment of SIT mechanisms and efficacy. PMID:26559321

  16. High throughput sequencing analysis of RNA libraries reveals the influences of initial library and PCR methods on SELEX efficiency

    PubMed Central

    Takahashi, Mayumi; Wu, Xiwei; Ho, Michelle; Chomchan, Pritsana; Rossi, John J.; Burnett, John C.; Zhou, Jiehua

    2016-01-01

    The systemic evolution of ligands by exponential enrichment (SELEX) technique is a powerful and effective aptamer-selection procedure. However, modifications to the process can dramatically improve selection efficiency and aptamer performance. For example, droplet digital PCR (ddPCR) has been recently incorporated into SELEX selection protocols to putatively reduce the propagation of byproducts and avoid selection bias that result from differences in PCR efficiency of sequences within the random library. However, a detailed, parallel comparison of the efficacy of conventional solution PCR versus the ddPCR modification in the RNA aptamer-selection process is needed to understand effects on overall SELEX performance. In the present study, we took advantage of powerful high throughput sequencing technology and bioinformatics analysis coupled with SELEX (HT-SELEX) to thoroughly investigate the effects of initial library and PCR methods in the RNA aptamer identification. Our analysis revealed that distinct “biased sequences” and nucleotide composition existed in the initial, unselected libraries purchased from two different manufacturers and that the fate of the “biased sequences” was target-dependent during selection. Our comparison of solution PCR- and ddPCR-driven HT-SELEX demonstrated that PCR method affected not only the nucleotide composition of the enriched sequences, but also the overall SELEX efficiency and aptamer efficacy. PMID:27652575

  17. High-throughput sequencing and morphology perform equally well for benthic monitoring of marine ecosystems

    PubMed Central

    Lejzerowicz, Franck; Esling, Philippe; Pillet, Loïc; Wilding, Thomas A.; Black, Kenneth D.; Pawlowski, Jan

    2015-01-01

    Environmental diversity surveys are crucial for the bioassessment of anthropogenic impacts on marine ecosystems. Traditional benthic monitoring relying on morphotaxonomic inventories of macrofaunal communities is expensive, time-consuming and expertise-demanding. High-throughput sequencing of environmental DNA barcodes (metabarcoding) offers an alternative to describe biological communities. However, whether the metabarcoding approach meets the quality standards of benthic monitoring remains to be tested. Here, we compared morphological and eDNA/RNA-based inventories of metazoans from samples collected at 10 stations around a fish farm in Scotland, including near-cage and distant zones. For each of 5 replicate samples per station, we sequenced the V4 region of the 18S rRNA gene using the Illumina technology. After filtering, we obtained 841,766 metazoan sequences clustered in 163 Operational Taxonomic Units (OTUs). We assigned the OTUs by combining local BLAST searches with phylogenetic analyses. We calculated two commonly used indices: the Infaunal Trophic Index and the AZTI Marine Biotic Index. We found that the molecular data faithfully reflect the morphology-based indices and provides an equivalent assessment of the impact associated with fish farms activities. We advocate that future benthic monitoring should integrate metabarcoding as a rapid and accurate tool for the evaluation of the quality of marine benthic ecosystems. PMID:26355099

  18. Alignment of high-throughput sequencing data inside in-memory databases.

    PubMed

    Firnkorn, Daniel; Knaup-Gregori, Petra; Lorenzo Bermejo, Justo; Ganzinger, Matthias

    2014-01-01

    In times of high-throughput DNA sequencing techniques, performance-capable analysis of DNA sequences is of high importance. Computer supported DNA analysis is still an intensive time-consuming task. In this paper we explore the potential of a new In-Memory database technology by using SAP's High Performance Analytic Appliance (HANA). We focus on read alignment as one of the first steps in DNA sequence analysis. In particular, we examined the widely used Burrows-Wheeler Aligner (BWA) and implemented stored procedures in both, HANA and the free database system MySQL, to compare execution time and memory management. To ensure that the results are comparable, MySQL has been running in memory as well, utilizing its integrated memory engine for database table creation. We implemented stored procedures, containing exact and inexact searching of DNA reads within the reference genome GRCh37. Due to technical restrictions in SAP HANA concerning recursion, the inexact matching problem could not be implemented on this platform. Hence, performance analysis between HANA and MySQL was made by comparing the execution time of the exact search procedures. Here, HANA was approximately 27 times faster than MySQL which means, that there is a high potential within the new In-Memory concepts, leading to further developments of DNA analysis procedures in the future.

  19. [The progress and prospect of application of genetic testing technology-based gene detection technology in the diagnosis and treatment of hereditary cancer].

    PubMed

    He, J X; Jiang, Y F

    2017-08-06

    Hereditary cancer is caused by specific pathogenic gene mutations. Early detection and early intervention are the most effective ways to prevent and control hereditary cancer. High-throughput sequencing based genetic testing technology (NGS) breaks through the restrictions of pedigree analysis, provide a convenient and efficient method to detect and diagnose hereditary cancer. Here, we introduce the mechanism of hereditary cancer, summarize, discuss and prospect the application of NGS and other genetic tests in the diagnosis of hereditary retinoblastoma, hereditary breast and ovarian cancer syndrome, hereditary colorectal cancer and other complex and rare hereditary tumors.

  20. Modelling Human Regulatory Variation in Mouse: Finding the Function in Genome-Wide Association Studies and Whole-Genome Sequencing

    PubMed Central

    Schmouth, Jean-François; Bonaguro, Russell J.; Corso-Diaz, Ximena; Simpson, Elizabeth M.

    2012-01-01

    An increasing body of literature from genome-wide association studies and human whole-genome sequencing highlights the identification of large numbers of candidate regulatory variants of potential therapeutic interest in numerous diseases. Our relatively poor understanding of the functions of non-coding genomic sequence, and the slow and laborious process of experimental validation of the functional significance of human regulatory variants, limits our ability to fully benefit from this information in our efforts to comprehend human disease. Humanized mouse models (HuMMs), in which human genes are introduced into the mouse, suggest an approach to this problem. In the past, HuMMs have been used successfully to study human disease variants; e.g., the complex genetic condition arising from Down syndrome, common monogenic disorders such as Huntington disease and β-thalassemia, and cancer susceptibility genes such as BRCA1. In this commentary, we highlight a novel method for high-throughput single-copy site-specific generation of HuMMs entitled High-throughput Human Genes on the X Chromosome (HuGX). This method can be applied to most human genes for which a bacterial artificial chromosome (BAC) construct can be derived and a mouse-null allele exists. This strategy comprises (1) the use of recombineering technology to create a human variant–harbouring BAC, (2) knock-in of this BAC into the mouse genome using Hprt docking technology, and (3) allele comparison by interspecies complementation. We demonstrate the throughput of the HuGX method by generating a series of seven different alleles for the human NR2E1 gene at Hprt. In future challenges, we consider the current limitations of experimental approaches and call for a concerted effort by the genetics community, for both human and mouse, to solve the challenge of the functional analysis of human regulatory variation. PMID:22396661

  1. Diversity arrays technology (DArT) markers in apple for genetic linkage maps.

    PubMed

    Schouten, Henk J; van de Weg, W Eric; Carling, Jason; Khan, Sabaz Ali; McKay, Steven J; van Kaauwen, Martijn P W; Wittenberg, Alexander H J; Koehorst-van Putten, Herma J J; Noordijk, Yolanda; Gao, Zhongshan; Rees, D Jasper G; Van Dyk, Maria M; Jaccoud, Damian; Considine, Michael J; Kilian, Andrzej

    2012-03-01

    Diversity Arrays Technology (DArT) provides a high-throughput whole-genome genotyping platform for the detection and scoring of hundreds of polymorphic loci without any need for prior sequence information. The work presented here details the development and performance of a DArT genotyping array for apple. This is the first paper on DArT in horticultural trees. Genetic mapping of DArT markers in two mapping populations and their integration with other marker types showed that DArT is a powerful high-throughput method for obtaining accurate and reproducible marker data, despite the low cost per data point. This method appears to be suitable for aligning the genetic maps of different segregating populations. The standard complexity reduction method, based on the methylation-sensitive PstI restriction enzyme, resulted in a high frequency of markers, although there was 52-54% redundancy due to the repeated sampling of highly similar sequences. Sequencing of the marker clones showed that they are significantly enriched for low-copy, genic regions. The genome coverage using the standard method was 55-76%. For improved genome coverage, an alternative complexity reduction method was examined, which resulted in less redundancy and additional segregating markers. The DArT markers proved to be of high quality and were very suitable for genetic mapping at low cost for the apple, providing moderate genome coverage. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1007/s11032-011-9579-5) contains supplementary material, which is available to authorized users.

  2. The genetic basis of new treatment modalities in melanoma.

    PubMed

    Kunz, Manfred

    2015-01-01

    In recent years, intracellular signal transduction via RAS-RAF-MEK-ERK has been successfully targeted in new treatment approaches for melanoma using small molecule inhibitors against activated BRAF (V600E mutation) and activated MEK1/2. Also mutated c-KIT has been identified as a promising target. Meanwhile, evidence has been provided that combinations between BRAF inhibitors and MEK1/2 inhibitors are more promising than single-agent treatments. Moreover, new treatment algorithms favor sequential treatment using BRAF inhibitors and newly developed immunotherapies targeting common T lymphocyte antigen 4 (CTLA-4) or programmed cell death 1 (PD-1). In depth molecular analyses have uncovered new mechanisms of treatment resistance and recurrence, which may impact on future treatment decisions. Moreover, next-generation sequencing data have shown that recurrent lesions harbor specific genetic aberrations. At the same time, high throughput sequencing studies of melanoma unraveled a series of new treatment candidates for future treatment approaches such as ERBB4, GRIN2A, GRM3, and RAC1. More recent bioinformatic technologies provided genetic evidence for extensive tumor heterogeneity and tumor clonality of solid tumors, which might also be of relevance for melanoma. However, these technologies have not yet been applied to this tumor. In this review, an overview on the genetic basis of current treatment of melanoma, treatment resistance and recurrences including new treatment perspectives based on recent high-throughput sequencing data is provided. Moreover, future aspects of individualized treatment based on each patient's individual mutational landscape are discussed.

  3. Next-Generation Immune Repertoire Sequencing as a Clue to Elucidate the Landscape of Immune Modulation by Host-Gut Microbiome Interactions.

    PubMed

    Ichinohe, Tatsuo; Miyama, Takahiko; Kawase, Takakazu; Honjo, Yasuko; Kitaura, Kazutaka; Sato, Hiroyuki; Shin-I, Tadasu; Suzuki, Ryuji

    2018-01-01

    The human immune system is a fine network consisted of the innumerable numbers of functional cells that balance the immunity and tolerance against various endogenous and environmental challenges. Although advances in modern immunology have revealed a role of many unique immune cell subsets, technologies that enable us to capture the whole landscape of immune responses against specific antigens have been not available to date. Acquired immunity against various microorganisms including host microbiome is principally founded on T cell and B cell populations, each of which expresses antigen-specific receptors that define a unique clonotype. Over the past several years, high-throughput next-generation sequencing has been developed as a powerful tool to profile T- and B-cell receptor repertoires in a given individual at the single-cell level. Sophisticated immuno-bioinformatic analyses by use of this innovative methodology have been already implemented in clinical development of antibody engineering, vaccine design, and cellular immunotherapy. In this article, we aim to discuss the possible application of high-throughput immune receptor sequencing in the field of nutritional and intestinal immunology. Although there are still unsolved caveats, this emerging technology combined with single-cell transcriptomics/proteomics provides a critical tool to unveil the previously unrecognized principle of host-microbiome immune homeostasis. Accumulation of such knowledge will lead to the development of effective ways for personalized immune modulation through deeper understanding of the mechanisms by which the intestinal environment affects our immune ecosystem.

  4. MPRAnator: a web-based tool for the design of massively parallel reporter assay experiments

    PubMed Central

    Georgakopoulos-Soares, Ilias; Jain, Naman; Gray, Jesse M; Hemberg, Martin

    2017-01-01

    Motivation: With the rapid advances in DNA synthesis and sequencing technologies and the continuing decline in the associated costs, high-throughput experiments can be performed to investigate the regulatory role of thousands of oligonucleotide sequences simultaneously. Nevertheless, designing high-throughput reporter assay experiments such as massively parallel reporter assays (MPRAs) and similar methods remains challenging. Results: We introduce MPRAnator, a set of tools that facilitate rapid design of MPRA experiments. With MPRA Motif design, a set of variables provides fine control of how motifs are placed into sequences, thereby allowing the investigation of the rules that govern transcription factor (TF) occupancy. MPRA single-nucleotide polymorphism design can be used to systematically examine the functional effects of single or combinations of single-nucleotide polymorphisms at regulatory sequences. Finally, the Transmutation tool allows for the design of negative controls by permitting scrambling, reversing, complementing or introducing multiple random mutations in the input sequences or motifs. Availability and implementation: MPRAnator tool set is implemented in Python, Perl and Javascript and is freely available at www.genomegeek.com and www.sanger.ac.uk/science/tools/mpranator. The source code is available on www.github.com/hemberg-lab/MPRAnator/ under the MIT license. The REST API allows programmatic access to MPRAnator using simple URLs. Contact: igs@sanger.ac.uk or mh26@sanger.ac.uk Supplementary information: Supplementary data are available at Bioinformatics online. PMID:27605100

  5. MPRAnator: a web-based tool for the design of massively parallel reporter assay experiments.

    PubMed

    Georgakopoulos-Soares, Ilias; Jain, Naman; Gray, Jesse M; Hemberg, Martin

    2017-01-01

    With the rapid advances in DNA synthesis and sequencing technologies and the continuing decline in the associated costs, high-throughput experiments can be performed to investigate the regulatory role of thousands of oligonucleotide sequences simultaneously. Nevertheless, designing high-throughput reporter assay experiments such as massively parallel reporter assays (MPRAs) and similar methods remains challenging. We introduce MPRAnator, a set of tools that facilitate rapid design of MPRA experiments. With MPRA Motif design, a set of variables provides fine control of how motifs are placed into sequences, thereby allowing the investigation of the rules that govern transcription factor (TF) occupancy. MPRA single-nucleotide polymorphism design can be used to systematically examine the functional effects of single or combinations of single-nucleotide polymorphisms at regulatory sequences. Finally, the Transmutation tool allows for the design of negative controls by permitting scrambling, reversing, complementing or introducing multiple random mutations in the input sequences or motifs. MPRAnator tool set is implemented in Python, Perl and Javascript and is freely available at www.genomegeek.com and www.sanger.ac.uk/science/tools/mpranator The source code is available on www.github.com/hemberg-lab/MPRAnator/ under the MIT license. The REST API allows programmatic access to MPRAnator using simple URLs. igs@sanger.ac.uk or mh26@sanger.ac.ukSupplementary information: Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.

  6. Impact of NGS in the medical sciences: Genetic syndromes with an increased risk of developing cancer as an example of the use of new technologies

    PubMed Central

    Lapunzina, Pablo; López, Rocío Ortiz; Rodríguez-Laguna, Lara; García-Miguel, Purificación; Martínez, Augusto Rojas; Martínez-Glez, Víctor

    2014-01-01

    The increased speed and decreasing cost of sequencing, along with an understanding of the clinical relevance of emerging information for patient management, has led to an explosion of potential applications in healthcare. Currently, SNP arrays and Next-Generation Sequencing (NGS) technologies are relatively new techniques used to scan genomes for gains and losses, losses of heterozygosity (LOH), SNPs, and indel variants as well as to perform complete sequencing of a panel of candidate genes, the entire exome (whole exome sequencing) or even the whole genome. As a result, these new high-throughput technologies have facilitated progress in the understanding and diagnosis of genetic syndromes and cancers, two disorders traditionally considered to be separate diseases but that can share causal genetic alterations in a group of developmental disorders associated with congenital malformations and cancer risk. The purpose of this work is to review these syndromes as an example of a group of disorders that has been included in a panel of genes for NGS analysis. We also highlight the relationship between development and cancer and underline the connections between these syndromes. PMID:24764758

  7. VARiD: a variation detection framework for color-space and letter-space platforms.

    PubMed

    Dalca, Adrian V; Rumble, Stephen M; Levy, Samuel; Brudno, Michael

    2010-06-15

    High-throughput sequencing (HTS) technologies are transforming the study of genomic variation. The various HTS technologies have different sequencing biases and error rates, and while most HTS technologies sequence the residues of the genome directly, generating base calls for each position, the Applied Biosystem's SOLiD platform generates dibase-coded (color space) sequences. While combining data from the various platforms should increase the accuracy of variation detection, to date there are only a few tools that can identify variants from color space data, and none that can analyze color space and regular (letter space) data together. We present VARiD--a probabilistic method for variation detection from both letter- and color-space reads simultaneously. VARiD is based on a hidden Markov model and uses the forward-backward algorithm to accurately identify heterozygous, homozygous and tri-allelic SNPs, as well as micro-indels. Our analysis shows that VARiD performs better than the AB SOLiD toolset at detecting variants from color-space data alone, and improves the calls dramatically when letter- and color-space reads are combined. The toolset is freely available at http://compbio.cs.utoronto.ca/varid.

  8. CSReport: A New Computational Tool Designed for Automatic Analysis of Class Switch Recombination Junctions Sequenced by High-Throughput Sequencing.

    PubMed

    Boyer, François; Boutouil, Hend; Dalloul, Iman; Dalloul, Zeinab; Cook-Moreau, Jeanne; Aldigier, Jean-Claude; Carrion, Claire; Herve, Bastien; Scaon, Erwan; Cogné, Michel; Péron, Sophie

    2017-05-15

    B cells ensure humoral immune responses due to the production of Ag-specific memory B cells and Ab-secreting plasma cells. In secondary lymphoid organs, Ag-driven B cell activation induces terminal maturation and Ig isotype class switch (class switch recombination [CSR]). CSR creates a virtually unique IgH locus in every B cell clone by intrachromosomal recombination between two switch (S) regions upstream of each C region gene. Amount and structural features of CSR junctions reveal valuable information about the CSR mechanism, and analysis of CSR junctions is useful in basic and clinical research studies of B cell functions. To provide an automated tool able to analyze large data sets of CSR junction sequences produced by high-throughput sequencing (HTS), we designed CSReport, a software program dedicated to support analysis of CSR recombination junctions sequenced with a HTS-based protocol (Ion Torrent technology). CSReport was assessed using simulated data sets of CSR junctions and then used for analysis of Sμ-Sα and Sμ-Sγ1 junctions from CH12F3 cells and primary murine B cells, respectively. CSReport identifies junction segment breakpoints on reference sequences and junction structure (blunt-ended junctions or junctions with insertions or microhomology). Besides the ability to analyze unprecedentedly large libraries of junction sequences, CSReport will provide a unified framework for CSR junction studies. Our results show that CSReport is an accurate tool for analysis of sequences from our HTS-based protocol for CSR junctions, thereby facilitating and accelerating their study. Copyright © 2017 by The American Association of Immunologists, Inc.

  9. Quantitative phenotyping via deep barcode sequencing

    PubMed Central

    Smith, Andrew M.; Heisler, Lawrence E.; Mellor, Joseph; Kaper, Fiona; Thompson, Michael J.; Chee, Mark; Roth, Frederick P.; Giaever, Guri; Nislow, Corey

    2009-01-01

    Next-generation DNA sequencing technologies have revolutionized diverse genomics applications, including de novo genome sequencing, SNP detection, chromatin immunoprecipitation, and transcriptome analysis. Here we apply deep sequencing to genome-scale fitness profiling to evaluate yeast strain collections in parallel. This method, Barcode analysis by Sequencing, or “Bar-seq,” outperforms the current benchmark barcode microarray assay in terms of both dynamic range and throughput. When applied to a complex chemogenomic assay, Bar-seq quantitatively identifies drug targets, with performance superior to the benchmark microarray assay. We also show that Bar-seq is well-suited for a multiplex format. We completely re-sequenced and re-annotated the yeast deletion collection using deep sequencing, found that ∼20% of the barcodes and common priming sequences varied from expectation, and used this revised list of barcode sequences to improve data quality. Together, this new assay and analysis routine provide a deep-sequencing-based toolkit for identifying gene–environment interactions on a genome-wide scale. PMID:19622793

  10. Loeffler 4.0: Diagnostic Metagenomics.

    PubMed

    Höper, Dirk; Wylezich, Claudia; Beer, Martin

    2017-01-01

    A new world of possibilities for "virus discovery" was opened up with high-throughput sequencing becoming available in the last decade. While scientifically metagenomic analysis was established before the start of the era of high-throughput sequencing, the availability of the first second-generation sequencers was the kick-off for diagnosticians to use sequencing for the detection of novel pathogens. Today, diagnostic metagenomics is becoming the standard procedure for the detection and genetic characterization of new viruses or novel virus variants. Here, we provide an overview about technical considerations of high-throughput sequencing-based diagnostic metagenomics together with selected examples of "virus discovery" for animal diseases or zoonoses and metagenomics for food safety or basic veterinary research. © 2017 Elsevier Inc. All rights reserved.

  11. High Throughput Sequence Analysis for Disease Resistance in Maize

    USDA-ARS?s Scientific Manuscript database

    Preliminary results of a computational analysis of high throughput sequencing data from Zea mays and the fungus Aspergillus are reported. The Illumina Genome Analyzer was used to sequence RNA samples from two strains of Z. mays (Va35 and Mp313) collected over a time course as well as several specie...

  12. Mapping DNA Methylation with High Throughput Nanopore Sequencing

    PubMed Central

    Rand, Arthur C.; Jain, Miten; Eizenga, Jordan M.; Musselman-Brown, Audrey; Olsen, Hugh E.; Akeson, Mark

    2017-01-01

    Chemical modifications to DNA regulate its biological function. We present a framework for mapping methylation to cytosine and adenosine with the Oxford Nanopore Technologies MinION using its ionic current signal. We map three cytosine variants and two adenine variants. The results show that our model is sensitive enough to detect changes in genomic DNA methylation levels as a function of growth phase in E. coli. PMID:28218897

  13. Toward the 1,000 dollars human genome.

    PubMed

    Bennett, Simon T; Barnes, Colin; Cox, Anthony; Davies, Lisa; Brown, Clive

    2005-06-01

    Revolutionary new technologies, capable of transforming the economics of sequencing, are providing an unparalleled opportunity to analyze human genetic variation comprehensively at the whole-genome level within a realistic timeframe and at affordable costs. Current estimates suggest that it would cost somewhere in the region of 30 million US dollars to sequence an entire human genome using Sanger-based sequencing, and on one machine it would take about 60 years. Solexa is widely regarded as a company with the necessary disruptive technology to be the first to achieve the ultimate goal of the so-called 1,000 dollars human genome - the conceptual cost-point needed for routine analysis of individual genomes. Solexa's technology is based on completely novel sequencing chemistry capable of sequencing billions of individual DNA molecules simultaneously, a base at a time, to enable highly accurate, low cost analysis of an entire human genome in a single experiment. When applied over a large enough genomic region, these new approaches to resequencing will enable the simultaneous detection and typing of known, as well as unknown, polymorphisms, and will also offer information about patterns of linkage disequilibrium in the population being studied. Technological progress, leading to the advent of single-molecule-based approaches, is beginning to dramatically drive down costs and increase throughput to unprecedented levels, each being several orders of magnitude better than that which is currently available. A new sequencing paradigm based on single molecules will be faster, cheaper and more sensitive, and will permit routine analysis at the whole-genome level.

  14. NCBI GEO: archive for functional genomics data sets—10 years on

    PubMed Central

    Barrett, Tanya; Troup, Dennis B.; Wilhite, Stephen E.; Ledoux, Pierre; Evangelista, Carlos; Kim, Irene F.; Tomashevsky, Maxim; Marshall, Kimberly A.; Phillippy, Katherine H.; Sherman, Patti M.; Muertter, Rolf N.; Holko, Michelle; Ayanbule, Oluwabukunmi; Yefanov, Andrey; Soboleva, Alexandra

    2011-01-01

    A decade ago, the Gene Expression Omnibus (GEO) database was established at the National Center for Biotechnology Information (NCBI). The original objective of GEO was to serve as a public repository for high-throughput gene expression data generated mostly by microarray technology. However, the research community quickly applied microarrays to non-gene-expression studies, including examination of genome copy number variation and genome-wide profiling of DNA-binding proteins. Because the GEO database was designed with a flexible structure, it was possible to quickly adapt the repository to store these data types. More recently, as the microarray community switches to next-generation sequencing technologies, GEO has again adapted to host these data sets. Today, GEO stores over 20 000 microarray- and sequence-based functional genomics studies, and continues to handle the majority of direct high-throughput data submissions from the research community. Multiple mechanisms are provided to help users effectively search, browse, download and visualize the data at the level of individual genes or entire studies. This paper describes recent database enhancements, including new search and data representation tools, as well as a brief review of how the community uses GEO data. GEO is freely accessible at http://www.ncbi.nlm.nih.gov/geo/. PMID:21097893

  15. Transcriptome-Based Differentiation of Closely-Related Miscanthus Lines

    DOE PAGES

    Chouvarine, Philippe; Cooksey, Amanda M.; McCarthy, Fiona M.; ...

    2012-01-10

    Distinguishing between individuals is critical to those conducting animal/plant breeding, food safety/quality research, diagnostic and clinical testing, and evolutionary biology studies. Classical genetic identification studies are based on marker polymorphisms, but polymorphism-based techniques are time and labor intensive and often cannot distinguish between closely related individuals. Illumina sequencing technologies provide the detailed sequence data required for rapid and efficient differentiation of related species, lines/cultivars, and individuals in a cost-effective manner. Here we describe the use of Illumina high-throughput exome sequencing, coupled with SNP mapping, as a rapid means of distinguishing between related cultivars of the lignocellulosic bioenergy crop giant miscanthusmore » (Miscanthus6giganteus). We provide the first exome sequence database for Miscanthus species complete with Gene Ontology (GO) functional annotations."« less

  16. A Polyglot Approach to Bioinformatics Data Integration: A Phylogenetic Analysis of HIV-1

    PubMed Central

    Reisman, Steven; Hatzopoulos, Thomas; Läufer, Konstantin; Thiruvathukal, George K.; Putonti, Catherine

    2016-01-01

    As sequencing technologies continue to drop in price and increase in throughput, new challenges emerge for the management and accessibility of genomic sequence data. We have developed a pipeline for facilitating the storage, retrieval, and subsequent analysis of molecular data, integrating both sequence and metadata. Taking a polyglot approach involving multiple languages, libraries, and persistence mechanisms, sequence data can be aggregated from publicly available and local repositories. Data are exposed in the form of a RESTful web service, formatted for easy querying, and retrieved for downstream analyses. As a proof of concept, we have developed a resource for annotated HIV-1 sequences. Phylogenetic analyses were conducted for >6,000 HIV-1 sequences revealing spatial and temporal factors influence the evolution of the individual genes uniquely. Nevertheless, signatures of origin can be extrapolated even despite increased globalization. The approach developed here can easily be customized for any species of interest. PMID:26819543

  17. Identification of genomic indels and structural variations using split reads

    PubMed Central

    2011-01-01

    Background Recent studies have demonstrated the genetic significance of insertions, deletions, and other more complex structural variants (SVs) in the human population. With the development of the next-generation sequencing technologies, high-throughput surveys of SVs on the whole-genome level have become possible. Here we present split-read identification, calibrated (SRiC), a sequence-based method for SV detection. Results We start by mapping each read to the reference genome in standard fashion using gapped alignment. Then to identify SVs, we score each of the many initial mappings with an assessment strategy designed to take into account both sequencing and alignment errors (e.g. scoring more highly events gapped in the center of a read). All current SV calling methods have multilevel biases in their identifications due to both experimental and computational limitations (e.g. calling more deletions than insertions). A key aspect of our approach is that we calibrate all our calls against synthetic data sets generated from simulations of high-throughput sequencing (with realistic error models). This allows us to calculate sensitivity and the positive predictive value under different parameter-value scenarios and for different classes of events (e.g. long deletions vs. short insertions). We run our calculations on representative data from the 1000 Genomes Project. Coupling the observed numbers of events on chromosome 1 with the calibrations gleaned from the simulations (for different length events) allows us to construct a relatively unbiased estimate for the total number of SVs in the human genome across a wide range of length scales. We estimate in particular that an individual genome contains ~670,000 indels/SVs. Conclusions Compared with the existing read-depth and read-pair approaches for SV identification, our method can pinpoint the exact breakpoints of SV events, reveal the actual sequence content of insertions, and cover the whole size spectrum for deletions. Moreover, with the advent of the third-generation sequencing technologies that produce longer reads, we expect our method to be even more useful. PMID:21787423

  18. Pyrosequencing the Canine Faecal Microbiota: Breadth and Depth of Biodiversity

    PubMed Central

    Hand, Daniel; Wallis, Corrin; Colyer, Alison; Penn, Charles W.

    2013-01-01

    Mammalian intestinal microbiota remain poorly understood despite decades of interest and investigation by culture-based and other long-established methodologies. Using high-throughput sequencing technology we now report a detailed analysis of canine faecal microbiota. The study group of animals comprised eleven healthy adult miniature Schnauzer dogs of mixed sex and age, some closely related and all housed in kennel and pen accommodation on the same premises with similar feeding and exercise regimes. DNA was extracted from faecal specimens and subjected to PCR amplification of 16S rDNA, followed by sequencing of the 5′ region that included variable regions V1 and V2. Barcoded amplicons were sequenced by Roche-454 FLX high-throughput pyrosequencing. Sequences were assigned to taxa using the Ribosomal Database Project Bayesian classifier and revealed dominance of Fusobacterium and Bacteroidetes phyla. Differences between animals in the proportions of different taxa, among 10,000 reads per animal, were clear and not supportive of the concept of a “core microbiota”. Despite this variability in prominent genera, littermates were shown to have a more similar faecal microbial composition than unrelated dogs. Diversity of the microbiota was also assessed by assignment of sequence reads into operational taxonomic units (OTUs) at the level of 97% sequence identity. The OTU data were then subjected to rarefaction analysis and determination of Chao1 richness estimates. The data indicated that faecal microbiota comprised possibly as many as 500 to 1500 OTUs. PMID:23382835

  19. GenomicTools: a computational platform for developing high-throughput analytics in genomics.

    PubMed

    Tsirigos, Aristotelis; Haiminen, Niina; Bilal, Erhan; Utro, Filippo

    2012-01-15

    Recent advances in sequencing technology have resulted in the dramatic increase of sequencing data, which, in turn, requires efficient management of computational resources, such as computing time, memory requirements as well as prototyping of computational pipelines. We present GenomicTools, a flexible computational platform, comprising both a command-line set of tools and a C++ API, for the analysis and manipulation of high-throughput sequencing data such as DNA-seq, RNA-seq, ChIP-seq and MethylC-seq. GenomicTools implements a variety of mathematical operations between sets of genomic regions thereby enabling the prototyping of computational pipelines that can address a wide spectrum of tasks ranging from pre-processing and quality control to meta-analyses. Additionally, the GenomicTools platform is designed to analyze large datasets of any size by minimizing memory requirements. In practical applications, where comparable, GenomicTools outperforms existing tools in terms of both time and memory usage. The GenomicTools platform (version 2.0.0) was implemented in C++. The source code, documentation, user manual, example datasets and scripts are available online at http://code.google.com/p/ibm-cbc-genomic-tools.

  20. A Pipeline for High-Throughput Concentration Response Modeling of Gene Expression for Toxicogenomics

    PubMed Central

    House, John S.; Grimm, Fabian A.; Jima, Dereje D.; Zhou, Yi-Hui; Rusyn, Ivan; Wright, Fred A.

    2017-01-01

    Cell-based assays are an attractive option to measure gene expression response to exposure, but the cost of whole-transcriptome RNA sequencing has been a barrier to the use of gene expression profiling for in vitro toxicity screening. In addition, standard RNA sequencing adds variability due to variable transcript length and amplification. Targeted probe-sequencing technologies such as TempO-Seq, with transcriptomic representation that can vary from hundreds of genes to the entire transcriptome, may reduce some components of variation. Analyses of high-throughput toxicogenomics data require renewed attention to read-calling algorithms and simplified dose–response modeling for datasets with relatively few samples. Using data from induced pluripotent stem cell-derived cardiomyocytes treated with chemicals at varying concentrations, we describe here and make available a pipeline for handling expression data generated by TempO-Seq to align reads, clean and normalize raw count data, identify differentially expressed genes, and calculate transcriptomic concentration–response points of departure. The methods are extensible to other forms of concentration–response gene-expression data, and we discuss the utility of the methods for assessing variation in susceptibility and the diseased cellular state. PMID:29163636

  1. AmpliVar: mutation detection in high-throughput sequence from amplicon-based libraries.

    PubMed

    Hsu, Arthur L; Kondrashova, Olga; Lunke, Sebastian; Love, Clare J; Meldrum, Cliff; Marquis-Nicholson, Renate; Corboy, Greg; Pham, Kym; Wakefield, Matthew; Waring, Paul M; Taylor, Graham R

    2015-04-01

    Conventional means of identifying variants in high-throughput sequencing align each read against a reference sequence, and then call variants at each position. Here, we demonstrate an orthogonal means of identifying sequence variation by grouping the reads as amplicons prior to any alignment. We used AmpliVar to make key-value hashes of sequence reads and group reads as individual amplicons using a table of flanking sequences. Low-abundance reads were removed according to a selectable threshold, and reads above this threshold were aligned as groups, rather than as individual reads, permitting the use of sensitive alignment tools. We show that this approach is more sensitive, more specific, and more computationally efficient than comparable methods for the analysis of amplicon-based high-throughput sequencing data. The method can be extended to enable alignment-free confirmation of variants seen in hybridization capture target-enrichment data. © 2015 WILEY PERIODICALS, INC.

  2. Current and future molecular approaches in the diagnosis of cystic fibrosis.

    PubMed

    Bergougnoux, Anne; Taulan-Cadars, Magali; Claustres, Mireille; Raynal, Caroline

    2018-05-01

    Cystic Fibrosis is among the first diseases to have general population genetic screening tests and one of the most common indications of prenatal and preimplantation genetic diagnosis for single gene disorders. During the past twenty years, thanks to the evolution of diagnostic techniques, our knowledge of CFTR genetics and pathophysiological mechanisms involved in cystic fibrosis has significantly improved. Areas covered: Sanger sequencing and quantitative methods greatly contributed to the identification of more than 2,000 sequence variations reported worldwide in the CFTR gene. We are now entering a new technological age with the generalization of high throughput approaches such as Next Generation Sequencing and Droplet Digital PCR technologies in diagnostics laboratories. These powerful technologies open up new perspectives for scanning the entire CFTR locus, exploring modifier factors that possibly influence the clinical evolution of patients, and for preimplantation and prenatal diagnosis. Expert commentary: Such breakthroughs would, however, require powerful bioinformatics tools and relevant functional tests of variants for analysis and interpretation of the resulting data. Ultimately, an optimal use of all those resources may improve patient care and therapeutic decision-making.

  3. High-throughput genotyping of hop (Humulus lupulus L.) utilising diversity arrays technology (DArT).

    PubMed

    Howard, E L; Whittock, S P; Jakše, J; Carling, J; Matthews, P D; Probasco, G; Henning, J A; Darby, P; Cerenak, A; Javornik, B; Kilian, A; Koutoulis, A

    2011-05-01

    Implementation of molecular methods in hop (Humulus lupulus L.) breeding is dependent on the availability of sizeable numbers of polymorphic markers and a comprehensive understanding of genetic variation. However, use of molecular marker technology is limited due to expense, time inefficiency, laborious methodology and dependence on DNA sequence information. Diversity arrays technology (DArT) is a high-throughput cost-effective method for the discovery of large numbers of quality polymorphic markers without reliance on DNA sequence information. This study is the first to utilise DArT for hop genotyping, identifying 730 polymorphic markers from 92 hop accessions. The marker quality was high and similar to the quality of DArT markers previously generated for other species; although percentage polymorphism and polymorphism information content (PIC) were lower than in previous studies deploying other marker systems in hop. Genetic relationships in hop illustrated by DArT in this study coincide with knowledge generated using alternate methods. Several statistical analyses separated the hop accessions into genetically differentiated North American and European groupings, with hybrids between the two groups clearly distinguishable. Levels of genetic diversity were similar in the North American and European groups, but higher in the hybrid group. The markers produced from this time and cost-efficient genotyping tool will be a valuable resource for numerous applications in hop breeding and genetics studies, such as mapping, marker-assisted selection, genetic identity testing, guidance in the maintenance of genetic diversity and the directed breeding of superior cultivars.

  4. Sequencing the Connectome

    PubMed Central

    Zador, Anthony M.; Dubnau, Joshua; Oyibo, Hassana K.; Zhan, Huiqing; Cao, Gang; Peikon, Ian D.

    2012-01-01

    Connectivity determines the function of neural circuits. Historically, circuit mapping has usually been viewed as a problem of microscopy, but no current method can achieve high-throughput mapping of entire circuits with single neuron precision. Here we describe a novel approach to determining connectivity. We propose BOINC (“barcoding of individual neuronal connections”), a method for converting the problem of connectivity into a form that can be read out by high-throughput DNA sequencing. The appeal of using sequencing is that its scale—sequencing billions of nucleotides per day is now routine—is a natural match to the complexity of neural circuits. An inexpensive high-throughput technique for establishing circuit connectivity at single neuron resolution could transform neuroscience research. PMID:23109909

  5. High-Throughput Block Optical DNA Sequence Identification.

    PubMed

    Sagar, Dodderi Manjunatha; Korshoj, Lee Erik; Hanson, Katrina Bethany; Chowdhury, Partha Pratim; Otoupal, Peter Britton; Chatterjee, Anushree; Nagpal, Prashant

    2018-01-01

    Optical techniques for molecular diagnostics or DNA sequencing generally rely on small molecule fluorescent labels, which utilize light with a wavelength of several hundred nanometers for detection. Developing a label-free optical DNA sequencing technique will require nanoscale focusing of light, a high-throughput and multiplexed identification method, and a data compression technique to rapidly identify sequences and analyze genomic heterogeneity for big datasets. Such a method should identify characteristic molecular vibrations using optical spectroscopy, especially in the "fingerprinting region" from ≈400-1400 cm -1 . Here, surface-enhanced Raman spectroscopy is used to demonstrate label-free identification of DNA nucleobases with multiplexed 3D plasmonic nanofocusing. While nanometer-scale mode volumes prevent identification of single nucleobases within a DNA sequence, the block optical technique can identify A, T, G, and C content in DNA k-mers. The content of each nucleotide in a DNA block can be a unique and high-throughput method for identifying sequences, genes, and other biomarkers as an alternative to single-letter sequencing. Additionally, coupling two complementary vibrational spectroscopy techniques (infrared and Raman) can improve block characterization. These results pave the way for developing a novel, high-throughput block optical sequencing method with lossy genomic data compression using k-mer identification from multiplexed optical data acquisition. © 2017 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  6. Nitrogen Cycle Evaluation (NiCE) Chip for the Simultaneous Analysis of Multiple N-Cycle Associated Genes.

    PubMed

    Oshiki, Mamoru; Segawa, Takahiro; Ishii, Satoshi

    2018-02-02

    Various microorganisms play key roles in the Nitrogen (N) cycle. Quantitative PCR (qPCR) and PCR-amplicon sequencing of the N cycle functional genes allow us to analyze the abundance and diversity of microbes responsible in the N transforming reactions in various environmental samples. However, analysis of multiple target genes can be cumbersome and expensive. PCR-independent analysis, such as metagenomics and metatranscriptomics, is useful but expensive especially when we analyze multiple samples and try to detect N cycle functional genes present at relatively low abundance. Here, we present the application of microfluidic qPCR chip technology to simultaneously quantify and prepare amplicon sequence libraries for multiple N cycle functional genes as well as taxon-specific 16S rRNA gene markers for many samples. This approach, named as N cycle evaluation (NiCE) chip, was evaluated by using DNA from pure and artificially mixed bacterial cultures and by comparing the results with those obtained by conventional qPCR and amplicon sequencing methods. Quantitative results obtained by the NiCE chip were comparable to those obtained by conventional qPCR. In addition, the NiCE chip was successfully applied to examine abundance and diversity of N cycle functional genes in wastewater samples. Although non-specific amplification was detected on the NiCE chip, this could be overcome by optimizing the primer sequences in the future. As the NiCE chip can provide high-throughput format to quantify and prepare sequence libraries for multiple N cycle functional genes, this tool should advance our ability to explore N cycling in various samples. Importance. We report a novel approach, namely Nitrogen Cycle Evaluation (NiCE) chip by using microfluidic qPCR chip technology. By sequencing the amplicons recovered from the NiCE chip, we can assess diversities of the N cycle functional genes. The NiCE chip technology is applicable to analyze the temporal dynamics of the N cycle gene transcriptions in wastewater treatment bioreactors. The NiCE chip can provide high-throughput format to quantify and prepare sequence libraries for multiple N cycle functional genes. While there is a room for future improvement, this tool should significantly advance our ability to explore the N cycle in various environmental samples. Copyright © 2018 American Society for Microbiology.

  7. Developing High-Throughput HIV Incidence Assay with Pyrosequencing Platform

    PubMed Central

    Park, Sung Yong; Goeken, Nolan; Lee, Hyo Jin; Bolan, Robert; Dubé, Michael P.

    2014-01-01

    ABSTRACT Human immunodeficiency virus (HIV) incidence is an important measure for monitoring the epidemic and evaluating the efficacy of intervention and prevention trials. This study developed a high-throughput, single-measure incidence assay by implementing a pyrosequencing platform. We devised a signal-masking bioinformatics pipeline, which yielded a process error rate of 5.8 × 10−4 per base. The pipeline was then applied to analyze 18,434 envelope gene segments (HXB2 7212 to 7601) obtained from 12 incident and 24 chronic patients who had documented HIV-negative and/or -positive tests. The pyrosequencing data were cross-checked by using the single-genome-amplification (SGA) method to independently obtain 302 sequences from 13 patients. Using two genomic biomarkers that probe for the presence of similar sequences, the pyrosequencing platform correctly classified all 12 incident subjects (100% sensitivity) and 23 of 24 chronic subjects (96% specificity). One misclassified subject's chronic infection was correctly classified by conducting the same analysis with SGA data. The biomarkers were statistically associated across the two platforms, suggesting the assay's reproducibility and robustness. Sampling simulations showed that the biomarkers were tolerant of sequencing errors and template resampling, two factors most likely to affect the accuracy of pyrosequencing results. We observed comparable biomarker scores between AIDS and non-AIDS chronic patients (multivariate analysis of variance [MANOVA], P = 0.12), indicating that the stage of HIV disease itself does not affect the classification scheme. The high-throughput genomic HIV incidence marks a significant step toward determining incidence from a single measure in cross-sectional surveys. IMPORTANCE Annual HIV incidence, the number of newly infected individuals within a year, is the key measure of monitoring the epidemic's rise and decline. Developing reliable assays differentiating recent from chronic infections has been a long-standing quest in the HIV community. Over the past 15 years, these assays have traditionally measured various HIV-specific antibodies, but recent technological advancements have expanded the diversity of proposed accurate, user-friendly, and financially viable tools. Here we designed a high-throughput genomic HIV incidence assay based on the signature imprinted in the HIV gene sequence population. By combining next-generation sequencing techniques with bioinformatics analysis, we demonstrated that genomic fingerprints are capable of distinguishing recently infected patients from chronically infected patients with high precision. Our high-throughput platform is expected to allow us to process many patients' samples from a single experiment, permitting the assay to be cost-effective for routine surveillance. PMID:24371062

  8. Automated sequence analysis and editing software for HIV drug resistance testing.

    PubMed

    Struck, Daniel; Wallis, Carole L; Denisov, Gennady; Lambert, Christine; Servais, Jean-Yves; Viana, Raquel V; Letsoalo, Esrom; Bronze, Michelle; Aitken, Sue C; Schuurman, Rob; Stevens, Wendy; Schmit, Jean Claude; Rinke de Wit, Tobias; Perez Bercoff, Danielle

    2012-05-01

    Access to antiretroviral treatment in resource-limited-settings is inevitably paralleled by the emergence of HIV drug resistance. Monitoring treatment efficacy and HIV drugs resistance testing are therefore of increasing importance in resource-limited settings. Yet low-cost technologies and procedures suited to the particular context and constraints of such settings are still lacking. The ART-A (Affordable Resistance Testing for Africa) consortium brought together public and private partners to address this issue. To develop an automated sequence analysis and editing software to support high throughput automated sequencing. The ART-A Software was designed to automatically process and edit ABI chromatograms or FASTA files from HIV-1 isolates. The ART-A Software performs the basecalling, assigns quality values, aligns query sequences against a set reference, infers a consensus sequence, identifies the HIV type and subtype, translates the nucleotide sequence to amino acids and reports insertions/deletions, premature stop codons, ambiguities and mixed calls. The results can be automatically exported to Excel to identify mutations. Automated analysis was compared to manual analysis using a panel of 1624 PR-RT sequences generated in 3 different laboratories. Discrepancies between manual and automated sequence analysis were 0.69% at the nucleotide level and 0.57% at the amino acid level (668,047 AA analyzed), and discordances at major resistance mutations were recorded in 62 cases (4.83% of differences, 0.04% of all AA) for PR and 171 (6.18% of differences, 0.03% of all AA) cases for RT. The ART-A Software is a time-sparing tool for pre-analyzing HIV and viral quasispecies sequences in high throughput laboratories and highlighting positions requiring attention. Copyright © 2012 Elsevier B.V. All rights reserved.

  9. A whole-genome shotgun approach for assembling and anchoring the hexaploid bread wheat genome

    DOE PAGES

    Chapman, Jarrod A.; Mascher, Martin; Buluc, Aydin; ...

    2015-01-31

    We report that polyploid species have long been thought to be recalcitrant to whole-genome assembly. By combining high-throughput sequencing, recent developments in parallel computing, and genetic mapping, we derive, de novo, a sequence assembly representing 9.1 Gbp of the highly repetitive 16 Gbp genome of hexaploid wheat, Triticum aestivum, and assign 7.1 Gb of this assembly to chromosomal locations. The genome representation and accuracy of our assembly is comparable or even exceeds that of a chromosome-by-chromosome shotgun assembly. Our assembly and mapping strategy uses only short read sequencing technology and is applicable to any species where it is possible tomore » construct a mapping population.« less

  10. Deep sequencing methods for protein engineering and design.

    PubMed

    Wrenbeck, Emily E; Faber, Matthew S; Whitehead, Timothy A

    2017-08-01

    The advent of next-generation sequencing (NGS) has revolutionized protein science, and the development of complementary methods enabling NGS-driven protein engineering have followed. In general, these experiments address the functional consequences of thousands of protein variants in a massively parallel manner using genotype-phenotype linked high-throughput functional screens followed by DNA counting via deep sequencing. We highlight the use of information rich datasets to engineer protein molecular recognition. Examples include the creation of multiple dual-affinity Fabs targeting structurally dissimilar epitopes and engineering of a broad germline-targeted anti-HIV-1 immunogen. Additionally, we highlight the generation of enzyme fitness landscapes for conducting fundamental studies of protein behavior and evolution. We conclude with discussion of technological advances. Copyright © 2016 Elsevier Ltd. All rights reserved.

  11. A whole-genome shotgun approach for assembling and anchoring the hexaploid bread wheat genome

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Chapman, Jarrod A.; Mascher, Martin; Buluc, Aydin

    We report that polyploid species have long been thought to be recalcitrant to whole-genome assembly. By combining high-throughput sequencing, recent developments in parallel computing, and genetic mapping, we derive, de novo, a sequence assembly representing 9.1 Gbp of the highly repetitive 16 Gbp genome of hexaploid wheat, Triticum aestivum, and assign 7.1 Gb of this assembly to chromosomal locations. The genome representation and accuracy of our assembly is comparable or even exceeds that of a chromosome-by-chromosome shotgun assembly. Our assembly and mapping strategy uses only short read sequencing technology and is applicable to any species where it is possible tomore » construct a mapping population.« less

  12. Application of resequencing to rice genomics, functional genomics and evolutionary analysis

    PubMed Central

    2014-01-01

    Rice is a model system used for crop genomics studies. The completion of the rice genome draft sequences in 2002 not only accelerated functional genome studies, but also initiated a new era of resequencing rice genomes. Based on the reference genome in rice, next-generation sequencing (NGS) using the high-throughput sequencing system can efficiently accomplish whole genome resequencing of various genetic populations and diverse germplasm resources. Resequencing technology has been effectively utilized in evolutionary analysis, rice genomics and functional genomics studies. This technique is beneficial for both bridging the knowledge gap between genotype and phenotype and facilitating molecular breeding via gene design in rice. Here, we also discuss the limitation, application and future prospects of rice resequencing. PMID:25006357

  13. Ultrasensitive Single Fluorescence-Labeled Probe-Mediated Single Universal Primer-Multiplex-Droplet Digital Polymerase Chain Reaction for High-Throughput Genetically Modified Organism Screening.

    PubMed

    Niu, Chenqi; Xu, Yuancong; Zhang, Chao; Zhu, Pengyu; Huang, Kunlun; Luo, Yunbo; Xu, Wentao

    2018-05-01

    As genetically modified (GM) technology develops and genetically modified organisms (GMOs) become more available, GMOs face increasing regulations and pressure to adhere to strict labeling guidelines. A singleplex detection method cannot perform the high-throughput analysis necessary for optimal GMO detection. Combining the advantages of multiplex detection and droplet digital polymerase chain reaction (ddPCR), a single universal primer-multiplex-ddPCR (SUP-M-ddPCR) strategy was proposed for accurate broad-spectrum screening and quantification. The SUP increases efficiency of the primers in PCR and plays an important role in establishing a high-throughput, multiplex detection method. Emerging ddPCR technology has been used for accurate quantification of nucleic acid molecules without a standard curve. Using maize as a reference point, four heterologous sequences ( 35S, NOS, NPTII, and PAT) were selected to evaluate the feasibility and applicability of this strategy. Surprisingly, these four genes cover more than 93% of the transgenic maize lines and serve as preliminary screening sequences. All screening probes were labeled with FAM fluorescence, which allows the signals from the samples with GMO content and those without to be easily differentiated. This fiveplex screening method is a new development in GMO screening. Utilizing an optimal amplification assay, the specificity, limit of detection (LOD), and limit of quantitation (LOQ) were validated. The LOD and LOQ of this GMO screening method were 0.1% and 0.01%, respectively, with a relative standard deviation (RSD) < 25%. This method could serve as an important tool for the detection of GM maize from different processed, commercially available products. Further, this screening method could be applied to other fields that require reliable and sensitive detection of DNA targets.

  14. University of Texas MD Anderson Cancer Center: High-Throughput Screening Identifying Driving Mutations in Endometrial Cancer | Office of Cancer Genomics

    Cancer.gov

    Recent advances in next-generation sequencing technology have enabled the unprecedented characterization of a full spectrum of somatic alterations in cancer genomes. Given the large numbers of somatic mutations typically detected by this approach, a key challenge in the downstream analysis is to distinguish “drivers” that functionally contribute to tumorigenesis from “passengers” that occur as the consequence of genomic instability.

  15. High-throughput sequencing: a failure mode analysis.

    PubMed

    Yang, George S; Stott, Jeffery M; Smailus, Duane; Barber, Sarah A; Balasundaram, Miruna; Marra, Marco A; Holt, Robert A

    2005-01-04

    Basic manufacturing principles are becoming increasingly important in high-throughput sequencing facilities where there is a constant drive to increase quality, increase efficiency, and decrease operating costs. While high-throughput centres report failure rates typically on the order of 10%, the causes of sporadic sequencing failures are seldom analyzed in detail and have not, in the past, been formally reported. Here we report the results of a failure mode analysis of our production sequencing facility based on detailed evaluation of 9,216 ESTs generated from two cDNA libraries. Two categories of failures are described; process-related failures (failures due to equipment or sample handling) and template-related failures (failures that are revealed by close inspection of electropherograms and are likely due to properties of the template DNA sequence itself). Preventative action based on a detailed understanding of failure modes is likely to improve the performance of other production sequencing pipelines.

  16. Alignment-free design of highly discriminatory diagnostic primer sets for Escherichia coli O104:H4 outbreak strains.

    PubMed

    Pritchard, Leighton; Holden, Nicola J; Bielaszewska, Martina; Karch, Helge; Toth, Ian K

    2012-01-01

    An Escherichia coli O104:H4 outbreak in Germany in summer 2011 caused 53 deaths, over 4000 individual infections across Europe, and considerable economic, social and political impact. This outbreak was the first in a position to exploit rapid, benchtop high-throughput sequencing (HTS) technologies and crowdsourced data analysis early in its investigation, establishing a new paradigm for rapid response to disease threats. We describe a novel strategy for design of diagnostic PCR primers that exploited this rapid draft bacterial genome sequencing to distinguish between E. coli O104:H4 outbreak isolates and other pathogenic E. coli isolates, including the historical hæmolytic uræmic syndrome (HUSEC) E. coli HUSEC041 O104:H4 strain, which possesses the same serotype as the outbreak isolates. Primers were designed using a novel alignment-free strategy against eleven draft whole genome assemblies of E. coli O104:H4 German outbreak isolates from the E. coli O104:H4 Genome Analysis Crowd-Sourcing Consortium website, and a negative sequence set containing 69 E. coli chromosome and plasmid sequences from public databases. Validation in vitro against 21 'positive' E. coli O104:H4 outbreak and 32 'negative' non-outbreak EHEC isolates indicated that individual primer sets exhibited 100% sensitivity for outbreak isolates, with false positive rates of between 9% and 22%. A minimal combination of two primers discriminated between outbreak and non-outbreak E. coli isolates with 100% sensitivity and 100% specificity. Draft genomes of isolates of disease outbreak bacteria enable high throughput primer design and enhanced diagnostic performance in comparison to traditional molecular assays. Future outbreak investigations will be able to harness HTS rapidly to generate draft genome sequences and diagnostic primer sets, greatly facilitating epidemiology and clinical diagnostics. We expect that high throughput primer design strategies will enable faster, more precise responses to future disease outbreaks of bacterial origin, and help to mitigate their societal impact.

  17. Human Y chromosome copy number variation in the next generation sequencing era and beyond.

    PubMed

    Massaia, Andrea; Xue, Yali

    2017-05-01

    The human Y chromosome provides a fertile ground for structural rearrangements owing to its haploidy and high content of repeated sequences. The methodologies used for copy number variation (CNV) studies have developed over the years. Low-throughput techniques based on direct observation of rearrangements were developed early on, and are still used, often to complement array-based or sequencing approaches which have limited power in regions with high repeat content and specifically in the presence of long, identical repeats, such as those found in human sex chromosomes. Some specific rearrangements have been investigated for decades; because of their effects on fertility, or their outstanding evolutionary features, the interest in these has not diminished. However, following the flourishing of large-scale genomics, several studies have investigated CNVs across the whole chromosome. These studies sometimes employ data generated within large genomic projects such as the DDD study or the 1000 Genomes Project, and often survey large samples of healthy individuals without any prior selection. Novel technologies based on sequencing long molecules and combinations of technologies, promise to stimulate the study of Y-CNVs in the immediate future.

  18. Proteome Studies of Filamentous Fungi

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Baker, Scott E.; Panisko, Ellen A.

    2011-04-20

    The continued fast pace of fungal genome sequence generation has enabled proteomic analysis of a wide breadth of organisms that span the breadth of the Kingdom Fungi. There is some phylogenetic bias to the current catalog of fungi with reasonable DNA sequence databases (genomic or EST) that could be analyzed at a global proteomic level. However, the rapid development of next generation sequencing platforms has lowered the cost of genome sequencing such that in the near future, having a genome sequence will no longer be a time or cost bottleneck for downstream proteomic (and transcriptomic) analyses. High throughput, non-gel basedmore » proteomics offers a snapshot of proteins present in a given sample at a single point in time. There are a number of different variations on the general method and technologies for identifying peptides in a given sample. We present a method that can serve as a “baseline” for proteomic studies of fungi.« less

  19. Sequencing-based diagnostics for pediatric genetic diseases: progress and potential

    PubMed Central

    Tayoun, Ahmad Abou; Krock, Bryan; Spinner, Nancy B.

    2016-01-01

    Introduction The last two decades have witnessed revolutionary changes in clinical diagnostics, fueled by the Human Genome Project and advances in high throughput, Next Generation Sequencing (NGS). We review the current state of sequencing-based pediatric diagnostics, associated challenges, and future prospects. Areas Covered We present an overview of genetic disease in children, review the technical aspects of Next Generation Sequencing and the strategies to make molecular diagnoses for children with genetic disease. We discuss the challenges of genomic sequencing including incomplete current knowledge of variants, lack of data about certain genomic regions, mosaicism, and the presence of regions with high homology. Expert Commentary NGS has been a transformative technology and the gap between the research and clinical communities has never been so narrow. Therapeutic interventions are emerging based on genomic findings and the applications of NGS are progressing to prenatal genetics, epigenomics and transcriptomics. PMID:27388938

  20. Shotgun Optical Maps of the Whole Escherichia coli O157:H7 Genome

    PubMed Central

    Lim, Alex; Dimalanta, Eileen T.; Potamousis, Konstantinos D.; Yen, Galex; Apodoca, Jennifer; Tao, Chunhong; Lin, Jieyi; Qi, Rong; Skiadas, John; Ramanathan, Arvind; Perna, Nicole T.; Plunkett, Guy; Burland, Valerie; Mau, Bob; Hackett, Jeremiah; Blattner, Frederick R.; Anantharaman, Thomas S.; Mishra, Bhubaneswar; Schwartz, David C.

    2001-01-01

    We have constructed NheI and XhoI optical maps of Escherichia coli O157:H7 solely from genomic DNA molecules to provide a uniquely valuable scaffold for contig closure and sequence validation. E. coli O157:H7 is a common pathogen found in contaminated food and water. Our approach obviated the need for the analysis of clones, PCR products, and hybridizations, because maps were constructed from ensembles of single DNA molecules. Shotgun sequencing of bacterial genomes remains labor-intensive, despite advances in sequencing technology. This is partly due to manual intervention required during the last stages of finishing. The applicability of optical mapping to this problem was enhanced by advances in machine vision techniques that improved mapping throughput and created a path to full automation of mapping. Comparisons were made between maps and sequence data that characterized sequence gaps and guided nascent assemblies. PMID:11544203

  1. Proteome studies of filamentous fungi.

    PubMed

    Baker, Scott E; Panisko, Ellen A

    2011-01-01

    The continued fast pace of fungal genome sequence generation has enabled proteomic analysis of a wide variety of organisms that span the breadth of the Kingdom Fungi. There is some phylogenetic bias to the current catalog of fungi with reasonable DNA sequence databases (genomic or EST) that could be analyzed at a global proteomic level. However, the rapid development of next generation sequencing platforms has lowered the cost of genome sequencing such that in the near future, having a genome sequence will no longer be a time or cost bottleneck for downstream proteomic (and transcriptomic) analyses. High throughput, nongel-based proteomics offers a snapshot of proteins present in a given sample at a single point in time. There are a number of variations on the general methods and technologies for identifying peptides in a given sample. We present a method that can serve as a "baseline" for proteomic studies of fungi.

  2. Analysis of the whole mitochondrial genome: translation of the Ion Torrent Personal Genome Machine system to the diagnostic bench?

    PubMed

    Seneca, Sara; Vancampenhout, Kim; Van Coster, Rudy; Smet, Joél; Lissens, Willy; Vanlander, Arnaud; De Paepe, Boel; Jonckheere, An; Stouffs, Katrien; De Meirleir, Linda

    2015-01-01

    Next-generation sequencing (NGS), an innovative sequencing technology that enables the successful analysis of numerous gene sequences in a massive parallel sequencing approach, has revolutionized the field of molecular biology. Although NGS was introduced in a rather recent past, the technology has already demonstrated its potential and effectiveness in many research projects, and is now on the verge of being introduced into the diagnostic setting of routine laboratories to delineate the molecular basis of genetic disease in undiagnosed patient samples. We tested a benchtop device on retrospective genomic DNA (gDNA) samples of controls and patients with a clinical suspicion of a mitochondrial DNA disorder. This Ion Torrent Personal Genome Machine platform is a high-throughput sequencer with a fast turnaround time and reasonable running costs. We challenged the chemistry and technology with the analysis and processing of a mutational spectrum composed of samples with single-nucleotide substitutions, indels (insertions and deletions) and large single or multiple deletions, occasionally in heteroplasmy. The output data were compared with previously obtained conventional dideoxy sequencing results and the mitochondrial revised Cambridge Reference Sequence (rCRS). We were able to identify the majority of all nucleotide alterations, but three false-negative results were also encountered in the data set. At the same time, the poor performance of the PGM instrument in regions associated with homopolymeric stretches generated many false-positive miscalls demanding additional manual curation of the data.

  3. Analysis and Testing of Mobile Wireless Networks

    NASA Technical Reports Server (NTRS)

    Alena, Richard; Evenson, Darin; Rundquist, Victor; Clancy, Daniel (Technical Monitor)

    2002-01-01

    Wireless networks are being used to connect mobile computing elements in more applications as the technology matures. There are now many products (such as 802.11 and 802.11b) which ran in the ISM frequency band and comply with wireless network standards. They are being used increasingly to link mobile Intranet into Wired networks. Standard methods of analyzing and testing their performance and compatibility are needed to determine the limits of the technology. This paper presents analytical and experimental methods of determining network throughput, range and coverage, and interference sources. Both radio frequency (BE) domain and network domain analysis have been applied to determine wireless network throughput and range in the outdoor environment- Comparison of field test data taken under optimal conditions, with performance predicted from RF analysis, yielded quantitative results applicable to future designs. Layering multiple wireless network- sooners can increase performance. Wireless network components can be set to different radio frequency-hopping sequences or spreading functions, allowing more than one sooner to coexist. Therefore, we ran multiple 802.11-compliant systems concurrently in the same geographical area to determine interference effects and scalability, The results can be used to design of more robust networks which have multiple layers of wireless data communication paths and provide increased throughput overall.

  4. Next Generation Sequencing Technology and Genomewide Data Analysis: Perspectives for Retinal Research

    PubMed Central

    Chaitankar, Vijender; Karakülah, Gökhan; Ratnapriya, Rinki; Giuste, Felipe O.; Brooks, Matthew J.; Swaroop, Anand

    2016-01-01

    The advent of high throughput next generation sequencing (NGS) has accelerated the pace of discovery of disease-associated genetic variants and genomewide profiling of expressed sequences and epigenetic marks, thereby permitting systems-based analyses of ocular development and disease. Rapid evolution of NGS and associated methodologies presents significant challenges in acquisition, management, and analysis of large data sets and for extracting biologically or clinically relevant information. Here we illustrate the basic design of commonly used NGS-based methods, specifically whole exome sequencing, transcriptome, and epigenome profiling, and provide recommendations for data analyses. We briefly discuss systems biology approaches for integrating multiple data sets to elucidate gene regulatory or disease networks. While we provide examples from the retina, the NGS guidelines reviewed here are applicable to other tissues/cell types as well. PMID:27297499

  5. Pseudouridines have context-dependent mutation and stop rates in high-throughput sequencing.

    PubMed

    Zhou, Katherine I; Clark, Wesley C; Pan, David W; Eckwahl, Matthew J; Dai, Qing; Pan, Tao

    2018-05-11

    The abundant RNA modification pseudouridine (Ψ) has been mapped transcriptome-wide by chemically modifying pseudouridines with carbodiimide and detecting the resulting reverse transcription stops in high-throughput sequencing. However, these methods have limited sensitivity and specificity, in part due to the use of reverse transcription stops. We sought to use mutations rather than just stops in sequencing data to identify pseudouridine sites. Here, we identify reverse transcription conditions that allow read-through of carbodiimide-modified pseudouridine (CMC-Ψ), and we show that pseudouridines in carbodiimide-treated human ribosomal RNA have context-dependent mutation and stop rates in high-throughput sequencing libraries prepared under these conditions. Furthermore, accounting for the context-dependence of mutation and stop rates can enhance the detection of pseudouridine sites. Similar approaches could contribute to the sequencing-based detection of many RNA modifications.

  6. Optimization of High-Throughput Sequencing Kinetics for determining enzymatic rate constants of thousands of RNA substrates

    PubMed Central

    Niland, Courtney N.; Jankowsky, Eckhard; Harris, Michael E.

    2016-01-01

    Quantification of the specificity of RNA binding proteins and RNA processing enzymes is essential to understanding their fundamental roles in biological processes. High Throughput Sequencing Kinetics (HTS-Kin) uses high throughput sequencing and internal competition kinetics to simultaneously monitor the processing rate constants of thousands of substrates by RNA processing enzymes. This technique has provided unprecedented insight into the substrate specificity of the tRNA processing endonuclease ribonuclease P. Here, we investigate the accuracy and robustness of measurements associated with each step of the HTS-Kin procedure. We examine the effect of substrate concentration on the observed rate constant, determine the optimal kinetic parameters, and provide guidelines for reducing error in amplification of the substrate population. Importantly, we find that high-throughput sequencing, and experimental reproducibility contribute their own sources of error, and these are the main sources of imprecision in the quantified results when otherwise optimized guidelines are followed. PMID:27296633

  7. Transcriptome Sequencing of Gracilariopsis lemaneiformis to Analyze the Genes Related to Optically Active Phycoerythrin Synthesis.

    PubMed

    Huang, Xiaoyun; Zang, Xiaonan; Wu, Fei; Jin, Yuming; Wang, Haitao; Liu, Chang; Ding, Yating; He, Bangxiang; Xiao, Dongfang; Song, Xinwei; Liu, Zhu

    2017-01-01

    Gracilariopsis lemaneiformis (aka Gracilaria lemaneiformis) is a red macroalga rich in phycoerythrin, which can capture light efficiently and transfer it to photosystemⅡ. However, little is known about the synthesis of optically active phycoerythrinin in G. lemaneiformis at the molecular level. With the advent of high-throughput sequencing technology, analysis of genetic information for G. lemaneiformis by transcriptome sequencing is an effective means to get a deeper insight into the molecular mechanism of phycoerythrin synthesis. Illumina technology was employed to sequence the transcriptome of two strains of G. lemaneiformis- the wild type and a green-pigmented mutant. We obtained a total of 86915 assembled unigenes as a reference gene set, and 42884 unigenes were annotated in at least one public database. Taking the above transcriptome sequencing as a reference gene set, 4041 differentially expressed genes were screened to analyze and compare the gene expression profiles of the wild type and green mutant. By GO and KEGG pathway analysis, we concluded that three factors, including a reduction in the expression level of apo-phycoerythrin, an increase of chlorophyll light-harvesting complex synthesis, and reduction of phycoerythrobilin by competitive inhibition, caused the reduction of optically active phycoerythrin in the green-pigmented mutant.

  8. A genome resource to address mechanisms of developmental programming: determination of the fetal sheep heart transcriptome.

    PubMed

    Cox, Laura A; Glenn, Jeremy P; Spradling, Kimberly D; Nijland, Mark J; Garcia, Roy; Nathanielsz, Peter W; Ford, Stephen P

    2012-06-15

    The pregnant sheep has provided seminal insights into reproduction related to animal and human development (ovarian function, fertility, implantation, fetal growth, parturition and lactation). Fetal sheep physiology has been extensively studied since 1950, contributing significantly to the basis for our understanding of many aspects of fetal development and behaviour that remain in use in clinical practice today. Understanding mechanisms requires the combination of systems approaches uniquely available in fetal sheep with the power of genomic studies. Absence of the full range of sheep genomic resources has limited the full realization of the power of this model, impeding progress in emerging areas of pregnancy biology such as developmental programming. We have examined the expressed fetal sheep heart transcriptome using high-throughput sequencing technologies. In so doing we identified 36,737 novel transcripts and describe genes, gene variants and pathways relevant to fundamental developmental mechanisms. Genes with the highest expression levels and with novel exons in the fetal heart transcriptome are known to play central roles in muscle development. We show that high-throughput sequencing methods can generate extensive transcriptome information in the absence of an assembled and annotated genome for that species. The gene sequence data obtained provide a unique genomic resource for sheep specific genetic technology development and, combined with the polymorphism data, augment annotation and assembly of the sheep genome. In addition, identification and pathway analysis of novel fetal sheep heart transcriptome splice variants is a first step towards revealing mechanisms of genetic variation and gene environment interactions during fetal heart development.

  9. Transcriptionally active PCR for antigen identification and vaccine development: in vitro genome-wide screening and in vivo immunogenicity

    PubMed Central

    Regis, David P.; Dobaño, Carlota; Quiñones-Olson, Paola; Liang, Xiaowu; Graber, Norma L.; Stefaniak, Maureen E.; Campo, Joseph J.; Carucci, Daniel J.; Roth, David A.; He, Huaping; Felgner, Philip L.; Doolan, Denise L.

    2009-01-01

    We have evaluated a technology called Transcriptionally Active PCR (TAP) for high throughput identification and prioritization of novel target antigens from genomic sequence data using the Plasmodium parasite, the causative agent of malaria, as a model. First, we adapted the TAP technology for the highly AT-rich Plasmodium genome, using well-characterized P. falciparum and P. yoelii antigens and a small panel of uncharacterized open reading frames from the P. falciparum genome sequence database. We demonstrated that TAP fragments encoding six well-characterized P. falciparum antigens and five well-characterized P. yoelii antigens could be amplified in an equivalent manner from both plasmid DNA and genomic DNA templates, and that uncharacterized open reading frames could also be amplified from genomic DNA template. Second, we showed that the in vitro expression of the TAP fragments was equivalent or superior to that of supercoiled plasmid DNA encoding the same antigen. Third, we evaluated the in vivo immunogenicity of TAP fragments encoding a subset of the model P. falciparum and P. yoelii antigens. We found that antigen-specific antibody and cellular immune responses induced by the TAP fragments in mice were equivalent or superior to those induced by the corresponding plasmid DNA vaccines. Finally, we developed and demonstrated proof-of-principle for an in vitro humoral immunoscreening assay for down-selection of novel target antigens. These data support the potential of a TAP approach for rapid high throughput functional screening and identification of potential candidate vaccine antigens from genomic sequence data. PMID:18164079

  10. Transcriptionally active PCR for antigen identification and vaccine development: in vitro genome-wide screening and in vivo immunogenicity.

    PubMed

    Regis, David P; Dobaño, Carlota; Quiñones-Olson, Paola; Liang, Xiaowu; Graber, Norma L; Stefaniak, Maureen E; Campo, Joseph J; Carucci, Daniel J; Roth, David A; He, Huaping; Felgner, Philip L; Doolan, Denise L

    2008-03-01

    We have evaluated a technology called transcriptionally active PCR (TAP) for high throughput identification and prioritization of novel target antigens from genomic sequence data using the Plasmodium parasite, the causative agent of malaria, as a model. First, we adapted the TAP technology for the highly AT-rich Plasmodium genome, using well-characterized P. falciparum and P. yoelii antigens and a small panel of uncharacterized open reading frames from the P. falciparum genome sequence database. We demonstrated that TAP fragments encoding six well-characterized P. falciparum antigens and five well-characterized P. yoelii antigens could be amplified in an equivalent manner from both plasmid DNA and genomic DNA templates, and that uncharacterized open reading frames could also be amplified from genomic DNA template. Second, we showed that the in vitro expression of the TAP fragments was equivalent or superior to that of supercoiled plasmid DNA encoding the same antigen. Third, we evaluated the in vivo immunogenicity of TAP fragments encoding a subset of the model P. falciparum and P. yoelii antigens. We found that antigen-specific antibody and cellular immune responses induced by the TAP fragments in mice were equivalent or superior to those induced by the corresponding plasmid DNA vaccines. Finally, we developed and demonstrated proof-of-principle for an in vitro humoral immunoscreening assay for down-selection of novel target antigens. These data support the potential of a TAP approach for rapid high throughput functional screening and identification of potential candidate vaccine antigens from genomic sequence data.

  11. A genome resource to address mechanisms of developmental programming: determination of the fetal sheep heart transcriptome

    PubMed Central

    Cox, Laura A; Glenn, Jeremy P; Spradling, Kimberly D; Nijland, Mark J; Garcia, Roy; Nathanielsz, Peter W; Ford, Stephen P

    2012-01-01

    The pregnant sheep has provided seminal insights into reproduction related to animal and human development (ovarian function, fertility, implantation, fetal growth, parturition and lactation). Fetal sheep physiology has been extensively studied since 1950, contributing significantly to the basis for our understanding of many aspects of fetal development and behaviour that remain in use in clinical practice today. Understanding mechanisms requires the combination of systems approaches uniquely available in fetal sheep with the power of genomic studies. Absence of the full range of sheep genomic resources has limited the full realization of the power of this model, impeding progress in emerging areas of pregnancy biology such as developmental programming. We have examined the expressed fetal sheep heart transcriptome using high-throughput sequencing technologies. In so doing we identified 36,737 novel transcripts and describe genes, gene variants and pathways relevant to fundamental developmental mechanisms. Genes with the highest expression levels and with novel exons in the fetal heart transcriptome are known to play central roles in muscle development. We show that high-throughput sequencing methods can generate extensive transcriptome information in the absence of an assembled and annotated genome for that species. The gene sequence data obtained provide a unique genomic resource for sheep specific genetic technology development and, combined with the polymorphism data, augment annotation and assembly of the sheep genome. In addition, identification and pathway analysis of novel fetal sheep heart transcriptome splice variants is a first step towards revealing mechanisms of genetic variation and gene environment interactions during fetal heart development. PMID:22508961

  12. Metagenomic survey of bacterial diversity in the atmosphere of Mexico City using different sampling methods.

    PubMed

    Serrano-Silva, N; Calderón-Ezquerro, M C

    2018-04-01

    The identification of airborne bacteria has traditionally been performed by retrieval in culture media, but the bacterial diversity in the air is underestimated using this method because many bacteria are not readily cultured. Advances in DNA sequencing technology have produced a broad knowledge of genomics and metagenomics, which can greatly improve our ability to identify and study the diversity of airborne bacteria. However, researchers are facing several challenges, particularly the efficient retrieval of low-density microorganisms from the air and the lack of standardized protocols for sample collection and processing. In this study, we tested three methods for sampling bioaerosols - a Durham-type spore trap (Durham), a seven-day recording volumetric spore trap (HST), and a high-throughput 'Jet' spore and particle sampler (Jet) - and recovered metagenomic DNA for 16S rDNA sequencing. Samples were simultaneously collected with the three devices during one week, and the sequencing libraries were analyzed. A simple and efficient method for collecting bioaerosols and extracting good quality DNA for high-throughput sequencing was standardized. The Durham sampler collected preferentially Cyanobacteria, the HST Actinobacteria, Proteobacteria and Firmicutes, and the Jet mainly Proteobacteria and Firmicutes. The HST sampler collected the largest amount of airborne bacterial diversity. More experiments are necessary to select the right sampler, depending on study objectives, which may require monitoring and collecting specific airborne bacteria. Copyright © 2017 Elsevier Ltd. All rights reserved.

  13. Missing data and technical variability in single-cell RNA-sequencing experiments.

    PubMed

    Hicks, Stephanie C; Townes, F William; Teng, Mingxiang; Irizarry, Rafael A

    2017-11-06

    Until recently, high-throughput gene expression technology, such as RNA-Sequencing (RNA-seq) required hundreds of thousands of cells to produce reliable measurements. Recent technical advances permit genome-wide gene expression measurement at the single-cell level. Single-cell RNA-Seq (scRNA-seq) is the most widely used and numerous publications are based on data produced with this technology. However, RNA-seq and scRNA-seq data are markedly different. In particular, unlike RNA-seq, the majority of reported expression levels in scRNA-seq are zeros, which could be either biologically-driven, genes not expressing RNA at the time of measurement, or technically-driven, genes expressing RNA, but not at a sufficient level to be detected by sequencing technology. Another difference is that the proportion of genes reporting the expression level to be zero varies substantially across single cells compared to RNA-seq samples. However, it remains unclear to what extent this cell-to-cell variation is being driven by technical rather than biological variation. Furthermore, while systematic errors, including batch effects, have been widely reported as a major challenge in high-throughput technologies, these issues have received minimal attention in published studies based on scRNA-seq technology. Here, we use an assessment experiment to examine data from published studies and demonstrate that systematic errors can explain a substantial percentage of observed cell-to-cell expression variability. Specifically, we present evidence that some of these reported zeros are driven by technical variation by demonstrating that scRNA-seq produces more zeros than expected and that this bias is greater for lower expressed genes. In addition, this missing data problem is exacerbated by the fact that this technical variation varies cell-to-cell. Then, we show how this technical cell-to-cell variability can be confused with novel biological results. Finally, we demonstrate and discuss how batch-effects and confounded experiments can intensify the problem. © The Author 2017. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  14. Polymorphism discovery and allele frequency estimation using high-throughput DNA sequencing of target-enriched pooled DNA samples

    PubMed Central

    2012-01-01

    Background The central role of the somatotrophic axis in animal post-natal growth, development and fertility is well established. Therefore, the identification of genetic variants affecting quantitative traits within this axis is an attractive goal. However, large sample numbers are a pre-requisite for the identification of genetic variants underlying complex traits and although technologies are improving rapidly, high-throughput sequencing of large numbers of complete individual genomes remains prohibitively expensive. Therefore using a pooled DNA approach coupled with target enrichment and high-throughput sequencing, the aim of this study was to identify polymorphisms and estimate allele frequency differences across 83 candidate genes of the somatotrophic axis, in 150 Holstein-Friesian dairy bulls divided into two groups divergent for genetic merit for fertility. Results In total, 4,135 SNPs and 893 indels were identified during the resequencing of the 83 candidate genes. Nineteen percent (n = 952) of variants were located within 5' and 3' UTRs. Seventy-two percent (n = 3,612) were intronic and 9% (n = 464) were exonic, including 65 indels and 236 SNPs resulting in non-synonymous substitutions (NSS). Significant (P < 0.01) mean allele frequency differentials between the low and high fertility groups were observed for 720 SNPs (58 NSS). Allele frequencies for 43 of the SNPs were also determined by genotyping the 150 individual animals (Sequenom® MassARRAY). No significant differences (P > 0.1) were observed between the two methods for any of the 43 SNPs across both pools (i.e., 86 tests in total). Conclusions The results of the current study support previous findings of the use of DNA sample pooling and high-throughput sequencing as a viable strategy for polymorphism discovery and allele frequency estimation. Using this approach we have characterised the genetic variation within genes of the somatotrophic axis and related pathways, central to mammalian post-natal growth and development and subsequent lactogenesis and fertility. We have identified a large number of variants segregating at significantly different frequencies between cattle groups divergent for calving interval plausibly harbouring causative variants contributing to heritable variation. To our knowledge, this is the first report describing sequencing of targeted genomic regions in any livestock species using groups with divergent phenotypes for an economically important trait. PMID:22235840

  15. SNiPlay: a web-based tool for detection, management and analysis of SNPs. Application to grapevine diversity projects.

    PubMed

    Dereeper, Alexis; Nicolas, Stéphane; Le Cunff, Loïc; Bacilieri, Roberto; Doligez, Agnès; Peros, Jean-Pierre; Ruiz, Manuel; This, Patrice

    2011-05-05

    High-throughput re-sequencing, new genotyping technologies and the availability of reference genomes allow the extensive characterization of Single Nucleotide Polymorphisms (SNPs) and insertion/deletion events (indels) in many plant species. The rapidly increasing amount of re-sequencing and genotyping data generated by large-scale genetic diversity projects requires the development of integrated bioinformatics tools able to efficiently manage, analyze, and combine these genetic data with genome structure and external data. In this context, we developed SNiPlay, a flexible, user-friendly and integrative web-based tool dedicated to polymorphism discovery and analysis. It integrates:1) a pipeline, freely accessible through the internet, combining existing softwares with new tools to detect SNPs and to compute different types of statistical indices and graphical layouts for SNP data. From standard sequence alignments, genotyping data or Sanger sequencing traces given as input, SNiPlay detects SNPs and indels events and outputs submission files for the design of Illumina's SNP chips. Subsequently, it sends sequences and genotyping data into a series of modules in charge of various processes: physical mapping to a reference genome, annotation (genomic position, intron/exon location, synonymous/non-synonymous substitutions), SNP frequency determination in user-defined groups, haplotype reconstruction and network, linkage disequilibrium evaluation, and diversity analysis (Pi, Watterson's Theta, Tajima's D).Furthermore, the pipeline allows the use of external data (such as phenotype, geographic origin, taxa, stratification) to define groups and compare statistical indices.2) a database storing polymorphisms, genotyping data and grapevine sequences released by public and private projects. It allows the user to retrieve SNPs using various filters (such as genomic position, missing data, polymorphism type, allele frequency), to compare SNP patterns between populations, and to export genotyping data or sequences in various formats. Our experiments on grapevine genetic projects showed that SNiPlay allows geneticists to rapidly obtain advanced results in several key research areas of plant genetic diversity. Both the management and treatment of large amounts of SNP data are rendered considerably easier for end-users through automation and integration. Current developments are taking into account new advances in high-throughput technologies.SNiPlay is available at: http://sniplay.cirad.fr/.

  16. New Challenges of the Computation of Multiple Sequence Alignments in the High-Throughput Era (2010 JGI/ANL HPC Workshop)

    ScienceCinema

    Notredame, Cedric

    2018-05-02

    Cedric Notredame from the Centre for Genomic Regulation gives a presentation on New Challenges of the Computation of Multiple Sequence Alignments in the High-Throughput Era at the JGI/Argonne HPC Workshop on January 26, 2010.

  17. Investigation of Human Cancers for Retrovirus by Low-Stringency Target Enrichment and High-Throughput Sequencing.

    PubMed

    Vinner, Lasse; Mourier, Tobias; Friis-Nielsen, Jens; Gniadecki, Robert; Dybkaer, Karen; Rosenberg, Jacob; Langhoff, Jill Levin; Cruz, David Flores Santa; Fonager, Jannik; Izarzugaza, Jose M G; Gupta, Ramneek; Sicheritz-Ponten, Thomas; Brunak, Søren; Willerslev, Eske; Nielsen, Lars Peter; Hansen, Anders Johannes

    2015-08-19

    Although nearly one fifth of all human cancers have an infectious aetiology, the causes for the majority of cancers remain unexplained. Despite the enormous data output from high-throughput shotgun sequencing, viral DNA in a clinical sample typically constitutes a proportion of host DNA that is too small to be detected. Sequence variation among virus genomes complicates application of sequence-specific, and highly sensitive, PCR methods. Therefore, we aimed to develop and characterize a method that permits sensitive detection of sequences despite considerable variation. We demonstrate that our low-stringency in-solution hybridization method enables detection of <100 viral copies. Furthermore, distantly related proviral sequences may be enriched by orders of magnitude, enabling discovery of hitherto unknown viral sequences by high-throughput sequencing. The sensitivity was sufficient to detect retroviral sequences in clinical samples. We used this method to conduct an investigation for novel retrovirus in samples from three cancer types. In accordance with recent studies our investigation revealed no retroviral infections in human B-cell lymphoma cells, cutaneous T-cell lymphoma or colorectal cancer biopsies. Nonetheless, our generally applicable method makes sensitive detection possible and permits sequencing of distantly related sequences from complex material.

  18. Targeted Capture and High-Throughput Sequencing Using Molecular Inversion Probes (MIPs).

    PubMed

    Cantsilieris, Stuart; Stessman, Holly A; Shendure, Jay; Eichler, Evan E

    2017-01-01

    Molecular inversion probes (MIPs) in combination with massively parallel DNA sequencing represent a versatile, yet economical tool for targeted sequencing of genomic DNA. Several thousand genomic targets can be selectively captured using long oligonucleotides containing unique targeting arms and universal linkers. The ability to append sequencing adaptors and sample-specific barcodes allows large-scale pooling and subsequent high-throughput sequencing at relatively low cost per sample. Here, we describe a "wet bench" protocol detailing the capture and subsequent sequencing of >2000 genomic targets from 192 samples, representative of a single lane on the Illumina HiSeq 2000 platform.

  19. Principles of gene microarray data analysis.

    PubMed

    Mocellin, Simone; Rossi, Carlo Riccardo

    2007-01-01

    The development of several gene expression profiling methods, such as comparative genomic hybridization (CGH), differential display, serial analysis of gene expression (SAGE), and gene microarray, together with the sequencing of the human genome, has provided an opportunity to monitor and investigate the complex cascade of molecular events leading to tumor development and progression. The availability of such large amounts of information has shifted the attention of scientists towards a nonreductionist approach to biological phenomena. High throughput technologies can be used to follow changing patterns of gene expression over time. Among them, gene microarray has become prominent because it is easier to use, does not require large-scale DNA sequencing, and allows for the parallel quantification of thousands of genes from multiple samples. Gene microarray technology is rapidly spreading worldwide and has the potential to drastically change the therapeutic approach to patients affected with tumor. Therefore, it is of paramount importance for both researchers and clinicians to know the principles underlying the analysis of the huge amount of data generated with microarray technology.

  20. Molecular profiling of multiple myeloma: from gene expression analysis to next-generation sequencing.

    PubMed

    Agnelli, Luca; Tassone, Pierfrancesco; Neri, Antonino

    2013-06-01

    Multiple myeloma is a fatal malignant proliferation of clonal bone marrow Ig-secreting plasma cells, characterized by wide clinical, biological, and molecular heterogeneity. Herein, global gene and microRNA expression, genome-wide DNA profilings, and next-generation sequencing technology used to investigate the genomic alterations underlying the bio-clinical heterogeneity in multiple myeloma are discussed. High-throughput technologies have undoubtedly allowed a better comprehension of the molecular basis of the disease, a fine stratification, and early identification of high-risk patients, and have provided insights toward targeted therapy studies. However, such technologies are at risk of being affected by laboratory- or cohort-specific biases, and are moreover influenced by high number of expected false positives. This aspect has a major weight in myeloma, which is characterized by large molecular heterogeneity. Therefore, meta-analysis as well as multiple approaches are desirable if not mandatory to validate the results obtained, in line with commonly accepted recommendation for tumor diagnostic/prognostic biomarker studies.

  1. Protein Sequence Annotation Tool (PSAT): A centralized web-based meta-server for high-throughput sequence annotations

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Leung, Elo; Huang, Amy; Cadag, Eithon

    In this study, we introduce the Protein Sequence Annotation Tool (PSAT), a web-based, sequence annotation meta-server for performing integrated, high-throughput, genome-wide sequence analyses. Our goals in building PSAT were to (1) create an extensible platform for integration of multiple sequence-based bioinformatics tools, (2) enable functional annotations and enzyme predictions over large input protein fasta data sets, and (3) provide a web interface for convenient execution of the tools. In this paper, we demonstrate the utility of PSAT by annotating the predicted peptide gene products of Herbaspirillum sp. strain RV1423, importing the results of PSAT into EC2KEGG, and using the resultingmore » functional comparisons to identify a putative catabolic pathway, thereby distinguishing RV1423 from a well annotated Herbaspirillum species. This analysis demonstrates that high-throughput enzyme predictions, provided by PSAT processing, can be used to identify metabolic potential in an otherwise poorly annotated genome. Lastly, PSAT is a meta server that combines the results from several sequence-based annotation and function prediction codes, and is available at http://psat.llnl.gov/psat/. PSAT stands apart from other sequencebased genome annotation systems in providing a high-throughput platform for rapid de novo enzyme predictions and sequence annotations over large input protein sequence data sets in FASTA. PSAT is most appropriately applied in annotation of large protein FASTA sets that may or may not be associated with a single genome.« less

  2. Protein Sequence Annotation Tool (PSAT): A centralized web-based meta-server for high-throughput sequence annotations

    DOE PAGES

    Leung, Elo; Huang, Amy; Cadag, Eithon; ...

    2016-01-20

    In this study, we introduce the Protein Sequence Annotation Tool (PSAT), a web-based, sequence annotation meta-server for performing integrated, high-throughput, genome-wide sequence analyses. Our goals in building PSAT were to (1) create an extensible platform for integration of multiple sequence-based bioinformatics tools, (2) enable functional annotations and enzyme predictions over large input protein fasta data sets, and (3) provide a web interface for convenient execution of the tools. In this paper, we demonstrate the utility of PSAT by annotating the predicted peptide gene products of Herbaspirillum sp. strain RV1423, importing the results of PSAT into EC2KEGG, and using the resultingmore » functional comparisons to identify a putative catabolic pathway, thereby distinguishing RV1423 from a well annotated Herbaspirillum species. This analysis demonstrates that high-throughput enzyme predictions, provided by PSAT processing, can be used to identify metabolic potential in an otherwise poorly annotated genome. Lastly, PSAT is a meta server that combines the results from several sequence-based annotation and function prediction codes, and is available at http://psat.llnl.gov/psat/. PSAT stands apart from other sequencebased genome annotation systems in providing a high-throughput platform for rapid de novo enzyme predictions and sequence annotations over large input protein sequence data sets in FASTA. PSAT is most appropriately applied in annotation of large protein FASTA sets that may or may not be associated with a single genome.« less

  3. Identification of optimum sequencing depth especially for de novo genome assembly of small genomes using next generation sequencing data.

    PubMed

    Desai, Aarti; Marwah, Veer Singh; Yadav, Akshay; Jha, Vineet; Dhaygude, Kishor; Bangar, Ujwala; Kulkarni, Vivek; Jere, Abhay

    2013-01-01

    Next Generation Sequencing (NGS) is a disruptive technology that has found widespread acceptance in the life sciences research community. The high throughput and low cost of sequencing has encouraged researchers to undertake ambitious genomic projects, especially in de novo genome sequencing. Currently, NGS systems generate sequence data as short reads and de novo genome assembly using these short reads is computationally very intensive. Due to lower cost of sequencing and higher throughput, NGS systems now provide the ability to sequence genomes at high depth. However, currently no report is available highlighting the impact of high sequence depth on genome assembly using real data sets and multiple assembly algorithms. Recently, some studies have evaluated the impact of sequence coverage, error rate and average read length on genome assembly using multiple assembly algorithms, however, these evaluations were performed using simulated datasets. One limitation of using simulated datasets is that variables such as error rates, read length and coverage which are known to impact genome assembly are carefully controlled. Hence, this study was undertaken to identify the minimum depth of sequencing required for de novo assembly for different sized genomes using graph based assembly algorithms and real datasets. Illumina reads for E.coli (4.6 MB) S.kudriavzevii (11.18 MB) and C.elegans (100 MB) were assembled using SOAPdenovo, Velvet, ABySS, Meraculous and IDBA-UD. Our analysis shows that 50X is the optimum read depth for assembling these genomes using all assemblers except Meraculous which requires 100X read depth. Moreover, our analysis shows that de novo assembly from 50X read data requires only 6-40 GB RAM depending on the genome size and assembly algorithm used. We believe that this information can be extremely valuable for researchers in designing experiments and multiplexing which will enable optimum utilization of sequencing as well as analysis resources.

  4. Experimental Design-Based Functional Mining and Characterization of High-Throughput Sequencing Data in the Sequence Read Archive

    PubMed Central

    Nakazato, Takeru; Ohta, Tazro; Bono, Hidemasa

    2013-01-01

    High-throughput sequencing technology, also called next-generation sequencing (NGS), has the potential to revolutionize the whole process of genome sequencing, transcriptomics, and epigenetics. Sequencing data is captured in a public primary data archive, the Sequence Read Archive (SRA). As of January 2013, data from more than 14,000 projects have been submitted to SRA, which is double that of the previous year. Researchers can download raw sequence data from SRA website to perform further analyses and to compare with their own data. However, it is extremely difficult to search entries and download raw sequences of interests with SRA because the data structure is complicated, and experimental conditions along with raw sequences are partly described in natural language. Additionally, some sequences are of inconsistent quality because anyone can submit sequencing data to SRA with no quality check. Therefore, as a criterion of data quality, we focused on SRA entries that were cited in journal articles. We extracted SRA IDs and PubMed IDs (PMIDs) from SRA and full-text versions of journal articles and retrieved 2748 SRA ID-PMID pairs. We constructed a publication list referring to SRA entries. Since, one of the main themes of -omics analyses is clarification of disease mechanisms, we also characterized SRA entries by disease keywords, according to the Medical Subject Headings (MeSH) extracted from articles assigned to each SRA entry. We obtained 989 SRA ID-MeSH disease term pairs, and constructed a disease list referring to SRA data. We previously developed feature profiles of diseases in a system called “Gendoo”. We generated hyperlinks between diseases extracted from SRA and the feature profiles of it. The developed project, publication and disease lists resulting from this study are available at our web service, called “DBCLS SRA” (http://sra.dbcls.jp/). This service will improve accessibility to high-quality data from SRA. PMID:24167589

  5. Accurate and exact CNV identification from targeted high-throughput sequence data.

    PubMed

    Nord, Alex S; Lee, Ming; King, Mary-Claire; Walsh, Tom

    2011-04-12

    Massively parallel sequencing of barcoded DNA samples significantly increases screening efficiency for clinically important genes. Short read aligners are well suited to single nucleotide and indel detection. However, methods for CNV detection from targeted enrichment are lacking. We present a method combining coverage with map information for the identification of deletions and duplications in targeted sequence data. Sequencing data is first scanned for gains and losses using a comparison of normalized coverage data between samples. CNV calls are confirmed by testing for a signature of sequences that span the CNV breakpoint. With our method, CNVs can be identified regardless of whether breakpoints are within regions targeted for sequencing. For CNVs where at least one breakpoint is within targeted sequence, exact CNV breakpoints can be identified. In a test data set of 96 subjects sequenced across ~1 Mb genomic sequence using multiplexing technology, our method detected mutations as small as 31 bp, predicted quantitative copy count, and had a low false-positive rate. Application of this method allows for identification of gains and losses in targeted sequence data, providing comprehensive mutation screening when combined with a short read aligner.

  6. Development and application of a recombination-based library versus library high- throughput yeast two-hybrid (RLL-Y2H) screening system.

    PubMed

    Yang, Fang; Lei, Yingying; Zhou, Meiling; Yao, Qili; Han, Yichao; Wu, Xiang; Zhong, Wanshun; Zhu, Chenghang; Xu, Weize; Tao, Ran; Chen, Xi; Lin, Da; Rahman, Khaista; Tyagi, Rohit; Habib, Zeshan; Xiao, Shaobo; Wang, Dang; Yu, Yang; Chen, Huanchun; Fu, Zhenfang; Cao, Gang

    2018-02-16

    Protein-protein interaction (PPI) network maintains proper function of all organisms. Simple high-throughput technologies are desperately needed to delineate the landscape of PPI networks. While recent state-of-the-art yeast two-hybrid (Y2H) systems improved screening efficiency, either individual colony isolation, library preparation arrays, gene barcoding or massive sequencing are still required. Here, we developed a recombination-based 'library vs library' Y2H system (RLL-Y2H), by which multi-library screening can be accomplished in a single pool without any individual treatment. This system is based on the phiC31 integrase-mediated integration between bait and prey plasmids. The integrated fragments were digested by MmeI and subjected to deep sequencing to decode the interaction matrix. We applied this system to decipher the trans-kingdom interactome between Mycobacterium tuberculosis and host cells and further identified Rv2427c interfering with the phagosome-lysosome fusion. This concept can also be applied to other systems to screen protein-RNA and protein-DNA interactions and delineate signaling landscape in cells.

  7. University of Texas MD Anderson Cancer Center (UT-MDACC): High-Throughput Screening Identifying Driving Mutations in Endometrial Cancer | Office of Cancer Genomics

    Cancer.gov

    Recent advances in next-generation sequencing technology have enabled the unprecedented characterization of a full spectrum of somatic alterations in cancer genomes. Given the large numbers of somatic mutations typically detected by this approach, a key challenge in the downstream analysis is to distinguish “drivers” that functionally contribute to tumorigenesis from “passengers” that occur as the consequence of genomic instability.

  8. Fast, accurate and easy-to-pipeline methods for amplicon sequence processing

    NASA Astrophysics Data System (ADS)

    Antonielli, Livio; Sessitsch, Angela

    2016-04-01

    Next generation sequencing (NGS) technologies established since years as an essential resource in microbiology. While on the one hand metagenomic studies can benefit from the continuously increasing throughput of the Illumina (Solexa) technology, on the other hand the spreading of third generation sequencing technologies (PacBio, Oxford Nanopore) are getting whole genome sequencing beyond the assembly of fragmented draft genomes, making it now possible to finish bacterial genomes even without short read correction. Besides (meta)genomic analysis next-gen amplicon sequencing is still fundamental for microbial studies. Amplicon sequencing of the 16S rRNA gene and ITS (Internal Transcribed Spacer) remains a well-established widespread method for a multitude of different purposes concerning the identification and comparison of archaeal/bacterial (16S rRNA gene) and fungal (ITS) communities occurring in diverse environments. Numerous different pipelines have been developed in order to process NGS-derived amplicon sequences, among which Mothur, QIIME and USEARCH are the most well-known and cited ones. The entire process from initial raw sequence data through read error correction, paired-end read assembly, primer stripping, quality filtering, clustering, OTU taxonomic classification and BIOM table rarefaction as well as alternative "normalization" methods will be addressed. An effective and accurate strategy will be presented using the state-of-the-art bioinformatic tools and the example of a straightforward one-script pipeline for 16S rRNA gene or ITS MiSeq amplicon sequencing will be provided. Finally, instructions on how to automatically retrieve nucleotide sequences from NCBI and therefore apply the pipeline to targets other than 16S rRNA gene (Greengenes, SILVA) and ITS (UNITE) will be discussed.

  9. Comprehensive analysis of RNA-protein interactions by high-throughput sequencing-RNA affinity profiling.

    PubMed

    Tome, Jacob M; Ozer, Abdullah; Pagano, John M; Gheba, Dan; Schroth, Gary P; Lis, John T

    2014-06-01

    RNA-protein interactions play critical roles in gene regulation, but methods to quantitatively analyze these interactions at a large scale are lacking. We have developed a high-throughput sequencing-RNA affinity profiling (HiTS-RAP) assay by adapting a high-throughput DNA sequencer to quantify the binding of fluorescently labeled protein to millions of RNAs anchored to sequenced cDNA templates. Using HiTS-RAP, we measured the affinity of mutagenized libraries of GFP-binding and NELF-E-binding aptamers to their respective targets and identified critical regions of interaction. Mutations additively affected the affinity of the NELF-E-binding aptamer, whose interaction depended mainly on a single-stranded RNA motif, but not that of the GFP aptamer, whose interaction depended primarily on secondary structure.

  10. Discovery of viruses and virus-like pathogens in pistachio using high-throughput sequencing

    USDA-ARS?s Scientific Manuscript database

    Pistachio (Pistacia vera L.) trees from the National Clonal Germplasm Repository (NCGR) and orchards in California were surveyed for viruses and virus-like agents by high-throughput sequencing (HTS). Analyses of 60 trees including clonal UCB-1 hybrid rootstock (P. atlantica × P. integerrima) identif...

  11. XPAT: a toolkit to conduct cross-platform association studies with heterogeneous sequencing datasets.

    PubMed

    Yu, Yao; Hu, Hao; Bohlender, Ryan J; Hu, Fulan; Chen, Jiun-Sheng; Holt, Carson; Fowler, Jerry; Guthery, Stephen L; Scheet, Paul; Hildebrandt, Michelle A T; Yandell, Mark; Huff, Chad D

    2018-04-06

    High-throughput sequencing data are increasingly being made available to the research community for secondary analyses, providing new opportunities for large-scale association studies. However, heterogeneity in target capture and sequencing technologies often introduce strong technological stratification biases that overwhelm subtle signals of association in studies of complex traits. Here, we introduce the Cross-Platform Association Toolkit, XPAT, which provides a suite of tools designed to support and conduct large-scale association studies with heterogeneous sequencing datasets. XPAT includes tools to support cross-platform aware variant calling, quality control filtering, gene-based association testing and rare variant effect size estimation. To evaluate the performance of XPAT, we conducted case-control association studies for three diseases, including 783 breast cancer cases, 272 ovarian cancer cases, 205 Crohn disease cases and 3507 shared controls (including 1722 females) using sequencing data from multiple sources. XPAT greatly reduced Type I error inflation in the case-control analyses, while replicating many previously identified disease-gene associations. We also show that association tests conducted with XPAT using cross-platform data have comparable performance to tests using matched platform data. XPAT enables new association studies that combine existing sequencing datasets to identify genetic loci associated with common diseases and other complex traits.

  12. Implementing genomic medicine in pathology.

    PubMed

    Williams, Eli S; Hegde, Madhuri

    2013-07-01

    The finished sequence of the Human Genome Project, published 50 years after Watson and Crick's seminal paper on the structure of DNA, pushed human genetics into the public eye and ushered in the genomic era. A significant, if overlooked, aspect of the race to complete the genome was the technology that propelled scientists to the finish line. DNA sequencing technologies have become more standardized, automated, and capable of higher throughput. This technology has continued to grow at an astounding rate in the decade since the Human Genome Project was completed. Today, massively parallel sequencing, or next-generation sequencing (NGS), allows the detection of genetic variants across the entire genome. This ability has led to the identification of new causes of disease and is changing the way we categorize, treat, and manage disease. NGS approaches such as whole-exome sequencing and whole-genome sequencing are rapidly becoming an affordable genetic testing strategy for the clinical laboratory. One test can now provide vast amounts of health information pertaining not only to the disease of interest, but information that may also predict adult-onset disease, reveal carrier status for a rare disease and predict drug responsiveness. The issue of what to do with these incidental findings, along with questions pertaining to NGS testing strategies, data interpretation and storage, and applying genetic testing results into patient care, remains without a clear answer. This review will explore these issues and others relevant to the implementation of NGS in the clinical laboratory.

  13. Noninvasive prenatal screening for fetal common sex chromosome aneuploidies from maternal blood.

    PubMed

    Zhang, Bin; Lu, Bei-Yi; Yu, Bin; Zheng, Fang-Xiu; Zhou, Qin; Chen, Ying-Ping; Zhang, Xiao-Qing

    2017-04-01

    Objective To explore the feasibility of high-throughput massively parallel genomic DNA sequencing technology for the noninvasive prenatal detection of fetal sex chromosome aneuploidies (SCAs). Methods The study enrolled pregnant women who were prepared to undergo noninvasive prenatal testing (NIPT) in the second trimester. Cell-free fetal DNA (cffDNA) was extracted from the mother's peripheral venous blood and a high-throughput sequencing procedure was undertaken. Patients identified as having pregnancies associated with SCAs were offered prenatal fetal chromosomal karyotyping. Results The study enrolled 10 275 pregnant women who were prepared to undergo NIPT. Of these, 57 pregnant women (0.55%) showed fetal SCA, including 27 with Turner syndrome (45,X), eight with Triple X syndrome (47,XXX), 12 with Klinefelter syndrome (47,XXY) and three with 47,XYY. Thirty-three pregnant women agreed to undergo fetal karyotyping and 18 had results consistent with NIPT, while 15 patients received a normal karyotype result. The overall positive predictive value of NIPT for detecting SCAs was 54.54% (18/33) and for detecting Turner syndrome (45,X) was 29.41% (5/17). Conclusion NIPT can be used to identify fetal SCAs by analysing cffDNA using massively parallel genomic sequencing, although the accuracy needs to be improved particularly for Turner syndrome (45,X).

  14. Human Genomic Loci Important in Common Infectious Diseases: Role of High-Throughput Sequencing and Genome-Wide Association Studies

    PubMed Central

    Sserwadda, Ivan; Amujal, Marion; Namatovu, Norah

    2018-01-01

    HIV/AIDS, tuberculosis (TB), and malaria are 3 major global public health threats that undermine development in many resource-poor settings. Recently, the notion that positive selection during epidemics or longer periods of exposure to common infectious diseases may have had a major effect in modifying the constitution of the human genome is being interrogated at a large scale in many populations around the world. This positive selection from infectious diseases increases power to detect associations in genome-wide association studies (GWASs). High-throughput sequencing (HTS) has transformed both the management of infectious diseases and continues to enable large-scale functional characterization of host resistance/susceptibility alleles and loci; a paradigm shift from single candidate gene studies. Application of genome sequencing technologies and genomics has enabled us to interrogate the host-pathogen interface for improving human health. Human populations are constantly locked in evolutionary arms races with pathogens; therefore, identification of common infectious disease-associated genomic variants/markers is important in therapeutic, vaccine development, and screening susceptible individuals in a population. This review describes a range of host-pathogen genomic loci that have been associated with disease susceptibility and resistant patterns in the era of HTS. We further highlight potential opportunities for these genetic markers. PMID:29755620

  15. Using optical mapping data for the improvement of vertebrate genome assemblies.

    PubMed

    Howe, Kerstin; Wood, Jonathan M D

    2015-01-01

    Optical mapping is a technology that gathers long-range information on genome sequences similar to ordered restriction digest maps. Because it is not subject to cloning, amplification, hybridisation or sequencing bias, it is ideally suited to the improvement of fragmented genome assemblies that can no longer be improved by classical methods. In addition, its low cost and rapid turnaround make it equally useful during the scaffolding process of de novo assembly from high throughput sequencing reads. We describe how optical mapping has been used in practice to produce high quality vertebrate genome assemblies. In particular, we detail the efforts undertaken by the Genome Reference Consortium (GRC), which maintains the reference genomes for human, mouse, zebrafish and chicken, and uses different optical mapping platforms for genome curation.

  16. Pediatric Glioblastoma Therapies Based on Patient-Derived Stem Cell Resources

    DTIC Science & Technology

    2014-11-01

    genomic DNA and then subjected to Illumina high-throughput sequencing . In this analysis, shRNAs lost in the GSC population represent candidate gene...and genomic DNA and then subjected to Illumina high-throughput sequencing . In this analysis, shRNAs lost in the GSC population represent candidate...PRISM 7900 Sequence Detection System ( Genomics Resource, FHCRC). Relative transcript abundance was analyzed using the 2−ΔΔCt method. TRIzol (Invitrogen

  17. Global Mapping of Transcription Factor Binding Sites by Sequencing Chromatin Surrogates: a Perspective on Experimental Design, Data Analysis, and Open Problems.

    PubMed

    Wei, Yingying; Wu, George; Ji, Hongkai

    2013-05-01

    Mapping genome-wide binding sites of all transcription factors (TFs) in all biological contexts is a critical step toward understanding gene regulation. The state-of-the-art technologies for mapping transcription factor binding sites (TFBSs) couple chromatin immunoprecipitation (ChIP) with high-throughput sequencing (ChIP-seq) or tiling array hybridization (ChIP-chip). These technologies have limitations: they are low-throughput with respect to surveying many TFs. Recent advances in genome-wide chromatin profiling, including development of technologies such as DNase-seq, FAIRE-seq and ChIP-seq for histone modifications, make it possible to predict in vivo TFBSs by analyzing chromatin features at computationally determined DNA motif sites. This promising new approach may allow researchers to monitor the genome-wide binding sites of many TFs simultaneously. In this article, we discuss various experimental design and data analysis issues that arise when applying this approach. Through a systematic analysis of the data from the Encyclopedia Of DNA Elements (ENCODE) project, we compare the predictive power of individual and combinations of chromatin marks using supervised and unsupervised learning methods, and evaluate the value of integrating information from public ChIP and gene expression data. We also highlight the challenges and opportunities for developing novel analytical methods, such as resolving the one-motif-multiple-TF ambiguity and distinguishing functional and non-functional TF binding targets from the predicted binding sites. The online version of this article (doi:10.1007/s12561-012-9066-5) contains supplementary material, which is available to authorized users.

  18. Monitoring and Surveillance of Marine Invasive Species in Californian Waters by DNA Barcoding: Methodological and Analytical Solutions

    NASA Astrophysics Data System (ADS)

    Campbell, T. L.; Geller, J. B.; Heller, P.; Ruiz, G.; Chang, A.; McCann, L.; Ceballos, L.; Marraffini, M.; Ashton, G.; Larson, K.; Havard, S.; Meagher, K.; Wheelock, M.; Drake, C.; Rhett, G.

    2016-02-01

    The Ballast Water Management Act, the Marine Invasive Species Act, and the Coastal Ecosystem Protection Act require the California Department of Fish and Wildlife to monitor and evaluate the extent of biological invasions in the state's marine and estuarine waters. This has been performed statewide, using a variety of methodologies. Conventional sample collection and processing is laborious, slow and costly, and may require considerable taxonomic expertise requiring detailed time-consuming microscopic study of multiple specimens. These factors limit the volume of biomass that can be searched for introduced species. New technologies continue to reduce the cost and increase the throughput of genetic analyses, which become efficient alternatives to traditional morphological analysis for identification, monitoring and surveillance of marine invasive species. Using next-generation sequencing of mitochondrial Cytochrome c oxidase subunit I (COI) and nuclear large subunit ribosomal RNA (LSU), we analyzed over 15,000 individual marine invertebrates collected in Californian waters. We have created sequence databases of California native and non-native species to assist in molecular identification and surveillance in North American waters. Metagenetics, the next-generation sequencing of environmental samples with comparison to DNA sequence databases, is a faster and cost-effective alternative to individual sample analysis. We have sequenced from biomass collected from whole settlement plates and plankton in California harbors, and used our introduced species database to create species lists. We can combine these species lists for individual marinas with collected environmental data, such as temperature, salinity, and dissolved oxygen to understand the ecology of marine invasions. Here we discuss high throughput sampling, sequencing, and COASTLINE, our data analysis answer to challenges working with hundreds of millions of sequencing reads from tens of thousands of specimens.

  19. The Widening Gulf between Genomics Data Generation and Consumption: A Practical Guide to Big Data Transfer Technology.

    PubMed

    Feltus, Frank A; Breen, Joseph R; Deng, Juan; Izard, Ryan S; Konger, Christopher A; Ligon, Walter B; Preuss, Don; Wang, Kuang-Ching

    2015-01-01

    In the last decade, high-throughput DNA sequencing has become a disruptive technology and pushed the life sciences into a distributed ecosystem of sequence data producers and consumers. Given the power of genomics and declining sequencing costs, biology is an emerging "Big Data" discipline that will soon enter the exabyte data range when all subdisciplines are combined. These datasets must be transferred across commercial and research networks in creative ways since sending data without thought can have serious consequences on data processing time frames. Thus, it is imperative that biologists, bioinformaticians, and information technology engineers recalibrate data processing paradigms to fit this emerging reality. This review attempts to provide a snapshot of Big Data transfer across networks, which is often overlooked by many biologists. Specifically, we discuss four key areas: 1) data transfer networks, protocols, and applications; 2) data transfer security including encryption, access, firewalls, and the Science DMZ; 3) data flow control with software-defined networking; and 4) data storage, staging, archiving and access. A primary intention of this article is to orient the biologist in key aspects of the data transfer process in order to frame their genomics-oriented needs to enterprise IT professionals.

  20. Genomic approaches for the elucidation of genes and gene networks underlying cardiovascular traits.

    PubMed

    Adriaens, M E; Bezzina, C R

    2018-06-22

    Genome-wide association studies have shed light on the association between natural genetic variation and cardiovascular traits. However, linking a cardiovascular trait associated locus to a candidate gene or set of candidate genes for prioritization for follow-up mechanistic studies is all but straightforward. Genomic technologies based on next-generation sequencing technology nowadays offer multiple opportunities to dissect gene regulatory networks underlying genetic cardiovascular trait associations, thereby aiding in the identification of candidate genes at unprecedented scale. RNA sequencing in particular becomes a powerful tool when combined with genotyping to identify loci that modulate transcript abundance, known as expression quantitative trait loci (eQTL), or loci modulating transcript splicing known as splicing quantitative trait loci (sQTL). Additionally, the allele-specific resolution of RNA-sequencing technology enables estimation of allelic imbalance, a state where the two alleles of a gene are expressed at a ratio differing from the expected 1:1 ratio. When multiple high-throughput approaches are combined with deep phenotyping in a single study, a comprehensive elucidation of the relationship between genotype and phenotype comes into view, an approach known as systems genetics. In this review, we cover key applications of systems genetics in the broad cardiovascular field.

  1. High-throughput sequencing methods to study neuronal RNA-protein interactions.

    PubMed

    Ule, Jernej

    2009-12-01

    UV-cross-linking and RNase protection, combined with high-throughput sequencing, have provided global maps of RNA sites bound by individual proteins or ribosomes. Using a stringent purification protocol, UV-CLIP (UV-cross-linking and immunoprecipitation) was able to identify intronic and exonic sites bound by splicing regulators in mouse brain tissue. Ribosome profiling has been used to quantify ribosome density on budding yeast mRNAs under different environmental conditions. Post-transcriptional regulation in neurons requires high spatial and temporal precision, as is evident from the role of localized translational control in synaptic plasticity. It remains to be seen if the high-throughput methods can be applied quantitatively to study the dynamics of RNP (ribonucleoprotein) remodelling in specific neuronal populations during the neurodegenerative process. It is certain, however, that applications of new biochemical techniques followed by high-throughput sequencing will continue to provide important insights into the mechanisms of neuronal post-transcriptional regulation.

  2. Research Techniques Made Simple: Single-Cell RNA Sequencing and its Applications in Dermatology.

    PubMed

    Wu, Xiaojun; Yang, Bin; Udo-Inyang, Imo; Ji, Suyun; Ozog, David; Zhou, Li; Mi, Qing-Sheng

    2018-05-01

    RNA sequencing is one of the most highly reliable and reproducible methods of assessing the cell transcriptome. As high-throughput RNA sequencing libraries at the single cell level have recently developed, single cell RNA sequencing has become more feasible and popular in biology research. Single cell RNA sequencing allows investigators to evaluate cell transcriptional profiles at the single cell level. It has become a very useful tool to perform investigations that could not be addressed by other methodologies, such as the assessment of cell-to-cell variation, the identification of rare populations, and the determination of heterogeneity within a cell population. So far, the single cell RNA sequencing technique has been widely applied to embryonic development, immune cell development, and human disease progress and treatment. Here, we describe the history of single cell technology development and its potential application in the field of dermatology. Copyright © 2018 The Authors. Published by Elsevier Inc. All rights reserved.

  3. Coverage Bias and Sensitivity of Variant Calling for Four Whole-genome Sequencing Technologies

    PubMed Central

    Lasitschka, Bärbel; Jones, David; Northcott, Paul; Hutter, Barbara; Jäger, Natalie; Kool, Marcel; Taylor, Michael; Lichter, Peter; Pfister, Stefan; Wolf, Stephan; Brors, Benedikt; Eils, Roland

    2013-01-01

    The emergence of high-throughput, next-generation sequencing technologies has dramatically altered the way we assess genomes in population genetics and in cancer genomics. Currently, there are four commonly used whole-genome sequencing platforms on the market: Illumina’s HiSeq2000, Life Technologies’ SOLiD 4 and its completely redesigned 5500xl SOLiD, and Complete Genomics’ technology. A number of earlier studies have compared a subset of those sequencing platforms or compared those platforms with Sanger sequencing, which is prohibitively expensive for whole genome studies. Here we present a detailed comparison of the performance of all currently available whole genome sequencing platforms, especially regarding their ability to call SNVs and to evenly cover the genome and specific genomic regions. Unlike earlier studies, we base our comparison on four different samples, allowing us to assess the between-sample variation of the platforms. We find a pronounced GC bias in GC-rich regions for Life Technologies’ platforms, with Complete Genomics performing best here, while we see the least bias in GC-poor regions for HiSeq2000 and 5500xl. HiSeq2000 gives the most uniform coverage and displays the least sample-to-sample variation. In contrast, Complete Genomics exhibits by far the smallest fraction of bases not covered, while the SOLiD platforms reveal remarkable shortcomings, especially in covering CpG islands. When comparing the performance of the four platforms for calling SNPs, HiSeq2000 and Complete Genomics achieve the highest sensitivity, while the SOLiD platforms show the lowest false positive rate. Finally, we find that integrating sequencing data from different platforms offers the potential to combine the strengths of different technologies. In summary, our results detail the strengths and weaknesses of all four whole-genome sequencing platforms. It indicates application areas that call for a specific sequencing platform and disallow other platforms. This helps to identify the proper sequencing platform for whole genome studies with different application scopes. PMID:23776689

  4. Absolute quantification of microbial taxon abundances.

    PubMed

    Props, Ruben; Kerckhof, Frederiek-Maarten; Rubbens, Peter; De Vrieze, Jo; Hernandez Sanabria, Emma; Waegeman, Willem; Monsieurs, Pieter; Hammes, Frederik; Boon, Nico

    2017-02-01

    High-throughput amplicon sequencing has become a well-established approach for microbial community profiling. Correlating shifts in the relative abundances of bacterial taxa with environmental gradients is the goal of many microbiome surveys. As the abundances generated by this technology are semi-quantitative by definition, the observed dynamics may not accurately reflect those of the actual taxon densities. We combined the sequencing approach (16S rRNA gene) with robust single-cell enumeration technologies (flow cytometry) to quantify the absolute taxon abundances. A detailed longitudinal analysis of the absolute abundances resulted in distinct abundance profiles that were less ambiguous and expressed in units that can be directly compared across studies. We further provide evidence that the enrichment of taxa (increase in relative abundance) does not necessarily relate to the outgrowth of taxa (increase in absolute abundance). Our results highlight that both relative and absolute abundances should be considered for a comprehensive biological interpretation of microbiome surveys.

  5. Genome Engineering and Agriculture: Opportunities and Challenges.

    PubMed

    Baltes, Nicholas J; Gil-Humanes, Javier; Voytas, Daniel F

    2017-01-01

    In recent years, plant biotechnology has witnessed unprecedented technological change. Advances in high-throughput sequencing technologies have provided insight into the location and structure of functional elements within plant DNA. At the same time, improvements in genome engineering tools have enabled unprecedented control over genetic material. These technologies, combined with a growing understanding of plant systems biology, will irrevocably alter the way we create new crop varieties. As the first wave of genome-edited products emerge, we are just getting a glimpse of the immense opportunities the technology provides. We are also seeing its challenges and limitations. It is clear that genome editing will play an increased role in crop improvement and will help us to achieve food security in the coming decades; however, certain challenges and limitations must be overcome to realize the technology's full potential. © 2017 Elsevier Inc. All rights reserved.

  6. Generation of genetically modified mice using CRISPR/Cas9 and haploid embryonic stem cell systems

    PubMed Central

    JIN, Li-Fang; LI, Jin-Song

    2016-01-01

    With the development of high-throughput sequencing technology in the post-genomic era, researchers have concentrated their efforts on elucidating the relationships between genes and their corresponding functions. Recently, important progress has been achieved in the generation of genetically modified mice based on CRISPR/Cas9 and haploid embryonic stem cell (haESC) approaches, which provide new platforms for gene function analysis, human disease modeling, and gene therapy. Here, we review the CRISPR/Cas9 and haESC technology for the generation of genetically modified mice and discuss the key challenges in the application of these approaches. PMID:27469251

  7. Quartz-Seq2: a high-throughput single-cell RNA-sequencing method that effectively uses limited sequence reads.

    PubMed

    Sasagawa, Yohei; Danno, Hiroki; Takada, Hitomi; Ebisawa, Masashi; Tanaka, Kaori; Hayashi, Tetsutaro; Kurisaki, Akira; Nikaido, Itoshi

    2018-03-09

    High-throughput single-cell RNA-seq methods assign limited unique molecular identifier (UMI) counts as gene expression values to single cells from shallow sequence reads and detect limited gene counts. We thus developed a high-throughput single-cell RNA-seq method, Quartz-Seq2, to overcome these issues. Our improvements in the reaction steps make it possible to effectively convert initial reads to UMI counts, at a rate of 30-50%, and detect more genes. To demonstrate the power of Quartz-Seq2, we analyzed approximately 10,000 transcriptomes from in vitro embryonic stem cells and an in vivo stromal vascular fraction with a limited number of reads.

  8. Issues with RNA-seq analysis in non-model organisms: A salmonid example.

    PubMed

    Sundaram, Arvind; Tengs, Torstein; Grimholt, Unni

    2017-10-01

    High throughput sequencing (HTS) is useful for many purposes as exemplified by the other topics included in this special issue. The purpose of this paper is to look into the unique challenges of using this technology in non-model organisms where resources such as genomes, functional genome annotations or genome complexity provide obstacles not met in model organisms. To describe these challenges, we narrow our scope to RNA sequencing used to study differential gene expression in response to pathogen challenge. As a demonstration species we chose Atlantic salmon, which has a sequenced genome with poor annotation and an added complexity due to many duplicated genes. We find that our RNA-seq analysis pipeline deciphers between duplicates despite high sequence identity. However, annotation issues provide problems in linking differentially expressed genes to pathways. Also, comparing results between approaches and species are complicated due to lack of standardized annotation. Copyright © 2017 Elsevier Ltd. All rights reserved.

  9. Mapping RNA-seq Reads with STAR

    PubMed Central

    Dobin, Alexander; Gingeras, Thomas R.

    2015-01-01

    Mapping of large sets of high-throughput sequencing reads to a reference genome is one of the foundational steps in RNA-seq data analysis. The STAR software package performs this task with high levels of accuracy and speed. In addition to detecting annotated and novel splice junctions, STAR is capable of discovering more complex RNA sequence arrangements, such as chimeric and circular RNA. STAR can align spliced sequences of any length with moderate error rates providing scalability for emerging sequencing technologies. STAR generates output files that can be used for many downstream analyses such as transcript/gene expression quantification, differential gene expression, novel isoform reconstruction, signal visualization, and so forth. In this unit we describe computational protocols that produce various output files, use different RNA-seq datatypes, and utilize different mapping strategies. STAR is Open Source software that can be run on Unix, Linux or Mac OS X systems. PMID:26334920

  10. Mapping RNA-seq Reads with STAR.

    PubMed

    Dobin, Alexander; Gingeras, Thomas R

    2015-09-03

    Mapping of large sets of high-throughput sequencing reads to a reference genome is one of the foundational steps in RNA-seq data analysis. The STAR software package performs this task with high levels of accuracy and speed. In addition to detecting annotated and novel splice junctions, STAR is capable of discovering more complex RNA sequence arrangements, such as chimeric and circular RNA. STAR can align spliced sequences of any length with moderate error rates, providing scalability for emerging sequencing technologies. STAR generates output files that can be used for many downstream analyses such as transcript/gene expression quantification, differential gene expression, novel isoform reconstruction, and signal visualization. In this unit, we describe computational protocols that produce various output files, use different RNA-seq datatypes, and utilize different mapping strategies. STAR is open source software that can be run on Unix, Linux, or Mac OS X systems. Copyright © 2015 John Wiley & Sons, Inc.

  11. Transcriptome analysis of carnation (Dianthus caryophyllus L.) based on next-generation sequencing technology.

    PubMed

    Tanase, Koji; Nishitani, Chikako; Hirakawa, Hideki; Isobe, Sachiko; Tabata, Satoshi; Ohmiya, Akemi; Onozaki, Takashi

    2012-07-02

    Carnation (Dianthus caryophyllus L.), in the family Caryophyllaceae, can be found in a wide range of colors and is a model system for studies of flower senescence. In addition, it is one of the most important flowers in the global floriculture industry. However, few genomics resources, such as sequences and markers are available for carnation or other members of the Caryophyllaceae. To increase our understanding of the genetic control of important characters in carnation, we generated an expressed sequence tag (EST) database for a carnation cultivar important in horticulture by high-throughput sequencing using 454 pyrosequencing technology. We constructed a normalized cDNA library and a 3'-UTR library of carnation, obtaining a total of 1,162,126 high-quality reads. These reads were assembled into 300,740 unigenes consisting of 37,844 contigs and 262,896 singlets. The contigs were searched against an Arabidopsis sequence database, and 61.8% (23,380) of them had at least one BLASTX hit. These contigs were also annotated with Gene Ontology (GO) and were found to cover a broad range of GO categories. Furthermore, we identified 17,362 potential simple sequence repeats (SSRs) in 14,291 of the unigenes. We focused on gene discovery in the areas of flower color and ethylene biosynthesis. Transcripts were identified for almost every gene involved in flower chlorophyll and carotenoid metabolism and in anthocyanin biosynthesis. Transcripts were also identified for every step in the ethylene biosynthesis pathway. We present the first large-scale sequence data set for carnation, generated using next-generation sequencing technology. The large EST database generated from these sequences is an informative resource for identifying genes involved in various biological processes in carnation and provides an EST resource for understanding the genetic diversity of this plant.

  12. Transcriptome analysis of carnation (Dianthus caryophyllus L.) based on next-generation sequencing technology

    PubMed Central

    2012-01-01

    Background Carnation (Dianthus caryophyllus L.), in the family Caryophyllaceae, can be found in a wide range of colors and is a model system for studies of flower senescence. In addition, it is one of the most important flowers in the global floriculture industry. However, few genomics resources, such as sequences and markers are available for carnation or other members of the Caryophyllaceae. To increase our understanding of the genetic control of important characters in carnation, we generated an expressed sequence tag (EST) database for a carnation cultivar important in horticulture by high-throughput sequencing using 454 pyrosequencing technology. Results We constructed a normalized cDNA library and a 3’-UTR library of carnation, obtaining a total of 1,162,126 high-quality reads. These reads were assembled into 300,740 unigenes consisting of 37,844 contigs and 262,896 singlets. The contigs were searched against an Arabidopsis sequence database, and 61.8% (23,380) of them had at least one BLASTX hit. These contigs were also annotated with Gene Ontology (GO) and were found to cover a broad range of GO categories. Furthermore, we identified 17,362 potential simple sequence repeats (SSRs) in 14,291 of the unigenes. We focused on gene discovery in the areas of flower color and ethylene biosynthesis. Transcripts were identified for almost every gene involved in flower chlorophyll and carotenoid metabolism and in anthocyanin biosynthesis. Transcripts were also identified for every step in the ethylene biosynthesis pathway. Conclusions We present the first large-scale sequence data set for carnation, generated using next-generation sequencing technology. The large EST database generated from these sequences is an informative resource for identifying genes involved in various biological processes in carnation and provides an EST resource for understanding the genetic diversity of this plant. PMID:22747974

  13. Computational solutions to large-scale data management and analysis

    PubMed Central

    Schadt, Eric E.; Linderman, Michael D.; Sorenson, Jon; Lee, Lawrence; Nolan, Garry P.

    2011-01-01

    Today we can generate hundreds of gigabases of DNA and RNA sequencing data in a week for less than US$5,000. The astonishing rate of data generation by these low-cost, high-throughput technologies in genomics is being matched by that of other technologies, such as real-time imaging and mass spectrometry-based flow cytometry. Success in the life sciences will depend on our ability to properly interpret the large-scale, high-dimensional data sets that are generated by these technologies, which in turn requires us to adopt advances in informatics. Here we discuss how we can master the different types of computational environments that exist — such as cloud and heterogeneous computing — to successfully tackle our big data problems. PMID:20717155

  14. Molecular characterization of a novel rhabdovirus infecting blackcurrant identified by high-throughput sequencing.

    PubMed

    Wu, L-P; Yang, T; Liu, H-W; Postman, J; Li, R

    2018-05-01

    A large contig with sequence similarities to several nucleorhabdoviruses was identified by high-throughput sequencing analysis from a black currant (Ribes nigrum L.) cultivar. The complete genome sequence of this new nucleorhabdovirus is 14,432 nucleotides long. Its genomic organization is very similar to those of unsegmented plant rhabdoviruses, containing six open reading frames in the order 3'-N-P-P3-M-G-L-5. The virus, which is provisionally named "black currant-associated rhabdovirus", is 41-52% identical in its genome nucleotide sequence to other nucleorhabdoviruses and may represent a new species in the genus Nucleorhabdovirus.

  15. GeneSCF: a real-time based functional enrichment tool with support for multiple organisms.

    PubMed

    Subhash, Santhilal; Kanduri, Chandrasekhar

    2016-09-13

    High-throughput technologies such as ChIP-sequencing, RNA-sequencing, DNA sequencing and quantitative metabolomics generate a huge volume of data. Researchers often rely on functional enrichment tools to interpret the biological significance of the affected genes from these high-throughput studies. However, currently available functional enrichment tools need to be updated frequently to adapt to new entries from the functional database repositories. Hence there is a need for a simplified tool that can perform functional enrichment analysis by using updated information directly from the source databases such as KEGG, Reactome or Gene Ontology etc. In this study, we focused on designing a command-line tool called GeneSCF (Gene Set Clustering based on Functional annotations), that can predict the functionally relevant biological information for a set of genes in a real-time updated manner. It is designed to handle information from more than 4000 organisms from freely available prominent functional databases like KEGG, Reactome and Gene Ontology. We successfully employed our tool on two of published datasets to predict the biologically relevant functional information. The core features of this tool were tested on Linux machines without the need for installation of more dependencies. GeneSCF is more reliable compared to other enrichment tools because of its ability to use reference functional databases in real-time to perform enrichment analysis. It is an easy-to-integrate tool with other pipelines available for downstream analysis of high-throughput data. More importantly, GeneSCF can run multiple gene lists simultaneously on different organisms thereby saving time for the users. Since the tool is designed to be ready-to-use, there is no need for any complex compilation and installation procedures.

  16. Emerging patterns of somatic mutations in cancer

    PubMed Central

    Watson, Ian R.; Takahashi, Koichi; Futreal, P. Andrew; Chin, Lynda

    2014-01-01

    The advance in technological tools for massively parallel, high-throughput sequencing of DNA has enabled the comprehensive characterization of somatic mutations in large number of tumor samples. Here, we review recent cancer genomic studies that have assembled emerging views of the landscapes of somatic mutations through deep sequencing analyses of the coding exomes and whole genomes in various cancer types. We discuss the comparative genomics of different cancers, including mutation rates, spectrums, and roles of environmental insults that influence these processes. We highlight the developing statistical approaches used to identify significantly mutated genes, and discuss the emerging biological and clinical insights from such analyses as well as the challenges ahead translating these genomic data into clinical impacts. PMID:24022702

  17. Acceleration of short and long DNA read mapping without loss of accuracy using suffix array.

    PubMed

    Tárraga, Joaquín; Arnau, Vicente; Martínez, Héctor; Moreno, Raul; Cazorla, Diego; Salavert-Torres, José; Blanquer-Espert, Ignacio; Dopazo, Joaquín; Medina, Ignacio

    2014-12-01

    HPG Aligner applies suffix arrays for DNA read mapping. This implementation produces a highly sensitive and extremely fast mapping of DNA reads that scales up almost linearly with read length. The approach presented here is faster (over 20× for long reads) and more sensitive (over 98% in a wide range of read lengths) than the current state-of-the-art mappers. HPG Aligner is not only an optimal alternative for current sequencers but also the only solution available to cope with longer reads and growing throughputs produced by forthcoming sequencing technologies. https://github.com/opencb/hpg-aligner. © The Author 2014. Published by Oxford University Press.

  18. Diversity Arrays Technology (DArT) Marker Platforms for Diversity Analysis and Linkage Mapping in a Complex Crop, the Octoploid Cultivated Strawberry (Fragaria × ananassa)

    PubMed Central

    Sánchez-Sevilla, José F.; Horvath, Aniko; Botella, Miguel A.; Gaston, Amèlia; Folta, Kevin; Kilian, Andrzej; Denoyes, Beatrice; Amaya, Iraida

    2015-01-01

    Cultivated strawberry (Fragaria × ananassa) is a genetically complex allo-octoploid crop with 28 pairs of chromosomes (2n = 8x = 56) for which a genome sequence is not yet available. The diploid Fragaria vesca is considered the donor species of one of the octoploid sub-genomes and its available genome sequence can be used as a reference for genomic studies. A wide number of strawberry cultivars are stored in ex situ germplasm collections world-wide but a number of previous studies have addressed the genetic diversity present within a limited number of these collections. Here, we report the development and application of two platforms based on the implementation of Diversity Array Technology (DArT) markers for high-throughput genotyping in strawberry. The first DArT microarray was used to evaluate the genetic diversity of 62 strawberry cultivars that represent a wide range of variation based on phenotype, geographical and temporal origin and pedigrees. A total of 603 DArT markers were used to evaluate the diversity and structure of the population and their cluster analyses revealed that these markers were highly efficient in classifying the accessions in groups based on historical, geographical and pedigree-based cues. The second DArTseq platform took benefit of the complexity reduction method optimized for strawberry and the development of next generation sequencing technologies. The strawberry DArTseq was used to generate a total of 9,386 SNP markers in the previously developed ‘232’ × ‘1392’ mapping population, of which, 4,242 high quality markers were further selected to saturate this map after several filtering steps. The high-throughput platforms here developed for genotyping strawberry will facilitate genome-wide characterizations of large accessions sets and complement other available options. PMID:26675207

  19. Characterization and complete genome sequence of a panicovirus from Bermuda grass by high-throughput sequencing.

    PubMed

    Tahir, Muhammad N; Lockhart, Ben; Grinstead, Samuel; Mollov, Dimitre

    2017-04-01

    Bermuda grass samples were examined by transmission electron microscopy and 28-30 nm spherical virus particles were observed. Total RNA from these plants was subjected to high-throughput sequencing (HTS). The nearly full genome sequence of a panicovirus was identified from one HTS scaffold. Sanger sequencing was used to confirm the HTS results and complete the genome sequence of 4404 nt. This virus was provisionally named Bermuda grass latent virus (BGLV). Its predicted open reading frames follow the typical arrangement of the genus Panicovirus. Based on sequence comparisons and phylogenetic analyses BGLV differs from other viruses and therefore taxonomically it is a new member of the genus Panicovirus, family Tombusviridae.

  20. Sources of PCR-induced distortions in high-throughput sequencing data sets

    PubMed Central

    Kebschull, Justus M.; Zador, Anthony M.

    2015-01-01

    PCR permits the exponential and sequence-specific amplification of DNA, even from minute starting quantities. PCR is a fundamental step in preparing DNA samples for high-throughput sequencing. However, there are errors associated with PCR-mediated amplification. Here we examine the effects of four important sources of error—bias, stochasticity, template switches and polymerase errors—on sequence representation in low-input next-generation sequencing libraries. We designed a pool of diverse PCR amplicons with a defined structure, and then used Illumina sequencing to search for signatures of each process. We further developed quantitative models for each process, and compared predictions of these models to our experimental data. We find that PCR stochasticity is the major force skewing sequence representation after amplification of a pool of unique DNA amplicons. Polymerase errors become very common in later cycles of PCR but have little impact on the overall sequence distribution as they are confined to small copy numbers. PCR template switches are rare and confined to low copy numbers. Our results provide a theoretical basis for removing distortions from high-throughput sequencing data. In addition, our findings on PCR stochasticity will have particular relevance to quantification of results from single cell sequencing, in which sequences are represented by only one or a few molecules. PMID:26187991

  1. Applying Genomic and Genetic Tools to Understand and Mitigate Damage from Exposure to Toxins

    DTIC Science & Technology

    2013-10-01

    sequences to the human genome . Genome Biol 10, R25 (2009). 26 Award number: W81XWH-09-1-0715 Title: Applying Genomic and Genetic Tools to Understand...utilizing the high-throughput technology of mRNA-seq. BODY The goal of our research program (W81XWH-09-1-0715) was to utilize genetic and genomic ...also acquired the achetf222a * * * * * 5 Award number: W81XWH-09-1-0715 Title: Applying Genomic and Genetic Tools to Understand and Mitigate

  2. Oncogenomics and the development of new cancer therapies.

    PubMed

    Strausberg, Robert L; Simpson, Andrew J G; Old, Lloyd J; Riggins, Gregory J

    2004-05-27

    Scientists have sequenced the human genome and identified most of its genes. Now it is time to use these genomic data, and the high-throughput technology developed to generate them, to tackle major health problems such as cancer. To accelerate our understanding of this disease and to produce targeted therapies, further basic mutational and functional genomic information is required. A systematic and coordinated approach, with the results freely available, should speed up progress. This will best be accomplished through an international academic and pharmaceutical oncogenomics initiative.

  3. Privacy Challenges of Genomic Big Data.

    PubMed

    Shen, Hong; Ma, Jian

    2017-01-01

    With the rapid advancement of high-throughput DNA sequencing technologies, genomics has become a big data discipline where large-scale genetic information of human individuals can be obtained efficiently with low cost. However, such massive amount of personal genomic data creates tremendous challenge for privacy, especially given the emergence of direct-to-consumer (DTC) industry that provides genetic testing services. Here we review the recent development in genomic big data and its implications on privacy. We also discuss the current dilemmas and future challenges of genomic privacy.

  4. Designing deep sequencing experiments: detecting structural variation and estimating transcript abundance.

    PubMed

    Bashir, Ali; Bansal, Vikas; Bafna, Vineet

    2010-06-18

    Massively parallel DNA sequencing technologies have enabled the sequencing of several individual human genomes. These technologies are also being used in novel ways for mRNA expression profiling, genome-wide discovery of transcription-factor binding sites, small RNA discovery, etc. The multitude of sequencing platforms, each with their unique characteristics, pose a number of design challenges, regarding the technology to be used and the depth of sequencing required for a particular sequencing application. Here we describe a number of analytical and empirical results to address design questions for two applications: detection of structural variations from paired-end sequencing and estimating mRNA transcript abundance. For structural variation, our results provide explicit trade-offs between the detection and resolution of rearrangement breakpoints, and the optimal mix of paired-read insert lengths. Specifically, we prove that optimal detection and resolution of breakpoints is achieved using a mix of exactly two insert library lengths. Furthermore, we derive explicit formulae to determine these insert length combinations, enabling a 15% improvement in breakpoint detection at the same experimental cost. On empirical short read data, these predictions show good concordance with Illumina 200 bp and 2 Kbp insert length libraries. For transcriptome sequencing, we determine the sequencing depth needed to detect rare transcripts from a small pilot study. With only 1 Million reads, we derive corrections that enable almost perfect prediction of the underlying expression probability distribution, and use this to predict the sequencing depth required to detect low expressed genes with greater than 95% probability. Together, our results form a generic framework for many design considerations related to high-throughput sequencing. We provide software tools http://bix.ucsd.edu/projects/NGS-DesignTools to derive platform independent guidelines for designing sequencing experiments (amount of sequencing, choice of insert length, mix of libraries) for novel applications of next generation sequencing.

  5. Metagenomic analysis and functional characterization of the biogas microbiome using high throughput shotgun sequencing and a novel binning strategy.

    PubMed

    Campanaro, Stefano; Treu, Laura; Kougias, Panagiotis G; De Francisci, Davide; Valle, Giorgio; Angelidaki, Irini

    2016-01-01

    Biogas production is an economically attractive technology that has gained momentum worldwide over the past years. Biogas is produced by a biologically mediated process, widely known as "anaerobic digestion." This process is performed by a specialized and complex microbial community, in which different members have distinct roles in the establishment of a collective organization. Deciphering the complex microbial community engaged in this process is interesting both for unraveling the network of bacterial interactions and for applicability potential to the derived knowledge. In this study, we dissect the bioma involved in anaerobic digestion by means of high throughput Illumina sequencing (~51 gigabases of sequence data), disclosing nearly one million genes and extracting 106 microbial genomes by a novel strategy combining two binning processes. Microbial phylogeny and putative taxonomy performed using >400 proteins revealed that the biogas community is a trove of new species. A new approach based on functional properties as per network representation was developed to assign roles to the microbial species. The organization of the anaerobic digestion microbiome is resembled by a funnel concept, in which the microbial consortium presents a progressive functional specialization while reaching the final step of the process (i.e., methanogenesis). Key microbial genomes encoding enzymes involved in specific metabolic pathways, such as carbohydrates utilization, fatty acids degradation, amino acids fermentation, and syntrophic acetate oxidation, were identified. Additionally, the analysis identified a new uncultured archaeon that was putatively related to Methanomassiliicoccales but surprisingly having a methylotrophic methanogenic pathway. This study is a pioneer research on the phylogenetic and functional characterization of the microbial community populating biogas reactors. By applying for the first time high-throughput sequencing and a novel binning strategy, the identified genes were anchored to single genomes providing a clear understanding of their metabolic pathways and highlighting their involvement in anaerobic digestion. The overall research established a reference catalog of biogas microbial genomes that will greatly simplify future genomic studies.

  6. Application of Quaternion in improving the quality of global sequence alignment scores for an ambiguous sequence target in Streptococcus pneumoniae DNA

    NASA Astrophysics Data System (ADS)

    Lestari, D.; Bustamam, A.; Novianti, T.; Ardaneswari, G.

    2017-07-01

    DNA sequence can be defined as a succession of letters, representing the order of nucleotides within DNA, using a permutation of four DNA base codes including adenine (A), guanine (G), cytosine (C), and thymine (T). The precise code of the sequences is determined using DNA sequencing methods and technologies, which have been developed since the 1970s and currently become highly developed, advanced and highly throughput sequencing technologies. So far, DNA sequencing has greatly accelerated biological and medical research and discovery. However, in some cases DNA sequencing could produce any ambiguous and not clear enough sequencing results that make them quite difficult to be determined whether these codes are A, T, G, or C. To solve these problems, in this study we can introduce other representation of DNA codes namely Quaternion Q = (PA, PT, PG, PC), where PA, PT, PG, PC are the probability of A, T, G, C bases that could appear in Q and PA + PT + PG + PC = 1. Furthermore, using Quaternion representations we are able to construct the improved scoring matrix for global sequence alignment processes, by applying a dot product method. Moreover, this scoring matrix produces better and higher quality of the match and mismatch score between two DNA base codes. In implementation, we applied the Needleman-Wunsch global sequence alignment algorithm using Octave, to analyze our target sequence which contains some ambiguous sequence data. The subject sequences are the DNA sequences of Streptococcus pneumoniae families obtained from the Genebank, meanwhile the target DNA sequence are received from our collaborator database. As the results we found the Quaternion representations improve the quality of the sequence alignment score and we can conclude that DNA sequence target has maximum similarity with Streptococcus pneumoniae.

  7. Evaluation of Sequencing Approaches for High-Throughput Transcriptomics - (BOSC)

    EPA Science Inventory

    Whole-genome in vitro transcriptomics has shown the capability to identify mechanisms of action and estimates of potency for chemical-mediated effects in a toxicological framework, but with limited throughput and high cost. The generation of high-throughput global gene expression...

  8. Validation of the high-throughput marker technology DArT using the model plant Arabidopsis thaliana.

    PubMed

    Wittenberg, Alexander H J; van der Lee, Theo; Cayla, Cyril; Kilian, Andrzej; Visser, Richard G F; Schouten, Henk J

    2005-08-01

    Diversity Arrays Technology (DArT) is a microarray-based DNA marker technique for genome-wide discovery and genotyping of genetic variation. DArT allows simultaneous scoring of hundreds of restriction site based polymorphisms between genotypes and does not require DNA sequence information or site-specific oligonucleotides. This paper demonstrates the potential of DArT for genetic mapping by validating the quality and molecular basis of the markers, using the model plant Arabidopsis thaliana. Restriction fragments from a genomic representation of the ecotype Landsberg erecta (Ler) were amplified by PCR, individualized by cloning and spotted onto glass slides. The arrays were then hybridized with labeled genomic representations of the ecotypes Columbia (Col) and Ler and of individuals from an F(2) population obtained from a Col x Ler cross. The scoring of markers with specialized software was highly reproducible and 107 markers could unambiguously be ordered on a genetic linkage map. The marker order on the genetic linkage map coincided with the order on the DNA sequence map. Sequencing of the Ler markers and alignment with the available Col genome sequence confirmed that the polymorphism in DArT markers is largely a result of restriction site polymorphisms.

  9. Competitive Genomic Screens of Barcoded Yeast Libraries

    PubMed Central

    Urbanus, Malene; Proctor, Michael; Heisler, Lawrence E.; Giaever, Guri; Nislow, Corey

    2011-01-01

    By virtue of advances in next generation sequencing technologies, we have access to new genome sequences almost daily. The tempo of these advances is accelerating, promising greater depth and breadth. In light of these extraordinary advances, the need for fast, parallel methods to define gene function becomes ever more important. Collections of genome-wide deletion mutants in yeasts and E. coli have served as workhorses for functional characterization of gene function, but this approach is not scalable, current gene-deletion approaches require each of the thousands of genes that comprise a genome to be deleted and verified. Only after this work is complete can we pursue high-throughput phenotyping. Over the past decade, our laboratory has refined a portfolio of competitive, miniaturized, high-throughput genome-wide assays that can be performed in parallel. This parallelization is possible because of the inclusion of DNA 'tags', or 'barcodes,' into each mutant, with the barcode serving as a proxy for the mutation and one can measure the barcode abundance to assess mutant fitness. In this study, we seek to fill the gap between DNA sequence and barcoded mutant collections. To accomplish this we introduce a combined transposon disruption-barcoding approach that opens up parallel barcode assays to newly sequenced, but poorly characterized microbes. To illustrate this approach we present a new Candida albicans barcoded disruption collection and describe how both microarray-based and next generation sequencing-based platforms can be used to collect 10,000 - 1,000,000 gene-gene and drug-gene interactions in a single experiment. PMID:21860376

  10. Diversity arrays technology: a generic genome profiling technology on open platforms.

    PubMed

    Kilian, Andrzej; Wenzl, Peter; Huttner, Eric; Carling, Jason; Xia, Ling; Blois, Hélène; Caig, Vanessa; Heller-Uszynska, Katarzyna; Jaccoud, Damian; Hopper, Colleen; Aschenbrenner-Kilian, Malgorzata; Evers, Margaret; Peng, Kaiman; Cayla, Cyril; Hok, Puthick; Uszynski, Grzegorz

    2012-01-01

    In the last 20 years, we have observed an exponential growth of the DNA sequence data and simular increase in the volume of DNA polymorphism data generated by numerous molecular marker technologies. Most of the investment, and therefore progress, concentrated on human genome and genomes of selected model species. Diversity Arrays Technology (DArT), developed over a decade ago, was among the first "democratizing" genotyping technologies, as its performance was primarily driven by the level of DNA sequence variation in the species rather than by the level of financial investment. DArT also proved more robust to genome size and ploidy-level differences among approximately 60 organisms for which DArT was developed to date compared to other high-throughput genotyping technologies. The success of DArT in a number of organisms, including a wide range of "orphan crops," can be attributed to the simplicity of underlying concepts: DArT combines genome complexity reduction methods enriching for genic regions with a highly parallel assay readout on a number of "open-access" microarray platforms. The quantitative nature of the assay enabled a number of applications in which allelic frequencies can be estimated from DArT arrays. A typical DArT assay tests for polymorphism tens of thousands of genomic loci with the final number of markers reported (hundreds to thousands) reflecting the level of DNA sequence variation in the tested loci. Detailed DArT methods, protocols, and a range of their application examples as well as DArT's evolution path are presented.

  11. Genome-wide prediction of cis-regulatory regions using supervised deep learning methods.

    PubMed

    Li, Yifeng; Shi, Wenqiang; Wasserman, Wyeth W

    2018-05-31

    In the human genome, 98% of DNA sequences are non-protein-coding regions that were previously disregarded as junk DNA. In fact, non-coding regions host a variety of cis-regulatory regions which precisely control the expression of genes. Thus, Identifying active cis-regulatory regions in the human genome is critical for understanding gene regulation and assessing the impact of genetic variation on phenotype. The developments of high-throughput sequencing and machine learning technologies make it possible to predict cis-regulatory regions genome wide. Based on rich data resources such as the Encyclopedia of DNA Elements (ENCODE) and the Functional Annotation of the Mammalian Genome (FANTOM) projects, we introduce DECRES based on supervised deep learning approaches for the identification of enhancer and promoter regions in the human genome. Due to their ability to discover patterns in large and complex data, the introduction of deep learning methods enables a significant advance in our knowledge of the genomic locations of cis-regulatory regions. Using models for well-characterized cell lines, we identify key experimental features that contribute to the predictive performance. Applying DECRES, we delineate locations of 300,000 candidate enhancers genome wide (6.8% of the genome, of which 40,000 are supported by bidirectional transcription data), and 26,000 candidate promoters (0.6% of the genome). The predicted annotations of cis-regulatory regions will provide broad utility for genome interpretation from functional genomics to clinical applications. The DECRES model demonstrates potentials of deep learning technologies when combined with high-throughput sequencing data, and inspires the development of other advanced neural network models for further improvement of genome annotations.

  12. Validation of high throughput sequencing and microbial forensics applications

    PubMed Central

    2014-01-01

    High throughput sequencing (HTS) generates large amounts of high quality sequence data for microbial genomics. The value of HTS for microbial forensics is the speed at which evidence can be collected and the power to characterize microbial-related evidence to solve biocrimes and bioterrorist events. As HTS technologies continue to improve, they provide increasingly powerful sets of tools to support the entire field of microbial forensics. Accurate, credible results allow analysis and interpretation, significantly influencing the course and/or focus of an investigation, and can impact the response of the government to an attack having individual, political, economic or military consequences. Interpretation of the results of microbial forensic analyses relies on understanding the performance and limitations of HTS methods, including analytical processes, assays and data interpretation. The utility of HTS must be defined carefully within established operating conditions and tolerances. Validation is essential in the development and implementation of microbial forensics methods used for formulating investigative leads attribution. HTS strategies vary, requiring guiding principles for HTS system validation. Three initial aspects of HTS, irrespective of chemistry, instrumentation or software are: 1) sample preparation, 2) sequencing, and 3) data analysis. Criteria that should be considered for HTS validation for microbial forensics are presented here. Validation should be defined in terms of specific application and the criteria described here comprise a foundation for investigators to establish, validate and implement HTS as a tool in microbial forensics, enhancing public safety and national security. PMID:25101166

  13. Validation of high throughput sequencing and microbial forensics applications.

    PubMed

    Budowle, Bruce; Connell, Nancy D; Bielecka-Oder, Anna; Colwell, Rita R; Corbett, Cindi R; Fletcher, Jacqueline; Forsman, Mats; Kadavy, Dana R; Markotic, Alemka; Morse, Stephen A; Murch, Randall S; Sajantila, Antti; Schmedes, Sarah E; Ternus, Krista L; Turner, Stephen D; Minot, Samuel

    2014-01-01

    High throughput sequencing (HTS) generates large amounts of high quality sequence data for microbial genomics. The value of HTS for microbial forensics is the speed at which evidence can be collected and the power to characterize microbial-related evidence to solve biocrimes and bioterrorist events. As HTS technologies continue to improve, they provide increasingly powerful sets of tools to support the entire field of microbial forensics. Accurate, credible results allow analysis and interpretation, significantly influencing the course and/or focus of an investigation, and can impact the response of the government to an attack having individual, political, economic or military consequences. Interpretation of the results of microbial forensic analyses relies on understanding the performance and limitations of HTS methods, including analytical processes, assays and data interpretation. The utility of HTS must be defined carefully within established operating conditions and tolerances. Validation is essential in the development and implementation of microbial forensics methods used for formulating investigative leads attribution. HTS strategies vary, requiring guiding principles for HTS system validation. Three initial aspects of HTS, irrespective of chemistry, instrumentation or software are: 1) sample preparation, 2) sequencing, and 3) data analysis. Criteria that should be considered for HTS validation for microbial forensics are presented here. Validation should be defined in terms of specific application and the criteria described here comprise a foundation for investigators to establish, validate and implement HTS as a tool in microbial forensics, enhancing public safety and national security.

  14. Outbreak Investigation Using High-Throughput Genome Sequencing within a Diagnostic Microbiology Laboratory

    PubMed Central

    Sherry, Norelle L.; Porter, Jessica L.; Seemann, Torsten; Watkins, Andrew; Stinear, Timothy P.

    2013-01-01

    Next-generation sequencing (NGS) of bacterial genomes has recently become more accessible and is now available to the routine diagnostic microbiology laboratory. However, questions remain regarding its feasibility, particularly with respect to data analysis in nonspecialist centers. To test the applicability of NGS to outbreak investigations, Ion Torrent sequencing was used to investigate a putative multidrug-resistant Escherichia coli outbreak in the neonatal unit of the Mercy Hospital for Women, Melbourne, Australia. Four suspected outbreak strains and a comparator strain were sequenced. Genome-wide single nucleotide polymorphism (SNP) analysis demonstrated that the four neonatal intensive care unit (NICU) strains were identical and easily differentiated from the comparator strain. Genome sequence data also determined that the NICU strains belonged to multilocus sequence type 131 and carried the blaCTX-M-15 extended-spectrum beta-lactamase. Comparison of the outbreak strains to all publicly available complete E. coli genome sequences showed that they clustered with neonatal meningitis and uropathogenic isolates. The turnaround time from a positive culture to the completion of sequencing (prior to data analysis) was 5 days, and the cost was approximately $300 per strain (for the reagents only). The main obstacles to a mainstream adoption of NGS technologies in diagnostic microbiology laboratories are currently cost (although this is decreasing), a paucity of user-friendly and clinically focused bioinformatics platforms, and a lack of genomics expertise outside the research environment. Despite these hurdles, NGS technologies provide unparalleled high-resolution genotyping in a short time frame and are likely to be widely implemented in the field of diagnostic microbiology in the next few years, particularly for epidemiological investigations (replacing current typing methods) and the characterization of resistance determinants. Clinical microbiologists need to familiarize themselves with these technologies and their applications. PMID:23408689

  15. Assessment of clinical analytical sensitivity and specificity of next-generation sequencing for detection of simple and complex mutations.

    PubMed

    Chin, Ephrem L H; da Silva, Cristina; Hegde, Madhuri

    2013-02-19

    Detecting mutations in disease genes by full gene sequence analysis is common in clinical diagnostic laboratories. Sanger dideoxy terminator sequencing allows for rapid development and implementation of sequencing assays in the clinical laboratory, but it has limited throughput, and due to cost constraints, only allows analysis of one or at most a few genes in a patient. Next-generation sequencing (NGS), on the other hand, has evolved rapidly, although to date it has mainly been used for large-scale genome sequencing projects and is beginning to be used in the clinical diagnostic testing. One advantage of NGS is that many genes can be analyzed easily at the same time, allowing for mutation detection when there are many possible causative genes for a specific phenotype. In addition, regions of a gene typically not tested for mutations, like deep intronic and promoter mutations, can also be detected. Here we use 20 previously characterized Sanger-sequenced positive controls in disease-causing genes to demonstrate the utility of NGS in a clinical setting using standard PCR based amplification to assess the analytical sensitivity and specificity of the technology for detecting all previously characterized changes (mutations and benign SNPs). The positive controls chosen for validation range from simple substitution mutations to complex deletion and insertion mutations occurring in autosomal dominant and recessive disorders. The NGS data was 100% concordant with the Sanger sequencing data identifying all 119 previously identified changes in the 20 samples. We have demonstrated that NGS technology is ready to be deployed in clinical laboratories. However, NGS and associated technologies are evolving, and clinical laboratories will need to invest significantly in staff and infrastructure to build the necessary foundation for success.

  16. High-throughput sequencing of the chloroplast and mitochondrion of Chlamydomonas reinhardtii to generate improved de novo assemblies, analyze expression patterns and transcript speciation, and evaluate diversity among laboratory strains and wild isolates

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Gallaher, Sean D.; Fitz-Gibbon, Sorel T.; Strenkert, Daniela

    Chlamydomonas reinhardtii is a unicellular chlorophyte alga that is widely studied as a reference organism for understanding photosynthesis, sensory and motile cilia, and for development of an algal-based platform for producing biofuels and bio-products. Its highly repetitive, ~205-kbp circular chloroplast genome and ~15.8-kbp linear mitochondrial genome were sequenced prior to the advent of high-throughput sequencing technologies. Here, high coverage shotgun sequencing was used to assemble both organellar genomes de novo. These new genomes correct dozens of errors in the prior genome sequences and annotations. Gen-ome sequencing coverage indicates that each cell contains on average 83 copies of the chloroplast genomemore » and 130 copies of the mitochondrial genome. Using protocols and analyses optimized for organellar tran-scripts, RNA-Seq was used to quantify their relative abundances across 12 different growth conditions. Forty-six percent of total cellular mRNA is attributable to high expression from a few dozen chloroplast genes. RNA-Seq data were used to guide gene annotation, to demonstrate polycistronic gene expression, and to quantify splicing of psaA and psbA introns. In contrast to a conclusion from a recent study, we found that chloroplast transcripts are not edited. Unexpectedly, cytosine-rich polynucleotide tails were observed at the 3’-end of all mitochondrial transcripts. A comparative genomics analysis of eight laboratory strains and 11 wild isolates of C. reinhardtii identified 2658 variants in the organellargenomes, which is 1/10th as much genetic diversity as is found in the nucleus.« less

  17. Transcriptomics of In Vitro Immune-Stimulated Hemocytes from the Manila Clam Ruditapes philippinarum Using High-Throughput Sequencing

    PubMed Central

    Moreira, Rebeca; Balseiro, Pablo; Planas, Josep V.; Fuste, Berta; Beltran, Sergi; Novoa, Beatriz; Figueras, Antonio

    2012-01-01

    Background The Manila clam (Ruditapes philippinarum) is a worldwide cultured bivalve species with important commercial value. Diseases affecting this species can result in large economic losses. Because knowledge of the molecular mechanisms of the immune response in bivalves, especially clams, is scarce and fragmentary, we sequenced RNA from immune-stimulated R. philippinarum hemocytes by 454-pyrosequencing to identify genes involved in their immune defense against infectious diseases. Methodology and Principal Findings High-throughput deep sequencing of R. philippinarum using 454 pyrosequencing technology yielded 974,976 high-quality reads with an average read length of 250 bp. The reads were assembled into 51,265 contigs and the 44.7% of the translated nucleotide sequences into protein were annotated successfully. The 35 most frequently found contigs included a large number of immune-related genes, and a more detailed analysis showed the presence of putative members of several immune pathways and processes like the apoptosis, the toll like signaling pathway and the complement cascade. We have found sequences from molecules never described in bivalves before, especially in the complement pathway where almost all the components are present. Conclusions This study represents the first transcriptome analysis using 454-pyrosequencing conducted on R. philippinarum focused on its immune system. Our results will provide a rich source of data to discover and identify new genes, which will serve as a basis for microarray construction and the study of gene expression as well as for the identification of genetic markers. The discovery of new immune sequences was very productive and resulted in a large variety of contigs that may play a role in the defense mechanisms of Ruditapes philippinarum. PMID:22536348

  18. Computational approaches to define a human milk metaglycome

    PubMed Central

    Agravat, Sanjay B.; Song, Xuezheng; Rojsajjakul, Teerapat; Cummings, Richard D.; Smith, David F.

    2016-01-01

    Motivation: The goal of deciphering the human glycome has been hindered by the lack of high-throughput sequencing methods for glycans. Although mass spectrometry (MS) is a key technology in glycan sequencing, MS alone provides limited information about the identification of monosaccharide constituents, their anomericity and their linkages. These features of individual, purified glycans can be partly identified using well-defined glycan-binding proteins, such as lectins and antibodies that recognize specific determinants within glycan structures. Results: We present a novel computational approach to automate the sequencing of glycans using metadata-assisted glycan sequencing, which combines MS analyses with glycan structural information from glycan microarray technology. Success in this approach was aided by the generation of a ‘virtual glycome’ to represent all potential glycan structures that might exist within a metaglycomes based on a set of biosynthetic assumptions using known structural information. We exploited this approach to deduce the structures of soluble glycans within the human milk glycome by matching predicted structures based on experimental data against the virtual glycome. This represents the first meta-glycome to be defined using this method and we provide a publically available web-based application to aid in sequencing milk glycans. Availability and implementation: http://glycomeseq.emory.edu Contact: sagravat@bidmc.harvard.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:26803164

  19. Strategies for optimizing BioNano and Dovetail explored through a second reference quality assembly for the legume model, Medicago truncatula.

    PubMed

    Moll, Karen M; Zhou, Peng; Ramaraj, Thiruvarangan; Fajardo, Diego; Devitt, Nicholas P; Sadowsky, Michael J; Stupar, Robert M; Tiffin, Peter; Miller, Jason R; Young, Nevin D; Silverstein, Kevin A T; Mudge, Joann

    2017-08-04

    Third generation sequencing technologies, with sequencing reads in the tens- of kilo-bases, facilitate genome assembly by spanning ambiguous regions and improving continuity. This has been critical for plant genomes, which are difficult to assemble due to high repeat content, gene family expansions, segmental and tandem duplications, and polyploidy. Recently, high-throughput mapping and scaffolding strategies have further improved continuity. Together, these long-range technologies enable quality draft assemblies of complex genomes in a cost-effective and timely manner. Here, we present high quality genome assemblies of the model legume plant, Medicago truncatula (R108) using PacBio, Dovetail Chicago (hereafter, Dovetail) and BioNano technologies. To test these technologies for plant genome assembly, we generated five assemblies using all possible combinations and ordering of these three technologies in the R108 assembly. While the BioNano and Dovetail joins overlapped, they also showed complementary gains in continuity and join numbers. Both technologies spanned repetitive regions that PacBio alone was unable to bridge. Combining technologies, particularly Dovetail followed by BioNano, resulted in notable improvements compared to Dovetail or BioNano alone. A combination of PacBio, Dovetail, and BioNano was used to generate a high quality draft assembly of R108, a M. truncatula accession widely used in studies of functional genomics. As a test for the usefulness of the resulting genome sequence, the new R108 assembly was used to pinpoint breakpoints and characterize flanking sequence of a previously identified translocation between chromosomes 4 and 8, identifying more than 22.7 Mb of novel sequence not present in the earlier A17 reference assembly. Adding Dovetail followed by BioNano data yielded complementary improvements in continuity over the original PacBio assembly. This strategy proved efficient and cost-effective for developing a quality draft assembly compared to traditional reference assemblies.

  20. Preparation of Low-Input and Ligation-Free ChIP-seq Libraries Using Template-Switching Technology.

    PubMed

    Bolduc, Nathalie; Lehman, Alisa P; Farmer, Andrew

    2016-10-10

    Chromatin immunoprecipitation (ChIP) followed by high-throughput sequencing (ChIP-seq) has become the gold standard for mapping of transcription factors and histone modifications throughout the genome. However, for ChIP experiments involving few cells or targeting low-abundance transcription factors, the small amount of DNA recovered makes ligation of adapters very challenging. In this unit, we describe a ChIP-seq workflow that can be applied to small cell numbers, including a robust single-tube and ligation-free method for preparation of sequencing libraries from sub-nanogram amounts of ChIP DNA. An example ChIP protocol is first presented, resulting in selective enrichment of DNA-binding proteins and cross-linked DNA fragments immobilized on beads via an antibody bridge. This is followed by a protocol for fast and easy cross-linking reversal and DNA recovery. Finally, we describe a fast, ligation-free library preparation protocol, featuring DNA SMART technology, resulting in samples ready for Illumina sequencing. © 2016 by John Wiley & Sons, Inc. Copyright © 2016 John Wiley & Sons, Inc.

  1. Application of next-generation sequencing in clinical oncology to advance personalized treatment of cancer

    PubMed Central

    Guan, Yan-Fang; Li, Gai-Rui; Wang, Rong-Jiao; Yi, Yu-Ting; Yang, Ling; Jiang, Dan; Zhang, Xiao-Ping; Peng, Yin

    2012-01-01

    With the development and improvement of new sequencing technology, next-generation sequencing (NGS) has been applied increasingly in cancer genomics research over the past decade. More recently, NGS has been adopted in clinical oncology to advance personalized treatment of cancer. NGS is used to identify novel and rare cancer mutations, detect familial cancer mutation carriers, and provide molecular rationale for appropriate targeted therapy. Compared to traditional sequencing, NGS holds many advantages, such as the ability to fully sequence all types of mutations for a large number of genes (hundreds to thousands) in a single test at a relatively low cost. However, significant challenges, particularly with respect to the requirement for simpler assays, more flexible throughput, shorter turnaround time, and most importantly, easier data analysis and interpretation, will have to be overcome to translate NGS to the bedside of cancer patients. Overall, continuous dedication to apply NGS in clinical oncology practice will enable us to be one step closer to personalized medicine. PMID:22980418

  2. Reverse Genetics and High Throughput Sequencing Methodologies for Plant Functional Genomics

    PubMed Central

    Ben-Amar, Anis; Daldoul, Samia; Reustle, Götz M.; Krczal, Gabriele; Mliki, Ahmed

    2016-01-01

    In the post-genomic era, increasingly sophisticated genetic tools are being developed with the long-term goal of understanding how the coordinated activity of genes gives rise to a complex organism. With the advent of the next generation sequencing associated with effective computational approaches, wide variety of plant species have been fully sequenced giving a wealth of data sequence information on structure and organization of plant genomes. Since thousands of gene sequences are already known, recently developed functional genomics approaches provide powerful tools to analyze plant gene functions through various gene manipulation technologies. Integration of different omics platforms along with gene annotation and computational analysis may elucidate a complete view in a system biology level. Extensive investigations on reverse genetics methodologies were deployed for assigning biological function to a specific gene or gene product. We provide here an updated overview of these high throughout strategies highlighting recent advances in the knowledge of functional genomics in plants. PMID:28217003

  3. Amplicon Sequencing of the slpH Locus Permits Culture-Independent Strain Typing of Lactobacillus helveticus in Dairy Products

    PubMed Central

    Moser, Aline; Wüthrich, Daniel; Bruggmann, Rémy; Eugster-Meier, Elisabeth; Meile, Leo; Irmler, Stefan

    2017-01-01

    The advent of massive parallel sequencing technologies has opened up possibilities for the study of the bacterial diversity of ecosystems without the need for enrichment or single strain isolation. By exploiting 78 genome data-sets from Lactobacillus helveticus strains, we found that the slpH locus that encodes a putative surface layer protein displays sufficient genetic heterogeneity to be a suitable target for strain typing. Based on high-throughput slpH gene sequencing and the detection of single-base DNA sequence variations, we established a culture-independent method to assess the biodiversity of the L. helveticus strains present in fermented dairy food. When we applied the method to study the L. helveticus strain composition in 15 natural whey cultures (NWCs) that were collected at different Gruyère, a protected designation of origin (PDO) production facilities, we detected a total of 10 sequence types (STs). In addition, we monitored the development of a three-strain mix in raclette cheese for 17 weeks. PMID:28775722

  4. DeepBase: annotation and discovery of microRNAs and other noncoding RNAs from deep-sequencing data.

    PubMed

    Yang, Jian-Hua; Qu, Liang-Hu

    2012-01-01

    Recent advances in high-throughput deep-sequencing technology have produced large numbers of short and long RNA sequences and enabled the detection and profiling of known and novel microRNAs (miRNAs) and other noncoding RNAs (ncRNAs) at unprecedented sensitivity and depth. In this chapter, we describe the use of deepBase, a database that we have developed to integrate all public deep-sequencing data and to facilitate the comprehensive annotation and discovery of miRNAs and other ncRNAs from these data. deepBase provides an integrative, interactive, and versatile web graphical interface to evaluate miRBase-annotated miRNA genes and other known ncRNAs, explores the expression patterns of miRNAs and other ncRNAs, and discovers novel miRNAs and other ncRNAs from deep-sequencing data. deepBase also provides a deepView genome browser to comparatively analyze these data at multiple levels. deepBase is available at http://deepbase.sysu.edu.cn/.

  5. Quantitative Assessment of RNA-Protein Interactions with High Throughput Sequencing - RNA Affinity Profiling (HiTS-RAP)

    PubMed Central

    Ozer, Abdullah; Tome, Jacob M.; Friedman, Robin C.; Gheba, Dan; Schroth, Gary P.; Lis, John T.

    2016-01-01

    Because RNA-protein interactions play a central role in a wide-array of biological processes, methods that enable a quantitative assessment of these interactions in a high-throughput manner are in great demand. Recently, we developed the High Throughput Sequencing-RNA Affinity Profiling (HiTS-RAP) assay, which couples sequencing on an Illumina GAIIx with the quantitative assessment of one or several proteins’ interactions with millions of different RNAs in a single experiment. We have successfully used HiTS-RAP to analyze interactions of EGFP and NELF-E proteins with their corresponding canonical and mutant RNA aptamers. Here, we provide a detailed protocol for HiTS-RAP, which can be completed in about a month (8 days hands-on time) including the preparation and testing of recombinant proteins and DNA templates, clustering DNA templates on a flowcell, high-throughput sequencing and protein binding with GAIIx, and finally data analysis. We also highlight aspects of HiTS-RAP that can be further improved and points of comparison between HiTS-RAP and two other recently developed methods, RNA-MaP and RBNS. A successful HiTS-RAP experiment provides the sequence and binding curves for approximately 200 million RNAs in a single experiment. PMID:26182240

  6. Large-Scale Biomonitoring of Remote and Threatened Ecosystems via High-Throughput Sequencing

    PubMed Central

    Gibson, Joel F.; Shokralla, Shadi; Curry, Colin; Baird, Donald J.; Monk, Wendy A.; King, Ian; Hajibabaei, Mehrdad

    2015-01-01

    Biodiversity metrics are critical for assessment and monitoring of ecosystems threatened by anthropogenic stressors. Existing sorting and identification methods are too expensive and labour-intensive to be scaled up to meet management needs. Alternately, a high-throughput DNA sequencing approach could be used to determine biodiversity metrics from bulk environmental samples collected as part of a large-scale biomonitoring program. Here we show that both morphological and DNA sequence-based analyses are suitable for recovery of individual taxonomic richness, estimation of proportional abundance, and calculation of biodiversity metrics using a set of 24 benthic samples collected in the Peace-Athabasca Delta region of Canada. The high-throughput sequencing approach was able to recover all metrics with a higher degree of taxonomic resolution than morphological analysis. The reduced cost and increased capacity of DNA sequence-based approaches will finally allow environmental monitoring programs to operate at the geographical and temporal scale required by industrial and regulatory end-users. PMID:26488407

  7. Towards Clinical Molecular Diagnosis of Inherited Cardiac Conditions: A Comparison of Bench-Top Genome DNA Sequencers

    PubMed Central

    Wilkinson, Samuel L.; John, Shibu; Walsh, Roddy; Novotny, Tomas; Valaskova, Iveta; Gupta, Manu; Game, Laurence; Barton, Paul J R.; Cook, Stuart A.; Ware, James S.

    2013-01-01

    Background Molecular genetic testing is recommended for diagnosis of inherited cardiac disease, to guide prognosis and treatment, but access is often limited by cost and availability. Recently introduced high-throughput bench-top DNA sequencing platforms have the potential to overcome these limitations. Methodology/Principal Findings We evaluated two next-generation sequencing (NGS) platforms for molecular diagnostics. The protein-coding regions of six genes associated with inherited arrhythmia syndromes were amplified from 15 human samples using parallelised multiplex PCR (Access Array, Fluidigm), and sequenced on the MiSeq (Illumina) and Ion Torrent PGM (Life Technologies). Overall, 97.9% of the target was sequenced adequately for variant calling on the MiSeq, and 96.8% on the Ion Torrent PGM. Regions missed tended to be of high GC-content, and most were problematic for both platforms. Variant calling was assessed using 107 variants detected using Sanger sequencing: within adequately sequenced regions, variant calling on both platforms was highly accurate (Sensitivity: MiSeq 100%, PGM 99.1%. Positive predictive value: MiSeq 95.9%, PGM 95.5%). At the time of the study the Ion Torrent PGM had a lower capital cost and individual runs were cheaper and faster. The MiSeq had a higher capacity (requiring fewer runs), with reduced hands-on time and simpler laboratory workflows. Both provide significant cost and time savings over conventional methods, even allowing for adjunct Sanger sequencing to validate findings and sequence exons missed by NGS. Conclusions/Significance MiSeq and Ion Torrent PGM both provide accurate variant detection as part of a PCR-based molecular diagnostic workflow, and provide alternative platforms for molecular diagnosis of inherited cardiac conditions. Though there were performance differences at this throughput, platforms differed primarily in terms of cost, scalability, protocol stability and ease of use. Compared with current molecular genetic diagnostic tests for inherited cardiac arrhythmias, these NGS approaches are faster, less expensive, and yet more comprehensive. PMID:23861798

  8. Exploring Pandora's Box: Potential and Pitfalls of Low Coverage Genome Surveys for Evolutionary Biology

    PubMed Central

    Leese, Florian; Mayer, Christoph; Agrawal, Shobhit; Dambach, Johannes; Dietz, Lars; Doemel, Jana S.; Goodall-Copstake, William P.; Held, Christoph; Jackson, Jennifer A.; Lampert, Kathrin P.; Linse, Katrin; Macher, Jan N.; Nolzen, Jennifer; Raupach, Michael J.; Rivera, Nicole T.; Schubart, Christoph D.; Striewski, Sebastian; Tollrian, Ralph; Sands, Chester J.

    2012-01-01

    High throughput sequencing technologies are revolutionizing genetic research. With this “rise of the machines”, genomic sequences can be obtained even for unknown genomes within a short time and for reasonable costs. This has enabled evolutionary biologists studying genetically unexplored species to identify molecular markers or genomic regions of interest (e.g. micro- and minisatellites, mitochondrial and nuclear genes) by sequencing only a fraction of the genome. However, when using such datasets from non-model species, it is possible that DNA from non-target contaminant species such as bacteria, viruses, fungi, or other eukaryotic organisms may complicate the interpretation of the results. In this study we analysed 14 genomic pyrosequencing libraries of aquatic non-model taxa from four major evolutionary lineages. We quantified the amount of suitable micro- and minisatellites, mitochondrial genomes, known nuclear genes and transposable elements and searched for contamination from various sources using bioinformatic approaches. Our results show that in all sequence libraries with estimated coverage of about 0.02–25%, many appropriate micro- and minisatellites, mitochondrial gene sequences and nuclear genes from different KEGG (Kyoto Encyclopedia of Genes and Genomes) pathways could be identified and characterized. These can serve as markers for phylogenetic and population genetic analyses. A central finding of our study is that several genomic libraries suffered from different biases owing to non-target DNA or mobile elements. In particular, viruses, bacteria or eukaryote endosymbionts contributed significantly (up to 10%) to some of the libraries analysed. If not identified as such, genetic markers developed from high-throughput sequencing data for non-model organisms may bias evolutionary studies or fail completely in experimental tests. In conclusion, our study demonstrates the enormous potential of low-coverage genome survey sequences and suggests bioinformatic analysis workflows. The results also advise a more sophisticated filtering for problematic sequences and non-target genome sequences prior to developing markers. PMID:23185309

  9. Opportunities and challenges for the integration of massively parallel genomic sequencing into clinical practice: lessons from the ClinSeq project.

    PubMed

    Biesecker, Leslie G

    2012-04-01

    The debate surrounding the return of results from high-throughput genomic interrogation encompasses many important issues including ethics, law, economics, and social policy. As well, the debate is also informed by the molecular, genetic, and clinical foundations of the emerging field of clinical genomics, which is based on this new technology. This article outlines the main biomedical considerations of sequencing technologies and demonstrates some of the early clinical experiences with the technology to enable the debate to stay focused on real-world practicalities. These experiences are based on early data from the ClinSeq project, which is a project to pilot the use of massively parallel sequencing in a clinical research context with a major aim to develop modes of returning results to individual subjects. The study has enrolled >900 subjects and generated exome sequence data on 572 subjects. These data are beginning to be interpreted and returned to the subjects, which provides examples of the potential usefulness and pitfalls of clinical genomics. There are numerous genetic results that can be readily derived from a genome including rare, high-penetrance traits, and carrier states. However, much work needs to be done to develop the tools and resources for genomic interpretation. The main lesson learned is that a genome sequence may be better considered as a health-care resource, rather than a test, one that can be interpreted and used over the lifetime of the patient.

  10. ZOOM Lite: next-generation sequencing data mapping and visualization software

    PubMed Central

    Zhang, Zefeng; Lin, Hao; Ma, Bin

    2010-01-01

    High-throughput next-generation sequencing technologies pose increasing demands on the efficiency, accuracy and usability of data analysis software. In this article, we present ZOOM Lite, a software for efficient reads mapping and result visualization. With a kernel capable of mapping tens of millions of Illumina or AB SOLiD sequencing reads efficiently and accurately, and an intuitive graphical user interface, ZOOM Lite integrates reads mapping and result visualization into a easy to use pipeline on desktop PC. The software handles both single-end and paired-end reads, and can output both the unique mapping result or the top N mapping results for each read. Additionally, the software takes a variety of input file formats and outputs to several commonly used result formats. The software is freely available at http://bioinfor.com/zoom/lite/. PMID:20530531

  11. Screening for SNPs with Allele-Specific Methylation based on Next-Generation Sequencing Data.

    PubMed

    Hu, Bo; Ji, Yuan; Xu, Yaomin; Ting, Angela H

    2013-05-01

    Allele-specific methylation (ASM) has long been studied but mainly documented in the context of genomic imprinting and X chromosome inactivation. Taking advantage of the next-generation sequencing technology, we conduct a high-throughput sequencing experiment with four prostate cell lines to survey the whole genome and identify single nucleotide polymorphisms (SNPs) with ASM. A Bayesian approach is proposed to model the counts of short reads for each SNP conditional on its genotypes of multiple subjects, leading to a posterior probability of ASM. We flag SNPs with high posterior probabilities of ASM by accounting for multiple comparisons based on posterior false discovery rates. Applying the Bayesian approach to the in-house prostate cell line data, we identify 269 SNPs as candidates of ASM. A simulation study is carried out to demonstrate the quantitative performance of the proposed approach.

  12. Chipster: user-friendly analysis software for microarray and other high-throughput data.

    PubMed

    Kallio, M Aleksi; Tuimala, Jarno T; Hupponen, Taavi; Klemelä, Petri; Gentile, Massimiliano; Scheinin, Ilari; Koski, Mikko; Käki, Janne; Korpelainen, Eija I

    2011-10-14

    The growth of high-throughput technologies such as microarrays and next generation sequencing has been accompanied by active research in data analysis methodology, producing new analysis methods at a rapid pace. While most of the newly developed methods are freely available, their use requires substantial computational skills. In order to enable non-programming biologists to benefit from the method development in a timely manner, we have created the Chipster software. Chipster (http://chipster.csc.fi/) brings a powerful collection of data analysis methods within the reach of bioscientists via its intuitive graphical user interface. Users can analyze and integrate different data types such as gene expression, miRNA and aCGH. The analysis functionality is complemented with rich interactive visualizations, allowing users to select datapoints and create new gene lists based on these selections. Importantly, users can save the performed analysis steps as reusable, automatic workflows, which can also be shared with other users. Being a versatile and easily extendable platform, Chipster can be used for microarray, proteomics and sequencing data. In this article we describe its comprehensive collection of analysis and visualization tools for microarray data using three case studies. Chipster is a user-friendly analysis software for high-throughput data. Its intuitive graphical user interface enables biologists to access a powerful collection of data analysis and integration tools, and to visualize data interactively. Users can collaborate by sharing analysis sessions and workflows. Chipster is open source, and the server installation package is freely available.

  13. Chipster: user-friendly analysis software for microarray and other high-throughput data

    PubMed Central

    2011-01-01

    Background The growth of high-throughput technologies such as microarrays and next generation sequencing has been accompanied by active research in data analysis methodology, producing new analysis methods at a rapid pace. While most of the newly developed methods are freely available, their use requires substantial computational skills. In order to enable non-programming biologists to benefit from the method development in a timely manner, we have created the Chipster software. Results Chipster (http://chipster.csc.fi/) brings a powerful collection of data analysis methods within the reach of bioscientists via its intuitive graphical user interface. Users can analyze and integrate different data types such as gene expression, miRNA and aCGH. The analysis functionality is complemented with rich interactive visualizations, allowing users to select datapoints and create new gene lists based on these selections. Importantly, users can save the performed analysis steps as reusable, automatic workflows, which can also be shared with other users. Being a versatile and easily extendable platform, Chipster can be used for microarray, proteomics and sequencing data. In this article we describe its comprehensive collection of analysis and visualization tools for microarray data using three case studies. Conclusions Chipster is a user-friendly analysis software for high-throughput data. Its intuitive graphical user interface enables biologists to access a powerful collection of data analysis and integration tools, and to visualize data interactively. Users can collaborate by sharing analysis sessions and workflows. Chipster is open source, and the server installation package is freely available. PMID:21999641

  14. Bio-Intelligence: A Research Program Facilitating the Development of New Paradigms for Tomorrow's Patient Care

    NASA Astrophysics Data System (ADS)

    Phan, Sieu; Famili, Fazel; Liu, Ziying; Peña-Castillo, Lourdes

    The advancement of omics technologies in concert with the enabling information technology development has accelerated biological research to a new realm in a blazing speed and sophistication. The limited single gene assay to the high throughput microarray assay and the laborious manual count of base-pairs to the robotic assisted machinery in genome sequencing are two examples to name. Yet even more sophisticated, the recent development in literature mining and artificial intelligence has allowed researchers to construct complex gene networks unraveling many formidable biological puzzles. To harness these emerging technologies to their full potential to medical applications, the Bio-intelligence program at the Institute for Information Technology, National Research Council Canada, aims to develop and exploit artificial intelligence and bioinformatics technologies to facilitate the development of intelligent decision support tools and systems to improve patient care - for early detection, accurate diagnosis/prognosis of disease, and better personalized therapeutic management.

  15. Alignment-Free Design of Highly Discriminatory Diagnostic Primer Sets for Escherichia coli O104:H4 Outbreak Strains

    PubMed Central

    Bielaszewska, Martina; Karch, Helge; Toth, Ian K.

    2012-01-01

    Background An Escherichia coli O104:H4 outbreak in Germany in summer 2011 caused 53 deaths, over 4000 individual infections across Europe, and considerable economic, social and political impact. This outbreak was the first in a position to exploit rapid, benchtop high-throughput sequencing (HTS) technologies and crowdsourced data analysis early in its investigation, establishing a new paradigm for rapid response to disease threats. We describe a novel strategy for design of diagnostic PCR primers that exploited this rapid draft bacterial genome sequencing to distinguish between E. coli O104:H4 outbreak isolates and other pathogenic E. coli isolates, including the historical hæmolytic uræmic syndrome (HUSEC) E. coli HUSEC041 O104:H4 strain, which possesses the same serotype as the outbreak isolates. Methodology/Principal Findings Primers were designed using a novel alignment-free strategy against eleven draft whole genome assemblies of E. coli O104:H4 German outbreak isolates from the E. coli O104:H4 Genome Analysis Crowd-Sourcing Consortium website, and a negative sequence set containing 69 E. coli chromosome and plasmid sequences from public databases. Validation in vitro against 21 ‘positive’ E. coli O104:H4 outbreak and 32 ‘negative’ non-outbreak EHEC isolates indicated that individual primer sets exhibited 100% sensitivity for outbreak isolates, with false positive rates of between 9% and 22%. A minimal combination of two primers discriminated between outbreak and non-outbreak E. coli isolates with 100% sensitivity and 100% specificity. Conclusions/Significance Draft genomes of isolates of disease outbreak bacteria enable high throughput primer design and enhanced diagnostic performance in comparison to traditional molecular assays. Future outbreak investigations will be able to harness HTS rapidly to generate draft genome sequences and diagnostic primer sets, greatly facilitating epidemiology and clinical diagnostics. We expect that high throughput primer design strategies will enable faster, more precise responses to future disease outbreaks of bacterial origin, and help to mitigate their societal impact. PMID:22496820

  16. Comprehensive circular RNA expression profile in radiation-treated HeLa cells and analysis of radioresistance-related circRNAs.

    PubMed

    Yu, Duo; Li, Yunfeng; Ming, Zhihui; Wang, Hongyong; Dong, Zhuo; Qiu, Ling; Wang, Tiejun

    2018-01-01

    Cervical cancer is one of the most common cancers in women worldwide. Malignant tumors develop resistance mechanisms and are less sensitive to or do not respond to irradiation. With the development of high-throughput sequencing technologies, circular RNA (circRNA) has been identified in an increasing number of diseases, especially cancers. It has been reported that circRNA can compete with microRNAs (miRNAs) to change the stability or translation of target RNAs, thus regulating gene expression at the transcriptional level. However, the role of circRNAs in cervical cancer and the radioresistance mechanisms of HeLa cells are unknown. The objective of this study is to investigate the role of circRNAs in radioresistance in HeLa cells. High-throughput sequencing and bioinformatics analysis of irradiated and sham-irradiated HeLa cells. The reliability of high-throughput RNA sequencing was validated using quantitative real-time polymerase chain reaction. The most significant circRNA functions and pathways were selected by Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) analyses. A circRNA-miRNA-target gene interaction network was used to find circRNAs associated with radioresistance. Moreover, a protein-protein interaction network was constructed to identify radioresistance-related hub proteins. High-throughput sequencing allowed the identification of 16,893 circRNAs involved in the response of HeLa cells to radiation. Compared with the control group, there were 153 differentially expressed circRNAs, of which 76 were up-regulated and 77 were down-regulated. GO covered three domains: biological process (BP), cellular component (CC) and molecular function (MF). The terms assigned to the BP domain were peptidyl-tyrosine dephosphorylation and regulation of cell migration. The identified CC terms were cell-cell adherens junction, nucleoplasm and cytosol, and the identified MF terms were protein binding and protein tyrosine phosphatase activity. The top five KEGG pathways were MAPK signaling pathway, endocytosis, axon guidance, neurotrophin signaling pathway, and SNARE interactions in vesicular transport. The protein-protein interaction analysis indicated that 19 proteins might be hub proteins. CircRNAs may play a major role in the response to radiation. These findings may improve our understanding of the role of circRNAs in radioresistance in HeLa cells and allow the development of novel therapeutic approaches.

  17. Characterization of the indigenous microflora in raw and pasteurized buffalo milk during storage at refrigeration temperature by high-throughput sequencing

    USDA-ARS?s Scientific Manuscript database

    The effect of refrigeration on bacterial communities within raw and pasteurized buffalo milk was studied using high-throughput sequencing. High quality samples of raw buffalo milk were obtained from five dairy farms in the Guangxi province of China. A sample of each milk was pasteurized, and both r...

  18. Increasing ecological inference from high throughput sequencing of fungi in the environment through a tagging approach

    Treesearch

    D. Lee Taylor; Michael G. Booth; Jack W. McFarland; Ian C. Herriott; Niall J. Lennon; Chad Nusbaum; Thomas G. Marr

    2008-01-01

    High throughput sequencing methods are widely used in analyses of microbial diversity but are generally applied to small numbers of samples, which precludes charaterization of patterns of microbial diversity across space and time. We have designed a primer-tagging approach that allows pooling and subsequent sorting of numerous samples, which is directed to...

  19. Personal Genomic Information Management and Personalized Medicine: Challenges, Current Solutions, and Roles of HIM Professionals

    PubMed Central

    Alzu'bi, Amal; Zhou, Leming; Watzlaf, Valerie

    2014-01-01

    In recent years, the term personalized medicine has received more and more attention in the field of healthcare. The increasing use of this term is closely related to the astonishing advancement in DNA sequencing technologies and other high-throughput biotechnologies. A large amount of personal genomic data can be generated by these technologies in a short time. Consequently, the needs for managing, analyzing, and interpreting these personal genomic data to facilitate personalized care are escalated. In this article, we discuss the challenges for implementing genomics-based personalized medicine in healthcare, current solutions to these challenges, and the roles of health information management (HIM) professionals in genomics-based personalized medicine. PMID:24808804

  20. Old knowledge and new technologies allow rapid development of model organisms

    PubMed Central

    Cook, Charles E.; Chenevert, Janet; Larsson, Tomas A.; Arendt, Detlev; Houliston, Evelyn; Lénárt, Péter

    2016-01-01

    Until recently the set of “model” species used commonly for cell biology was limited to a small number of well-understood organisms, and developing a new model was prohibitively expensive or time-consuming. With the current rapid advances in technology, in particular low-cost high-throughput sequencing, it is now possible to develop molecular resources fairly rapidly. Wider sampling of biological diversity can only accelerate progress in addressing cellular mechanisms and shed light on how they are adapted to varied physiological contexts. Here we illustrate how historical knowledge and new technologies can reveal the potential of nonconventional organisms, and we suggest guidelines for selecting new experimental models. We also present examples of nonstandard marine metazoan model species that have made important contributions to our understanding of biological processes. PMID:26976934

  1. Single-cell genome sequencing at ultra-high-throughput with microfluidic droplet barcoding.

    PubMed

    Lan, Freeman; Demaree, Benjamin; Ahmed, Noorsher; Abate, Adam R

    2017-07-01

    The application of single-cell genome sequencing to large cell populations has been hindered by technical challenges in isolating single cells during genome preparation. Here we present single-cell genomic sequencing (SiC-seq), which uses droplet microfluidics to isolate, fragment, and barcode the genomes of single cells, followed by Illumina sequencing of pooled DNA. We demonstrate ultra-high-throughput sequencing of >50,000 cells per run in a synthetic community of Gram-negative and Gram-positive bacteria and fungi. The sequenced genomes can be sorted in silico based on characteristic sequences. We use this approach to analyze the distributions of antibiotic-resistance genes, virulence factors, and phage sequences in microbial communities from an environmental sample. The ability to routinely sequence large populations of single cells will enable the de-convolution of genetic heterogeneity in diverse cell populations.

  2. Identification of species with DNA-based technology: current progress and challenges.

    PubMed

    Pereira, Filipe; Carneiro, João; Amorim, António

    2008-01-01

    One of the grand challenges of modern biology is to develop accurate and reliable technologies for a rapid screening of DNA sequence variation. This topic of research is of prime importance for the detection and identification of species in numerous fields of investigation, such as taxonomy, epidemiology, forensics, archaeology or ecology. Molecular identification is also central for the diagnosis, treatment and control of infections caused by different pathogens. In recent years, a variety of DNA-based approaches have been developed for the identification of individuals in a myriad of taxonomic groups. Here, we provide an overview of most commonly used assays, with emphasis on those based on DNA hybridizations, restriction enzymes, random PCR amplifications, species-specific PCR primers and DNA sequencing. A critical evaluation of all methods is presented focusing on their discriminatory power, reproducibility and user-friendliness. Having in mind that the current trend is to develop small-scale devices with a high-throughput capacity, we briefly review recent technological achievements for DNA analysis that offer great potentials for the identification of species.

  3. High-Throughput Sequence Analysis of Turbot (Scophthalmus maximus) Transcriptome Using 454-Pyrosequencing for the Discovery of Antiviral Immune Genes

    PubMed Central

    Pereiro, Patricia; Balseiro, Pablo; Romero, Alejandro; Dios, Sonia; Forn-Cuni, Gabriel; Fuste, Berta; Planas, Josep V.; Beltran, Sergi; Novoa, Beatriz; Figueras, Antonio

    2012-01-01

    Background Turbot (Scophthalmus maximus L.) is an important aquacultural resource both in Europe and Asia. However, there is little information on gene sequences available in public databases. Currently, one of the main problems affecting the culture of this flatfish is mortality due to several pathogens, especially viral diseases which are not treatable. In order to identify new genes involved in immune defense, we conducted 454-pyrosequencing of the turbot transcriptome after different immune stimulations. Methodology/Principal Findings Turbot were injected with viral stimuli to increase the expression level of immune-related genes. High-throughput deep sequencing using 454-pyrosequencing technology yielded 915,256 high-quality reads. These sequences were assembled into 55,404 contigs that were subjected to annotation steps. Intriguingly, 55.16% of the deduced protein was not significantly similar to any sequences in the databases used for the annotation and only 0.85% of the BLASTx top-hits matched S. maximus protein sequences. This relatively low level of annotation is possibly due to the limited information for this specie and other flatfish in the database. These results suggest the identification of a large number of new genes in turbot and in fish in general. A more detailed analysis showed the presence of putative members of several innate and specific immune pathways. Conclusions/Significance To our knowledge, this study is the first transcriptome analysis using 454-pyrosequencing for turbot. Previously, there were only 12,471 EST and less of 1,500 nucleotide sequences for S. maximus in NCBI database. Our results provide a rich source of data (55,404 contigs and 181,845 singletons) for discovering and identifying new genes, which will serve as a basis for microarray construction, gene expression characterization and for identification of genetic markers to be used in several applications. Immune stimulation in turbot was very effective, obtaining an enormous variety of sequences belonging to genes involved in the defense mechanisms. PMID:22629298

  4. A General Method for Discovering Inhibitors of Protein–DNA Interactions Using Photonic Crystal Biosensors

    PubMed Central

    Chan, Leo L.; Pineda, Maria; Heeres, James T.; Hergenrother, Paul J.; Cunningham, Brian T.

    2009-01-01

    Protein–DNA interactions are essential for fundamental cellular processes such as transcription, DNA damage repair, and apoptosis. As such, small molecule disruptors of these interactions could be powerful tools for investigation of these biological processes, and such compounds would have great potential as therapeutics. Unfortunately, there are few methods available for the rapid identification of compounds that disrupt protein–DNA interactions. Here we show that photonic crystal (PC) technology can be utilized to detect protein–DNA interactions, and can be used in a high-throughput screening mode to identify compounds that prevent protein–DNA binding. The PC technology is used to detect binding between protein–DNA interactions that are DNA-sequence-dependent (the bacterial toxin–antitoxin system MazEF) and those that are DNA-sequence-independent (the human apoptosis inducing factor (AIF)). The PC technology was further utilized in a screen for inhibitors of the AIF–DNA interaction, and through this screen aurin tricarboxylic acid was identified as the first in vitro inhibitor of AIF. The generality and simplicity of the photonic crystal method should enable this technology to find broad utility for identification of compounds that inhibit protein–DNA binding. PMID:18582039

  5. The Widening Gulf between Genomics Data Generation and Consumption: A Practical Guide to Big Data Transfer Technology

    PubMed Central

    Feltus, Frank A.; Breen, Joseph R.; Deng, Juan; Izard, Ryan S.; Konger, Christopher A.; Ligon, Walter B.; Preuss, Don; Wang, Kuang-Ching

    2015-01-01

    In the last decade, high-throughput DNA sequencing has become a disruptive technology and pushed the life sciences into a distributed ecosystem of sequence data producers and consumers. Given the power of genomics and declining sequencing costs, biology is an emerging “Big Data” discipline that will soon enter the exabyte data range when all subdisciplines are combined. These datasets must be transferred across commercial and research networks in creative ways since sending data without thought can have serious consequences on data processing time frames. Thus, it is imperative that biologists, bioinformaticians, and information technology engineers recalibrate data processing paradigms to fit this emerging reality. This review attempts to provide a snapshot of Big Data transfer across networks, which is often overlooked by many biologists. Specifically, we discuss four key areas: 1) data transfer networks, protocols, and applications; 2) data transfer security including encryption, access, firewalls, and the Science DMZ; 3) data flow control with software-defined networking; and 4) data storage, staging, archiving and access. A primary intention of this article is to orient the biologist in key aspects of the data transfer process in order to frame their genomics-oriented needs to enterprise IT professionals. PMID:26568680

  6. FRESCO: Referential compression of highly similar sequences.

    PubMed

    Wandelt, Sebastian; Leser, Ulf

    2013-01-01

    In many applications, sets of similar texts or sequences are of high importance. Prominent examples are revision histories of documents or genomic sequences. Modern high-throughput sequencing technologies are able to generate DNA sequences at an ever-increasing rate. In parallel to the decreasing experimental time and cost necessary to produce DNA sequences, computational requirements for analysis and storage of the sequences are steeply increasing. Compression is a key technology to deal with this challenge. Recently, referential compression schemes, storing only the differences between a to-be-compressed input and a known reference sequence, gained a lot of interest in this field. In this paper, we propose a general open-source framework to compress large amounts of biological sequence data called Framework for REferential Sequence COmpression (FRESCO). Our basic compression algorithm is shown to be one to two orders of magnitudes faster than comparable related work, while achieving similar compression ratios. We also propose several techniques to further increase compression ratios, while still retaining the advantage in speed: 1) selecting a good reference sequence; and 2) rewriting a reference sequence to allow for better compression. In addition,we propose a new way of further boosting the compression ratios by applying referential compression to already referentially compressed files (second-order compression). This technique allows for compression ratios way beyond state of the art, for instance,4,000:1 and higher for human genomes. We evaluate our algorithms on a large data set from three different species (more than 1,000 genomes, more than 3 TB) and on a collection of versions of Wikipedia pages. Our results show that real-time compression of highly similar sequences at high compression ratios is possible on modern hardware.

  7. Magnetic bead purification of labeled DNA fragments forhigh-throughput capillary electrophoresis sequencing

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Elkin, Christopher; Kapur, Hitesh; Smith, Troy

    2001-09-15

    We have developed an automated purification method for terminator sequencing products based on a magnetic bead technology. This 384-well protocol generates labeled DNA fragments that are essentially free of contaminates for less than $0.005 per reaction. In comparison to laborious ethanol precipitation protocols, this method increases the phred20 read length by forty bases with various DNA templates such as PCR fragments, Plasmids, Cosmids and RCA products. Our method eliminates centrifugation and is compatible with both the MegaBACE 1000 and ABIPrism 3700 capillary instruments. As of September 2001, this method has produced over 1.6 million samples with 93 percent averaging 620more » phred20 bases as part of Joint Genome Institutes Production Process.« less

  8. Development of an ELA-DRA gene typing method based on pyrosequencing technology.

    PubMed

    Díaz, S; Echeverría, M G; It, V; Posik, D M; Rogberg-Muñoz, A; Pena, N L; Peral-García, P; Vega-Pla, J L; Giovambattista, G

    2008-11-01

    The polymorphism of equine lymphocyte antigen (ELA) class II DRA gene had been detected by polymerase chain reaction-single-strand conformational polymorphism (PCR-SSCP) and reference strand-mediated conformation analysis. These methodologies allowed to identify 11 ELA-DRA exon 2 sequences, three of which are widely distributed among domestic horse breeds. Herein, we describe the development of a pyrosequencing-based method applicable to ELA-DRA typing, by screening samples from eight different horse breeds previously typed by PCR-SSCP. This sequence-based method would be useful in high-throughput genotyping of major histocompatibility complex genes in horses and other animal species, making this system interesting as a rapid screening method for animal genotyping of immune-related genes.

  9. High-throughput analysis of the protein sequence-stability landscape using a quantitative "yeast surface two-hybrid" system and fragment reconstitution

    PubMed Central

    Dutta, Sanjib; Koide, Akiko; Koide, Shohei

    2008-01-01

    Stability evaluation of many mutants can lead to a better understanding of the sequence determinants of a structural motif and of factors governing protein stability and protein evolution. The traditional biophysical analysis of protein stability is low throughput, limiting our ability to widely explore the sequence space in a quantitative manner. In this study, we have developed a high-throughput library screening method for quantifying stability changes, which is based on protein fragment reconstitution and yeast surface display. Our method exploits the thermodynamic linkage between protein stability and fragment reconstitution and the ability of the yeast surface display technique to quantitatively evaluate protein-protein interactions. The method was applied to a fibronectin type III (FN3) domain. Characterization of fragment reconstitution was facilitated by the co-expression of two FN3 fragments, thus establishing a "yeast surface two-hybrid" method. Importantly, our method does not rely on competition between clones and thus eliminates a common limitation of high-throughput selection methods in which the most stable variants are predominantly recovered. Thus, it allows for the isolation of sequences that exhibits a desired level of stability. We identified over one hundred unique sequences for a β-bulge motif, which was significantly more informative than natural sequences of the FN3 family in revealing the sequence determinants for the β-bulge. Our method provides a powerful means to rapidly assess stability of many variants, to systematically assess contribution of different factors to protein stability and to enhance protein stability. PMID:18674545

  10. eRNA: a graphic user interface-based tool optimized for large data analysis from high-throughput RNA sequencing

    PubMed Central

    2014-01-01

    Background RNA sequencing (RNA-seq) is emerging as a critical approach in biological research. However, its high-throughput advantage is significantly limited by the capacity of bioinformatics tools. The research community urgently needs user-friendly tools to efficiently analyze the complicated data generated by high throughput sequencers. Results We developed a standalone tool with graphic user interface (GUI)-based analytic modules, known as eRNA. The capacity of performing parallel processing and sample management facilitates large data analyses by maximizing hardware usage and freeing users from tediously handling sequencing data. The module miRNA identification” includes GUIs for raw data reading, adapter removal, sequence alignment, and read counting. The module “mRNA identification” includes GUIs for reference sequences, genome mapping, transcript assembling, and differential expression. The module “Target screening” provides expression profiling analyses and graphic visualization. The module “Self-testing” offers the directory setups, sample management, and a check for third-party package dependency. Integration of other GUIs including Bowtie, miRDeep2, and miRspring extend the program’s functionality. Conclusions eRNA focuses on the common tools required for the mapping and quantification analysis of miRNA-seq and mRNA-seq data. The software package provides an additional choice for scientists who require a user-friendly computing environment and high-throughput capacity for large data analysis. eRNA is available for free download at https://sourceforge.net/projects/erna/?source=directory. PMID:24593312

  11. eRNA: a graphic user interface-based tool optimized for large data analysis from high-throughput RNA sequencing.

    PubMed

    Yuan, Tiezheng; Huang, Xiaoyi; Dittmar, Rachel L; Du, Meijun; Kohli, Manish; Boardman, Lisa; Thibodeau, Stephen N; Wang, Liang

    2014-03-05

    RNA sequencing (RNA-seq) is emerging as a critical approach in biological research. However, its high-throughput advantage is significantly limited by the capacity of bioinformatics tools. The research community urgently needs user-friendly tools to efficiently analyze the complicated data generated by high throughput sequencers. We developed a standalone tool with graphic user interface (GUI)-based analytic modules, known as eRNA. The capacity of performing parallel processing and sample management facilitates large data analyses by maximizing hardware usage and freeing users from tediously handling sequencing data. The module miRNA identification" includes GUIs for raw data reading, adapter removal, sequence alignment, and read counting. The module "mRNA identification" includes GUIs for reference sequences, genome mapping, transcript assembling, and differential expression. The module "Target screening" provides expression profiling analyses and graphic visualization. The module "Self-testing" offers the directory setups, sample management, and a check for third-party package dependency. Integration of other GUIs including Bowtie, miRDeep2, and miRspring extend the program's functionality. eRNA focuses on the common tools required for the mapping and quantification analysis of miRNA-seq and mRNA-seq data. The software package provides an additional choice for scientists who require a user-friendly computing environment and high-throughput capacity for large data analysis. eRNA is available for free download at https://sourceforge.net/projects/erna/?source=directory.

  12. ContEst16S: an algorithm that identifies contaminated prokaryotic genomes using 16S RNA gene sequences.

    PubMed

    Lee, Imchang; Chalita, Mauricio; Ha, Sung-Min; Na, Seong-In; Yoon, Seok-Hwan; Chun, Jongsik

    2017-06-01

    Thanks to the recent advancement of DNA sequencing technology, the cost and time of prokaryotic genome sequencing have been dramatically decreased. It has repeatedly been reported that genome sequencing using high-throughput next-generation sequencing is prone to contaminations due to its high depth of sequencing coverage. Although a few bioinformatics tools are available to detect potential contaminations, these have inherited limitations as they only use protein-coding genes. Here we introduce a new algorithm, called ContEst16S, to detect potential contaminations using 16S rRNA genes from genome assemblies. We screened 69 745 prokaryotic genomes from the NCBI Assembly Database using ContEst16S and found that 594 were contaminated by bacteria, human and plants. Of the predicted contaminated genomes, 8 % were not predicted by the existing protein-coding gene-based tool, implying that both methods can be complementary in the detection of contaminations. A web-based service of the algorithm is available at www.ezbiocloud.net/tools/contest16s.

  13. Whole exome sequencing: a state-of-the-art approach for defining (and exploring!) genetic landscapes in pediatric nephrology.

    PubMed

    Gulati, Ashima; Somlo, Stefan

    2018-05-01

    The genesis of whole exome sequencing as a powerful tool for detailing the protein coding sequence of the human genome was conceptualized based on the availability of next-generation sequencing technology and knowledge of the human reference genome. The field of pediatric nephrology enriched with molecularly unsolved phenotypes is allowing the clinical and research application of whole exome sequencing to enable novel gene discovery and provide amendment of phenotypic misclassification. Recent studies in the field have informed us that newer high-throughput sequencing techniques are likely to be of high yield when applied in conjunction with conventional genomic approaches such as linkage analysis and other strategies used to focus subsequent analysis. They have also emphasized the need for the validation of novel genetic findings in large collaborative cohorts and the production of robust corroborative biological data. The well-structured application of comprehensive genomic testing in clinical and research arenas will hopefully continue to advance patient care and precision medicine, but does call for attention to be paid to its integrated challenges.

  14. Towards decoding the conifer giga-genome.

    PubMed

    Mackay, John; Dean, Jeffrey F D; Plomion, Christophe; Peterson, Daniel G; Cánovas, Francisco M; Pavy, Nathalie; Ingvarsson, Pär K; Savolainen, Outi; Guevara, M Ángeles; Fluch, Silvia; Vinceti, Barbara; Abarca, Dolores; Díaz-Sala, Carmen; Cervera, María-Teresa

    2012-12-01

    Several new initiatives have been launched recently to sequence conifer genomes including pines, spruces and Douglas-fir. Owing to the very large genome sizes ranging from 18 to 35 gigabases, sequencing even a single conifer genome had been considered unattainable until the recent throughput increases and cost reductions afforded by next generation sequencers. The purpose of this review is to describe the context for these new initiatives. A knowledge foundation has been acquired in several conifers of commercial and ecological interest through large-scale cDNA analyses, construction of genetic maps and gene mapping studies aiming to link phenotype and genotype. Exploratory sequencing in pines and spruces have pointed out some of the unique properties of these giga-genomes and suggested strategies that may be needed to extract value from their sequencing. The hope is that recent and pending developments in sequencing technology will contribute to rapidly filling the knowledge vacuum surrounding their structure, contents and evolution. Researchers are also making plans to use comparative analyses that will help to turn the data into a valuable resource for enhancing and protecting the world's conifer forests.

  15. Fundamental Bounds for Sequence Reconstruction from Nanopore Sequencers.

    PubMed

    Magner, Abram; Duda, Jarosław; Szpankowski, Wojciech; Grama, Ananth

    2016-06-01

    Nanopore sequencers are emerging as promising new platforms for high-throughput sequencing. As with other technologies, sequencer errors pose a major challenge for their effective use. In this paper, we present a novel information theoretic analysis of the impact of insertion-deletion (indel) errors in nanopore sequencers. In particular, we consider the following problems: (i) for given indel error characteristics and rate, what is the probability of accurate reconstruction as a function of sequence length; (ii) using replicated extrusion (the process of passing a DNA strand through the nanopore), what is the number of replicas needed to accurately reconstruct the true sequence with high probability? Our results provide a number of important insights: (i) the probability of accurate reconstruction of a sequence from a single sample in the presence of indel errors tends quickly (i.e., exponentially) to zero as the length of the sequence increases; and (ii) replicated extrusion is an effective technique for accurate reconstruction. We show that for typical distributions of indel errors, the required number of replicas is a slow function (polylogarithmic) of sequence length - implying that through replicated extrusion, we can sequence large reads using nanopore sequencers. Moreover, we show that in certain cases, the required number of replicas can be related to information-theoretic parameters of the indel error distributions.

  16. Mapping of disease-associated variants in admixed populations

    PubMed Central

    2011-01-01

    Recent developments in high-throughput genotyping and whole-genome sequencing will enhance the identification of disease loci in admixed populations. We discuss how a more refined estimation of ancestry benefits both admixture mapping and association mapping, making disease loci identification in admixed populations more powerful. High-throughput genotyping and sequencing will enable refined estimation of ancestry, thus enhancing disease loci identification in admixed populations PMID:21635713

  17. A new fungal large subunit ribosomal RNA primer for high throughput sequencing surveys

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Mueller, Rebecca C.; Gallegos-Graves, La Verne; Kuske, Cheryl R.

    The inclusion of phylogenetic metrics in community ecology has provided insights into important ecological processes, particularly when combined with high-throughput sequencing methods; however, these approaches have not been widely used in studies of fungal communities relative to other microbial groups. Two obstacles have been considered: (1) the internal transcribed spacer (ITS) region has limited utility for constructing phylogenies and (2) most PCR primers that target the large subunit (LSU) ribosomal unit generate amplicons that exceed current limits of high-throughput sequencing platforms. We designed and tested a PCR primer (LR22R) to target approximately 300–400 bp region of the D2 hypervariable regionmore » of the fungal LSU for use with the Illumina MiSeq platform. Both in silico and empirical analyses showed that the LR22R–LR3 pair captured a broad range of fungal taxonomic groups with a small fraction of non-fungal groups. Phylogenetic placement of publically available LSU D2 sequences showed broad agreement with taxonomic classification. Comparisons of the LSU D2 and the ITS2 ribosomal regions from environmental samples and known communities showed similar discriminatory abilities of the two primer sets. Altogether, these findings show that the LR22R–LR3 primer pair has utility for phylogenetic analyses of fungal communities using high-throughput sequencing methods.« less

  18. A new fungal large subunit ribosomal RNA primer for high throughput sequencing surveys

    DOE PAGES

    Mueller, Rebecca C.; Gallegos-Graves, La Verne; Kuske, Cheryl R.

    2015-12-09

    The inclusion of phylogenetic metrics in community ecology has provided insights into important ecological processes, particularly when combined with high-throughput sequencing methods; however, these approaches have not been widely used in studies of fungal communities relative to other microbial groups. Two obstacles have been considered: (1) the internal transcribed spacer (ITS) region has limited utility for constructing phylogenies and (2) most PCR primers that target the large subunit (LSU) ribosomal unit generate amplicons that exceed current limits of high-throughput sequencing platforms. We designed and tested a PCR primer (LR22R) to target approximately 300–400 bp region of the D2 hypervariable regionmore » of the fungal LSU for use with the Illumina MiSeq platform. Both in silico and empirical analyses showed that the LR22R–LR3 pair captured a broad range of fungal taxonomic groups with a small fraction of non-fungal groups. Phylogenetic placement of publically available LSU D2 sequences showed broad agreement with taxonomic classification. Comparisons of the LSU D2 and the ITS2 ribosomal regions from environmental samples and known communities showed similar discriminatory abilities of the two primer sets. Altogether, these findings show that the LR22R–LR3 primer pair has utility for phylogenetic analyses of fungal communities using high-throughput sequencing methods.« less

  19. The (in)complete organelle genome: exploring the use and nonuse of available technologies for characterizing mitochondrial and plastid chromosomes.

    PubMed

    Sanitá Lima, Matheus; Woods, Laura C; Cartwright, Matthew W; Smith, David Roy

    2016-11-01

    Not long ago, scientists paid dearly in time, money and skill for every nucleotide that they sequenced. Today, DNA sequencing technologies epitomize the slogan 'faster, easier, cheaper and more', and in many ways, sequencing an entire genome has become routine, even for the smallest laboratory groups. This is especially true for mitochondrial and plastid genomes. Given their relatively small sizes and high copy numbers per cell, organelle DNAs are currently among the most highly sequenced kind of chromosome. But accurately characterizing an organelle genome and the information it encodes can require much more than DNA sequencing and bioinformatics analyses. Organelle genomes can be surprisingly complex and can exhibit convoluted and unconventional modes of gene expression. Unravelling this complexity can demand a wide assortment of experiments, from pulsed-field gel electrophoresis to Southern and Northern blots to RNA analyses. Here, we show that it is exactly these types of 'complementary' analyses that are often lacking from contemporary organelle genome papers, particularly short 'genome announcement' articles. Consequently, crucial and interesting features of organelle chromosomes are going undescribed, which could ultimately lead to a poor understanding and even a misrepresentation of these genomes and the genes they express. High-throughput sequencing and bioinformatics have made it easy to sequence and assemble entire chromosomes, but they should not be used as a substitute for or at the expense of other types of genomic characterization methods. © 2016 The Authors. Molecular Ecology Resources Published by John Wiley & Sons Ltd.

  20. Profiling of drought-responsive microRNA and mRNA in tomato using high-throughput sequencing.

    PubMed

    Liu, Minmin; Yu, Huiyang; Zhao, Gangjun; Huang, Qiufeng; Lu, Yongen; Ouyang, Bo

    2017-06-26

    Abiotic stresses cause severe loss of crop production. Among them, drought is one of the most frequent environmental stresses, which limits crop growth, development and productivity. Plant drought tolerance is fine-tuned by a complex gene regulatory network. Understanding the molecular regulation of this polygenic trait is crucial for the eventual success to improve plant yield and quality. Recent studies have demonstrated that microRNAs play critical roles in plant drought tolerance. However, little is known about the microRNA in drought response of the model plant tomato. Here, we described the profiling of drought-responsive microRNA and mRNA in tomato using high-throughput next-generation sequencing. Drought stress was applied on the seedlings of M82, a drought-sensitive cultivated tomato genotype, and IL9-1, a drought-tolerant introgression line derived from the stress-resistant wild species Solanum pennellii LA0716 and M82. Under drought, IL9-1 performed superior than M82 regarding survival rate, H 2 O 2 elimination and leaf turgor maintenance. A total of four small RNA and eight mRNA libraries were constructed and sequenced using Illumina sequencing technology. 105 conserved and 179 novel microRNAs were identified, among them, 54 and 98 were differentially expressed upon drought stress, respectively. The majority of the differentially-expressed conserved microRNAs was up-regulated in IL9-1 whereas down-regulated in M82. Under drought stress, 2714 and 1161 genes were found to be differentially expressed in M82 and IL9-1, respectively, and many of their homologues are involved in plant stress, such as genes encoding transcription factor and protein kinase. Various pathways involved in abiotic stress were revealed by Gene Ontology and pathway analysis. The mRNA sequencing results indicated that most of the target genes were regulated by their corresponding microRNAs, which suggested that microRNAs may play essential roles in the drought tolerance of tomato. In this study, numerous microRNAs and mRNAs involved in the drought response of tomato were identified using high-throughput sequencing, which will provide new insights into the complex regulatory network of plant adaption to drought stress. This work will also help to exploit new players functioning in plant drought-stress tolerance.

  1. Error correction and statistical analyses for intra-host comparisons of feline immunodeficiency virus diversity from high-throughput sequencing data.

    PubMed

    Liu, Yang; Chiaromonte, Francesca; Ross, Howard; Malhotra, Raunaq; Elleder, Daniel; Poss, Mary

    2015-06-30

    Infection with feline immunodeficiency virus (FIV) causes an immunosuppressive disease whose consequences are less severe if cats are co-infected with an attenuated FIV strain (PLV). We use virus diversity measurements, which reflect replication ability and the virus response to various conditions, to test whether diversity of virulent FIV in lymphoid tissues is altered in the presence of PLV. Our data consisted of the 3' half of the FIV genome from three tissues of animals infected with FIV alone, or with FIV and PLV, sequenced by 454 technology. Since rare variants dominate virus populations, we had to carefully distinguish sequence variation from errors due to experimental protocols and sequencing. We considered an exponential-normal convolution model used for background correction of microarray data, and modified it to formulate an error correction approach for minor allele frequencies derived from high-throughput sequencing. Similar to accounting for over-dispersion in counts, this accounts for error-inflated variability in frequencies - and quite effectively reproduces empirically observed distributions. After obtaining error-corrected minor allele frequencies, we applied ANalysis Of VAriance (ANOVA) based on a linear mixed model and found that conserved sites and transition frequencies in FIV genes differ among tissues of dual and single infected cats. Furthermore, analysis of minor allele frequencies at individual FIV genome sites revealed 242 sites significantly affected by infection status (dual vs. single) or infection status by tissue interaction. All together, our results demonstrated a decrease in FIV diversity in bone marrow in the presence of PLV. Importantly, these effects were weakened or undetectable when error correction was performed with other approaches (thresholding of minor allele frequencies; probabilistic clustering of reads). We also queried the data for cytidine deaminase activity on the viral genome, which causes an asymmetric increase in G to A substitutions, but found no evidence for this host defense strategy. Our error correction approach for minor allele frequencies (more sensitive and computationally efficient than other algorithms) and our statistical treatment of variation (ANOVA) were critical for effective use of high-throughput sequencing data in understanding viral diversity. We found that co-infection with PLV shifts FIV diversity from bone marrow to lymph node and spleen.

  2. SNP discovery in the bovine milk transcriptome using RNA-Seq technology.

    PubMed

    Cánovas, Angela; Rincon, Gonzalo; Islas-Trejo, Alma; Wickramasinghe, Saumya; Medrano, Juan F

    2010-12-01

    High-throughput sequencing of RNA (RNA-Seq) was developed primarily to analyze global gene expression in different tissues. However, it also is an efficient way to discover coding SNPs. The objective of this study was to perform a SNP discovery analysis in the milk transcriptome using RNA-Seq. Seven milk samples from Holstein cows were analyzed by sequencing cDNAs using the Illumina Genome Analyzer system. We detected 19,175 genes expressed in milk samples corresponding to approximately 70% of the total number of genes analyzed. The SNP detection analysis revealed 100,734 SNPs in Holstein samples, and a large number of those corresponded to differences between the Holstein breed and the Hereford bovine genome assembly Btau4.0. The number of polymorphic SNPs within Holstein cows was 33,045. The accuracy of RNA-Seq SNP discovery was tested by comparing SNPs detected in a set of 42 candidate genes expressed in milk that had been resequenced earlier using Sanger sequencing technology. Seventy of 86 SNPs were detected using both RNA-Seq and Sanger sequencing technologies. The KASPar Genotyping System was used to validate unique SNPs found by RNA-Seq but not observed by Sanger technology. Our results confirm that analyzing the transcriptome using RNA-Seq technology is an efficient and cost-effective method to identify SNPs in transcribed regions. This study creates guidelines to maximize the accuracy of SNP discovery and prevention of false-positive SNP detection, and provides more than 33,000 SNPs located in coding regions of genes expressed during lactation that can be used to develop genotyping platforms to perform marker-trait association studies in Holstein cattle.

  3. High-throughput microsatellite genotyping in ecology: improved accuracy, efficiency, standardization and success with low-quantity and degraded DNA.

    PubMed

    De Barba, M; Miquel, C; Lobréaux, S; Quenette, P Y; Swenson, J E; Taberlet, P

    2017-05-01

    Microsatellite markers have played a major role in ecological, evolutionary and conservation research during the past 20 years. However, technical constrains related to the use of capillary electrophoresis and a recent technological revolution that has impacted other marker types have brought to question the continued use of microsatellites for certain applications. We present a study for improving microsatellite genotyping in ecology using high-throughput sequencing (HTS). This approach entails selection of short markers suitable for HTS, sequencing PCR-amplified microsatellites on an Illumina platform and bioinformatic treatment of the sequence data to obtain multilocus genotypes. It takes advantage of the fact that HTS gives direct access to microsatellite sequences, allowing unambiguous allele identification and enabling automation of the genotyping process through bioinformatics. In addition, the massive parallel sequencing abilities expand the information content of single experimental runs far beyond capillary electrophoresis. We illustrated the method by genotyping brown bear samples amplified with a multiplex PCR of 13 new microsatellite markers and a sex marker. HTS of microsatellites provided accurate individual identification and parentage assignment and resulted in a significant improvement of genotyping success (84%) of faecal degraded DNA and costs reduction compared to capillary electrophoresis. The HTS approach holds vast potential for improving success, accuracy, efficiency and standardization of microsatellite genotyping in ecological and conservation applications, especially those that rely on profiling of low-quantity/quality DNA and on the construction of genetic databases. We discuss and give perspectives for the implementation of the method in the light of the challenges encountered in wildlife studies. © 2016 John Wiley & Sons Ltd.

  4. Application of high-throughput sequencing to whole rabies viral genome characterisation and its use for phylogenetic re-evaluation of a raccoon strain incursion into the province of Ontario.

    PubMed

    Nadin-Davis, Susan A; Colville, Adam; Trewby, Hannah; Biek, Roman; Real, Leslie

    2017-03-15

    Raccoon rabies remains a serious public health problem throughout much of the eastern seaboard of North America due to the urban nature of the reservoir host and the many challenges inherent in multi-jurisdictional efforts to administer co-ordinated and comprehensive wildlife rabies control programmes. Better understanding of the mechanisms of spread of rabies virus can play a significant role in guiding such control efforts. To facilitate a detailed molecular epidemiological study of raccoon rabies virus movements across eastern North America, we developed a methodology to efficiently determine whole genome sequences of hundreds of viral samples. The workflow combines the generation of a limited number of overlapping amplicons covering the complete viral genome and use of high throughput sequencing technology. The value of this approach is demonstrated through a retrospective phylogenetic analysis of an outbreak of raccoon rabies which occurred in the province of Ontario between 1999 and 2005. As demonstrated by the number of single nucleotide polymorphisms detected, whole genome sequence data were far more effective than single gene sequences in discriminating between samples and this facilitated the generation of more robust and informative phylogenies that yielded insights into the spatio-temporal pattern of viral spread. With minor modification this approach could be applied to other rabies virus variants thereby facilitating greatly improved phylogenetic inference and thus better understanding of the spread of this serious zoonotic disease. Such information will inform the most appropriate strategies for rabies control in wildlife reservoirs. Crown Copyright © 2017. Published by Elsevier B.V. All rights reserved.

  5. LightAssembler: fast and memory-efficient assembly algorithm for high-throughput sequencing reads.

    PubMed

    El-Metwally, Sara; Zakaria, Magdi; Hamza, Taher

    2016-11-01

    The deluge of current sequenced data has exceeded Moore's Law, more than doubling every 2 years since the next-generation sequencing (NGS) technologies were invented. Accordingly, we will able to generate more and more data with high speed at fixed cost, but lack the computational resources to store, process and analyze it. With error prone high throughput NGS reads and genomic repeats, the assembly graph contains massive amount of redundant nodes and branching edges. Most assembly pipelines require this large graph to reside in memory to start their workflows, which is intractable for mammalian genomes. Resource-efficient genome assemblers combine both the power of advanced computing techniques and innovative data structures to encode the assembly graph efficiently in a computer memory. LightAssembler is a lightweight assembly algorithm designed to be executed on a desktop machine. It uses a pair of cache oblivious Bloom filters, one holding a uniform sample of [Formula: see text]-spaced sequenced [Formula: see text]-mers and the other holding [Formula: see text]-mers classified as likely correct, using a simple statistical test. LightAssembler contains a light implementation of the graph traversal and simplification modules that achieves comparable assembly accuracy and contiguity to other competing tools. Our method reduces the memory usage by [Formula: see text] compared to the resource-efficient assemblers using benchmark datasets from GAGE and Assemblathon projects. While LightAssembler can be considered as a gap-based sequence assembler, different gap sizes result in an almost constant assembly size and genome coverage. https://github.com/SaraEl-Metwally/LightAssembler CONTACT: sarah_almetwally4@mans.edu.egSupplementary information: Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  6. Evolutionary biology through the lens of budding yeast comparative genomics.

    PubMed

    Marsit, Souhir; Leducq, Jean-Baptiste; Durand, Éléonore; Marchant, Axelle; Filteau, Marie; Landry, Christian R

    2017-10-01

    The budding yeast Saccharomyces cerevisiae is a highly advanced model system for studying genetics, cell biology and systems biology. Over the past decade, the application of high-throughput sequencing technologies to this species has contributed to this yeast also becoming an important model for evolutionary genomics. Indeed, comparative genomic analyses of laboratory, wild and domesticated yeast populations are providing unprecedented detail about many of the processes that govern evolution, including long-term processes, such as reproductive isolation and speciation, and short-term processes, such as adaptation to natural and domestication-related environments.

  7. NGSANE: a lightweight production informatics framework for high-throughput data analysis.

    PubMed

    Buske, Fabian A; French, Hugh J; Smith, Martin A; Clark, Susan J; Bauer, Denis C

    2014-05-15

    The initial steps in the analysis of next-generation sequencing data can be automated by way of software 'pipelines'. However, individual components depreciate rapidly because of the evolving technology and analysis methods, often rendering entire versions of production informatics pipelines obsolete. Constructing pipelines from Linux bash commands enables the use of hot swappable modular components as opposed to the more rigid program call wrapping by higher level languages, as implemented in comparable published pipelining systems. Here we present Next Generation Sequencing ANalysis for Enterprises (NGSANE), a Linux-based, high-performance-computing-enabled framework that minimizes overhead for set up and processing of new projects, yet maintains full flexibility of custom scripting when processing raw sequence data. Ngsane is implemented in bash and publicly available under BSD (3-Clause) licence via GitHub at https://github.com/BauerLab/ngsane. Denis.Bauer@csiro.au Supplementary data are available at Bioinformatics online.

  8. Using Informatics-, Bioinformatics- and Genomics-Based Approaches for the Molecular Surveillance and Detection of Biothreat Agents

    NASA Astrophysics Data System (ADS)

    Seto, Donald

    The convergence and wealth of informatics, bioinformatics and genomics methods and associated resources allow a comprehensive and rapid approach for the surveillance and detection of bacterial and viral organisms. Coupled with the continuing race for the fastest, most cost-efficient and highest-quality DNA sequencing technology, that is, "next generation sequencing", the detection of biological threat agents by `cheaper and faster' means is possible. With the application of improved bioinformatic tools for the understanding of these genomes and for parsing unique pathogen genome signatures, along with `state-of-the-art' informatics which include faster computational methods, equipment and databases, it is feasible to apply new algorithms to biothreat agent detection. Two such methods are high-throughput DNA sequencing-based and resequencing microarray-based identification. These are illustrated and validated by two examples involving human adenoviruses, both from real-world test beds.

  9. Methods, Tools and Current Perspectives in Proteogenomics *

    PubMed Central

    Ruggles, Kelly V.; Krug, Karsten; Wang, Xiaojing; Clauser, Karl R.; Wang, Jing; Payne, Samuel H.; Fenyö, David; Zhang, Bing; Mani, D. R.

    2017-01-01

    With combined technological advancements in high-throughput next-generation sequencing and deep mass spectrometry-based proteomics, proteogenomics, i.e. the integrative analysis of proteomic and genomic data, has emerged as a new research field. Early efforts in the field were focused on improving protein identification using sample-specific genomic and transcriptomic sequencing data. More recently, integrative analysis of quantitative measurements from genomic and proteomic studies have identified novel insights into gene expression regulation, cell signaling, and disease. Many methods and tools have been developed or adapted to enable an array of integrative proteogenomic approaches and in this article, we systematically classify published methods and tools into four major categories, (1) Sequence-centric proteogenomics; (2) Analysis of proteogenomic relationships; (3) Integrative modeling of proteogenomic data; and (4) Data sharing and visualization. We provide a comprehensive review of methods and available tools in each category and highlight their typical applications. PMID:28456751

  10. Prediction of novel pre-microRNAs with high accuracy through boosting and SVM.

    PubMed

    Zhang, Yuanwei; Yang, Yifan; Zhang, Huan; Jiang, Xiaohua; Xu, Bo; Xue, Yu; Cao, Yunxia; Zhai, Qian; Zhai, Yong; Xu, Mingqing; Cooke, Howard J; Shi, Qinghua

    2011-05-15

    High-throughput deep-sequencing technology has generated an unprecedented number of expressed short sequence reads, presenting not only an opportunity but also a challenge for prediction of novel microRNAs. To verify the existence of candidate microRNAs, we have to show that these short sequences can be processed from candidate pre-microRNAs. However, it is laborious and time consuming to verify these using existing experimental techniques. Therefore, here, we describe a new method, miRD, which is constructed using two feature selection strategies based on support vector machines (SVMs) and boosting method. It is a high-efficiency tool for novel pre-microRNA prediction with accuracy up to 94.0% among different species. miRD is implemented in PHP/PERL+MySQL+R and can be freely accessed at http://mcg.ustc.edu.cn/rpg/mird/mird.php.

  11. Screening for SNPs with Allele-Specific Methylation based on Next-Generation Sequencing Data

    PubMed Central

    Hu, Bo; Xu, Yaomin

    2013-01-01

    Allele-specific methylation (ASM) has long been studied but mainly documented in the context of genomic imprinting and X chromosome inactivation. Taking advantage of the next-generation sequencing technology, we conduct a high-throughput sequencing experiment with four prostate cell lines to survey the whole genome and identify single nucleotide polymorphisms (SNPs) with ASM. A Bayesian approach is proposed to model the counts of short reads for each SNP conditional on its genotypes of multiple subjects, leading to a posterior probability of ASM. We flag SNPs with high posterior probabilities of ASM by accounting for multiple comparisons based on posterior false discovery rates. Applying the Bayesian approach to the in-house prostate cell line data, we identify 269 SNPs as candidates of ASM. A simulation study is carried out to demonstrate the quantitative performance of the proposed approach. PMID:23710259

  12. High-throughput sequencing of forensic genetic samples using punches of FTA cards with buccal swabs.

    PubMed

    Kampmann, Marie-Louise; Buchard, Anders; Børsting, Claus; Morling, Niels

    2016-01-01

    Here, we demonstrate that punches from buccal swab samples preserved on FTA cards can be used for high-throughput DNA sequencing, also known as massively parallel sequencing (MPS). We typed 44 reference samples with the HID-Ion AmpliSeq Identity Panel using washed 1.2 mm punches from FTA cards with buccal swabs and compared the results with those obtained with DNA extracted using the EZ1 DNA Investigator Kit. Concordant profiles were obtained for all samples. Our protocol includes simple punch, wash, and PCR steps, reducing cost and hands-on time in the laboratory. Furthermore, it facilitates automation of DNA sequencing.

  13. Pair-barcode high-throughput sequencing for large-scale multiplexed sample analysis

    PubMed Central

    2012-01-01

    Background The multiplexing becomes the major limitation of the next-generation sequencing (NGS) in application to low complexity samples. Physical space segregation allows limited multiplexing, while the existing barcode approach only permits simultaneously analysis of up to several dozen samples. Results Here we introduce pair-barcode sequencing (PBS), an economic and flexible barcoding technique that permits parallel analysis of large-scale multiplexed samples. In two pilot runs using SOLiD sequencer (Applied Biosystems Inc.), 32 independent pair-barcoded miRNA libraries were simultaneously discovered by the combination of 4 unique forward barcodes and 8 unique reverse barcodes. Over 174,000,000 reads were generated and about 64% of them are assigned to both of the barcodes. After mapping all reads to pre-miRNAs in miRBase, different miRNA expression patterns are captured from the two clinical groups. The strong correlation using different barcode pairs and the high consistency of miRNA expression in two independent runs demonstrates that PBS approach is valid. Conclusions By employing PBS approach in NGS, large-scale multiplexed pooled samples could be practically analyzed in parallel so that high-throughput sequencing economically meets the requirements of samples which are low sequencing throughput demand. PMID:22276739

  14. Pair-barcode high-throughput sequencing for large-scale multiplexed sample analysis.

    PubMed

    Tu, Jing; Ge, Qinyu; Wang, Shengqin; Wang, Lei; Sun, Beili; Yang, Qi; Bai, Yunfei; Lu, Zuhong

    2012-01-25

    The multiplexing becomes the major limitation of the next-generation sequencing (NGS) in application to low complexity samples. Physical space segregation allows limited multiplexing, while the existing barcode approach only permits simultaneously analysis of up to several dozen samples. Here we introduce pair-barcode sequencing (PBS), an economic and flexible barcoding technique that permits parallel analysis of large-scale multiplexed samples. In two pilot runs using SOLiD sequencer (Applied Biosystems Inc.), 32 independent pair-barcoded miRNA libraries were simultaneously discovered by the combination of 4 unique forward barcodes and 8 unique reverse barcodes. Over 174,000,000 reads were generated and about 64% of them are assigned to both of the barcodes. After mapping all reads to pre-miRNAs in miRBase, different miRNA expression patterns are captured from the two clinical groups. The strong correlation using different barcode pairs and the high consistency of miRNA expression in two independent runs demonstrates that PBS approach is valid. By employing PBS approach in NGS, large-scale multiplexed pooled samples could be practically analyzed in parallel so that high-throughput sequencing economically meets the requirements of samples which are low sequencing throughput demand.

  15. High Diversity of Myocyanophage in Various Aquatic Environments Revealed by High-Throughput Sequencing of Major Capsid Protein Gene With a New Set of Primers.

    PubMed

    Hou, Weiguo; Wang, Shang; Briggs, Brandon R; Li, Gaoyuan; Xie, Wei; Dong, Hailiang

    2018-01-01

    Myocyanophages, a group of viruses infecting cyanobacteria, are abundant and play important roles in elemental cycling. Here we investigated the particle-associated viral communities retained on 0.2 μm filters and in sediment samples (representing ancient cyanophage communities) from four ocean and three lake locations, using high-throughput sequencing and a newly designed primer pair targeting a gene fragment (∼145-bp in length) encoding the cyanophage gp23 major capsid protein (MCP). Diverse viral communities were detected in all samples. The fragments of 142-, 145-, and 148-bp in length were most abundant in the amplicons, and most sequences (>92%) belonged to cyanophages. Additionally, different sequencing depths resulted in different diversity estimates of the viral community. Operational taxonomic units obtained from deep sequencing of the MCP gene covered the majority of those obtained from shallow sequencing, suggesting that deep sequencing exhibited a more complete picture of cyanophage community than shallow sequencing. Our results also revealed a wide geographic distribution of marine myocyanophages, i.e., higher dissimilarities of the myocyanophage communities corresponded with the larger distances between the sampling sites. Collectively, this study suggests that the newly designed primer pair can be effectively used to study the community and diversity of myocyanophage from different environments, and the high-throughput sequencing represents a good method to understand viral diversity.

  16. High Diversity of Myocyanophage in Various Aquatic Environments Revealed by High-Throughput Sequencing of Major Capsid Protein Gene With a New Set of Primers

    PubMed Central

    Hou, Weiguo; Wang, Shang; Briggs, Brandon R.; Li, Gaoyuan; Xie, Wei; Dong, Hailiang

    2018-01-01

    Myocyanophages, a group of viruses infecting cyanobacteria, are abundant and play important roles in elemental cycling. Here we investigated the particle-associated viral communities retained on 0.2 μm filters and in sediment samples (representing ancient cyanophage communities) from four ocean and three lake locations, using high-throughput sequencing and a newly designed primer pair targeting a gene fragment (∼145-bp in length) encoding the cyanophage gp23 major capsid protein (MCP). Diverse viral communities were detected in all samples. The fragments of 142-, 145-, and 148-bp in length were most abundant in the amplicons, and most sequences (>92%) belonged to cyanophages. Additionally, different sequencing depths resulted in different diversity estimates of the viral community. Operational taxonomic units obtained from deep sequencing of the MCP gene covered the majority of those obtained from shallow sequencing, suggesting that deep sequencing exhibited a more complete picture of cyanophage community than shallow sequencing. Our results also revealed a wide geographic distribution of marine myocyanophages, i.e., higher dissimilarities of the myocyanophage communities corresponded with the larger distances between the sampling sites. Collectively, this study suggests that the newly designed primer pair can be effectively used to study the community and diversity of myocyanophage from different environments, and the high-throughput sequencing represents a good method to understand viral diversity.

  17. Synthetic biology to access and expand nature’s chemical diversity

    PubMed Central

    Smanski, Michael J.; Zhou, Hui; Claesen, Jan; Shen, Ben; Fischbach, Michael; Voigt, Christopher A.

    2016-01-01

    Bacterial genomes encode the biosynthetic potential to produce hundreds of thousands of complex molecules with diverse applications, from medicine to agriculture and materials. Economically accessing the potential encoded within sequenced genomes promises to reinvigorate waning drug discovery pipelines and provide novel routes to intricate chemicals. This is a tremendous undertaking, as the pathways often comprise dozens of genes spanning as much as 100+ kiliobases of DNA, are controlled by complex regulatory networks, and the most interesting molecules are made by non-model organisms. Advances in synthetic biology address these issues, including DNA construction technologies, genetic parts for precision expression control, synthetic regulatory circuits, computer aided design, and multiplexed genome engineering. Collectively, these technologies are moving towards an era when chemicals can be accessed en mass based on sequence information alone. This will enable the harnessing of metagenomic data and massive strain banks for high-throughput molecular discovery and, ultimately, the ability to forward design pathways to complex chemicals not found in nature. PMID:26876034

  18. New technology and resources for cryptococcal research

    PubMed Central

    Zhang, Nannan; Park, Yoon-Dong; Williamson, Peter R.

    2014-01-01

    Rapid advances in molecular biology and genome sequencing have enabled the generation of new technology and resources for cryptococcal research. RNAi-mediated specific gene knock down has become routine and more efficient by utilizing modified shRNA plasmids and convergent promoter RNAi constructs. This system was recently applied in a high-throughput screen to identify genes involved in host-pathogen interactions. Gene deletion efficiencies have also been improved by increasing rates of homologous recombination through a number of approaches, including a combination of double-joint PCR with split-marker transformation, the use of dominant selectable markers and the introduction of Cre-Loxp systems into Cryptococcus. Moreover, visualization of cryptococcal proteins has become more facile using fusions with codon-optimized fluorescent tags, such as green or red fluorescent proteins or, mCherry. Using recent genome-wide analytical tools, new transcriptional factors and regulatory proteins have been identified in novel virulence-related signaling pathways by employing microarray analysis, RNA-sequencing and proteomic analysis. PMID:25460849

  19. Epigenetic regulation of gene expression in cancer: techniques, resources and analysis

    PubMed Central

    Kagohara, Luciane T; Stein-O’Brien, Genevieve L; Kelley, Dylan; Flam, Emily; Wick, Heather C; Danilova, Ludmila V; Easwaran, Hariharan; Favorov, Alexander V; Qian, Jiang; Gaykalova, Daria A; Fertig, Elana J

    2018-01-01

    Abstract Cancer is a complex disease, driven by aberrant activity in numerous signaling pathways in even individual malignant cells. Epigenetic changes are critical mediators of these functional changes that drive and maintain the malignant phenotype. Changes in DNA methylation, histone acetylation and methylation, noncoding RNAs, posttranslational modifications are all epigenetic drivers in cancer, independent of changes in the DNA sequence. These epigenetic alterations were once thought to be crucial only for the malignant phenotype maintenance. Now, epigenetic alterations are also recognized as critical for disrupting essential pathways that protect the cells from uncontrolled growth, longer survival and establishment in distant sites from the original tissue. In this review, we focus on DNA methylation and chromatin structure in cancer. The precise functional role of these alterations is an area of active research using emerging high-throughput approaches and bioinformatics analysis tools. Therefore, this review also describes these high-throughput measurement technologies, public domain databases for high-throughput epigenetic data in tumors and model systems and bioinformatics algorithms for their analysis. Advances in bioinformatics data that combine these epigenetic data with genomics data are essential to infer the function of specific epigenetic alterations in cancer. These integrative algorithms are also a focus of this review. Future studies using these emerging technologies will elucidate how alterations in the cancer epigenome cooperate with genetic aberrations during tumor initiation and progression. This deeper understanding is essential to future studies with epigenetics biomarkers and precision medicine using emerging epigenetic therapies. PMID:28968850

  20. Increasing the reach of forensic genetics with massively parallel sequencing.

    PubMed

    Budowle, Bruce; Schmedes, Sarah E; Wendt, Frank R

    2017-09-01

    The field of forensic genetics has made great strides in the analysis of biological evidence related to criminal and civil matters. More so, the discipline has set a standard of performance and quality in the forensic sciences. The advent of massively parallel sequencing will allow the field to expand its capabilities substantially. This review describes the salient features of massively parallel sequencing and how it can impact forensic genetics. The features of this technology offer increased number and types of genetic markers that can be analyzed, higher throughput of samples, and the capability of targeting different organisms, all by one unifying methodology. While there are many applications, three are described where massively parallel sequencing will have immediate impact: molecular autopsy, microbial forensics and differentiation of monozygotic twins. The intent of this review is to expose the forensic science community to the potential enhancements that have or are soon to arrive and demonstrate the continued expansion the field of forensic genetics and its service in the investigation of legal matters.

  1. HSA: a heuristic splice alignment tool.

    PubMed

    Bu, Jingde; Chi, Xuebin; Jin, Zhong

    2013-01-01

    RNA-Seq methodology is a revolutionary transcriptomics sequencing technology, which is the representative of Next generation Sequencing (NGS). With the high throughput sequencing of RNA-Seq, we can acquire much more information like differential expression and novel splice variants from deep sequence analysis and data mining. But the short read length brings a great challenge to alignment, especially when the reads span two or more exons. A two steps heuristic splice alignment tool is generated in this investigation. First, map raw reads to reference with unspliced aligner--BWA; second, split initial unmapped reads into three equal short reads (seeds), align each seed to the reference, filter hits, search possible split position of read and extend hits to a complete match. Compare with other splice alignment tools like SOAPsplice and Tophat2, HSA has a better performance in call rate and efficiency, but its results do not as accurate as the other software to some extent. HSA is an effective spliced aligner of RNA-Seq reads mapping, which is available at https://github.com/vlcc/HSA.

  2. Computational complexity of algorithms for sequence comparison, short-read assembly and genome alignment.

    PubMed

    Baichoo, Shakuntala; Ouzounis, Christos A

    A multitude of algorithms for sequence comparison, short-read assembly and whole-genome alignment have been developed in the general context of molecular biology, to support technology development for high-throughput sequencing, numerous applications in genome biology and fundamental research on comparative genomics. The computational complexity of these algorithms has been previously reported in original research papers, yet this often neglected property has not been reviewed previously in a systematic manner and for a wider audience. We provide a review of space and time complexity of key sequence analysis algorithms and highlight their properties in a comprehensive manner, in order to identify potential opportunities for further research in algorithm or data structure optimization. The complexity aspect is poised to become pivotal as we will be facing challenges related to the continuous increase of genomic data on unprecedented scales and complexity in the foreseeable future, when robust biological simulation at the cell level and above becomes a reality. Copyright © 2017 Elsevier B.V. All rights reserved.

  3. Exploring single nucleotide polymorphism (SNP), microsatellite (SSR) and differentially expressed genes in the jellyfish (Rhopilema esculentum) by transcriptome sequencing.

    PubMed

    Li, Yunfeng; Zhou, Zunchun; Tian, Meilin; Tian, Yi; Dong, Ying; Li, Shilei; Liu, Weidong; He, Chongbo

    2017-08-01

    In this study, single nucleotide polymorphism (SNP), microsatellite (SSR) and differentially expressed genes (DEGs) in the oral parts, gonads, and umbrella parts of the jellyfish Rhopilema esculentum were analyzed by RNA-Seq technology. A total of 76.4 million raw reads and 72.1 million clean reads were generated from deep sequencing. Approximately 119,874 tentative unigenes and 149,239 transcripts were obtained. A total of 1,034,708 SNP markers were detected in the three tissues. For microsatellite mining, 5088 SSRs were identified from the unigene sequences. The most frequent repeat motifs were mononucleotide repeats, which accounted for 61.93%. Transcriptome comparison of the three tissues yielded a total of 8841 DEGs, of which 3560 were up-regulated and 5281 were down-regulated. This study represents the greatest sequencing effort carried out for a jellyfish and provides the first high-throughput transcriptomic resource for jellyfish. Copyright © 2017 Elsevier B.V. All rights reserved.

  4. Mapping the miRNA interactome by crosslinking ligation and sequencing of hybrids (CLASH)

    PubMed Central

    Helwak, Aleksandra; Tollervey, David

    2014-01-01

    RNA-RNA interactions play critical roles in many cellular processes but studying them is difficult and laborious. Here, we describe an experimental procedure, termed crosslinking ligation and sequencing of hybrids (CLASH), which allows high-throughput identification of sites of RNA-RNA interaction. During CLASH, a tagged bait protein is UV crosslinked in vivo to stabilise RNA interactions and purified under denaturing conditions. RNAs associated with the bait protein are partially truncated, and the ends of RNA-duplexes are ligated together. Following linker addition, cDNA library preparation and high-throughput sequencing, the ligated duplexes give rise to chimeric cDNAs, which unambiguously identify RNA-RNA interaction sites independent of bioinformatic predictions. This protocol is optimized for studying miRNA targets bound by Argonaute proteins, but should be easily adapted for other RNA-binding proteins and classes of RNA. The protocol requires around 5 days to complete, excluding the time required for high-throughput sequencing and bioinformatic analyses. PMID:24577361

  5. A high-throughput multiplex method adapted for GMO detection.

    PubMed

    Chaouachi, Maher; Chupeau, Gaëlle; Berard, Aurélie; McKhann, Heather; Romaniuk, Marcel; Giancola, Sandra; Laval, Valérie; Bertheau, Yves; Brunel, Dominique

    2008-12-24

    A high-throughput multiplex assay for the detection of genetically modified organisms (GMO) was developed on the basis of the existing SNPlex method designed for SNP genotyping. This SNPlex assay allows the simultaneous detection of up to 48 short DNA sequences (approximately 70 bp; "signature sequences") from taxa endogenous reference genes, from GMO constructions, screening targets, construct-specific, and event-specific targets, and finally from donor organisms. This assay avoids certain shortcomings of multiplex PCR-based methods already in widespread use for GMO detection. The assay demonstrated high specificity and sensitivity. The results suggest that this assay is reliable, flexible, and cost- and time-effective for high-throughput GMO detection.

  6. The missing indels: an estimate of indel variation in a human genome and analysis of factors that impede detection

    PubMed Central

    Jiang, Yue; Turinsky, Andrei L.; Brudno, Michael

    2015-01-01

    With the development of High-Throughput Sequencing (HTS) thousands of human genomes have now been sequenced. Whenever different studies analyze the same genome they usually agree on the amount of single-nucleotide polymorphisms, but differ dramatically on the number of insertion and deletion variants (indels). Furthermore, there is evidence that indels are often severely under-reported. In this manuscript we derive the total number of indel variants in a human genome by combining data from different sequencing technologies, while assessing the indel detection accuracy. Our estimate of approximately 1 million indels in a Yoruban genome is much higher than the results reported in several recent HTS studies. We identify two key sources of difficulties in indel detection: the insufficient coverage, read length or alignment quality; and the presence of repeats, including short interspersed elements and homopolymers/dimers. We quantify the effect of these factors on indel detection. The quality of sequencing data plays a major role in improving indel detection by HTS methods. However, many indels exist in long homopolymers and repeats, where their detection is severely impeded. The true number of indel events is likely even higher than our current estimates, and new techniques and technologies will be required to detect them. PMID:26130710

  7. enoLOGOS: a versatile web tool for energy normalized sequence logos

    PubMed Central

    Workman, Christopher T.; Yin, Yutong; Corcoran, David L.; Ideker, Trey; Stormo, Gary D.; Benos, Panayiotis V.

    2005-01-01

    enoLOGOS is a web-based tool that generates sequence logos from various input sources. Sequence logos have become a popular way to graphically represent DNA and amino acid sequence patterns from a set of aligned sequences. Each position of the alignment is represented by a column of stacked symbols with its total height reflecting the information content in this position. Currently, the available web servers are able to create logo images from a set of aligned sequences, but none of them generates weighted sequence logos directly from energy measurements or other sources. With the advent of high-throughput technologies for estimating the contact energy of different DNA sequences, tools that can create logos directly from binding affinity data are useful to researchers. enoLOGOS generates sequence logos from a variety of input data, including energy measurements, probability matrices, alignment matrices, count matrices and aligned sequences. Furthermore, enoLOGOS can represent the mutual information of different positions of the consensus sequence, a unique feature of this tool. Another web interface for our software, C2H2-enoLOGOS, generates logos for the DNA-binding preferences of the C2H2 zinc-finger transcription factor family members. enoLOGOS and C2H2-enoLOGOS are accessible over the web at . PMID:15980495

  8. False positives complicate ancient pathogen identifications using high-throughput shotgun sequencing

    PubMed Central

    2014-01-01

    Background Identification of historic pathogens is challenging since false positives and negatives are a serious risk. Environmental non-pathogenic contaminants are ubiquitous. Furthermore, public genetic databases contain limited information regarding these species. High-throughput sequencing may help reliably detect and identify historic pathogens. Results We shotgun-sequenced 8 16th-century Mixtec individuals from the site of Teposcolula Yucundaa (Oaxaca, Mexico) who are reported to have died from the huey cocoliztli (‘Great Pestilence’ in Nahautl), an unknown disease that decimated native Mexican populations during the Spanish colonial period, in order to identify the pathogen. Comparison of these sequences with those deriving from the surrounding soil and from 4 precontact individuals from the site found a wide variety of contaminant organisms that confounded analyses. Without the comparative sequence data from the precontact individuals and soil, false positives for Yersinia pestis and rickettsiosis could have been reported. Conclusions False positives and negatives remain problematic in ancient DNA analyses despite the application of high-throughput sequencing. Our results suggest that several studies claiming the discovery of ancient pathogens may need further verification. Additionally, true single molecule sequencing’s short read lengths, inability to sequence through DNA lesions, and limited ancient-DNA-specific technical development hinder its application to palaeopathology. PMID:24568097

  9. Role of APOE Isoforms in the Pathogenesis of TBI induced Alzheimer’s Disease

    DTIC Science & Technology

    2016-10-01

    deletion, APOE targeted replacement, complex breeding, CCI model optimization, mRNA library generation, high throughput massive parallel sequencing...demonstrate that the lack of Abca1 increases amyloid plaques and decreased APOE protein levels in AD-model mice. In this proposal we will test the hypothesis...injury, inflammatory reaction, transcriptome, high throughput massive parallel sequencing, mRNA-seq., behavioral testing, memory impairment, recovery 3

  10. SwellGel: an affinity chromatography technology for high-capacity and high-throughput purification of recombinant-tagged proteins.

    PubMed

    Draveling, C; Ren, L; Haney, P; Zeisse, D; Qoronfleh, M W

    2001-07-01

    The revolution in genomics and proteomics is having a profound impact on drug discovery. Today's protein scientist demands a faster, easier, more reliable way to purify proteins. A high capacity, high-throughput new technology has been developed in Perbio Sciences for affinity protein purification. This technology utilizes selected chromatography media that are dehydrated to form uniform aggregates. The SwellGel aggregates will instantly rehydrate upon addition of the protein sample, allowing purification and direct performance of multiple assays in a variety of formats. SwellGel technology has greater stability and is easier to handle than standard wet chromatography resins. The microplate format of this technology provides high-capacity, high-throughput features, recovering milligram quantities of protein suitable for high-throughput screening or biophysical/structural studies. Data will be presented applying SwellGel technology to recombinant 6x His-tagged protein and glutathione-S-transferase (GST) fusion protein purification. Copyright 2001 Academic Press.

  11. College of American Pathologists' laboratory standards for next-generation sequencing clinical tests.

    PubMed

    Aziz, Nazneen; Zhao, Qin; Bry, Lynn; Driscoll, Denise K; Funke, Birgit; Gibson, Jane S; Grody, Wayne W; Hegde, Madhuri R; Hoeltge, Gerald A; Leonard, Debra G B; Merker, Jason D; Nagarajan, Rakesh; Palicki, Linda A; Robetorye, Ryan S; Schrijver, Iris; Weck, Karen E; Voelkerding, Karl V

    2015-04-01

    The higher throughput and lower per-base cost of next-generation sequencing (NGS) as compared to Sanger sequencing has led to its rapid adoption in clinical testing. The number of laboratories offering NGS-based tests has also grown considerably in the past few years, despite the fact that specific Clinical Laboratory Improvement Amendments of 1988/College of American Pathologists (CAP) laboratory standards had not yet been developed to regulate this technology. To develop a checklist for clinical testing using NGS technology that sets standards for the analytic wet bench process and for bioinformatics or "dry bench" analyses. As NGS-based clinical tests are new to diagnostic testing and are of much greater complexity than traditional Sanger sequencing-based tests, there is an urgent need to develop new regulatory standards for laboratories offering these tests. To develop the necessary regulatory framework for NGS and to facilitate appropriate adoption of this technology for clinical testing, CAP formed a committee in 2011, the NGS Work Group, to deliberate upon the contents to be included in the checklist. Results . -A total of 18 laboratory accreditation checklist requirements for the analytic wet bench process and bioinformatics analysis processes have been included within CAP's molecular pathology checklist (MOL). This report describes the important issues considered by the CAP committee during the development of the new checklist requirements, which address documentation, validation, quality assurance, confirmatory testing, exception logs, monitoring of upgrades, variant interpretation and reporting, incidental findings, data storage, version traceability, and data transfer confidentiality.

  12. Comparison of next generation sequencing technologies for transcriptome characterization

    PubMed Central

    2009-01-01

    Background We have developed a simulation approach to help determine the optimal mixture of sequencing methods for most complete and cost effective transcriptome sequencing. We compared simulation results for traditional capillary sequencing with "Next Generation" (NG) ultra high-throughput technologies. The simulation model was parameterized using mappings of 130,000 cDNA sequence reads to the Arabidopsis genome (NCBI Accession SRA008180.19). We also generated 454-GS20 sequences and de novo assemblies for the basal eudicot California poppy (Eschscholzia californica) and the magnoliid avocado (Persea americana) using a variety of methods for cDNA synthesis. Results The Arabidopsis reads tagged more than 15,000 genes, including new splice variants and extended UTR regions. Of the total 134,791 reads (13.8 MB), 119,518 (88.7%) mapped exactly to known exons, while 1,117 (0.8%) mapped to introns, 11,524 (8.6%) spanned annotated intron/exon boundaries, and 3,066 (2.3%) extended beyond the end of annotated UTRs. Sequence-based inference of relative gene expression levels correlated significantly with microarray data. As expected, NG sequencing of normalized libraries tagged more genes than non-normalized libraries, although non-normalized libraries yielded more full-length cDNA sequences. The Arabidopsis data were used to simulate additional rounds of NG and traditional EST sequencing, and various combinations of each. Our simulations suggest a combination of FLX and Solexa sequencing for optimal transcriptome coverage at modest cost. We have also developed ESTcalc http://fgp.huck.psu.edu/NG_Sims/ngsim.pl, an online webtool, which allows users to explore the results of this study by specifying individualized costs and sequencing characteristics. Conclusion NG sequencing technologies are a highly flexible set of platforms that can be scaled to suit different project goals. In terms of sequence coverage alone, the NG sequencing is a dramatic advance over capillary-based sequencing, but NG sequencing also presents significant challenges in assembly and sequence accuracy due to short read lengths, method-specific sequencing errors, and the absence of physical clones. These problems may be overcome by hybrid sequencing strategies using a mixture of sequencing methodologies, by new assemblers, and by sequencing more deeply. Sequencing and microarray outcomes from multiple experiments suggest that our simulator will be useful for guiding NG transcriptome sequencing projects in a wide range of organisms. PMID:19646272

  13. Combining high-throughput sequencing with fruit body surveys reveals contrasting life-history strategies in fungi

    PubMed Central

    Ovaskainen, Otso; Schigel, Dmitry; Ali-Kovero, Heini; Auvinen, Petri; Paulin, Lars; Nordén, Björn; Nordén, Jenni

    2013-01-01

    Before the recent revolution in molecular biology, field studies on fungal communities were mostly confined to fruit bodies, whereas mycelial interactions were studied in the laboratory. Here we combine high-throughput sequencing with a fruit body inventory to study simultaneously mycelial and fruit body occurrences in a community of fungi inhabiting dead wood of Norway spruce. We studied mycelial occurrence by extracting DNA from wood samples followed by 454-sequencing of the ITS1 and ITS2 regions and an automated procedure for species identification. In total, we detected 198 species as mycelia and 137 species as fruit bodies. The correlation between mycelial and fruit body occurrences was high for the majority of the species, suggesting that high-throughput sequencing can successfully characterize the dominating fungal communities, despite possible biases related to sampling, PCR, sequencing and molecular identification. We used the fruit body and molecular data to test hypothesized links between life history and population dynamic parameters. We show that the species that have on average a high mycelial abundance also have a high fruiting rate and produce large fruit bodies, leading to a positive feedback loop in their population dynamics. Earlier studies have shown that species with specialized resource requirements are rarely seen fruiting, for which reason they are often classified as red-listed. We show with the help of high-throughput sequencing that some of these species are more abundant as mycelium in wood than what could be expected from their occurrence as fruit bodies. PMID:23575372

  14. A novel ultra high-throughput 16S rRNA gene amplicon sequencing library preparation method for the Illumina HiSeq platform.

    PubMed

    de Muinck, Eric J; Trosvik, Pål; Gilfillan, Gregor D; Hov, Johannes R; Sundaram, Arvind Y M

    2017-07-06

    Advances in sequencing technologies and bioinformatics have made the analysis of microbial communities almost routine. Nonetheless, the need remains to improve on the techniques used for gathering such data, including increasing throughput while lowering cost and benchmarking the techniques so that potential sources of bias can be better characterized. We present a triple-index amplicon sequencing strategy to sequence large numbers of samples at significantly lower c ost and in a shorter timeframe compared to existing methods. The design employs a two-stage PCR protocol, incorpo rating three barcodes to each sample, with the possibility to add a fourth-index. It also includes heterogeneity spacers to overcome low complexity issues faced when sequencing amplicons on Illumina platforms. The library preparation method was extensively benchmarked through analysis of a mock community in order to assess biases introduced by sample indexing, number of PCR cycles, and template concentration. We further evaluated the method through re-sequencing of a standardized environmental sample. Finally, we evaluated our protocol on a set of fecal samples from a small cohort of healthy adults, demonstrating good performance in a realistic experimental setting. Between-sample variation was mainly related to batch effects, such as DNA extraction, while sample indexing was also a significant source of bias. PCR cycle number strongly influenced chimera formation and affected relative abundance estimates of species with high GC content. Libraries were sequenced using the Illumina HiSeq and MiSeq platforms to demonstrate that this protocol is highly scalable to sequence thousands of samples at a very low cost. Here, we provide the most comprehensive study of performance and bias inherent to a 16S rRNA gene amplicon sequencing method to date. Triple-indexing greatly reduces the number of long custom DNA oligos required for library preparation, while the inclusion of variable length heterogeneity spacers minimizes the need for PhiX spike-in. This design results in a significant cost reduction of highly multiplexed amplicon sequencing. The biases we characterize highlight the need for highly standardized protocols. Reassuringly, we find that the biological signal is a far stronger structuring factor than the various sources of bias.

  15. Detection of active transposable elements in Arabidopsis thaliana using Oxford Nanopore Sequencing technology.

    PubMed

    Debladis, Emilie; Llauro, Christel; Carpentier, Marie-Christine; Mirouze, Marie; Panaud, Olivier

    2017-07-17

    Transposables elements (TEs) contribute to both structural and functional dynamics of most eukaryotic genomes. Because of their propensity to densely populate plant and animal genomes, the precise estimation of the impact of transposition on genomic diversity has been considered as one of the main challenges of today's genomics. The recent development of NGS (next generation sequencing) technologies has open new perspectives in population genomics by providing new methods for high throughput detection of Transposable Elements-associated Structural Variants (TEASV). However, these have relied on Illumina platform that generates short reads (up to 350 nucleotides). This limitation in size of sequence reads can cause high false discovery rate (FDR) and therefore limit the power of detection of TEASVs, especially in the case of large, complex genomes. The newest sequencing technologies, such as Oxford Nanopore Technologies (ONT) can generate kilobases-long reads thus representing a promising tool for TEASV detection in plant and animals. We present the results of a pilot experiment for TEASV detection on the model plant species Arabidopsis thaliana using ONT sequencing and show that it can be used efficiently to detect TE movements. We generated a ~0.8X genome coverage of a met1-derived epigenetic recombinant inbred line (epiRIL) using a MinIon device with R7 chemistry. We were able to detect nine new copies of the LTR-retrotransposon Evadé (EVD). We also evidenced the activity of the DNA transposon CACTA, CAC1. Even at a low sequence coverage (0.8X), ONT sequencing allowed us to reliably detect several TE insertions in Arabidopsis thaliana genome. The long read length allowed a precise and un-ambiguous mapping of the structural variations caused by the activity of TEs. This suggests that the trade-off between read length and genome coverage for TEASV detection may be in favor of the former. Should the technology be further improved both in terms of lower error rate and operation costs, it could be efficiently used in diversity studies at population level.

  16. MotifMark: Finding regulatory motifs in DNA sequences.

    PubMed

    Hassanzadeh, Hamid Reza; Kolhe, Pushkar; Isbell, Charles L; Wang, May D

    2017-07-01

    The interaction between proteins and DNA is a key driving force in a significant number of biological processes such as transcriptional regulation, repair, recombination, splicing, and DNA modification. The identification of DNA-binding sites and the specificity of target proteins in binding to these regions are two important steps in understanding the mechanisms of these biological activities. A number of high-throughput technologies have recently emerged that try to quantify the affinity between proteins and DNA motifs. Despite their success, these technologies have their own limitations and fall short in precise characterization of motifs, and as a result, require further downstream analysis to extract useful and interpretable information from a haystack of noisy and inaccurate data. Here we propose MotifMark, a new algorithm based on graph theory and machine learning, that can find binding sites on candidate probes and rank their specificity in regard to the underlying transcription factor. We developed a pipeline to analyze experimental data derived from compact universal protein binding microarrays and benchmarked it against two of the most accurate motif search methods. Our results indicate that MotifMark can be a viable alternative technique for prediction of motif from protein binding microarrays and possibly other related high-throughput techniques.

  17. Single-Cell Sequencing for Drug Discovery and Drug Development.

    PubMed

    Wu, Hongjin; Wang, Charles; Wu, Shixiu

    2017-01-01

    Next-generation sequencing (NGS), particularly single-cell sequencing, has revolutionized the scale and scope of genomic and biomedical research. Recent technological advances in NGS and singlecell studies have made the deep whole-genome (DNA-seq), whole epigenome and whole-transcriptome sequencing (RNA-seq) at single-cell level feasible. NGS at the single-cell level expands our view of genome, epigenome and transcriptome and allows the genome, epigenome and transcriptome of any organism to be explored without a priori assumptions and with unprecedented throughput. And it does so with single-nucleotide resolution. NGS is also a very powerful tool for drug discovery and drug development. In this review, we describe the current state of single-cell sequencing techniques, which can provide a new, more powerful and precise approach for analyzing effects of drugs on treated cells and tissues. Our review discusses single-cell whole genome/exome sequencing (scWGS/scWES), single-cell transcriptome sequencing (scRNA-seq), single-cell bisulfite sequencing (scBS), and multiple omics of single-cell sequencing. We also highlight the advantages and challenges of each of these approaches. Finally, we describe, elaborate and speculate the potential applications of single-cell sequencing for drug discovery and drug development. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.

  18. Using Poisson mixed-effects model to quantify transcript-level gene expression in RNA-Seq.

    PubMed

    Hu, Ming; Zhu, Yu; Taylor, Jeremy M G; Liu, Jun S; Qin, Zhaohui S

    2012-01-01

    RNA sequencing (RNA-Seq) is a powerful new technology for mapping and quantifying transcriptomes using ultra high-throughput next-generation sequencing technologies. Using deep sequencing, gene expression levels of all transcripts including novel ones can be quantified digitally. Although extremely promising, the massive amounts of data generated by RNA-Seq, substantial biases and uncertainty in short read alignment pose challenges for data analysis. In particular, large base-specific variation and between-base dependence make simple approaches, such as those that use averaging to normalize RNA-Seq data and quantify gene expressions, ineffective. In this study, we propose a Poisson mixed-effects (POME) model to characterize base-level read coverage within each transcript. The underlying expression level is included as a key parameter in this model. Since the proposed model is capable of incorporating base-specific variation as well as between-base dependence that affect read coverage profile throughout the transcript, it can lead to improved quantification of the true underlying expression level. POME can be freely downloaded at http://www.stat.purdue.edu/~yuzhu/pome.html. yuzhu@purdue.edu; zhaohui.qin@emory.edu Supplementary data are available at Bioinformatics online.

  19. The Einstein Genome Gateway using WASP - a high throughput multi-layered life sciences portal for XSEDE.

    PubMed

    Golden, Aaron; McLellan, Andrew S; Dubin, Robert A; Jing, Qiang; O Broin, Pilib; Moskowitz, David; Zhang, Zhengdong; Suzuki, Masako; Hargitai, Joseph; Calder, R Brent; Greally, John M

    2012-01-01

    Massively-parallel sequencing (MPS) technologies and their diverse applications in genomics and epigenomics research have yielded enormous new insights into the physiology and pathophysiology of the human genome. The biggest hurdle remains the magnitude and diversity of the datasets generated, compromising our ability to manage, organize, process and ultimately analyse data. The Wiki-based Automated Sequence Processor (WASP), developed at the Albert Einstein College of Medicine (hereafter Einstein), uniquely manages to tightly couple the sequencing platform, the sequencing assay, sample metadata and the automated workflows deployed on a heterogeneous high performance computing cluster infrastructure that yield sequenced, quality-controlled and 'mapped' sequence data, all within the one operating environment accessible by a web-based GUI interface. WASP at Einstein processes 4-6 TB of data per week and since its production cycle commenced it has processed ~ 1 PB of data overall and has revolutionized user interactivity with these new genomic technologies, who remain blissfully unaware of the data storage, management and most importantly processing services they request. The abstraction of such computational complexity for the user in effect makes WASP an ideal middleware solution, and an appropriate basis for the development of a grid-enabled resource - the Einstein Genome Gateway - as part of the Extreme Science and Engineering Discovery Environment (XSEDE) program. In this paper we discuss the existing WASP system, its proposed middleware role, and its planned interaction with XSEDE to form the Einstein Genome Gateway.

  20. An improved genome assembly uncovers prolific tandem repeats in Atlantic cod.

    PubMed

    Tørresen, Ole K; Star, Bastiaan; Jentoft, Sissel; Reinar, William B; Grove, Harald; Miller, Jason R; Walenz, Brian P; Knight, James; Ekholm, Jenny M; Peluso, Paul; Edvardsen, Rolf B; Tooming-Klunderud, Ave; Skage, Morten; Lien, Sigbjørn; Jakobsen, Kjetill S; Nederbragt, Alexander J

    2017-01-18

    The first Atlantic cod (Gadus morhua) genome assembly published in 2011 was one of the early genome assemblies exclusively based on high-throughput 454 pyrosequencing. Since then, rapid advances in sequencing technologies have led to a multitude of assemblies generated for complex genomes, although many of these are of a fragmented nature with a significant fraction of bases in gaps. The development of long-read sequencing and improved software now enable the generation of more contiguous genome assemblies. By combining data from Illumina, 454 and the longer PacBio sequencing technologies, as well as integrating the results of multiple assembly programs, we have created a substantially improved version of the Atlantic cod genome assembly. The sequence contiguity of this assembly is increased fifty-fold and the proportion of gap-bases has been reduced fifteen-fold. Compared to other vertebrates, the assembly contains an unusual high density of tandem repeats (TRs). Indeed, retrospective analyses reveal that gaps in the first genome assembly were largely associated with these TRs. We show that 21% of the TRs across the assembly, 19% in the promoter regions and 12% in the coding sequences are heterozygous in the sequenced individual. The inclusion of PacBio reads combined with the use of multiple assembly programs drastically improved the Atlantic cod genome assembly by successfully resolving long TRs. The high frequency of heterozygous TRs within or in the vicinity of genes in the genome indicate a considerable standing genomic variation in Atlantic cod populations, which is likely of evolutionary importance.

  1. RNA-Seq Alignment to Individualized Genomes Improves Transcript Abundance Estimates in Multiparent Populations

    PubMed Central

    Munger, Steven C.; Raghupathy, Narayanan; Choi, Kwangbom; Simons, Allen K.; Gatti, Daniel M.; Hinerfeld, Douglas A.; Svenson, Karen L.; Keller, Mark P.; Attie, Alan D.; Hibbs, Matthew A.; Graber, Joel H.; Chesler, Elissa J.; Churchill, Gary A.

    2014-01-01

    Massively parallel RNA sequencing (RNA-seq) has yielded a wealth of new insights into transcriptional regulation. A first step in the analysis of RNA-seq data is the alignment of short sequence reads to a common reference genome or transcriptome. Genetic variants that distinguish individual genomes from the reference sequence can cause reads to be misaligned, resulting in biased estimates of transcript abundance. Fine-tuning of read alignment algorithms does not correct this problem. We have developed Seqnature software to construct individualized diploid genomes and transcriptomes for multiparent populations and have implemented a complete analysis pipeline that incorporates other existing software tools. We demonstrate in simulated and real data sets that alignment to individualized transcriptomes increases read mapping accuracy, improves estimation of transcript abundance, and enables the direct estimation of allele-specific expression. Moreover, when applied to expression QTL mapping we find that our individualized alignment strategy corrects false-positive linkage signals and unmasks hidden associations. We recommend the use of individualized diploid genomes over reference sequence alignment for all applications of high-throughput sequencing technology in genetically diverse populations. PMID:25236449

  2. Genetic high throughput screening in Retinitis Pigmentosa based on high resolution melting (HRM) analysis.

    PubMed

    Anasagasti, Ander; Barandika, Olatz; Irigoyen, Cristina; Benitez, Bruno A; Cooper, Breanna; Cruchaga, Carlos; López de Munain, Adolfo; Ruiz-Ederra, Javier

    2013-11-01

    Retinitis Pigmentosa (RP) involves a group of genetically determined retinal diseases caused by a large number of mutations that result in rod photoreceptor cell death followed by gradual death of cone cells. Most cases of RP are monogenic, with more than 80 associated genes identified so far. The high number of genes and variants involved in RP, among other factors, is making the molecular characterization of RP a real challenge for many patients. Although HRM has been used for the analysis of isolated variants or single RP genes, as far as we are concerned, this is the first study that uses HRM analysis for a high-throughput screening of several RP genes. Our main goal was to test the suitability of HRM analysis as a genetic screening technique in RP, and to compare its performance with two of the most widely used NGS platforms, Illumina and PGM-Ion Torrent technologies. RP patients (n = 96) were clinically diagnosed at the Ophthalmology Department of Donostia University Hospital, Spain. We analyzed a total of 16 RP genes that meet the following inclusion criteria: 1) size: genes with transcripts of less than 4 kb; 2) number of exons: genes with up to 22 exons; and 3) prevalence: genes reported to account for, at least, 0.4% of total RP cases worldwide. For comparison purposes, RHO gene was also sequenced with Illumina (GAII; Illumina), Ion semiconductor technologies (PGM; Life Technologies) and Sanger sequencing (ABI 3130xl platform; Applied Biosystems). Detected variants were confirmed in all cases by Sanger sequencing and tested for co-segregation in the family of affected probands. We identified a total of 65 genetic variants, 15 of which (23%) were novel, in 49 out of 96 patients. Among them, 14 (4 novel) are probable disease-causing genetic variants in 7 RP genes, affecting 15 patients. Our HRM analysis-based study, proved to be a cost-effective and rapid method that provides an accurate identification of genetic RP variants. This approach is effective for medium sized (<4 kb transcript) RP genes, which constitute over 80% of the total of known RP genes.

  3. High-Throughput Sequencing for Detection of Subpopulations of Bacteria Not Previously Associated with Artisanal Cheeses

    PubMed Central

    Quigley, Lisa; O'Sullivan, Orla; Beresford, Tom P.; Ross, R. Paul; Fitzgerald, Gerald F.

    2012-01-01

    Here, high-throughput sequencing was employed to reveal the highly diverse bacterial populations present in 62 Irish artisanal cheeses and, in some cases, associated cheese rinds. Using this approach, we revealed the presence of several genera not previously associated with cheese, including Faecalibacterium, Prevotella, and Helcococcus and, for the first time, detected the presence of Arthrobacter and Brachybacterium in goats' milk cheese. Our analysis confirmed many previously observed patterns, such as the dominance of typical cheese bacteria, the fact that the microbiota of raw and pasteurized milk cheeses differ, and that the level of cheese maturation has a significant influence on Lactobacillus populations. It was also noted that cheeses containing adjunct ingredients had lower proportions of Lactococcus species. It is thus apparent that high-throughput sequencing-based investigations can provide valuable insights into the microbial populations of artisanal foods. PMID:22685131

  4. High-throughput sequencing for detection of subpopulations of bacteria not previously associated with artisanal cheeses.

    PubMed

    Quigley, Lisa; O'Sullivan, Orla; Beresford, Tom P; Ross, R Paul; Fitzgerald, Gerald F; Cotter, Paul D

    2012-08-01

    Here, high-throughput sequencing was employed to reveal the highly diverse bacterial populations present in 62 Irish artisanal cheeses and, in some cases, associated cheese rinds. Using this approach, we revealed the presence of several genera not previously associated with cheese, including Faecalibacterium, Prevotella, and Helcococcus and, for the first time, detected the presence of Arthrobacter and Brachybacterium in goats' milk cheese. Our analysis confirmed many previously observed patterns, such as the dominance of typical cheese bacteria, the fact that the microbiota of raw and pasteurized milk cheeses differ, and that the level of cheese maturation has a significant influence on Lactobacillus populations. It was also noted that cheeses containing adjunct ingredients had lower proportions of Lactococcus species. It is thus apparent that high-throughput sequencing-based investigations can provide valuable insights into the microbial populations of artisanal foods.

  5. Impacts of Seasonal Housing and Teat Preparation on Raw Milk Microbiota: a High-Throughput Sequencing Study.

    PubMed

    Doyle, Conor J; Gleeson, David; O'Toole, Paul W; Cotter, Paul D

    2017-01-15

    In pasture-based systems, changes in dairy herd habitat due to seasonality results in the exposure of animals to different environmental niches. These niches contain distinct microbial communities that may be transferred to raw milk, with potentially important food quality and safety implications for milk producers. It is postulated that the extent to which these microorganisms are transferred could be limited by the inclusion of a teat preparation step prior to milking. High-throughput sequencing on a variety of microbial niches on farms was used to study the patterns of microbial movement through the dairy production chain and, in the process, to investigate the impact of seasonal housing and the inclusion/exclusion of a teat preparation regime on the raw milk microbiota from the same herd over two sampling periods, i.e., indoor and outdoor. Beta diversity and network analyses showed that environmental and milk microbiotas separated depending on whether they were sourced from an indoor or outdoor environment. Within these respective habitats, similarities between the milk microbiota and that of teat swab samples and, to a lesser extent, fecal samples were apparent. Indeed, SourceTracker identified the teat surface as the most significant source of contamination, with herd feces being the next most prevalent source of contamination. In milk from cows grazing outdoors, teat prep significantly increased the numbers of total bacteria present. In summary, sequence-based microbiota analysis identified possible sources of raw milk contamination and highlighted the influence of environment and farm management practices on the raw milk microbiota. The composition of the raw milk microbiota is an important consideration from both a spoilage perspective and a food safety perspective and has implications for milk targeted for direct consumption and for downstream processing. Factors that influence contamination have been examined previously, primarily through the use of culture-based techniques. We describe here the extensive application of high-throughput DNA sequencing technologies to study the relationship between the milk production environment and the raw milk microbiota. The results show that the environment in which the herd was kept was the primary driver of the composition of the milk microbiota composition. Copyright © 2016 American Society for Microbiology.

  6. Impacts of Seasonal Housing and Teat Preparation on Raw Milk Microbiota: a High-Throughput Sequencing Study

    PubMed Central

    Doyle, Conor J.; Gleeson, David; O'Toole, Paul W.

    2016-01-01

    ABSTRACT In pasture-based systems, changes in dairy herd habitat due to seasonality results in the exposure of animals to different environmental niches. These niches contain distinct microbial communities that may be transferred to raw milk, with potentially important food quality and safety implications for milk producers. It is postulated that the extent to which these microorganisms are transferred could be limited by the inclusion of a teat preparation step prior to milking. High-throughput sequencing on a variety of microbial niches on farms was used to study the patterns of microbial movement through the dairy production chain and, in the process, to investigate the impact of seasonal housing and the inclusion/exclusion of a teat preparation regime on the raw milk microbiota from the same herd over two sampling periods, i.e., indoor and outdoor. Beta diversity and network analyses showed that environmental and milk microbiotas separated depending on whether they were sourced from an indoor or outdoor environment. Within these respective habitats, similarities between the milk microbiota and that of teat swab samples and, to a lesser extent, fecal samples were apparent. Indeed, SourceTracker identified the teat surface as the most significant source of contamination, with herd feces being the next most prevalent source of contamination. In milk from cows grazing outdoors, teat prep significantly increased the numbers of total bacteria present. In summary, sequence-based microbiota analysis identified possible sources of raw milk contamination and highlighted the influence of environment and farm management practices on the raw milk microbiota. IMPORTANCE The composition of the raw milk microbiota is an important consideration from both a spoilage perspective and a food safety perspective and has implications for milk targeted for direct consumption and for downstream processing. Factors that influence contamination have been examined previously, primarily through the use of culture-based techniques. We describe here the extensive application of high-throughput DNA sequencing technologies to study the relationship between the milk production environment and the raw milk microbiota. The results show that the environment in which the herd was kept was the primary driver of the composition of the milk microbiota composition. PMID:27815277

  7. Microbiome dynamic modulation through functional diets based on pre- and probiotics (mannan-oligosaccharides and Saccharomyces cerevisiae) in juvenile rainbow trout (Oncorhynchus mykiss).

    PubMed

    Gonçalves, A T; Gallardo-Escárate, C

    2017-05-01

    This study used high-throughput sequencing to evaluate the intestinal microbiome dynamics in rainbow trout (Oncorhynchus mykiss) fed commercial diets supplemented with either pre- or probiotics (0·6% mannan-oligosaccharides and 0·5% Saccharomyces cerevisiae respectively) or the mixture of both. A total of 57 fish whole intestinal mucosa and contents bacterial communities were characterized by high-throughput sequencing and analysis of the V3-V4 region of the 16S rRNA gene, as well as the relationship between plasma biochemical health indicators and microbiome diversity. This was performed at 7, 14 and 30 days after start feeding functional diets, and microbiome diversity increased when fish fed functional diets after 7 days and it was positively correlated with plasma cholesterol levels. Dominant phyla were, in descending order, Proteobacteria, Firmicutes, Actinobacteria, Acidobacteria, Bacteroidetes and Fusobacteria. However, functional diets reduced the abundance of Gammaproteobacteria to favour abundances of organisms from Firmicutes and Fusobacteria, two phyla with members that confer beneficial effects. A dynamic shift of the microbiome composition was observed with changes after 7 days of feeding and the modulation by functional diets tend to cluster the corresponding groups apart from CTRL group. The core microbiome showed an overall stability with functional diets, except genus such as Escherichia-Shigella that suffered severe reductions on their abundances when feeding any of the functional diets. Functional diets based on pre- or probiotics dynamically modulate intestinal microbiota of juvenile trout engaging taxonomical abundance shifts that might impact fish physiological performance. This study shows for the first time the microbiome modulation dynamics by functional diets based on mannan-oligosaccharides and S. cerevisiae and their synergy using culture independent high-throughput sequencing technology, revealing the complexity behind the dietary modulation with functional feeds in aquatic organisms. © 2017 The Society for Applied Microbiology.

  8. Illumina GA IIx& HiSeq 2000 Production Sequenccing and QC Analysis Pipelines at the DOE Joint Genome Institute

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Daum, Christopher; Zane, Matthew; Han, James

    2011-01-31

    The U.S. Department of Energy (DOE) Joint Genome Institute's (JGI) Production Sequencing group is committed to the generation of high-quality genomic DNA sequence to support the mission areas of renewable energy generation, global carbon management, and environmental characterization and clean-up. Within the JGI's Production Sequencing group, a robust Illumina Genome Analyzer and HiSeq pipeline has been established. Optimization of the sesequencer pipelines has been ongoing with the aim of continual process improvement of the laboratory workflow, reducing operational costs and project cycle times to increases ample throughput, and improving the overall quality of the sequence generated. A sequence QC analysismore » pipeline has been implemented to automatically generate read and assembly level quality metrics. The foremost of these optimization projects, along with sequencing and operational strategies, throughput numbers, and sequencing quality results will be presented.« less

  9. Library Design-Facilitated High-Throughput Sequencing of Synthetic Peptide Libraries.

    PubMed

    Vinogradov, Alexander A; Gates, Zachary P; Zhang, Chi; Quartararo, Anthony J; Halloran, Kathryn H; Pentelute, Bradley L

    2017-11-13

    A methodology to achieve high-throughput de novo sequencing of synthetic peptide mixtures is reported. The approach leverages shotgun nanoliquid chromatography coupled with tandem mass spectrometry-based de novo sequencing of library mixtures (up to 2000 peptides) as well as automated data analysis protocols to filter away incorrect assignments, noise, and synthetic side-products. For increasing the confidence in the sequencing results, mass spectrometry-friendly library designs were developed that enabled unambiguous decoding of up to 600 peptide sequences per hour while maintaining greater than 85% sequence identification rates in most cases. The reliability of the reported decoding strategy was additionally confirmed by matching fragmentation spectra for select authentic peptides identified from library sequencing samples. The methods reported here are directly applicable to screening techniques that yield mixtures of active compounds, including particle sorting of one-bead one-compound libraries and affinity enrichment of synthetic library mixtures performed in solution.

  10. HTP-OligoDesigner: An Online Primer Design Tool for High-Throughput Gene Cloning and Site-Directed Mutagenesis.

    PubMed

    Camilo, Cesar M; Lima, Gustavo M A; Maluf, Fernando V; Guido, Rafael V C; Polikarpov, Igor

    2016-01-01

    Following burgeoning genomic and transcriptomic sequencing data, biochemical and molecular biology groups worldwide are implementing high-throughput cloning and mutagenesis facilities in order to obtain a large number of soluble proteins for structural and functional characterization. Since manual primer design can be a time-consuming and error-generating step, particularly when working with hundreds of targets, the automation of primer design process becomes highly desirable. HTP-OligoDesigner was created to provide the scientific community with a simple and intuitive online primer design tool for both laboratory-scale and high-throughput projects of sequence-independent gene cloning and site-directed mutagenesis and a Tm calculator for quick queries.

  11. Comparison of the gut microbiota composition between wild and captive sika deer (Cervus nippon hortulorum) from feces by high-throughput sequencing.

    PubMed

    Guan, Yu; Yang, Haitao; Han, Siyu; Feng, Limin; Wang, Tianming; Ge, Jianping

    2017-11-23

    The gut microbiota is characterized as a complex ecosystem that has effects on health and diseases of host with the interactions of many other factors together. Sika deer is the national level for the protection of wild animals in China. The available sequencing data of gut microbiota from feces of wild sika deer, especially for Cervus nippon hortulorum in Northeast China, are limited. Here, we characterized the gastrointestinal bacterial communities of wild (7 samples) and captive (12 samples) sika deer from feces, and compared their gut microbiota by analyzing the V3-V4 region of 16S rRNA gene using high-throughput sequencing technology on the Illumina Hiseq platform. Firmicutes (77.624%), Bacteroidetes (18.288%) and Tenericutes (1.342%) were the most predominant phyla in wild sika deer. While in captive sika deer, Firmicutes (50.710%) was the dominant phylum, followed by Bacteroidetes (31.996%) and Proteobacteria (4.806%). A total of 9 major phyla, 22 families and 30 genera among gastrointestinal bacterial communities showed significant differences between wild and captive sika deer. The specific function and mechanism of Tenericutes in wild sika deer need further study. Our results indicated that captive sika deer in farm had higher fecal bacterial diversity than the wild. Abundance and quantity of diet source for sika deer played crucial role in shaping the composition and structure of gut microbiota.

  12. High-Throughput Sequencing of RNA Silencing-Associated Small RNAs in Olive (Olea europaea L.)

    PubMed Central

    Donaire, Livia; Pedrola, Laia; de la Rosa, Raúl; Llave, César

    2011-01-01

    Small RNAs (sRNAs) of 20 to 25 nucleotides (nt) in length maintain genome integrity and control gene expression in a multitude of developmental and physiological processes. Despite RNA silencing has been primarily studied in model plants, the advent of high-throughput sequencing technologies has enabled profiling of the sRNA component of more than 40 plant species. Here, we used deep sequencing and molecular methods to report the first inventory of sRNAs in olive (Olea europaea L.). sRNA libraries prepared from juvenile and adult shoots revealed that the 24-nt class dominates the sRNA transcriptome and atypically accumulates to levels never seen in other plant species, suggesting an active role of heterochromatin silencing in the maintenance and integrity of its large genome. A total of 18 known miRNA families were identified in the libraries. Also, 5 other sRNAs derived from potential hairpin-like precursors remain as plausible miRNA candidates. RNA blots confirmed miRNA expression and suggested tissue- and/or developmental-specific expression patterns. Target mRNAs of conserved miRNAs were computationally predicted among the olive cDNA collection and experimentally validated through endonucleolytic cleavage assays. Finally, we use expression data to uncover genetic components of the miR156, miR172 and miR390/TAS3-derived trans-acting small interfering RNA (tasiRNA) regulatory nodes, suggesting that these interactive networks controlling developmental transitions are fully operational in olive. PMID:22140484

  13. Resequencing of the common marmoset genome improves genome assemblies and gene-coding sequence analysis.

    PubMed

    Sato, Kengo; Kuroki, Yoko; Kumita, Wakako; Fujiyama, Asao; Toyoda, Atsushi; Kawai, Jun; Iriki, Atsushi; Sasaki, Erika; Okano, Hideyuki; Sakakibara, Yasubumi

    2015-11-20

    The first draft of the common marmoset (Callithrix jacchus) genome was published by the Marmoset Genome Sequencing and Analysis Consortium. The draft was based on whole-genome shotgun sequencing, and the current assembly version is Callithrix_jacches-3.2.1, but there still exist 187,214 undetermined gap regions and supercontigs and relatively short contigs that are unmapped to chromosomes in the draft genome. We performed resequencing and assembly of the genome of common marmoset by deep sequencing with high-throughput sequencing technology. Several different sequence runs using Illumina sequencing platforms were executed, and 181 Gbp of high-quality bases including mate-pairs with long insert lengths of 3, 8, 20, and 40 Kbp were obtained, that is, approximately 60× coverage. The resequencing significantly improved the MGSAC draft genome sequence. The N50 of the contigs, which is a statistical measure used to evaluate assembly quality, doubled. As a result, 51% of the contigs (total length: 299 Mbp) that were unmapped to chromosomes in the MGSAC draft were merged with chromosomal contigs, and the improved genome sequence helped to detect 5,288 new genes that are homologous to human cDNAs and the gaps in 5,187 transcripts of the Ensembl gene annotations were completely filled.

  14. Studies of a biochemical factory: tomato trichome deep expressed sequence tag sequencing and proteomics.

    PubMed

    Schilmiller, Anthony L; Miner, Dennis P; Larson, Matthew; McDowell, Eric; Gang, David R; Wilkerson, Curtis; Last, Robert L

    2010-07-01

    Shotgun proteomics analysis allows hundreds of proteins to be identified and quantified from a single sample at relatively low cost. Extensive DNA sequence information is a prerequisite for shotgun proteomics, and it is ideal to have sequence for the organism being studied rather than from related species or accessions. While this requirement has limited the set of organisms that are candidates for this approach, next generation sequencing technologies make it feasible to obtain deep DNA sequence coverage from any organism. As part of our studies of specialized (secondary) metabolism in tomato (Solanum lycopersicum) trichomes, 454 sequencing of cDNA was combined with shotgun proteomics analyses to obtain in-depth profiles of genes and proteins expressed in leaf and stem glandular trichomes of 3-week-old plants. The expressed sequence tag and proteomics data sets combined with metabolite analysis led to the discovery and characterization of a sesquiterpene synthase that produces beta-caryophyllene and alpha-humulene from E,E-farnesyl diphosphate in trichomes of leaf but not of stem. This analysis demonstrates the utility of combining high-throughput cDNA sequencing with proteomics experiments in a target tissue. These data can be used for dissection of other biochemical processes in these specialized epidermal cells.

  15. Studies of a Biochemical Factory: Tomato Trichome Deep Expressed Sequence Tag Sequencing and Proteomics1[W][OA

    PubMed Central

    Schilmiller, Anthony L.; Miner, Dennis P.; Larson, Matthew; McDowell, Eric; Gang, David R.; Wilkerson, Curtis; Last, Robert L.

    2010-01-01

    Shotgun proteomics analysis allows hundreds of proteins to be identified and quantified from a single sample at relatively low cost. Extensive DNA sequence information is a prerequisite for shotgun proteomics, and it is ideal to have sequence for the organism being studied rather than from related species or accessions. While this requirement has limited the set of organisms that are candidates for this approach, next generation sequencing technologies make it feasible to obtain deep DNA sequence coverage from any organism. As part of our studies of specialized (secondary) metabolism in tomato (Solanum lycopersicum) trichomes, 454 sequencing of cDNA was combined with shotgun proteomics analyses to obtain in-depth profiles of genes and proteins expressed in leaf and stem glandular trichomes of 3-week-old plants. The expressed sequence tag and proteomics data sets combined with metabolite analysis led to the discovery and characterization of a sesquiterpene synthase that produces β-caryophyllene and α-humulene from E,E-farnesyl diphosphate in trichomes of leaf but not of stem. This analysis demonstrates the utility of combining high-throughput cDNA sequencing with proteomics experiments in a target tissue. These data can be used for dissection of other biochemical processes in these specialized epidermal cells. PMID:20431087

  16. [Genetic analysis of two children patients affected with CHARGE syndrome].

    PubMed

    Li, Guoqiang; Li, Niu; Xu, Yufei; Li, Juan; Ding, Yu; Shen, Yiping; Wang, Xiumin; Wang, Jian

    2018-04-10

    To analyze two Chinese pediatric patients with multiple malformations and growth and development delay. Both patients were subjected to targeted gene sequencing, and the results were analyzed with Ingenuity Variant Analysis software. Suspected pathogenic variations were verified by Sanger sequencing. High-throughput sequencing showed that both patients have carried heterozygous variants of the CHD7 gene. Patient 1 carried a nonsense mutation in exon 36 (c.7957C>T, p.Arg2653*), while patient 2 carried a nonsense mutation of exon 2 (c.718C>T, p.Gln240*). Sanger sequencing confirmed the above mutations in both patients, while their parents were of wild-type for the corresponding sites, indicating that the two mutations have happened de novo. Two patients were diagnosed with CHARGE syndrome by high-throughput sequencing.

  17. HybPiper: Extracting coding sequence and introns for phylogenetics from high-throughput sequencing reads using target enrichment1

    PubMed Central

    Johnson, Matthew G.; Gardner, Elliot M.; Liu, Yang; Medina, Rafael; Goffinet, Bernard; Shaw, A. Jonathan; Zerega, Nyree J. C.; Wickett, Norman J.

    2016-01-01

    Premise of the study: Using sequence data generated via target enrichment for phylogenetics requires reassembly of high-throughput sequence reads into loci, presenting a number of bioinformatics challenges. We developed HybPiper as a user-friendly platform for assembly of gene regions, extraction of exon and intron sequences, and identification of paralogous gene copies. We test HybPiper using baits designed to target 333 phylogenetic markers and 125 genes of functional significance in Artocarpus (Moraceae). Methods and Results: HybPiper implements parallel execution of sequence assembly in three phases: read mapping, contig assembly, and target sequence extraction. The pipeline was able to recover nearly complete gene sequences for all genes in 22 species of Artocarpus. HybPiper also recovered more than 500 bp of nontargeted intron sequence in over half of the phylogenetic markers and identified paralogous gene copies in Artocarpus. Conclusions: HybPiper was designed for Linux and Mac OS X and is freely available at https://github.com/mossmatters/HybPiper. PMID:27437175

  18. Searching for resistance genes to Bursaphelenchus xylophilus using high throughput screening.

    PubMed

    Santos, Carla S; Pinheiro, Miguel; Silva, Ana I; Egas, Conceição; Vasconcelos, Marta W

    2012-11-07

    Pine wilt disease (PWD), caused by the pinewood nematode (PWN; Bursaphelenchus xylophilus), damages and kills pine trees and is causing serious economic damage worldwide. Although the ecological mechanism of infestation is well described, the plant's molecular response to the pathogen is not well known. This is due mainly to the lack of genomic information and the complexity of the disease. High throughput sequencing is now an efficient approach for detecting the expression of genes in non-model organisms, thus providing valuable information in spite of the lack of the genome sequence. In an attempt to unravel genes potentially involved in the pine defense against the pathogen, we hereby report the high throughput comparative sequence analysis of infested and non-infested stems of Pinus pinaster (very susceptible to PWN) and Pinus pinea (less susceptible to PWN). Four cDNA libraries from infested and non-infested stems of P. pinaster and P. pinea were sequenced in a full 454 GS FLX run, producing a total of 2,083,698 reads. The putative amino acid sequences encoded by the assembled transcripts were annotated according to Gene Ontology, to assign Pinus contigs into Biological Processes, Cellular Components and Molecular Functions categories. Most of the annotated transcripts corresponded to Picea genes-25.4-39.7%, whereas a smaller percentage, matched Pinus genes, 1.8-12.8%, probably a consequence of more public genomic information available for Picea than for Pinus. The comparative transcriptome analysis showed that when P. pinaster was infested with PWN, the genes malate dehydrogenase, ABA, water deficit stress related genes and PAR1 were highly expressed, while in PWN-infested P. pinea, the highly expressed genes were ricin B-related lectin, and genes belonging to the SNARE and high mobility group families. Quantitative PCR experiments confirmed the differential gene expression between the two pine species. Defense-related genes triggered by nematode infestation were detected in both P. pinaster and P. pinea transcriptomes utilizing 454 pyrosequencing technology. P. pinaster showed higher abundance of genes related to transcriptional regulation, terpenoid secondary metabolism (including some with nematicidal activity) and pathogen attack. P. pinea showed higher abundance of genes related to oxidative stress and higher levels of expression in general of stress responsive genes. This study provides essential information about the molecular defense mechanisms utilized by P. pinaster and P. pinea against PWN infestation and contributes to a better understanding of PWD.

  19. Ethical considerations of research policy for personal genome analysis: the approach of the Genome Science Project in Japan.

    PubMed

    Minari, Jusaku; Shirai, Tetsuya; Kato, Kazuto

    2014-12-01

    As evidenced by high-throughput sequencers, genomic technologies have recently undergone radical advances. These technologies enable comprehensive sequencing of personal genomes considerably more efficiently and less expensively than heretofore. These developments present a challenge to the conventional framework of biomedical ethics; under these changing circumstances, each research project has to develop a pragmatic research policy. Based on the experience with a new large-scale project-the Genome Science Project-this article presents a novel approach to conducting a specific policy for personal genome research in the Japanese context. In creating an original informed-consent form template for the project, we present a two-tiered process: making the draft of the template following an analysis of national and international policies; refining the draft template in conjunction with genome project researchers for practical application. Through practical use of the template, we have gained valuable experience in addressing challenges in the ethical review process, such as the importance of sharing details of the latest developments in genomics with members of research ethics committees. We discuss certain limitations of the conventional concept of informed consent and its governance system and suggest the potential of an alternative process using information technology.

  20. YAMAT-seq: an efficient method for high-throughput sequencing of mature transfer RNAs

    PubMed Central

    Shigematsu, Megumi; Honda, Shozo; Loher, Phillipe; Telonis, Aristeidis G.; Rigoutsos, Isidore

    2017-01-01

    Abstract Besides translation, transfer RNAs (tRNAs) play many non-canonical roles in various biological pathways and exhibit highly variable expression profiles. To unravel the emerging complexities of tRNA biology and molecular mechanisms underlying them, an efficient tRNA sequencing method is required. However, the rigid structure of tRNA has been presenting a challenge to the development of such methods. We report the development of Y-shaped Adapter-ligated MAture TRNA sequencing (YAMAT-seq), an efficient and convenient method for high-throughput sequencing of mature tRNAs. YAMAT-seq circumvents the issue of inefficient adapter ligation, a characteristic of conventional RNA sequencing methods for mature tRNAs, by employing the efficient and specific ligation of Y-shaped adapter to mature tRNAs using T4 RNA Ligase 2. Subsequent cDNA amplification and next-generation sequencing successfully yield numerous mature tRNA sequences. YAMAT-seq has high specificity for mature tRNAs and high sensitivity to detect most isoacceptors from minute amount of total RNA. Moreover, YAMAT-seq shows quantitative capability to estimate expression levels of mature tRNAs, and has high reproducibility and broad applicability for various cell lines. YAMAT-seq thus provides high-throughput technique for identifying tRNA profiles and their regulations in various transcriptomes, which could play important regulatory roles in translation and other biological processes. PMID:28108659

  1. ChIP-chip versus ChIP-seq: Lessons for experimental design and data analysis

    PubMed Central

    2011-01-01

    Background Chromatin immunoprecipitation (ChIP) followed by microarray hybridization (ChIP-chip) or high-throughput sequencing (ChIP-seq) allows genome-wide discovery of protein-DNA interactions such as transcription factor bindings and histone modifications. Previous reports only compared a small number of profiles, and little has been done to compare histone modification profiles generated by the two technologies or to assess the impact of input DNA libraries in ChIP-seq analysis. Here, we performed a systematic analysis of a modENCODE dataset consisting of 31 pairs of ChIP-chip/ChIP-seq profiles of the coactivator CBP, RNA polymerase II (RNA PolII), and six histone modifications across four developmental stages of Drosophila melanogaster. Results Both technologies produce highly reproducible profiles within each platform, ChIP-seq generally produces profiles with a better signal-to-noise ratio, and allows detection of more peaks and narrower peaks. The set of peaks identified by the two technologies can be significantly different, but the extent to which they differ varies depending on the factor and the analysis algorithm. Importantly, we found that there is a significant variation among multiple sequencing profiles of input DNA libraries and that this variation most likely arises from both differences in experimental condition and sequencing depth. We further show that using an inappropriate input DNA profile can impact the average signal profiles around genomic features and peak calling results, highlighting the importance of having high quality input DNA data for normalization in ChIP-seq analysis. Conclusions Our findings highlight the biases present in each of the platforms, show the variability that can arise from both technology and analysis methods, and emphasize the importance of obtaining high quality and deeply sequenced input DNA libraries for ChIP-seq analysis. PMID:21356108

  2. Rapid detection of the CYP2A6*12 hybrid allele by Pyrosequencing technology.

    PubMed

    Koontz, Deborah A; Huckins, Jacqueline J; Spencer, Antonina; Gallagher, Margaret L

    2009-08-24

    Identification of CYP2A6 alleles associated with reduced enzyme activity is important in the study of inter-individual differences in drug metabolism. CYP2A6*12 is a hybrid allele that results from unequal crossover between CYP2A6 and CYP2A7 genes. The 5' regulatory region and exons 1-2 are derived from CYP2A7, and exons 3-9 are derived from CYP2A6. Conventional methods for detection of CYP2A6*12 consist of two-step PCR protocols that are laborious and unsuitable for high-throughput genotyping. We developed a rapid and accurate method to detect the CYP2A6*12 allele by Pyrosequencing technology. A single set of PCR primers was designed to specifically amplify both the CYP2A6*1 wild-type allele and the CYP2A6*12 hybrid allele. An internal Pyrosequencing primer was used to generate allele-specific sequence information, which detected homozygous wild-type, heterozygous hybrid, and homozygous hybrid alleles. We first validated the assay on 104 DNA samples that were also genotyped by conventional two-step PCR and by cycle sequencing. CYP2A6*12 allele frequencies were then determined using the Pyrosequencing assay on 181 multi-ethnic DNA samples from subjects of African American, European Caucasian, Pacific Rim, and Hispanic descent. Finally, we streamlined the Pyrosequencing assay by integrating liquid handling robotics into the workflow. Pyrosequencing results demonstrated 100% concordance with conventional two-step PCR and cycle sequencing methods. Allele frequency data showed slightly higher prevalence of the CYP2A6*12 allele in European Caucasians and Hispanics. This Pyrosequencing assay proved to be a simple, rapid, and accurate alternative to conventional methods, which can be easily adapted to the needs of higher-throughput studies.

  3. High Throughput Computing Impact on Meta Genomics (Metagenomics Informatics Challenges Workshop: 10K Genomes at a Time)

    ScienceCinema

    Gore, Brooklin

    2018-02-01

    This presentation includes a brief background on High Throughput Computing, correlating gene transcription factors, optical mapping, genotype to phenotype mapping via QTL analysis, and current work on next gen sequencing.

  4. Fast multiclonal clusterization of V(D)J recombinations from high-throughput sequencing.

    PubMed

    Giraud, Mathieu; Salson, Mikaël; Duez, Marc; Villenet, Céline; Quief, Sabine; Caillault, Aurélie; Grardel, Nathalie; Roumier, Christophe; Preudhomme, Claude; Figeac, Martin

    2014-05-28

    V(D)J recombinations in lymphocytes are essential for immunological diversity. They are also useful markers of pathologies. In leukemia, they are used to quantify the minimal residual disease during patient follow-up. However, the full breadth of lymphocyte diversity is not fully understood. We propose new algorithms that process high-throughput sequencing (HTS) data to extract unnamed V(D)J junctions and gather them into clones for quantification. This analysis is based on a seed heuristic and is fast and scalable because in the first phase, no alignment is performed with germline database sequences. The algorithms were applied to TR γ HTS data from a patient with acute lymphoblastic leukemia, and also on data simulating hypermutations. Our methods identified the main clone, as well as additional clones that were not identified with standard protocols. The proposed algorithms provide new insight into the analysis of high-throughput sequencing data for leukemia, and also to the quantitative assessment of any immunological profile. The methods described here are implemented in a C++ open-source program called Vidjil.

  5. Use of the melting curve assay as a means for high-throughput quantification of Illumina sequencing libraries.

    PubMed

    Shinozuka, Hiroshi; Forster, John W

    2016-01-01

    Background. Multiplexed sequencing is commonly performed on massively parallel short-read sequencing platforms such as Illumina, and the efficiency of library normalisation can affect the quality of the output dataset. Although several library normalisation approaches have been established, none are ideal for highly multiplexed sequencing due to issues of cost and/or processing time. Methods. An inexpensive and high-throughput library quantification method has been developed, based on an adaptation of the melting curve assay. Sequencing libraries were subjected to the assay using the Bio-Rad Laboratories CFX Connect(TM) Real-Time PCR Detection System. The library quantity was calculated through summation of reduction of relative fluorescence units between 86 and 95 °C. Results.PCR-enriched sequencing libraries are suitable for this quantification without pre-purification of DNA. Short DNA molecules, which ideally should be eliminated from the library for subsequent processing, were differentiated from the target DNA in a mixture on the basis of differences in melting temperature. Quantification results for long sequences targeted using the melting curve assay were correlated with those from existing methods (R (2) > 0.77), and that observed from MiSeq sequencing (R (2) = 0.82). Discussion.The results of multiplexed sequencing suggested that the normalisation performance of the described method is equivalent to that of another recently reported high-throughput bead-based method, BeNUS. However, costs for the melting curve assay are considerably lower and processing times shorter than those of other existing methods, suggesting greater suitability for highly multiplexed sequencing applications.

  6. The use of coded PCR primers enables high-throughput sequencing of multiple homolog amplification products by 454 parallel sequencing.

    PubMed

    Binladen, Jonas; Gilbert, M Thomas P; Bollback, Jonathan P; Panitz, Frank; Bendixen, Christian; Nielsen, Rasmus; Willerslev, Eske

    2007-02-14

    The invention of the Genome Sequence 20 DNA Sequencing System (454 parallel sequencing platform) has enabled the rapid and high-volume production of sequence data. Until now, however, individual emulsion PCR (emPCR) reactions and subsequent sequencing runs have been unable to combine template DNA from multiple individuals, as homologous sequences cannot be subsequently assigned to their original sources. We use conventional PCR with 5'-nucleotide tagged primers to generate homologous DNA amplification products from multiple specimens, followed by sequencing through the high-throughput Genome Sequence 20 DNA Sequencing System (GS20, Roche/454 Life Sciences). Each DNA sequence is subsequently traced back to its individual source through 5'tag-analysis. We demonstrate that this new approach enables the assignment of virtually all the generated DNA sequences to the correct source once sequencing anomalies are accounted for (miss-assignment rate<0.4%). Therefore, the method enables accurate sequencing and assignment of homologous DNA sequences from multiple sources in single high-throughput GS20 run. We observe a bias in the distribution of the differently tagged primers that is dependent on the 5' nucleotide of the tag. In particular, primers 5' labelled with a cytosine are heavily overrepresented among the final sequences, while those 5' labelled with a thymine are strongly underrepresented. A weaker bias also exists with regards to the distribution of the sequences as sorted by the second nucleotide of the dinucleotide tags. As the results are based on a single GS20 run, the general applicability of the approach requires confirmation. However, our experiments demonstrate that 5'primer tagging is a useful method in which the sequencing power of the GS20 can be applied to PCR-based assays of multiple homologous PCR products. The new approach will be of value to a broad range of research areas, such as those of comparative genomics, complete mitochondrial analyses, population genetics, and phylogenetics.

  7. Impact of sequencing depth on the characterization of the microbiome and resistome.

    PubMed

    Zaheer, Rahat; Noyes, Noelle; Ortega Polo, Rodrigo; Cook, Shaun R; Marinier, Eric; Van Domselaar, Gary; Belk, Keith E; Morley, Paul S; McAllister, Tim A

    2018-04-12

    Developments in high-throughput next generation sequencing (NGS) technology have rapidly advanced the understanding of overall microbial ecology as well as occurrence and diversity of specific genes within diverse environments. In the present study, we compared the ability of varying sequencing depths to generate meaningful information about the taxonomic structure and prevalence of antimicrobial resistance genes (ARGs) in the bovine fecal microbial community. Metagenomic sequencing was conducted on eight composite fecal samples originating from four beef cattle feedlots. Metagenomic DNA was sequenced to various depths, D1, D0.5 and D0.25, with average sample read counts of 117, 59 and 26 million, respectively. A comparative analysis of the relative abundance of reads aligning to different phyla and antimicrobial classes indicated that the relative proportions of read assignments remained fairly constant regardless of depth. However, the number of reads being assigned to ARGs as well as to microbial taxa increased significantly with increasing depth. We found a depth of D0.5 was suitable to describe the microbiome and resistome of cattle fecal samples. This study helps define a balance between cost and required sequencing depth to acquire meaningful results.

  8. Genome-Wide Mutagenesis in Borrelia burgdorferi.

    PubMed

    Lin, Tao; Gao, Lihui

    2018-01-01

    Signature-tagged mutagenesis (STM) is a functional genomics approach to identify bacterial virulence determinants and virulence factors by simultaneously screening multiple mutants in a single host animal, and has been utilized extensively for the study of bacterial pathogenesis, host-pathogen interactions, and spirochete and tick biology. The signature-tagged transposon mutagenesis has been developed to investigate virulence determinants and pathogenesis of Borrelia burgdorferi. Mutants in genes important in virulence are identified by negative selection in which the mutants fail to colonize or disseminate in the animal host and tick vector. STM procedure combined with Luminex Flex ® Map™ technology and next-generation sequencing (e.g., Tn-seq) are the powerful high-throughput tools for the determination of Borrelia burgdorferi virulence determinants. The assessment of multiple tissue sites and two DNA resources at two different time points using Luminex Flex ® Map™ technology provides a robust data set. B. burgdorferi transposon mutant screening indicates that a high proportion of genes are the novel virulence determinants that are required for mouse and tick infection. In this protocol, an effective signature-tagged Himar1-based transposon suicide vector was developed and used to generate a sequence-defined library of nearly 4800 mutants in the infectious B. burgdorferi B31 clone. In STM, signature-tagged suicide vectors are constructed by inserting unique DNA sequences (tags) into the transposable elements. The signature-tagged transposon mutants are generated when transposon suicide vectors are transformed into an infectious B. burgdorferi clone, and the transposable element is transposed into the 5'-TA-3' sequence in the B. burgdorferi genome with the signature tag. The transposon library is created and consists of many sub-libraries, each sub-library has several hundreds of mutants with same tags. A group of mice or ticks are infected with a mixed population of mutants with different tags, after recovered from different tissues of infected mice and ticks, mutants from output pool and input pool are detected using high-throughput, semi-quantitative Luminex ® FLEXMAP™ or next-generation sequencing (Tn-seq) technologies. Thus far, we have created a high-density, sequence-defined transposon library of over 6600 STM mutants for the efficient genome-wide investigation of genes and gene products required for wild-type pathogenesis, host-pathogen interactions, in vitro growth, in vivo survival, physiology, morphology, chemotaxis, motility, structure, metabolism, gene regulation, plasmid maintenance and replication, etc. The insertion sites of 4480 transposon mutants have been determined. About 800 predicted protein-encoding genes in the genome were disrupted in the STM transposon library. The infectivity and some functions of 800 mutants in 500 genes have been determined. Analysis of these transposon mutants has yielded valuable information regarding the genes and gene products important in the pathogenesis and biology of B. burgdorferi and its tick vectors.

  9. A Modified Protocol for High-Quality RNA Extraction from Oleoresin-Producing Adult Pines.

    PubMed

    de Lima, Júlio César; Füller, Thanise Nogueira; de Costa, Fernanda; Rodrigues-Corrêa, Kelly C S; Fett-Neto, Arthur G

    2016-01-01

    RNA extraction resulting in good yields and quality is a fundamental step for the analyses of transcriptomes through high-throughput sequencing technologies, microarray, and also northern blots, RT-PCR, and RTqPCR. Even though many specific protocols designed for plants with high content of secondary metabolites have been developed, these are often expensive, time consuming, and not suitable for a wide range of tissues. Here we present a modification of the method previously described using the commercially available Concert™ Plant RNA Reagent (Invitrogen) buffer for field-grown adult pine trees with high oleoresin content.

  10. A new arenavirus in a cluster of fatal transplant-associated diseases.

    PubMed

    Palacios, Gustavo; Druce, Julian; Du, Lei; Tran, Thomas; Birch, Chris; Briese, Thomas; Conlan, Sean; Quan, Phenix-Lan; Hui, Jeffrey; Marshall, John; Simons, Jan Fredrik; Egholm, Michael; Paddock, Christopher D; Shieh, Wun-Ju; Goldsmith, Cynthia S; Zaki, Sherif R; Catton, Mike; Lipkin, W Ian

    2008-03-06

    Three patients who received visceral-organ transplants from a single donor on the same day died of a febrile illness 4 to 6 weeks after transplantation. Culture, polymerase-chain-reaction (PCR) and serologic assays, and oligonucleotide microarray analysis for a wide range of infectious agents were not informative. We evaluated RNA obtained from the liver and kidney transplant recipients. Unbiased high-throughput sequencing was used to identify microbial sequences not found by means of other methods. The specificity of sequences for a new candidate pathogen was confirmed by means of culture and by means of PCR, immunohistochemical, and serologic analyses. High-throughput sequencing yielded 103,632 sequences, of which 14 represented an Old World arenavirus. Additional sequence analysis showed that this new arenavirus was related to lymphocytic choriomeningitis viruses. Specific PCR assays based on a unique sequence confirmed the presence of the virus in the kidneys, liver, blood, and cerebrospinal fluid of the recipients. Immunohistochemical analysis revealed arenavirus antigen in the liver and kidney transplants in the recipients. IgM and IgG antiviral antibodies were detected in the serum of the donor. Seroconversion was evident in serum specimens obtained from one recipient at two time points. Unbiased high-throughput sequencing is a powerful tool for the discovery of pathogens. The use of this method during an outbreak of disease facilitated the identification of a new arenavirus transmitted through solid-organ transplantation. Copyright 2008 Massachusetts Medical Society.

  11. SINA: accurate high-throughput multiple sequence alignment of ribosomal RNA genes.

    PubMed

    Pruesse, Elmar; Peplies, Jörg; Glöckner, Frank Oliver

    2012-07-15

    In the analysis of homologous sequences, computation of multiple sequence alignments (MSAs) has become a bottleneck. This is especially troublesome for marker genes like the ribosomal RNA (rRNA) where already millions of sequences are publicly available and individual studies can easily produce hundreds of thousands of new sequences. Methods have been developed to cope with such numbers, but further improvements are needed to meet accuracy requirements. In this study, we present the SILVA Incremental Aligner (SINA) used to align the rRNA gene databases provided by the SILVA ribosomal RNA project. SINA uses a combination of k-mer searching and partial order alignment (POA) to maintain very high alignment accuracy while satisfying high throughput performance demands. SINA was evaluated in comparison with the commonly used high throughput MSA programs PyNAST and mothur. The three BRAliBase III benchmark MSAs could be reproduced with 99.3, 97.6 and 96.1 accuracy. A larger benchmark MSA comprising 38 772 sequences could be reproduced with 98.9 and 99.3% accuracy using reference MSAs comprising 1000 and 5000 sequences. SINA was able to achieve higher accuracy than PyNAST and mothur in all performed benchmarks. Alignment of up to 500 sequences using the latest SILVA SSU/LSU Ref datasets as reference MSA is offered at http://www.arb-silva.de/aligner. This page also links to Linux binaries, user manual and tutorial. SINA is made available under a personal use license.

  12. Exploring fungal diversity in deep-sea sediments from Okinawa Trough using high-throughput Illumina sequencing

    NASA Astrophysics Data System (ADS)

    Zhang, Xiao-Yong; Wang, Guang-Hua; Xu, Xin-Ya; Nong, Xu-Hua; Wang, Jie; Amin, Muhammad; Qi, Shu-Hua

    2016-10-01

    The present study investigated the fungal diversity in four different deep-sea sediments from Okinawa Trough using high-throughput Illumina sequencing of the nuclear ribosomal internal transcribed spacer-1 (ITS1). A total of 40,297 fungal ITS1 sequences clustered into 420 operational taxonomic units (OTUs) with 97% sequence similarity and 170 taxa were recovered from these sediments. Most ITS1 sequences (78%) belonged to the phylum Ascomycota, followed by Basidiomycota (17.3%), Zygomycota (1.5%) and Chytridiomycota (0.8%), and a small proportion (2.4%) belonged to unassigned fungal phyla. Compared with previous studies on fungal diversity of sediments from deep-sea environments by culture-dependent approach and clone library analysis, the present result suggested that Illumina sequencing had been dramatically accelerating the discovery of fungal community of deep-sea sediments. Furthermore, our results revealed that Sordariomycetes was the most diverse and abundant fungal class in this study, challenging the traditional view that the diversity of Sordariomycetes phylotypes was low in the deep-sea environments. In addition, more than 12 taxa accounted for 21.5% sequences were found to be rarely reported as deep-sea fungi, suggesting the deep-sea sediments from Okinawa Trough harbored a plethora of different fungal communities compared with other deep-sea environments. To our knowledge, this study is the first exploration of the fungal diversity in deep-sea sediments from Okinawa Trough using high-throughput Illumina sequencing.

  13. High-throughput sequencing reveals the core gut microbiome of Bar-headed goose (Anser indicus) in different wintering areas in Tibet.

    PubMed

    Wang, Wen; Cao, Jian; Yang, Fang; Wang, Xuelian; Zheng, Sisi; Sharshov, Kirill; Li, Laixing

    2016-04-01

    Elucidating the spatial dynamic and core gut microbiome related to wild bar-headed goose is of crucial importance for probiotics development that may meet the demands of bar-headed goose artificial breeding industries and accelerate the domestication of this species. However, the core microbial communities in the wild bar-headed geese remain totally unknown. Here, for the first time, we present a comprehensive survey of bar-headed geese gut microbial communities by Illumina high-throughput sequencing technology using nine individuals from three distinct wintering locations in Tibet. A total of 236,676 sequences were analyzed, and 607 OTUs were identified. We show that the gut microbial communities of bar-headed geese have representatives of 14 phyla and are dominated by Firmicutes, Proteobacteria, Actinobacteria, and Bacteroidetes. The additive abundance of these four most dominant phyla was above 96% across all the samples. At the genus level, the sequences represented 150 genera. A set of 19 genera were present in all samples and considered as core gut microbiome. The top seven most abundant core genera were distributed in that four dominant phyla. Among them, four genera (Lactococcus, Bacillus, Solibacillus, and Streptococcus) belonged to Firmicutes, while for other three phyla, each containing one genus, such as Proteobacteria (genus Pseudomonas), Actinobacteria (genus Arthrobacter), and Bacteroidetes (genus Bacteroides). This broad survey represents the most in-depth assessment, to date, of the gut microbes that associated with bar-headed geese. These data create a baseline for future bar-headed goose microbiology research, and make an original contribution to probiotics development for bar-headed goose artificial breeding industries. © 2015 The Authors. MicrobiologyOpen published by John Wiley & Sons Ltd.

  14. Unraveling Core Functional Microbiota in Traditional Solid-State Fermentation by High-Throughput Amplicons and Metatranscriptomics Sequencing.

    PubMed

    Song, Zhewei; Du, Hai; Zhang, Yan; Xu, Yan

    2017-01-01

    Fermentation microbiota is specific microorganisms that generate different types of metabolites in many productions. In traditional solid-state fermentation, the structural composition and functional capacity of the core microbiota determine the quality and quantity of products. As a typical example of food fermentation, Chinese Maotai-flavor liquor production involves a complex of various microorganisms and a wide variety of metabolites. However, the microbial succession and functional shift of the core microbiota in this traditional food fermentation remain unclear. Here, high-throughput amplicons (16S rRNA gene amplicon sequencing and internal transcribed space amplicon sequencing) and metatranscriptomics sequencing technologies were combined to reveal the structure and function of the core microbiota in Chinese soy sauce aroma type liquor production. In addition, ultra-performance liquid chromatography and headspace-solid phase microextraction-gas chromatography-mass spectrometry were employed to provide qualitative and quantitative analysis of the major flavor metabolites. A total of 10 fungal and 11 bacterial genera were identified as the core microbiota. In addition, metatranscriptomic analysis revealed pyruvate metabolism in yeasts (genera Pichia, Schizosaccharomyces, Saccharomyces , and Zygosaccharomyces ) and lactic acid bacteria (genus Lactobacillus ) classified into two stages in the production of flavor components. Stage I involved high-level alcohol (ethanol) production, with the genus Schizosaccharomyces serving as the core functional microorganism. Stage II involved high-level acid (lactic acid and acetic acid) production, with the genus Lactobacillus serving as the core functional microorganism. The functional shift from the genus Schizosaccharomyces to the genus Lactobacillus drives flavor component conversion from alcohol (ethanol) to acid (lactic acid and acetic acid) in Chinese Maotai-flavor liquor production. Our findings provide insight into the effects of the core functional microbiota in soy sauce aroma type liquor production and the characteristics of the fermentation microbiota under different environmental conditions.

  15. High-Throughput Sequencing of microRNAs in Peripheral Blood Mononuclear Cells: Identification of Potential Weight Loss Biomarkers

    PubMed Central

    Milagro, Fermín I.; Miranda, Jonatan; Portillo, María P.; Fernandez-Quintela, Alfredo; Campión, Javier; Martínez, J. Alfredo

    2013-01-01

    Introduction MicroRNAs (miRNAs) are being increasingly studied in relation to energy metabolism and body composition homeostasis. Indeed, the quantitative analysis of miRNAs expression in different adiposity conditions may contribute to understand the intimate mechanisms participating in body weight control and to find new biomarkers with diagnostic or prognostic value in obesity management. Objective The aim of this study was the search for miRNAs in blood cells whose expression could be used as prognostic biomarkers of weight loss. Methods Ten Caucasian obese women were selected among the participants in a weight-loss trial that consisted in following an energy-restricted treatment. Weight loss was considered unsuccessful when <5% of initial body weight (non-responders) and successful when >5% (responders). At baseline, total miRNA isolated from peripheral blood mononuclear cells (PBMC) was sequenced with SOLiD v4. The miRNA sequencing data were validated by RT-PCR. Results Differential baseline expression of several miRNAs was found between responders and non-responders. Two miRNAs were up-regulated in the non-responder group (mir-935 and mir-4772) and three others were down-regulated (mir-223, mir-224 and mir-376b). Both mir-935 and mir-4772 showed relevant associations with the magnitude of weight loss, although the expression of other transcripts (mir-874, mir-199b, mir-766, mir-589 and mir-148b) also correlated with weight loss. Conclusions This research addresses the use of high-throughput sequencing technologies in the search for miRNA expression biomarkers in obesity, by determining the miRNA transcriptome of PBMC. Basal expression of different miRNAs, particularly mir-935 and mir-4772, could be prognostic biomarkers and may forecast the response to a hypocaloric diet. PMID:23335998

  16. ISRNA: an integrative online toolkit for short reads from high-throughput sequencing data.

    PubMed

    Luo, Guan-Zheng; Yang, Wei; Ma, Ying-Ke; Wang, Xiu-Jie

    2014-02-01

    Integrative Short Reads NAvigator (ISRNA) is an online toolkit for analyzing high-throughput small RNA sequencing data. Besides the high-speed genome mapping function, ISRNA provides statistics for genomic location, length distribution and nucleotide composition bias analysis of sequence reads. Number of reads mapped to known microRNAs and other classes of short non-coding RNAs, coverage of short reads on genes, expression abundance of sequence reads as well as some other analysis functions are also supported. The versatile search functions enable users to select sequence reads according to their sub-sequences, expression abundance, genomic location, relationship to genes, etc. A specialized genome browser is integrated to visualize the genomic distribution of short reads. ISRNA also supports management and comparison among multiple datasets. ISRNA is implemented in Java/C++/Perl/MySQL and can be freely accessed at http://omicslab.genetics.ac.cn/ISRNA/.

  17. Sequencing and characterizing the genome of Estrella lausannensis as an undergraduate project: training students and biological insights.

    PubMed

    Bertelli, Claire; Aeby, Sébastien; Chassot, Bérénice; Clulow, James; Hilfiker, Olivier; Rappo, Samuel; Ritzmann, Sébastien; Schumacher, Paolo; Terrettaz, Céline; Benaglio, Paola; Falquet, Laurent; Farinelli, Laurent; Gharib, Walid H; Goesmann, Alexander; Harshman, Keith; Linke, Burkhard; Miyazaki, Ryo; Rivolta, Carlo; Robinson-Rechavi, Marc; van der Meer, Jan Roelof; Greub, Gilbert

    2015-01-01

    With the widespread availability of high-throughput sequencing technologies, sequencing projects have become pervasive in the molecular life sciences. The huge bulk of data generated daily must be analyzed further by biologists with skills in bioinformatics and by "embedded bioinformaticians," i.e., bioinformaticians integrated in wet lab research groups. Thus, students interested in molecular life sciences must be trained in the main steps of genomics: sequencing, assembly, annotation and analysis. To reach that goal, a practical course has been set up for master students at the University of Lausanne: the "Sequence a genome" class. At the beginning of the academic year, a few bacterial species whose genome is unknown are provided to the students, who sequence and assemble the genome(s) and perform manual annotation. Here, we report the progress of the first class from September 2010 to June 2011 and the results obtained by seven master students who specifically assembled and annotated the genome of Estrella lausannensis, an obligate intracellular bacterium related to Chlamydia. The draft genome of Estrella is composed of 29 scaffolds encompassing 2,819,825 bp that encode for 2233 putative proteins. Estrella also possesses a 9136 bp plasmid that encodes for 14 genes, among which we found an integrase and a toxin/antitoxin module. Like all other members of the Chlamydiales order, Estrella possesses a highly conserved type III secretion system, considered as a key virulence factor. The annotation of the Estrella genome also allowed the characterization of the metabolic abilities of this strictly intracellular bacterium. Altogether, the students provided the scientific community with the Estrella genome sequence and a preliminary understanding of the biology of this recently-discovered bacterial genus, while learning to use cutting-edge technologies for sequencing and to perform bioinformatics analyses.

  18. HLA genotyping by next-generation sequencing of complementary DNA.

    PubMed

    Segawa, Hidenobu; Kukita, Yoji; Kato, Kikuya

    2017-11-28

    Genotyping of the human leucocyte antigen (HLA) is indispensable for various medical treatments. However, unambiguous genotyping is technically challenging due to high polymorphism of the corresponding genomic region. Next-generation sequencing is changing the landscape of genotyping. In addition to high throughput of data, its additional advantage is that DNA templates are derived from single molecules, which is a strong merit for the phasing problem. Although most currently developed technologies use genomic DNA, use of cDNA could enable genotyping with reduced costs in data production and analysis. We thus developed an HLA genotyping system based on next-generation sequencing of cDNA. Each HLA gene was divided into 3 or 4 target regions subjected to PCR amplification and subsequent sequencing with Ion Torrent PGM. The sequence data were then subjected to an automated analysis. The principle of the analysis was to construct candidate sequences generated from all possible combinations of variable bases and arrange them in decreasing order of the number of reads. Upon collecting candidate sequences from all target regions, 2 haplotypes were usually assigned. Cases not assigned 2 haplotypes were forwarded to 4 additional processes: selection of candidate sequences applying more stringent criteria, removal of artificial haplotypes, selection of candidate sequences with a relaxed threshold for sequence matching, and countermeasure for incomplete sequences in the HLA database. The genotyping system was evaluated using 30 samples; the overall accuracy was 97.0% at the field 3 level and 98.3% at the G group level. With one sample, genotyping of DPB1 was not completed due to short read size. We then developed a method for complete sequencing of individual molecules of the DPB1 gene, using the molecular barcode technology. The performance of the automatic genotyping system was comparable to that of systems developed in previous studies. Thus, next-generation sequencing of cDNA is a viable option for HLA genotyping.

  19. REDItools: high-throughput RNA editing detection made easy.

    PubMed

    Picardi, Ernesto; Pesole, Graziano

    2013-07-15

    The reliable detection of RNA editing sites from massive sequencing data remains challenging and, although several methodologies have been proposed, no computational tools have been released to date. Here, we introduce REDItools a suite of python scripts to perform high-throughput investigation of RNA editing using next-generation sequencing data. REDItools are in python programming language and freely available at http://code.google.com/p/reditools/. ernesto.picardi@uniba.it or graziano.pesole@uniba.it Supplementary data are available at Bioinformatics online.

  20. High throughput protein production screening

    DOEpatents

    Beernink, Peter T [Walnut Creek, CA; Coleman, Matthew A [Oakland, CA; Segelke, Brent W [San Ramon, CA

    2009-09-08

    Methods, compositions, and kits for the cell-free production and analysis of proteins are provided. The invention allows for the production of proteins from prokaryotic sequences or eukaryotic sequences, including human cDNAs using PCR and IVT methods and detecting the proteins through fluorescence or immunoblot techniques. This invention can be used to identify optimized PCR and WT conditions, codon usages and mutations. The methods are readily automated and can be used for high throughput analysis of protein expression levels, interactions, and functional states.

Top