Athavale, Ajay
2018-01-04
Ajay Athavale (Monsanto) presents "High Throughput Plasmid Sequencing with Illumina and CLC Bio" at the 7th Annual Sequencing, Finishing, Analysis in the Future (SFAF) Meeting held in June, 2012 in Santa Fe, NM.
BiQ Analyzer HT: locus-specific analysis of DNA methylation by high-throughput bisulfite sequencing
Lutsik, Pavlo; Feuerbach, Lars; Arand, Julia; Lengauer, Thomas; Walter, Jörn; Bock, Christoph
2011-01-01
Bisulfite sequencing is a widely used method for measuring DNA methylation in eukaryotic genomes. The assay provides single-base pair resolution and, given sufficient sequencing depth, its quantitative accuracy is excellent. High-throughput sequencing of bisulfite-converted DNA can be applied either genome wide or targeted to a defined set of genomic loci (e.g. using locus-specific PCR primers or DNA capture probes). Here, we describe BiQ Analyzer HT (http://biq-analyzer-ht.bioinf.mpi-inf.mpg.de/), a user-friendly software tool that supports locus-specific analysis and visualization of high-throughput bisulfite sequencing data. The software facilitates the shift from time-consuming clonal bisulfite sequencing to the more quantitative and cost-efficient use of high-throughput sequencing for studying locus-specific DNA methylation patterns. In addition, it is useful for locus-specific visualization of genome-wide bisulfite sequencing data. PMID:21565797
[Current applications of high-throughput DNA sequencing technology in antibody drug research].
Yu, Xin; Liu, Qi-Gang; Wang, Ming-Rong
2012-03-01
Since the publication of a high-throughput DNA sequencing technology based on PCR reaction was carried out in oil emulsions in 2005, high-throughput DNA sequencing platforms have been evolved to a robust technology in sequencing genomes and diverse DNA libraries. Antibody libraries with vast numbers of members currently serve as a foundation of discovering novel antibody drugs, and high-throughput DNA sequencing technology makes it possible to rapidly identify functional antibody variants with desired properties. Herein we present a review of current applications of high-throughput DNA sequencing technology in the analysis of antibody library diversity, sequencing of CDR3 regions, identification of potent antibodies based on sequence frequency, discovery of functional genes, and combination with various display technologies, so as to provide an alternative approach of discovery and development of antibody drugs.
High Throughput Sequence Analysis for Disease Resistance in Maize
USDA-ARS?s Scientific Manuscript database
Preliminary results of a computational analysis of high throughput sequencing data from Zea mays and the fungus Aspergillus are reported. The Illumina Genome Analyzer was used to sequence RNA samples from two strains of Z. mays (Va35 and Mp313) collected over a time course as well as several specie...
USDA-ARS?s Scientific Manuscript database
Contigs with sequence similarities to several nucleorhabdoviruses were identified by high-throughput sequencing analysis from a black currant (Ribes nigrum L.) cultivar. The complete genomic sequence of this new nucleorhabdovirus is 14,432 nucleotides. Its genomic organization is typical of nucleorh...
Molecular characterization of a novel Luteovirus from peach identified by high-throughput sequencing
USDA-ARS?s Scientific Manuscript database
Contigs with sequence homologies to Cherry-associated luteovirus were identified by high-throughput sequencing analysis of two peach accessions undergoing quarantine testing. The complete genomic sequences of the two isolates of this virus are 5,819 and 5,814 nucleotides. Their genome organization i...
Advances in high throughput DNA sequence data compression.
Sardaraz, Muhammad; Tahir, Muhammad; Ikram, Ataul Aziz
2016-06-01
Advances in high throughput sequencing technologies and reduction in cost of sequencing have led to exponential growth in high throughput DNA sequence data. This growth has posed challenges such as storage, retrieval, and transmission of sequencing data. Data compression is used to cope with these challenges. Various methods have been developed to compress genomic and sequencing data. In this article, we present a comprehensive review of compression methods for genome and reads compression. Algorithms are categorized as referential or reference free. Experimental results and comparative analysis of various methods for data compression are presented. Finally, key challenges and research directions in DNA sequence data compression are highlighted.
USDA-ARS?s Scientific Manuscript database
Recent developments in high-throughput sequencing technology have made low-cost sequencing an attractive approach for many genome analysis tasks. Increasing read lengths, improving quality and the production of increasingly larger numbers of usable sequences per instrument-run continue to make whole...
Pediatric Glioblastoma Therapies Based on Patient-Derived Stem Cell Resources
2014-11-01
genomic DNA and then subjected to Illumina high-throughput sequencing . In this analysis, shRNAs lost in the GSC population represent candidate gene...and genomic DNA and then subjected to Illumina high-throughput sequencing . In this analysis, shRNAs lost in the GSC population represent candidate...PRISM 7900 Sequence Detection System ( Genomics Resource, FHCRC). Relative transcript abundance was analyzed using the 2−ΔΔCt method. TRIzol (Invitrogen
Deciphering the genomic targets of alkylating polyamide conjugates using high-throughput sequencing
Chandran, Anandhakumar; Syed, Junetha; Taylor, Rhys D.; Kashiwazaki, Gengo; Sato, Shinsuke; Hashiya, Kaori; Bando, Toshikazu; Sugiyama, Hiroshi
2016-01-01
Chemically engineered small molecules targeting specific genomic sequences play an important role in drug development research. Pyrrole-imidazole polyamides (PIPs) are a group of molecules that can bind to the DNA minor-groove and can be engineered to target specific sequences. Their biological effects rely primarily on their selective DNA binding. However, the binding mechanism of PIPs at the chromatinized genome level is poorly understood. Herein, we report a method using high-throughput sequencing to identify the DNA-alkylating sites of PIP-indole-seco-CBI conjugates. High-throughput sequencing analysis of conjugate 2 showed highly similar DNA-alkylating sites on synthetic oligos (histone-free DNA) and on human genomes (chromatinized DNA context). To our knowledge, this is the first report identifying alkylation sites across genomic DNA by alkylating PIP conjugates using high-throughput sequencing. PMID:27098039
High-throughput sequencing: a failure mode analysis.
Yang, George S; Stott, Jeffery M; Smailus, Duane; Barber, Sarah A; Balasundaram, Miruna; Marra, Marco A; Holt, Robert A
2005-01-04
Basic manufacturing principles are becoming increasingly important in high-throughput sequencing facilities where there is a constant drive to increase quality, increase efficiency, and decrease operating costs. While high-throughput centres report failure rates typically on the order of 10%, the causes of sporadic sequencing failures are seldom analyzed in detail and have not, in the past, been formally reported. Here we report the results of a failure mode analysis of our production sequencing facility based on detailed evaluation of 9,216 ESTs generated from two cDNA libraries. Two categories of failures are described; process-related failures (failures due to equipment or sample handling) and template-related failures (failures that are revealed by close inspection of electropherograms and are likely due to properties of the template DNA sequence itself). Preventative action based on a detailed understanding of failure modes is likely to improve the performance of other production sequencing pipelines.
The promise and challenge of high-throughput sequencing of the antibody repertoire
Georgiou, George; Ippolito, Gregory C; Beausang, John; Busse, Christian E; Wardemann, Hedda; Quake, Stephen R
2014-01-01
Efforts to determine the antibody repertoire encoded by B cells in the blood or lymphoid organs using high-throughput DNA sequencing technologies have been advancing at an extremely rapid pace and are transforming our understanding of humoral immune responses. Information gained from high-throughput DNA sequencing of immunoglobulin genes (Ig-seq) can be applied to detect B-cell malignancies with high sensitivity, to discover antibodies specific for antigens of interest, to guide vaccine development and to understand autoimmunity. Rapid progress in the development of experimental protocols and informatics analysis tools is helping to reduce sequencing artifacts, to achieve more precise quantification of clonal diversity and to extract the most pertinent biological information. That said, broader application of Ig-seq, especially in clinical settings, will require the development of a standardized experimental design framework that will enable the sharing and meta-analysis of sequencing data generated by different laboratories. PMID:24441474
Analysis of Illumina Microbial Assemblies
DOE Office of Scientific and Technical Information (OSTI.GOV)
Clum, Alicia; Foster, Brian; Froula, Jeff
2010-05-28
Since the emerging of second generation sequencing technologies, the evaluation of different sequencing approaches and their assembly strategies for different types of genomes has become an important undertaken. Next generation sequencing technologies dramatically increase sequence throughput while decreasing cost, making them an attractive tool for whole genome shotgun sequencing. To compare different approaches for de-novo whole genome assembly, appropriate tools and a solid understanding of both quantity and quality of the underlying sequence data are crucial. Here, we performed an in-depth analysis of short-read Illumina sequence assembly strategies for bacterial and archaeal genomes. Different types of Illumina libraries as wellmore » as different trim parameters and assemblers were evaluated. Results of the comparative analysis and sequencing platforms will be presented. The goal of this analysis is to develop a cost-effective approach for the increased throughput of the generation of high quality microbial genomes.« less
Overcoming bias and systematic errors in next generation sequencing data.
Taub, Margaret A; Corrada Bravo, Hector; Irizarry, Rafael A
2010-12-10
Considerable time and effort has been spent in developing analysis and quality assessment methods to allow the use of microarrays in a clinical setting. As is the case for microarrays and other high-throughput technologies, data from new high-throughput sequencing technologies are subject to technological and biological biases and systematic errors that can impact downstream analyses. Only when these issues can be readily identified and reliably adjusted for will clinical applications of these new technologies be feasible. Although much work remains to be done in this area, we describe consistently observed biases that should be taken into account when analyzing high-throughput sequencing data. In this article, we review current knowledge about these biases, discuss their impact on analysis results, and propose solutions.
Kizaki, Seiichiro; Chandran, Anandhakumar; Sugiyama, Hiroshi
2016-03-02
Tet (ten-eleven translocation) family proteins have the ability to oxidize 5-methylcytosine (mC) to 5-hydroxymethylcytosine (hmC), 5-formylcytosine (fC), and 5-carboxycytosine (caC). However, the oxidation reaction of Tet is not understood completely. Evaluation of genomic-level epigenetic changes by Tet protein requires unbiased identification of the highly selective oxidation sites. In this study, we used high-throughput sequencing to investigate the sequence specificity of mC oxidation by Tet1. A 6.6×10(4) -member mC-containing random DNA-sequence library was constructed. The library was subjected to Tet-reactive pulldown followed by high-throughput sequencing. Analysis of the obtained sequence data identified the Tet1-reactive sequences. We identified mCpG as a highly reactive sequence of Tet1 protein. © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Loeffler 4.0: Diagnostic Metagenomics.
Höper, Dirk; Wylezich, Claudia; Beer, Martin
2017-01-01
A new world of possibilities for "virus discovery" was opened up with high-throughput sequencing becoming available in the last decade. While scientifically metagenomic analysis was established before the start of the era of high-throughput sequencing, the availability of the first second-generation sequencers was the kick-off for diagnosticians to use sequencing for the detection of novel pathogens. Today, diagnostic metagenomics is becoming the standard procedure for the detection and genetic characterization of new viruses or novel virus variants. Here, we provide an overview about technical considerations of high-throughput sequencing-based diagnostic metagenomics together with selected examples of "virus discovery" for animal diseases or zoonoses and metagenomics for food safety or basic veterinary research. © 2017 Elsevier Inc. All rights reserved.
Li, Zhoufang; Liu, Guangjie; Tong, Yin; Zhang, Meng; Xu, Ying; Qin, Li; Wang, Zhanhui; Chen, Xiaoping; He, Jiankui
2015-01-01
Profiling immune repertoires by high throughput sequencing enhances our understanding of immune system complexity and immune-related diseases in humans. Previously, cloning and Sanger sequencing identified limited numbers of T cell receptor (TCR) nucleotide sequences in rhesus monkeys, thus their full immune repertoire is unknown. We applied multiplex PCR and Illumina high throughput sequencing to study the TCRβ of rhesus monkeys. We identified 1.26 million TCRβ sequences corresponding to 643,570 unique TCRβ sequences and 270,557 unique complementarity-determining region 3 (CDR3) gene sequences. Precise measurements of CDR3 length distribution, CDR3 amino acid distribution, length distribution of N nucleotide of junctional region, and TCRV and TCRJ gene usage preferences were performed. A comprehensive profile of rhesus monkey immune repertoire might aid human infectious disease studies using rhesus monkeys. PMID:25961410
Pair-barcode high-throughput sequencing for large-scale multiplexed sample analysis
2012-01-01
Background The multiplexing becomes the major limitation of the next-generation sequencing (NGS) in application to low complexity samples. Physical space segregation allows limited multiplexing, while the existing barcode approach only permits simultaneously analysis of up to several dozen samples. Results Here we introduce pair-barcode sequencing (PBS), an economic and flexible barcoding technique that permits parallel analysis of large-scale multiplexed samples. In two pilot runs using SOLiD sequencer (Applied Biosystems Inc.), 32 independent pair-barcoded miRNA libraries were simultaneously discovered by the combination of 4 unique forward barcodes and 8 unique reverse barcodes. Over 174,000,000 reads were generated and about 64% of them are assigned to both of the barcodes. After mapping all reads to pre-miRNAs in miRBase, different miRNA expression patterns are captured from the two clinical groups. The strong correlation using different barcode pairs and the high consistency of miRNA expression in two independent runs demonstrates that PBS approach is valid. Conclusions By employing PBS approach in NGS, large-scale multiplexed pooled samples could be practically analyzed in parallel so that high-throughput sequencing economically meets the requirements of samples which are low sequencing throughput demand. PMID:22276739
Pair-barcode high-throughput sequencing for large-scale multiplexed sample analysis.
Tu, Jing; Ge, Qinyu; Wang, Shengqin; Wang, Lei; Sun, Beili; Yang, Qi; Bai, Yunfei; Lu, Zuhong
2012-01-25
The multiplexing becomes the major limitation of the next-generation sequencing (NGS) in application to low complexity samples. Physical space segregation allows limited multiplexing, while the existing barcode approach only permits simultaneously analysis of up to several dozen samples. Here we introduce pair-barcode sequencing (PBS), an economic and flexible barcoding technique that permits parallel analysis of large-scale multiplexed samples. In two pilot runs using SOLiD sequencer (Applied Biosystems Inc.), 32 independent pair-barcoded miRNA libraries were simultaneously discovered by the combination of 4 unique forward barcodes and 8 unique reverse barcodes. Over 174,000,000 reads were generated and about 64% of them are assigned to both of the barcodes. After mapping all reads to pre-miRNAs in miRBase, different miRNA expression patterns are captured from the two clinical groups. The strong correlation using different barcode pairs and the high consistency of miRNA expression in two independent runs demonstrates that PBS approach is valid. By employing PBS approach in NGS, large-scale multiplexed pooled samples could be practically analyzed in parallel so that high-throughput sequencing economically meets the requirements of samples which are low sequencing throughput demand.
AmpliVar: mutation detection in high-throughput sequence from amplicon-based libraries.
Hsu, Arthur L; Kondrashova, Olga; Lunke, Sebastian; Love, Clare J; Meldrum, Cliff; Marquis-Nicholson, Renate; Corboy, Greg; Pham, Kym; Wakefield, Matthew; Waring, Paul M; Taylor, Graham R
2015-04-01
Conventional means of identifying variants in high-throughput sequencing align each read against a reference sequence, and then call variants at each position. Here, we demonstrate an orthogonal means of identifying sequence variation by grouping the reads as amplicons prior to any alignment. We used AmpliVar to make key-value hashes of sequence reads and group reads as individual amplicons using a table of flanking sequences. Low-abundance reads were removed according to a selectable threshold, and reads above this threshold were aligned as groups, rather than as individual reads, permitting the use of sensitive alignment tools. We show that this approach is more sensitive, more specific, and more computationally efficient than comparable methods for the analysis of amplicon-based high-throughput sequencing data. The method can be extended to enable alignment-free confirmation of variants seen in hybridization capture target-enrichment data. © 2015 WILEY PERIODICALS, INC.
2014-01-01
Background RNA sequencing (RNA-seq) is emerging as a critical approach in biological research. However, its high-throughput advantage is significantly limited by the capacity of bioinformatics tools. The research community urgently needs user-friendly tools to efficiently analyze the complicated data generated by high throughput sequencers. Results We developed a standalone tool with graphic user interface (GUI)-based analytic modules, known as eRNA. The capacity of performing parallel processing and sample management facilitates large data analyses by maximizing hardware usage and freeing users from tediously handling sequencing data. The module miRNA identification” includes GUIs for raw data reading, adapter removal, sequence alignment, and read counting. The module “mRNA identification” includes GUIs for reference sequences, genome mapping, transcript assembling, and differential expression. The module “Target screening” provides expression profiling analyses and graphic visualization. The module “Self-testing” offers the directory setups, sample management, and a check for third-party package dependency. Integration of other GUIs including Bowtie, miRDeep2, and miRspring extend the program’s functionality. Conclusions eRNA focuses on the common tools required for the mapping and quantification analysis of miRNA-seq and mRNA-seq data. The software package provides an additional choice for scientists who require a user-friendly computing environment and high-throughput capacity for large data analysis. eRNA is available for free download at https://sourceforge.net/projects/erna/?source=directory. PMID:24593312
Yuan, Tiezheng; Huang, Xiaoyi; Dittmar, Rachel L; Du, Meijun; Kohli, Manish; Boardman, Lisa; Thibodeau, Stephen N; Wang, Liang
2014-03-05
RNA sequencing (RNA-seq) is emerging as a critical approach in biological research. However, its high-throughput advantage is significantly limited by the capacity of bioinformatics tools. The research community urgently needs user-friendly tools to efficiently analyze the complicated data generated by high throughput sequencers. We developed a standalone tool with graphic user interface (GUI)-based analytic modules, known as eRNA. The capacity of performing parallel processing and sample management facilitates large data analyses by maximizing hardware usage and freeing users from tediously handling sequencing data. The module miRNA identification" includes GUIs for raw data reading, adapter removal, sequence alignment, and read counting. The module "mRNA identification" includes GUIs for reference sequences, genome mapping, transcript assembling, and differential expression. The module "Target screening" provides expression profiling analyses and graphic visualization. The module "Self-testing" offers the directory setups, sample management, and a check for third-party package dependency. Integration of other GUIs including Bowtie, miRDeep2, and miRspring extend the program's functionality. eRNA focuses on the common tools required for the mapping and quantification analysis of miRNA-seq and mRNA-seq data. The software package provides an additional choice for scientists who require a user-friendly computing environment and high-throughput capacity for large data analysis. eRNA is available for free download at https://sourceforge.net/projects/erna/?source=directory.
David, Fabrice P A; Delafontaine, Julien; Carat, Solenne; Ross, Frederick J; Lefebvre, Gregory; Jarosz, Yohan; Sinclair, Lucas; Noordermeer, Daan; Rougemont, Jacques; Leleu, Marion
2014-01-01
The HTSstation analysis portal is a suite of simple web forms coupled to modular analysis pipelines for various applications of High-Throughput Sequencing including ChIP-seq, RNA-seq, 4C-seq and re-sequencing. HTSstation offers biologists the possibility to rapidly investigate their HTS data using an intuitive web application with heuristically pre-defined parameters. A number of open-source software components have been implemented and can be used to build, configure and run HTS analysis pipelines reactively. Besides, our programming framework empowers developers with the possibility to design their own workflows and integrate additional third-party software. The HTSstation web application is accessible at http://htsstation.epfl.ch.
HTSstation: A Web Application and Open-Access Libraries for High-Throughput Sequencing Data Analysis
David, Fabrice P. A.; Delafontaine, Julien; Carat, Solenne; Ross, Frederick J.; Lefebvre, Gregory; Jarosz, Yohan; Sinclair, Lucas; Noordermeer, Daan; Rougemont, Jacques; Leleu, Marion
2014-01-01
The HTSstation analysis portal is a suite of simple web forms coupled to modular analysis pipelines for various applications of High-Throughput Sequencing including ChIP-seq, RNA-seq, 4C-seq and re-sequencing. HTSstation offers biologists the possibility to rapidly investigate their HTS data using an intuitive web application with heuristically pre-defined parameters. A number of open-source software components have been implemented and can be used to build, configure and run HTS analysis pipelines reactively. Besides, our programming framework empowers developers with the possibility to design their own workflows and integrate additional third-party software. The HTSstation web application is accessible at http://htsstation.epfl.ch. PMID:24475057
High-throughput sequence alignment using Graphics Processing Units
Schatz, Michael C; Trapnell, Cole; Delcher, Arthur L; Varshney, Amitabh
2007-01-01
Background The recent availability of new, less expensive high-throughput DNA sequencing technologies has yielded a dramatic increase in the volume of sequence data that must be analyzed. These data are being generated for several purposes, including genotyping, genome resequencing, metagenomics, and de novo genome assembly projects. Sequence alignment programs such as MUMmer have proven essential for analysis of these data, but researchers will need ever faster, high-throughput alignment tools running on inexpensive hardware to keep up with new sequence technologies. Results This paper describes MUMmerGPU, an open-source high-throughput parallel pairwise local sequence alignment program that runs on commodity Graphics Processing Units (GPUs) in common workstations. MUMmerGPU uses the new Compute Unified Device Architecture (CUDA) from nVidia to align multiple query sequences against a single reference sequence stored as a suffix tree. By processing the queries in parallel on the highly parallel graphics card, MUMmerGPU achieves more than a 10-fold speedup over a serial CPU version of the sequence alignment kernel, and outperforms the exact alignment component of MUMmer on a high end CPU by 3.5-fold in total application time when aligning reads from recent sequencing projects using Solexa/Illumina, 454, and Sanger sequencing technologies. Conclusion MUMmerGPU is a low cost, ultra-fast sequence alignment program designed to handle the increasing volume of data produced by new, high-throughput sequencing technologies. MUMmerGPU demonstrates that even memory-intensive applications can run significantly faster on the relatively low-cost GPU than on the CPU. PMID:18070356
"First generation" automated DNA sequencing technology.
Slatko, Barton E; Kieleczawa, Jan; Ju, Jingyue; Gardner, Andrew F; Hendrickson, Cynthia L; Ausubel, Frederick M
2011-10-01
Beginning in the 1980s, automation of DNA sequencing has greatly increased throughput, reduced costs, and enabled large projects to be completed more easily. The development of automation technology paralleled the development of other aspects of DNA sequencing: better enzymes and chemistry, separation and imaging technology, sequencing protocols, robotics, and computational advancements (including base-calling algorithms with quality scores, database developments, and sequence analysis programs). Despite the emergence of high-throughput sequencing platforms, automated Sanger sequencing technology remains useful for many applications. This unit provides background and a description of the "First-Generation" automated DNA sequencing technology. It also includes protocols for using the current Applied Biosystems (ABI) automated DNA sequencing machines. © 2011 by John Wiley & Sons, Inc.
Short-read, high-throughput sequencing technology for STR genotyping
Bornman, Daniel M.; Hester, Mark E.; Schuetter, Jared M.; Kasoji, Manjula D.; Minard-Smith, Angela; Barden, Curt A.; Nelson, Scott C.; Godbold, Gene D.; Baker, Christine H.; Yang, Boyu; Walther, Jacquelyn E.; Tornes, Ivan E.; Yan, Pearlly S.; Rodriguez, Benjamin; Bundschuh, Ralf; Dickens, Michael L.; Young, Brian A.; Faith, Seth A.
2013-01-01
DNA-based methods for human identification principally rely upon genotyping of short tandem repeat (STR) loci. Electrophoretic-based techniques for variable-length classification of STRs are universally utilized, but are limited in that they have relatively low throughput and do not yield nucleotide sequence information. High-throughput sequencing technology may provide a more powerful instrument for human identification, but is not currently validated for forensic casework. Here, we present a systematic method to perform high-throughput genotyping analysis of the Combined DNA Index System (CODIS) STR loci using short-read (150 bp) massively parallel sequencing technology. Open source reference alignment tools were optimized to evaluate PCR-amplified STR loci using a custom designed STR genome reference. Evaluation of this approach demonstrated that the 13 CODIS STR loci and amelogenin (AMEL) locus could be accurately called from individual and mixture samples. Sensitivity analysis showed that as few as 18,500 reads, aligned to an in silico referenced genome, were required to genotype an individual (>99% confidence) for the CODIS loci. The power of this technology was further demonstrated by identification of variant alleles containing single nucleotide polymorphisms (SNPs) and the development of quantitative measurements (reads) for resolving mixed samples. PMID:25621315
Wu, L-P; Yang, T; Liu, H-W; Postman, J; Li, R
2018-05-01
A large contig with sequence similarities to several nucleorhabdoviruses was identified by high-throughput sequencing analysis from a black currant (Ribes nigrum L.) cultivar. The complete genome sequence of this new nucleorhabdovirus is 14,432 nucleotides long. Its genomic organization is very similar to those of unsegmented plant rhabdoviruses, containing six open reading frames in the order 3'-N-P-P3-M-G-L-5. The virus, which is provisionally named "black currant-associated rhabdovirus", is 41-52% identical in its genome nucleotide sequence to other nucleorhabdoviruses and may represent a new species in the genus Nucleorhabdovirus.
High throughput protein production screening
Beernink, Peter T [Walnut Creek, CA; Coleman, Matthew A [Oakland, CA; Segelke, Brent W [San Ramon, CA
2009-09-08
Methods, compositions, and kits for the cell-free production and analysis of proteins are provided. The invention allows for the production of proteins from prokaryotic sequences or eukaryotic sequences, including human cDNAs using PCR and IVT methods and detecting the proteins through fluorescence or immunoblot techniques. This invention can be used to identify optimized PCR and WT conditions, codon usages and mutations. The methods are readily automated and can be used for high throughput analysis of protein expression levels, interactions, and functional states.
Besaratinia, Ahmad; Li, Haiqing; Yoon, Jae-In; Zheng, Albert; Gao, Hanlin; Tommasi, Stella
2012-01-01
Many carcinogens leave a unique mutational fingerprint in the human genome. These mutational fingerprints manifest as specific types of mutations often clustering at certain genomic loci in tumor genomes from carcinogen-exposed individuals. To develop a high-throughput method for detecting the mutational fingerprint of carcinogens, we have devised a cost-, time- and labor-effective strategy, in which the widely used transgenic Big Blue® mouse mutation detection assay is made compatible with the Roche/454 Genome Sequencer FLX Titanium next-generation sequencing technology. As proof of principle, we have used this novel method to establish the mutational fingerprints of three prominent carcinogens with varying mutagenic potencies, including sunlight ultraviolet radiation, 4-aminobiphenyl and secondhand smoke that are known to be strong, moderate and weak mutagens, respectively. For verification purposes, we have compared the mutational fingerprints of these carcinogens obtained by our newly developed method with those obtained by parallel analyses using the conventional low-throughput approach, that is, standard mutation detection assay followed by direct DNA sequencing using a capillary DNA sequencer. We demonstrate that this high-throughput next-generation sequencing-based method is highly specific and sensitive to detect the mutational fingerprints of the tested carcinogens. The method is reproducible, and its accuracy is comparable with that of the currently available low-throughput method. In conclusion, this novel method has the potential to move the field of carcinogenesis forward by allowing high-throughput analysis of mutations induced by endogenous and/or exogenous genotoxic agents. PMID:22735701
Besaratinia, Ahmad; Li, Haiqing; Yoon, Jae-In; Zheng, Albert; Gao, Hanlin; Tommasi, Stella
2012-08-01
Many carcinogens leave a unique mutational fingerprint in the human genome. These mutational fingerprints manifest as specific types of mutations often clustering at certain genomic loci in tumor genomes from carcinogen-exposed individuals. To develop a high-throughput method for detecting the mutational fingerprint of carcinogens, we have devised a cost-, time- and labor-effective strategy, in which the widely used transgenic Big Blue mouse mutation detection assay is made compatible with the Roche/454 Genome Sequencer FLX Titanium next-generation sequencing technology. As proof of principle, we have used this novel method to establish the mutational fingerprints of three prominent carcinogens with varying mutagenic potencies, including sunlight ultraviolet radiation, 4-aminobiphenyl and secondhand smoke that are known to be strong, moderate and weak mutagens, respectively. For verification purposes, we have compared the mutational fingerprints of these carcinogens obtained by our newly developed method with those obtained by parallel analyses using the conventional low-throughput approach, that is, standard mutation detection assay followed by direct DNA sequencing using a capillary DNA sequencer. We demonstrate that this high-throughput next-generation sequencing-based method is highly specific and sensitive to detect the mutational fingerprints of the tested carcinogens. The method is reproducible, and its accuracy is comparable with that of the currently available low-throughput method. In conclusion, this novel method has the potential to move the field of carcinogenesis forward by allowing high-throughput analysis of mutations induced by endogenous and/or exogenous genotoxic agents.
Tome, Jacob M; Ozer, Abdullah; Pagano, John M; Gheba, Dan; Schroth, Gary P; Lis, John T
2014-06-01
RNA-protein interactions play critical roles in gene regulation, but methods to quantitatively analyze these interactions at a large scale are lacking. We have developed a high-throughput sequencing-RNA affinity profiling (HiTS-RAP) assay by adapting a high-throughput DNA sequencer to quantify the binding of fluorescently labeled protein to millions of RNAs anchored to sequenced cDNA templates. Using HiTS-RAP, we measured the affinity of mutagenized libraries of GFP-binding and NELF-E-binding aptamers to their respective targets and identified critical regions of interaction. Mutations additively affected the affinity of the NELF-E-binding aptamer, whose interaction depended mainly on a single-stranded RNA motif, but not that of the GFP aptamer, whose interaction depended primarily on secondary structure.
Fujimori, Shigeo; Hirai, Naoya; Ohashi, Hiroyuki; Masuoka, Kazuyo; Nishikimi, Akihiko; Fukui, Yoshinori; Washio, Takanori; Oshikubo, Tomohiro; Yamashita, Tatsuhiro; Miyamoto-Sato, Etsuko
2012-01-01
Next-generation sequencing (NGS) has been applied to various kinds of omics studies, resulting in many biological and medical discoveries. However, high-throughput protein-protein interactome datasets derived from detection by sequencing are scarce, because protein-protein interaction analysis requires many cell manipulations to examine the interactions. The low reliability of the high-throughput data is also a problem. Here, we describe a cell-free display technology combined with NGS that can improve both the coverage and reliability of interactome datasets. The completely cell-free method gives a high-throughput and a large detection space, testing the interactions without using clones. The quantitative information provided by NGS reduces the number of false positives. The method is suitable for the in vitro detection of proteins that interact not only with the bait protein, but also with DNA, RNA and chemical compounds. Thus, it could become a universal approach for exploring the large space of protein sequences and interactome networks. PMID:23056904
[Genetic analysis of two children patients affected with CHARGE syndrome].
Li, Guoqiang; Li, Niu; Xu, Yufei; Li, Juan; Ding, Yu; Shen, Yiping; Wang, Xiumin; Wang, Jian
2018-04-10
To analyze two Chinese pediatric patients with multiple malformations and growth and development delay. Both patients were subjected to targeted gene sequencing, and the results were analyzed with Ingenuity Variant Analysis software. Suspected pathogenic variations were verified by Sanger sequencing. High-throughput sequencing showed that both patients have carried heterozygous variants of the CHD7 gene. Patient 1 carried a nonsense mutation in exon 36 (c.7957C>T, p.Arg2653*), while patient 2 carried a nonsense mutation of exon 2 (c.718C>T, p.Gln240*). Sanger sequencing confirmed the above mutations in both patients, while their parents were of wild-type for the corresponding sites, indicating that the two mutations have happened de novo. Two patients were diagnosed with CHARGE syndrome by high-throughput sequencing.
Nagasaki, Hideki; Mochizuki, Takako; Kodama, Yuichi; Saruhashi, Satoshi; Morizaki, Shota; Sugawara, Hideaki; Ohyanagi, Hajime; Kurata, Nori; Okubo, Kousaku; Takagi, Toshihisa; Kaminuma, Eli; Nakamura, Yasukazu
2013-08-01
High-performance next-generation sequencing (NGS) technologies are advancing genomics and molecular biological research. However, the immense amount of sequence data requires computational skills and suitable hardware resources that are a challenge to molecular biologists. The DNA Data Bank of Japan (DDBJ) of the National Institute of Genetics (NIG) has initiated a cloud computing-based analytical pipeline, the DDBJ Read Annotation Pipeline (DDBJ Pipeline), for a high-throughput annotation of NGS reads. The DDBJ Pipeline offers a user-friendly graphical web interface and processes massive NGS datasets using decentralized processing by NIG supercomputers currently free of charge. The proposed pipeline consists of two analysis components: basic analysis for reference genome mapping and de novo assembly and subsequent high-level analysis of structural and functional annotations. Users may smoothly switch between the two components in the pipeline, facilitating web-based operations on a supercomputer for high-throughput data analysis. Moreover, public NGS reads of the DDBJ Sequence Read Archive located on the same supercomputer can be imported into the pipeline through the input of only an accession number. This proposed pipeline will facilitate research by utilizing unified analytical workflows applied to the NGS data. The DDBJ Pipeline is accessible at http://p.ddbj.nig.ac.jp/.
Nagasaki, Hideki; Mochizuki, Takako; Kodama, Yuichi; Saruhashi, Satoshi; Morizaki, Shota; Sugawara, Hideaki; Ohyanagi, Hajime; Kurata, Nori; Okubo, Kousaku; Takagi, Toshihisa; Kaminuma, Eli; Nakamura, Yasukazu
2013-01-01
High-performance next-generation sequencing (NGS) technologies are advancing genomics and molecular biological research. However, the immense amount of sequence data requires computational skills and suitable hardware resources that are a challenge to molecular biologists. The DNA Data Bank of Japan (DDBJ) of the National Institute of Genetics (NIG) has initiated a cloud computing-based analytical pipeline, the DDBJ Read Annotation Pipeline (DDBJ Pipeline), for a high-throughput annotation of NGS reads. The DDBJ Pipeline offers a user-friendly graphical web interface and processes massive NGS datasets using decentralized processing by NIG supercomputers currently free of charge. The proposed pipeline consists of two analysis components: basic analysis for reference genome mapping and de novo assembly and subsequent high-level analysis of structural and functional annotations. Users may smoothly switch between the two components in the pipeline, facilitating web-based operations on a supercomputer for high-throughput data analysis. Moreover, public NGS reads of the DDBJ Sequence Read Archive located on the same supercomputer can be imported into the pipeline through the input of only an accession number. This proposed pipeline will facilitate research by utilizing unified analytical workflows applied to the NGS data. The DDBJ Pipeline is accessible at http://p.ddbj.nig.ac.jp/. PMID:23657089
A new arenavirus in a cluster of fatal transplant-associated diseases.
Palacios, Gustavo; Druce, Julian; Du, Lei; Tran, Thomas; Birch, Chris; Briese, Thomas; Conlan, Sean; Quan, Phenix-Lan; Hui, Jeffrey; Marshall, John; Simons, Jan Fredrik; Egholm, Michael; Paddock, Christopher D; Shieh, Wun-Ju; Goldsmith, Cynthia S; Zaki, Sherif R; Catton, Mike; Lipkin, W Ian
2008-03-06
Three patients who received visceral-organ transplants from a single donor on the same day died of a febrile illness 4 to 6 weeks after transplantation. Culture, polymerase-chain-reaction (PCR) and serologic assays, and oligonucleotide microarray analysis for a wide range of infectious agents were not informative. We evaluated RNA obtained from the liver and kidney transplant recipients. Unbiased high-throughput sequencing was used to identify microbial sequences not found by means of other methods. The specificity of sequences for a new candidate pathogen was confirmed by means of culture and by means of PCR, immunohistochemical, and serologic analyses. High-throughput sequencing yielded 103,632 sequences, of which 14 represented an Old World arenavirus. Additional sequence analysis showed that this new arenavirus was related to lymphocytic choriomeningitis viruses. Specific PCR assays based on a unique sequence confirmed the presence of the virus in the kidneys, liver, blood, and cerebrospinal fluid of the recipients. Immunohistochemical analysis revealed arenavirus antigen in the liver and kidney transplants in the recipients. IgM and IgG antiviral antibodies were detected in the serum of the donor. Seroconversion was evident in serum specimens obtained from one recipient at two time points. Unbiased high-throughput sequencing is a powerful tool for the discovery of pathogens. The use of this method during an outbreak of disease facilitated the identification of a new arenavirus transmitted through solid-organ transplantation. Copyright 2008 Massachusetts Medical Society.
Gore, Brooklin
2018-02-01
This presentation includes a brief background on High Throughput Computing, correlating gene transcription factors, optical mapping, genotype to phenotype mapping via QTL analysis, and current work on next gen sequencing.
Dutta, Sanjib; Koide, Akiko; Koide, Shohei
2008-01-01
Stability evaluation of many mutants can lead to a better understanding of the sequence determinants of a structural motif and of factors governing protein stability and protein evolution. The traditional biophysical analysis of protein stability is low throughput, limiting our ability to widely explore the sequence space in a quantitative manner. In this study, we have developed a high-throughput library screening method for quantifying stability changes, which is based on protein fragment reconstitution and yeast surface display. Our method exploits the thermodynamic linkage between protein stability and fragment reconstitution and the ability of the yeast surface display technique to quantitatively evaluate protein-protein interactions. The method was applied to a fibronectin type III (FN3) domain. Characterization of fragment reconstitution was facilitated by the co-expression of two FN3 fragments, thus establishing a "yeast surface two-hybrid" method. Importantly, our method does not rely on competition between clones and thus eliminates a common limitation of high-throughput selection methods in which the most stable variants are predominantly recovered. Thus, it allows for the isolation of sequences that exhibits a desired level of stability. We identified over one hundred unique sequences for a β-bulge motif, which was significantly more informative than natural sequences of the FN3 family in revealing the sequence determinants for the β-bulge. Our method provides a powerful means to rapidly assess stability of many variants, to systematically assess contribution of different factors to protein stability and to enhance protein stability. PMID:18674545
DOE Office of Scientific and Technical Information (OSTI.GOV)
Athavale, Ajay
Ajay Athavale (Monsanto) presents "High Throughput Plasmid Sequencing with Illumina and CLC Bio" at the 7th Annual Sequencing, Finishing, Analysis in the Future (SFAF) Meeting held in June, 2012 in Santa Fe, NM.
Logares, Ramiro; Haverkamp, Thomas H A; Kumar, Surendra; Lanzén, Anders; Nederbragt, Alexander J; Quince, Christopher; Kauserud, Håvard
2012-10-01
The incursion of High-Throughput Sequencing (HTS) in environmental microbiology brings unique opportunities and challenges. HTS now allows a high-resolution exploration of the vast taxonomic and metabolic diversity present in the microbial world, which can provide an exceptional insight on global ecosystem functioning, ecological processes and evolution. This exploration has also economic potential, as we will have access to the evolutionary innovation present in microbial metabolisms, which could be used for biotechnological development. HTS is also challenging the research community, and the current bottleneck is present in the data analysis side. At the moment, researchers are in a sequence data deluge, with sequencing throughput advancing faster than the computer power needed for data analysis. However, new tools and approaches are being developed constantly and the whole process could be depicted as a fast co-evolution between sequencing technology, informatics and microbiologists. In this work, we examine the most popular and recently commercialized HTS platforms as well as bioinformatics methods for data handling and analysis used in microbial metagenomics. This non-exhaustive review is intended to serve as a broad state-of-the-art guide to researchers expanding into this rapidly evolving field. Copyright © 2012 Elsevier B.V. All rights reserved.
ISRNA: an integrative online toolkit for short reads from high-throughput sequencing data.
Luo, Guan-Zheng; Yang, Wei; Ma, Ying-Ke; Wang, Xiu-Jie
2014-02-01
Integrative Short Reads NAvigator (ISRNA) is an online toolkit for analyzing high-throughput small RNA sequencing data. Besides the high-speed genome mapping function, ISRNA provides statistics for genomic location, length distribution and nucleotide composition bias analysis of sequence reads. Number of reads mapped to known microRNAs and other classes of short non-coding RNAs, coverage of short reads on genes, expression abundance of sequence reads as well as some other analysis functions are also supported. The versatile search functions enable users to select sequence reads according to their sub-sequences, expression abundance, genomic location, relationship to genes, etc. A specialized genome browser is integrated to visualize the genomic distribution of short reads. ISRNA also supports management and comparison among multiple datasets. ISRNA is implemented in Java/C++/Perl/MySQL and can be freely accessed at http://omicslab.genetics.ac.cn/ISRNA/.
Fast multiclonal clusterization of V(D)J recombinations from high-throughput sequencing.
Giraud, Mathieu; Salson, Mikaël; Duez, Marc; Villenet, Céline; Quief, Sabine; Caillault, Aurélie; Grardel, Nathalie; Roumier, Christophe; Preudhomme, Claude; Figeac, Martin
2014-05-28
V(D)J recombinations in lymphocytes are essential for immunological diversity. They are also useful markers of pathologies. In leukemia, they are used to quantify the minimal residual disease during patient follow-up. However, the full breadth of lymphocyte diversity is not fully understood. We propose new algorithms that process high-throughput sequencing (HTS) data to extract unnamed V(D)J junctions and gather them into clones for quantification. This analysis is based on a seed heuristic and is fast and scalable because in the first phase, no alignment is performed with germline database sequences. The algorithms were applied to TR γ HTS data from a patient with acute lymphoblastic leukemia, and also on data simulating hypermutations. Our methods identified the main clone, as well as additional clones that were not identified with standard protocols. The proposed algorithms provide new insight into the analysis of high-throughput sequencing data for leukemia, and also to the quantitative assessment of any immunological profile. The methods described here are implemented in a C++ open-source program called Vidjil.
Jordan, Scott
2018-01-24
Scott Jordan on "Advances in high-throughput speed, low-latency communication for embedded instrumentation" at the 2012 Sequencing, Finishing, Analysis in the Future Meeting held June 5-7, 2012 in Santa Fe, New Mexico.
Mobile element biology – new possibilities with high-throughput sequencing
Xing, Jinchuan; Witherspoon, David J.; Jorde, Lynn B.
2014-01-01
Mobile elements compose more than half of the human genome, but until recently their large-scale detection was time-consuming and challenging. With the development of new high-throughput sequencing technologies, the complete spectrum of mobile element variation in humans can now be identified and analyzed. Thousands of new mobile element insertions have been discovered, yielding new insights into mobile element biology, evolution, and genomic variation. We review several high-throughput methods, with an emphasis on techniques that specifically target mobile element insertions in humans, and we highlight recent applications of these methods in evolutionary studies and in the analysis of somatic alterations in human cancers. PMID:23312846
Large-Scale Biomonitoring of Remote and Threatened Ecosystems via High-Throughput Sequencing
Gibson, Joel F.; Shokralla, Shadi; Curry, Colin; Baird, Donald J.; Monk, Wendy A.; King, Ian; Hajibabaei, Mehrdad
2015-01-01
Biodiversity metrics are critical for assessment and monitoring of ecosystems threatened by anthropogenic stressors. Existing sorting and identification methods are too expensive and labour-intensive to be scaled up to meet management needs. Alternately, a high-throughput DNA sequencing approach could be used to determine biodiversity metrics from bulk environmental samples collected as part of a large-scale biomonitoring program. Here we show that both morphological and DNA sequence-based analyses are suitable for recovery of individual taxonomic richness, estimation of proportional abundance, and calculation of biodiversity metrics using a set of 24 benthic samples collected in the Peace-Athabasca Delta region of Canada. The high-throughput sequencing approach was able to recover all metrics with a higher degree of taxonomic resolution than morphological analysis. The reduced cost and increased capacity of DNA sequence-based approaches will finally allow environmental monitoring programs to operate at the geographical and temporal scale required by industrial and regulatory end-users. PMID:26488407
DOE Office of Scientific and Technical Information (OSTI.GOV)
Leung, Elo; Huang, Amy; Cadag, Eithon
In this study, we introduce the Protein Sequence Annotation Tool (PSAT), a web-based, sequence annotation meta-server for performing integrated, high-throughput, genome-wide sequence analyses. Our goals in building PSAT were to (1) create an extensible platform for integration of multiple sequence-based bioinformatics tools, (2) enable functional annotations and enzyme predictions over large input protein fasta data sets, and (3) provide a web interface for convenient execution of the tools. In this paper, we demonstrate the utility of PSAT by annotating the predicted peptide gene products of Herbaspirillum sp. strain RV1423, importing the results of PSAT into EC2KEGG, and using the resultingmore » functional comparisons to identify a putative catabolic pathway, thereby distinguishing RV1423 from a well annotated Herbaspirillum species. This analysis demonstrates that high-throughput enzyme predictions, provided by PSAT processing, can be used to identify metabolic potential in an otherwise poorly annotated genome. Lastly, PSAT is a meta server that combines the results from several sequence-based annotation and function prediction codes, and is available at http://psat.llnl.gov/psat/. PSAT stands apart from other sequencebased genome annotation systems in providing a high-throughput platform for rapid de novo enzyme predictions and sequence annotations over large input protein sequence data sets in FASTA. PSAT is most appropriately applied in annotation of large protein FASTA sets that may or may not be associated with a single genome.« less
Leung, Elo; Huang, Amy; Cadag, Eithon; ...
2016-01-20
In this study, we introduce the Protein Sequence Annotation Tool (PSAT), a web-based, sequence annotation meta-server for performing integrated, high-throughput, genome-wide sequence analyses. Our goals in building PSAT were to (1) create an extensible platform for integration of multiple sequence-based bioinformatics tools, (2) enable functional annotations and enzyme predictions over large input protein fasta data sets, and (3) provide a web interface for convenient execution of the tools. In this paper, we demonstrate the utility of PSAT by annotating the predicted peptide gene products of Herbaspirillum sp. strain RV1423, importing the results of PSAT into EC2KEGG, and using the resultingmore » functional comparisons to identify a putative catabolic pathway, thereby distinguishing RV1423 from a well annotated Herbaspirillum species. This analysis demonstrates that high-throughput enzyme predictions, provided by PSAT processing, can be used to identify metabolic potential in an otherwise poorly annotated genome. Lastly, PSAT is a meta server that combines the results from several sequence-based annotation and function prediction codes, and is available at http://psat.llnl.gov/psat/. PSAT stands apart from other sequencebased genome annotation systems in providing a high-throughput platform for rapid de novo enzyme predictions and sequence annotations over large input protein sequence data sets in FASTA. PSAT is most appropriately applied in annotation of large protein FASTA sets that may or may not be associated with a single genome.« less
New Tools For Understanding Microbial Diversity Using High-throughput Sequence Data
NASA Astrophysics Data System (ADS)
Knight, R.; Hamady, M.; Liu, Z.; Lozupone, C.
2007-12-01
High-throughput sequencing techniques such as 454 are straining the limits of tools traditionally used to build trees, choose OTUs, and perform other essential sequencing tasks. We have developed a workflow for phylogenetic analysis of large-scale sequence data sets that combines existing tools, such as the Arb phylogeny package and the NAST multiple sequence alignment tool, with new methods for choosing and clustering OTUs and for performing phylogenetic community analysis with UniFrac. This talk discusses the cyberinfrastructure we are developing to support the human microbiome project, and the application of these workflows to analyze very large data sets that contrast the gut microbiota with a range of physical environments. These tools will ultimately help to define core and peripheral microbiomes in a range of environments, and will allow us to understand the physical and biotic factors that contribute most to differences in microbial diversity.
Hayden, Eric J
2016-08-15
RNA molecules provide a realistic but tractable model of a genotype to phenotype relationship. This relationship has been extensively investigated computationally using secondary structure prediction algorithms. Enzymatic RNA molecules, or ribozymes, offer access to genotypic and phenotypic information in the laboratory. Advancements in high-throughput sequencing technologies have enabled the analysis of sequences in the lab that now rivals what can be accomplished computationally. This has motivated a resurgence of in vitro selection experiments and opened new doors for the analysis of the distribution of RNA functions in genotype space. A body of computational experiments has investigated the persistence of specific RNA structures despite changes in the primary sequence, and how this mutational robustness can promote adaptations. This article summarizes recent approaches that were designed to investigate the role of mutational robustness during the evolution of RNA molecules in the laboratory, and presents theoretical motivations, experimental methods and approaches to data analysis. Copyright © 2016 Elsevier Inc. All rights reserved.
Zhao, Meng-Meng; Du, Shan-Shan; Li, Qiu-Hong; Chen, Tao; Qiu, Hui; Wu, Qin; Chen, Shan-Shan; Zhou, Ying; Zhang, Yuan; Hu, Yang; Su, Yi-Liang; Shen, Li; Zhang, Fen; Weng, Dong; Li, Hui-Ping
2017-02-01
This study aims to use high throughput 16SrRNA gene sequencing to examine the bacterial profile of lymph node biopsy samples of patients with sarcoidosis and to further verify the association between Propionibacterium acnes (P. acnes) and sarcoidosis. A total of 36 mediastinal lymph node biopsy specimens were collected from 17 cases of sarcoidosis, 8 tuberculosis (TB group), and 11 non-infectious lung diseases (control group). The V4 region of the bacterial 16SrRNA gene in the specimens was amplified and sequenced using the high throughput sequencing platform MiSeq, and bacterial profile was established. The data analysis software QIIME and Metastats were used to compare bacterial relative abundance in the three patient groups. Overall, 545 genera were identified; 38 showed significantly lower and 29 had significantly higher relative abundance in the sarcoidosis group than in the TB and control groups (P < 0.01). P. acnes 16SrRNA was exclusively found in all the 17 samples of the sarcoidosis group, whereas was not detected in the TB and control groups. The relative abundance of P. acnes in the sarcoidosis group (0.16% ± 0. 11%) was significantly higher than that in the TB (Metastats analysis: P = 0.0010, q = 0.0044) and control groups (Metastats analysis: P = 0.0010, q = 0.0038). The relative abundance of P. granulosum was only 0.0022% ± 0. 0044% in the sarcoidosis group. P. granulosum 16SrRNA was not detected in the other two groups. High throughput 16SrRNA gene sequencing appears to be a useful tool to investigate the bacterial profile of sarcoidosis specimens. The results suggest that P. acnes may be involved in sarcoidosis development.
Shibata, Kazuhiro; Itoh, Masayoshi; Aizawa, Katsunori; Nagaoka, Sumiharu; Sasaki, Nobuya; Carninci, Piero; Konno, Hideaki; Akiyama, Junichi; Nishi, Katsuo; Kitsunai, Tokuji; Tashiro, Hideo; Itoh, Mari; Sumi, Noriko; Ishii, Yoshiyuki; Nakamura, Shin; Hazama, Makoto; Nishine, Tsutomu; Harada, Akira; Yamamoto, Rintaro; Matsumoto, Hiroyuki; Sakaguchi, Sumito; Ikegami, Takashi; Kashiwagi, Katsuya; Fujiwake, Syuji; Inoue, Kouji; Togawa, Yoshiyuki; Izawa, Masaki; Ohara, Eiji; Watahiki, Masanori; Yoneda, Yuko; Ishikawa, Tomokazu; Ozawa, Kaori; Tanaka, Takumi; Matsuura, Shuji; Kawai, Jun; Okazaki, Yasushi; Muramatsu, Masami; Inoue, Yorinao; Kira, Akira; Hayashizaki, Yoshihide
2000-01-01
The RIKEN high-throughput 384-format sequencing pipeline (RISA system) including a 384-multicapillary sequencer (the so-called RISA sequencer) was developed for the RIKEN mouse encyclopedia project. The RISA system consists of colony picking, template preparation, sequencing reaction, and the sequencing process. A novel high-throughput 384-format capillary sequencer system (RISA sequencer system) was developed for the sequencing process. This system consists of a 384-multicapillary auto sequencer (RISA sequencer), a 384-multicapillary array assembler (CAS), and a 384-multicapillary casting device. The RISA sequencer can simultaneously analyze 384 independent sequencing products. The optical system is a scanning system chosen after careful comparison with an image detection system for the simultaneous detection of the 384-capillary array. This scanning system can be used with any fluorescent-labeled sequencing reaction (chain termination reaction), including transcriptional sequencing based on RNA polymerase, which was originally developed by us, and cycle sequencing based on thermostable DNA polymerase. For long-read sequencing, 380 out of 384 sequences (99.2%) were successfully analyzed and the average read length, with more than 99% accuracy, was 654.4 bp. A single RISA sequencer can analyze 216 kb with >99% accuracy in 2.7 h (90 kb/h). For short-read sequencing to cluster the 3′ end and 5′ end sequencing by reading 350 bp, 384 samples can be analyzed in 1.5 h. We have also developed a RISA inoculator, RISA filtrator and densitometer, RISA plasmid preparator which can handle throughput of 40,000 samples in 17.5 h, and a high-throughput RISA thermal cycler which has four 384-well sites. The combination of these technologies allowed us to construct the RISA system consisting of 16 RISA sequencers, which can process 50,000 DNA samples per day. One haploid genome shotgun sequence of a higher organism, such as human, mouse, rat, domestic animals, and plants, can be revealed by seven RISA systems within one month. PMID:11076861
High-throughput tetrad analysis.
Ludlow, Catherine L; Scott, Adrian C; Cromie, Gareth A; Jeffery, Eric W; Sirr, Amy; May, Patrick; Lin, Jake; Gilbert, Teresa L; Hays, Michelle; Dudley, Aimée M
2013-07-01
Tetrad analysis has been a gold-standard genetic technique for several decades. Unfortunately, the need to manually isolate, disrupt and space tetrads has relegated its application to small-scale studies and limited its integration with high-throughput DNA sequencing technologies. We have developed a rapid, high-throughput method, called barcode-enabled sequencing of tetrads (BEST), that uses (i) a meiosis-specific GFP fusion protein to isolate tetrads by FACS and (ii) molecular barcodes that are read during genotyping to identify spores derived from the same tetrad. Maintaining tetrad information allows accurate inference of missing genetic markers and full genotypes of missing (and presumably nonviable) individuals. An individual researcher was able to isolate over 3,000 yeast tetrads in 3 h, an output equivalent to that of almost 1 month of manual dissection. BEST is transferable to other microorganisms for which meiotic mapping is significantly more laborious.
Angiuoli, Samuel V; Matalka, Malcolm; Gussman, Aaron; Galens, Kevin; Vangala, Mahesh; Riley, David R; Arze, Cesar; White, James R; White, Owen; Fricke, W Florian
2011-08-30
Next-generation sequencing technologies have decentralized sequence acquisition, increasing the demand for new bioinformatics tools that are easy to use, portable across multiple platforms, and scalable for high-throughput applications. Cloud computing platforms provide on-demand access to computing infrastructure over the Internet and can be used in combination with custom built virtual machines to distribute pre-packaged with pre-configured software. We describe the Cloud Virtual Resource, CloVR, a new desktop application for push-button automated sequence analysis that can utilize cloud computing resources. CloVR is implemented as a single portable virtual machine (VM) that provides several automated analysis pipelines for microbial genomics, including 16S, whole genome and metagenome sequence analysis. The CloVR VM runs on a personal computer, utilizes local computer resources and requires minimal installation, addressing key challenges in deploying bioinformatics workflows. In addition CloVR supports use of remote cloud computing resources to improve performance for large-scale sequence processing. In a case study, we demonstrate the use of CloVR to automatically process next-generation sequencing data on multiple cloud computing platforms. The CloVR VM and associated architecture lowers the barrier of entry for utilizing complex analysis protocols on both local single- and multi-core computers and cloud systems for high throughput data processing.
Chwialkowska, Karolina; Korotko, Urszula; Kosinska, Joanna; Szarejko, Iwona; Kwasniewski, Miroslaw
2017-01-01
Epigenetic mechanisms, including histone modifications and DNA methylation, mutually regulate chromatin structure, maintain genome integrity, and affect gene expression and transposon mobility. Variations in DNA methylation within plant populations, as well as methylation in response to internal and external factors, are of increasing interest, especially in the crop research field. Methylation Sensitive Amplification Polymorphism (MSAP) is one of the most commonly used methods for assessing DNA methylation changes in plants. This method involves gel-based visualization of PCR fragments from selectively amplified DNA that are cleaved using methylation-sensitive restriction enzymes. In this study, we developed and validated a new method based on the conventional MSAP approach called Methylation Sensitive Amplification Polymorphism Sequencing (MSAP-Seq). We improved the MSAP-based approach by replacing the conventional separation of amplicons on polyacrylamide gels with direct, high-throughput sequencing using Next Generation Sequencing (NGS) and automated data analysis. MSAP-Seq allows for global sequence-based identification of changes in DNA methylation. This technique was validated in Hordeum vulgare . However, MSAP-Seq can be straightforwardly implemented in different plant species, including crops with large, complex and highly repetitive genomes. The incorporation of high-throughput sequencing into MSAP-Seq enables parallel and direct analysis of DNA methylation in hundreds of thousands of sites across the genome. MSAP-Seq provides direct genomic localization of changes and enables quantitative evaluation. We have shown that the MSAP-Seq method specifically targets gene-containing regions and that a single analysis can cover three-quarters of all genes in large genomes. Moreover, MSAP-Seq's simplicity, cost effectiveness, and high-multiplexing capability make this method highly affordable. Therefore, MSAP-Seq can be used for DNA methylation analysis in crop plants with large and complex genomes.
Chwialkowska, Karolina; Korotko, Urszula; Kosinska, Joanna; Szarejko, Iwona; Kwasniewski, Miroslaw
2017-01-01
Epigenetic mechanisms, including histone modifications and DNA methylation, mutually regulate chromatin structure, maintain genome integrity, and affect gene expression and transposon mobility. Variations in DNA methylation within plant populations, as well as methylation in response to internal and external factors, are of increasing interest, especially in the crop research field. Methylation Sensitive Amplification Polymorphism (MSAP) is one of the most commonly used methods for assessing DNA methylation changes in plants. This method involves gel-based visualization of PCR fragments from selectively amplified DNA that are cleaved using methylation-sensitive restriction enzymes. In this study, we developed and validated a new method based on the conventional MSAP approach called Methylation Sensitive Amplification Polymorphism Sequencing (MSAP-Seq). We improved the MSAP-based approach by replacing the conventional separation of amplicons on polyacrylamide gels with direct, high-throughput sequencing using Next Generation Sequencing (NGS) and automated data analysis. MSAP-Seq allows for global sequence-based identification of changes in DNA methylation. This technique was validated in Hordeum vulgare. However, MSAP-Seq can be straightforwardly implemented in different plant species, including crops with large, complex and highly repetitive genomes. The incorporation of high-throughput sequencing into MSAP-Seq enables parallel and direct analysis of DNA methylation in hundreds of thousands of sites across the genome. MSAP-Seq provides direct genomic localization of changes and enables quantitative evaluation. We have shown that the MSAP-Seq method specifically targets gene-containing regions and that a single analysis can cover three-quarters of all genes in large genomes. Moreover, MSAP-Seq's simplicity, cost effectiveness, and high-multiplexing capability make this method highly affordable. Therefore, MSAP-Seq can be used for DNA methylation analysis in crop plants with large and complex genomes. PMID:29250096
Lu, Xin; Zhang, Xu-Xiang; Wang, Zhu; Huang, Kailong; Wang, Yuan; Liang, Weigang; Tan, Yunfei; Liu, Bo; Tang, Junying
2015-01-01
This study used 454 pyrosequencing, Illumina high-throughput sequencing and metagenomic analysis to investigate bacterial pathogens and their potential virulence in a sewage treatment plant (STP) applying both conventional and advanced treatment processes. Pyrosequencing and Illumina sequencing consistently demonstrated that Arcobacter genus occupied over 43.42% of total abundance of potential pathogens in the STP. At species level, potential pathogens Arcobacter butzleri, Aeromonas hydrophila and Klebsiella pneumonia dominated in raw sewage, which was also confirmed by quantitative real time PCR. Illumina sequencing also revealed prevalence of various types of pathogenicity islands and virulence proteins in the STP. Most of the potential pathogens and virulence factors were eliminated in the STP, and the removal efficiency mainly depended on oxidation ditch. Compared with sand filtration, magnetic resin seemed to have higher removals in most of the potential pathogens and virulence factors. However, presence of the residual A. butzleri in the final effluent still deserves more concerns. The findings indicate that sewage acts as an important source of environmental pathogens, but STPs can effectively control their spread in the environment. Joint use of the high-throughput sequencing technologies is considered a reliable method for deep and comprehensive overview of environmental bacterial virulence. PMID:25938416
Morgan, Martin; Anders, Simon; Lawrence, Michael; Aboyoun, Patrick; Pagès, Hervé; Gentleman, Robert
2009-01-01
Summary: ShortRead is a package for input, quality assessment, manipulation and output of high-throughput sequencing data. ShortRead is provided in the R and Bioconductor environments, allowing ready access to additional facilities for advanced statistical analysis, data transformation, visualization and integration with diverse genomic resources. Availability and Implementation: This package is implemented in R and available at the Bioconductor web site; the package contains a ‘vignette’ outlining typical work flows. Contact: mtmorgan@fhcrc.org PMID:19654119
Ozer, Abdullah; Tome, Jacob M.; Friedman, Robin C.; Gheba, Dan; Schroth, Gary P.; Lis, John T.
2016-01-01
Because RNA-protein interactions play a central role in a wide-array of biological processes, methods that enable a quantitative assessment of these interactions in a high-throughput manner are in great demand. Recently, we developed the High Throughput Sequencing-RNA Affinity Profiling (HiTS-RAP) assay, which couples sequencing on an Illumina GAIIx with the quantitative assessment of one or several proteins’ interactions with millions of different RNAs in a single experiment. We have successfully used HiTS-RAP to analyze interactions of EGFP and NELF-E proteins with their corresponding canonical and mutant RNA aptamers. Here, we provide a detailed protocol for HiTS-RAP, which can be completed in about a month (8 days hands-on time) including the preparation and testing of recombinant proteins and DNA templates, clustering DNA templates on a flowcell, high-throughput sequencing and protein binding with GAIIx, and finally data analysis. We also highlight aspects of HiTS-RAP that can be further improved and points of comparison between HiTS-RAP and two other recently developed methods, RNA-MaP and RBNS. A successful HiTS-RAP experiment provides the sequence and binding curves for approximately 200 million RNAs in a single experiment. PMID:26182240
Next Generation Sequencing at the University of Chicago Genomics Core
DOE Office of Scientific and Technical Information (OSTI.GOV)
Faber, Pieter
2013-04-24
The University of Chicago Genomics Core provides University of Chicago investigators (and external clients) access to State-of-the-Art genomics capabilities: next generation sequencing, Sanger sequencing / genotyping and micro-arrays (gene expression, genotyping, and methylation). The current presentation will highlight our capabilities in the area of ultra-high throughput sequencing analysis.
DnaSAM: Software to perform neutrality testing for large datasets with complex null models.
Eckert, Andrew J; Liechty, John D; Tearse, Brandon R; Pande, Barnaly; Neale, David B
2010-05-01
Patterns of DNA sequence polymorphisms can be used to understand the processes of demography and adaptation within natural populations. High-throughput generation of DNA sequence data has historically been the bottleneck with respect to data processing and experimental inference. Advances in marker technologies have largely solved this problem. Currently, the limiting step is computational, with most molecular population genetic software allowing a gene-by-gene analysis through a graphical user interface. An easy-to-use analysis program that allows both high-throughput processing of multiple sequence alignments along with the flexibility to simulate data under complex demographic scenarios is currently lacking. We introduce a new program, named DnaSAM, which allows high-throughput estimation of DNA sequence diversity and neutrality statistics from experimental data along with the ability to test those statistics via Monte Carlo coalescent simulations. These simulations are conducted using the ms program, which is able to incorporate several genetic parameters (e.g. recombination) and demographic scenarios (e.g. population bottlenecks). The output is a set of diversity and neutrality statistics with associated probability values under a user-specified null model that are stored in easy to manipulate text file. © 2009 Blackwell Publishing Ltd.
SUGAR: graphical user interface-based data refiner for high-throughput DNA sequencing.
Sato, Yukuto; Kojima, Kaname; Nariai, Naoki; Yamaguchi-Kabata, Yumi; Kawai, Yosuke; Takahashi, Mamoru; Mimori, Takahiro; Nagasaki, Masao
2014-08-08
Next-generation sequencers (NGSs) have become one of the main tools for current biology. To obtain useful insights from the NGS data, it is essential to control low-quality portions of the data affected by technical errors such as air bubbles in sequencing fluidics. We develop a software SUGAR (subtile-based GUI-assisted refiner) which can handle ultra-high-throughput data with user-friendly graphical user interface (GUI) and interactive analysis capability. The SUGAR generates high-resolution quality heatmaps of the flowcell, enabling users to find possible signals of technical errors during the sequencing. The sequencing data generated from the error-affected regions of a flowcell can be selectively removed by automated analysis or GUI-assisted operations implemented in the SUGAR. The automated data-cleaning function based on sequence read quality (Phred) scores was applied to a public whole human genome sequencing data and we proved the overall mapping quality was improved. The detailed data evaluation and cleaning enabled by SUGAR would reduce technical problems in sequence read mapping, improving subsequent variant analysis that require high-quality sequence data and mapping results. Therefore, the software will be especially useful to control the quality of variant calls to the low population cells, e.g., cancers, in a sample with technical errors of sequencing procedures.
Cartwright, Reed A; Hussin, Julie; Keebler, Jonathan E M; Stone, Eric A; Awadalla, Philip
2012-01-06
Recent advances in high-throughput DNA sequencing technologies and associated statistical analyses have enabled in-depth analysis of whole-genome sequences. As this technology is applied to a growing number of individual human genomes, entire families are now being sequenced. Information contained within the pedigree of a sequenced family can be leveraged when inferring the donors' genotypes. The presence of a de novo mutation within the pedigree is indicated by a violation of Mendelian inheritance laws. Here, we present a method for probabilistically inferring genotypes across a pedigree using high-throughput sequencing data and producing the posterior probability of de novo mutation at each genomic site examined. This framework can be used to disentangle the effects of germline and somatic mutational processes and to simultaneously estimate the effect of sequencing error and the initial genetic variation in the population from which the founders of the pedigree arise. This approach is examined in detail through simulations and areas for method improvement are noted. By applying this method to data from members of a well-defined nuclear family with accurate pedigree information, the stage is set to make the most direct estimates of the human mutation rate to date.
Wonczak, Stephan; Thiele, Holger; Nieroda, Lech; Jabbari, Kamel; Borowski, Stefan; Sinha, Vishal; Gunia, Wilfried; Lang, Ulrich; Achter, Viktor; Nürnberg, Peter
2015-01-01
Next generation sequencing (NGS) has been a great success and is now a standard method of research in the life sciences. With this technology, dozens of whole genomes or hundreds of exomes can be sequenced in rather short time, producing huge amounts of data. Complex bioinformatics analyses are required to turn these data into scientific findings. In order to run these analyses fast, automated workflows implemented on high performance computers are state of the art. While providing sufficient compute power and storage to meet the NGS data challenge, high performance computing (HPC) systems require special care when utilized for high throughput processing. This is especially true if the HPC system is shared by different users. Here, stability, robustness and maintainability are as important for automated workflows as speed and throughput. To achieve all of these aims, dedicated solutions have to be developed. In this paper, we present the tricks and twists that we utilized in the implementation of our exome data processing workflow. It may serve as a guideline for other high throughput data analysis projects using a similar infrastructure. The code implementing our solutions is provided in the supporting information files. PMID:25942438
Oono, Ryoko
2017-01-01
High-throughput sequencing technology has helped microbial community ecologists explore ecological and evolutionary patterns at unprecedented scales. The benefits of a large sample size still typically outweigh that of greater sequencing depths per sample for accurate estimations of ecological inferences. However, excluding or not sequencing rare taxa may mislead the answers to the questions 'how and why are communities different?' This study evaluates the confidence intervals of ecological inferences from high-throughput sequencing data of foliar fungal endophytes as case studies through a range of sampling efforts, sequencing depths, and taxonomic resolutions to understand how technical and analytical practices may affect our interpretations. Increasing sampling size reliably decreased confidence intervals across multiple community comparisons. However, the effects of sequencing depths on confidence intervals depended on how rare taxa influenced the dissimilarity estimates among communities and did not significantly decrease confidence intervals for all community comparisons. A comparison of simulated communities under random drift suggests that sequencing depths are important in estimating dissimilarities between microbial communities under neutral selective processes. Confidence interval analyses reveal important biases as well as biological trends in microbial community studies that otherwise may be ignored when communities are only compared for statistically significant differences.
2017-01-01
High-throughput sequencing technology has helped microbial community ecologists explore ecological and evolutionary patterns at unprecedented scales. The benefits of a large sample size still typically outweigh that of greater sequencing depths per sample for accurate estimations of ecological inferences. However, excluding or not sequencing rare taxa may mislead the answers to the questions ‘how and why are communities different?’ This study evaluates the confidence intervals of ecological inferences from high-throughput sequencing data of foliar fungal endophytes as case studies through a range of sampling efforts, sequencing depths, and taxonomic resolutions to understand how technical and analytical practices may affect our interpretations. Increasing sampling size reliably decreased confidence intervals across multiple community comparisons. However, the effects of sequencing depths on confidence intervals depended on how rare taxa influenced the dissimilarity estimates among communities and did not significantly decrease confidence intervals for all community comparisons. A comparison of simulated communities under random drift suggests that sequencing depths are important in estimating dissimilarities between microbial communities under neutral selective processes. Confidence interval analyses reveal important biases as well as biological trends in microbial community studies that otherwise may be ignored when communities are only compared for statistically significant differences. PMID:29253889
Goodacre, Norman; Aljanahi, Aisha; Nandakumar, Subhiksha; Mikailov, Mike
2018-01-01
ABSTRACT Detection of distantly related viruses by high-throughput sequencing (HTS) is bioinformatically challenging because of the lack of a public database containing all viral sequences, without abundant nonviral sequences, which can extend runtime and obscure viral hits. Our reference viral database (RVDB) includes all viral, virus-related, and virus-like nucleotide sequences (excluding bacterial viruses), regardless of length, and with overall reduced cellular sequences. Semantic selection criteria (SEM-I) were used to select viral sequences from GenBank, resulting in a first-generation viral database (VDB). This database was manually and computationally reviewed, resulting in refined, semantic selection criteria (SEM-R), which were applied to a new download of updated GenBank sequences to create a second-generation VDB. Viral entries in the latter were clustered at 98% by CD-HIT-EST to reduce redundancy while retaining high viral sequence diversity. The viral identity of the clustered representative sequences (creps) was confirmed by BLAST searches in NCBI databases and HMMER searches in PFAM and DFAM databases. The resulting RVDB contained a broad representation of viral families, sequence diversity, and a reduced cellular content; it includes full-length and partial sequences and endogenous nonretroviral elements, endogenous retroviruses, and retrotransposons. Testing of RVDBv10.2, with an in-house HTS transcriptomic data set indicated a significantly faster run for virus detection than interrogating the entirety of the NCBI nonredundant nucleotide database, which contains all viral sequences but also nonviral sequences. RVDB is publically available for facilitating HTS analysis, particularly for novel virus detection. It is meant to be updated on a regular basis to include new viral sequences added to GenBank. IMPORTANCE To facilitate bioinformatics analysis of high-throughput sequencing (HTS) data for the detection of both known and novel viruses, we have developed a new reference viral database (RVDB) that provides a broad representation of different virus species from eukaryotes by including all viral, virus-like, and virus-related sequences (excluding bacteriophages), regardless of their size. In particular, RVDB contains endogenous nonretroviral elements, endogenous retroviruses, and retrotransposons. Sequences were clustered to reduce redundancy while retaining high viral sequence diversity. A particularly useful feature of RVDB is the reduction of cellular sequences, which can enhance the run efficiency of large transcriptomic and genomic data analysis and increase the specificity of virus detection. PMID:29564396
Goodacre, Norman; Aljanahi, Aisha; Nandakumar, Subhiksha; Mikailov, Mike; Khan, Arifa S
2018-01-01
Detection of distantly related viruses by high-throughput sequencing (HTS) is bioinformatically challenging because of the lack of a public database containing all viral sequences, without abundant nonviral sequences, which can extend runtime and obscure viral hits. Our reference viral database (RVDB) includes all viral, virus-related, and virus-like nucleotide sequences (excluding bacterial viruses), regardless of length, and with overall reduced cellular sequences. Semantic selection criteria (SEM-I) were used to select viral sequences from GenBank, resulting in a first-generation viral database (VDB). This database was manually and computationally reviewed, resulting in refined, semantic selection criteria (SEM-R), which were applied to a new download of updated GenBank sequences to create a second-generation VDB. Viral entries in the latter were clustered at 98% by CD-HIT-EST to reduce redundancy while retaining high viral sequence diversity. The viral identity of the clustered representative sequences (creps) was confirmed by BLAST searches in NCBI databases and HMMER searches in PFAM and DFAM databases. The resulting RVDB contained a broad representation of viral families, sequence diversity, and a reduced cellular content; it includes full-length and partial sequences and endogenous nonretroviral elements, endogenous retroviruses, and retrotransposons. Testing of RVDBv10.2, with an in-house HTS transcriptomic data set indicated a significantly faster run for virus detection than interrogating the entirety of the NCBI nonredundant nucleotide database, which contains all viral sequences but also nonviral sequences. RVDB is publically available for facilitating HTS analysis, particularly for novel virus detection. It is meant to be updated on a regular basis to include new viral sequences added to GenBank. IMPORTANCE To facilitate bioinformatics analysis of high-throughput sequencing (HTS) data for the detection of both known and novel viruses, we have developed a new reference viral database (RVDB) that provides a broad representation of different virus species from eukaryotes by including all viral, virus-like, and virus-related sequences (excluding bacteriophages), regardless of their size. In particular, RVDB contains endogenous nonretroviral elements, endogenous retroviruses, and retrotransposons. Sequences were clustered to reduce redundancy while retaining high viral sequence diversity. A particularly useful feature of RVDB is the reduction of cellular sequences, which can enhance the run efficiency of large transcriptomic and genomic data analysis and increase the specificity of virus detection.
Caboche, Ségolène; Audebert, Christophe; Hot, David
2014-01-01
The recent progresses of high-throughput sequencing (HTS) technologies enable easy and cost-reduced access to whole genome sequencing (WGS) or re-sequencing. HTS associated with adapted, automatic and fast bioinformatics solutions for sequencing applications promises an accurate and timely identification and characterization of pathogenic agents. Many studies have demonstrated that data obtained from HTS analysis have allowed genome-based diagnosis, which has been consistent with phenotypic observations. These proofs of concept are probably the first steps toward the future of clinical microbiology. From concept to routine use, many parameters need to be considered to promote HTS as a powerful tool to help physicians and clinicians in microbiological investigations. This review highlights the milestones to be completed toward this purpose. PMID:25437800
High-Throughput Single-Cell RNA Sequencing and Data Analysis.
Sagar; Herman, Josip Stefan; Pospisilik, John Andrew; Grün, Dominic
2018-01-01
Understanding biological systems at a single cell resolution may reveal several novel insights which remain masked by the conventional population-based techniques providing an average readout of the behavior of cells. Single-cell transcriptome sequencing holds the potential to identify novel cell types and characterize the cellular composition of any organ or tissue in health and disease. Here, we describe a customized high-throughput protocol for single-cell RNA-sequencing (scRNA-seq) combining flow cytometry and a nanoliter-scale robotic system. Since scRNA-seq requires amplification of a low amount of endogenous cellular RNA, leading to substantial technical noise in the dataset, downstream data filtering and analysis require special care. Therefore, we also briefly describe in-house state-of-the-art data analysis algorithms developed to identify cellular subpopulations including rare cell types as well as to derive lineage trees by ordering the identified subpopulations of cells along the inferred differentiation trajectories.
Quigley, Lisa; O'Sullivan, Orla; Beresford, Tom P.; Ross, R. Paul; Fitzgerald, Gerald F.
2012-01-01
Here, high-throughput sequencing was employed to reveal the highly diverse bacterial populations present in 62 Irish artisanal cheeses and, in some cases, associated cheese rinds. Using this approach, we revealed the presence of several genera not previously associated with cheese, including Faecalibacterium, Prevotella, and Helcococcus and, for the first time, detected the presence of Arthrobacter and Brachybacterium in goats' milk cheese. Our analysis confirmed many previously observed patterns, such as the dominance of typical cheese bacteria, the fact that the microbiota of raw and pasteurized milk cheeses differ, and that the level of cheese maturation has a significant influence on Lactobacillus populations. It was also noted that cheeses containing adjunct ingredients had lower proportions of Lactococcus species. It is thus apparent that high-throughput sequencing-based investigations can provide valuable insights into the microbial populations of artisanal foods. PMID:22685131
Quigley, Lisa; O'Sullivan, Orla; Beresford, Tom P; Ross, R Paul; Fitzgerald, Gerald F; Cotter, Paul D
2012-08-01
Here, high-throughput sequencing was employed to reveal the highly diverse bacterial populations present in 62 Irish artisanal cheeses and, in some cases, associated cheese rinds. Using this approach, we revealed the presence of several genera not previously associated with cheese, including Faecalibacterium, Prevotella, and Helcococcus and, for the first time, detected the presence of Arthrobacter and Brachybacterium in goats' milk cheese. Our analysis confirmed many previously observed patterns, such as the dominance of typical cheese bacteria, the fact that the microbiota of raw and pasteurized milk cheeses differ, and that the level of cheese maturation has a significant influence on Lactobacillus populations. It was also noted that cheeses containing adjunct ingredients had lower proportions of Lactococcus species. It is thus apparent that high-throughput sequencing-based investigations can provide valuable insights into the microbial populations of artisanal foods.
Asha, Srinivasan; Sreekumar, Sweda; Soniya, E V
2016-01-01
Analysis of high-throughput small RNA deep sequencing data, in combination with black pepper transcriptome sequences revealed microRNA-mediated gene regulation in black pepper ( Piper nigrum L.). Black pepper is an important spice crop and its berries are used worldwide as a natural food additive that contributes unique flavour to foods. In the present study to characterize microRNAs from black pepper, we generated a small RNA library from black pepper leaf and sequenced it by Illumina high-throughput sequencing technology. MicroRNAs belonging to a total of 303 conserved miRNA families were identified from the sRNAome data. Subsequent analysis from recently sequenced black pepper transcriptome confirmed precursor sequences of 50 conserved miRNAs and four potential novel miRNA candidates. Stem-loop qRT-PCR experiments demonstrated differential expression of eight conserved miRNAs in black pepper. Computational analysis of targets of the miRNAs showed 223 potential black pepper unigene targets that encode diverse transcription factors and enzymes involved in plant development, disease resistance, metabolic and signalling pathways. RLM-RACE experiments further mapped miRNA-mediated cleavage at five of the mRNA targets. In addition, miRNA isoforms corresponding to 18 miRNA families were also identified from black pepper. This study presents the first large-scale identification of microRNAs from black pepper and provides the foundation for the future studies of miRNA-mediated gene regulation of stress responses and diverse metabolic processes in black pepper.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Daum, Christopher; Zane, Matthew; Han, James
2011-01-31
The U.S. Department of Energy (DOE) Joint Genome Institute's (JGI) Production Sequencing group is committed to the generation of high-quality genomic DNA sequence to support the mission areas of renewable energy generation, global carbon management, and environmental characterization and clean-up. Within the JGI's Production Sequencing group, a robust Illumina Genome Analyzer and HiSeq pipeline has been established. Optimization of the sesequencer pipelines has been ongoing with the aim of continual process improvement of the laboratory workflow, reducing operational costs and project cycle times to increases ample throughput, and improving the overall quality of the sequence generated. A sequence QC analysismore » pipeline has been implemented to automatically generate read and assembly level quality metrics. The foremost of these optimization projects, along with sequencing and operational strategies, throughput numbers, and sequencing quality results will be presented.« less
Library Design-Facilitated High-Throughput Sequencing of Synthetic Peptide Libraries.
Vinogradov, Alexander A; Gates, Zachary P; Zhang, Chi; Quartararo, Anthony J; Halloran, Kathryn H; Pentelute, Bradley L
2017-11-13
A methodology to achieve high-throughput de novo sequencing of synthetic peptide mixtures is reported. The approach leverages shotgun nanoliquid chromatography coupled with tandem mass spectrometry-based de novo sequencing of library mixtures (up to 2000 peptides) as well as automated data analysis protocols to filter away incorrect assignments, noise, and synthetic side-products. For increasing the confidence in the sequencing results, mass spectrometry-friendly library designs were developed that enabled unambiguous decoding of up to 600 peptide sequences per hour while maintaining greater than 85% sequence identification rates in most cases. The reliability of the reported decoding strategy was additionally confirmed by matching fragmentation spectra for select authentic peptides identified from library sequencing samples. The methods reported here are directly applicable to screening techniques that yield mixtures of active compounds, including particle sorting of one-bead one-compound libraries and affinity enrichment of synthetic library mixtures performed in solution.
Madanecki, Piotr; Bałut, Magdalena; Buckley, Patrick G; Ochocka, J Renata; Bartoszewski, Rafał; Crossman, David K; Messiaen, Ludwine M; Piotrowski, Arkadiusz
2018-01-01
High-throughput technologies generate considerable amount of data which often requires bioinformatic expertise to analyze. Here we present High-Throughput Tabular Data Processor (HTDP), a platform independent Java program. HTDP works on any character-delimited column data (e.g. BED, GFF, GTF, PSL, WIG, VCF) from multiple text files and supports merging, filtering and converting of data that is produced in the course of high-throughput experiments. HTDP can also utilize itemized sets of conditions from external files for complex or repetitive filtering/merging tasks. The program is intended to aid global, real-time processing of large data sets using a graphical user interface (GUI). Therefore, no prior expertise in programming, regular expression, or command line usage is required of the user. Additionally, no a priori assumptions are imposed on the internal file composition. We demonstrate the flexibility and potential of HTDP in real-life research tasks including microarray and massively parallel sequencing, i.e. identification of disease predisposing variants in the next generation sequencing data as well as comprehensive concurrent analysis of microarray and sequencing results. We also show the utility of HTDP in technical tasks including data merge, reduction and filtering with external criteria files. HTDP was developed to address functionality that is missing or rudimentary in other GUI software for processing character-delimited column data from high-throughput technologies. Flexibility, in terms of input file handling, provides long term potential functionality in high-throughput analysis pipelines, as the program is not limited by the currently existing applications and data formats. HTDP is available as the Open Source software (https://github.com/pmadanecki/htdp).
Bałut, Magdalena; Buckley, Patrick G.; Ochocka, J. Renata; Bartoszewski, Rafał; Crossman, David K.; Messiaen, Ludwine M.; Piotrowski, Arkadiusz
2018-01-01
High-throughput technologies generate considerable amount of data which often requires bioinformatic expertise to analyze. Here we present High-Throughput Tabular Data Processor (HTDP), a platform independent Java program. HTDP works on any character-delimited column data (e.g. BED, GFF, GTF, PSL, WIG, VCF) from multiple text files and supports merging, filtering and converting of data that is produced in the course of high-throughput experiments. HTDP can also utilize itemized sets of conditions from external files for complex or repetitive filtering/merging tasks. The program is intended to aid global, real-time processing of large data sets using a graphical user interface (GUI). Therefore, no prior expertise in programming, regular expression, or command line usage is required of the user. Additionally, no a priori assumptions are imposed on the internal file composition. We demonstrate the flexibility and potential of HTDP in real-life research tasks including microarray and massively parallel sequencing, i.e. identification of disease predisposing variants in the next generation sequencing data as well as comprehensive concurrent analysis of microarray and sequencing results. We also show the utility of HTDP in technical tasks including data merge, reduction and filtering with external criteria files. HTDP was developed to address functionality that is missing or rudimentary in other GUI software for processing character-delimited column data from high-throughput technologies. Flexibility, in terms of input file handling, provides long term potential functionality in high-throughput analysis pipelines, as the program is not limited by the currently existing applications and data formats. HTDP is available as the Open Source software (https://github.com/pmadanecki/htdp). PMID:29432475
2011-01-01
Background Next-generation sequencing technologies have decentralized sequence acquisition, increasing the demand for new bioinformatics tools that are easy to use, portable across multiple platforms, and scalable for high-throughput applications. Cloud computing platforms provide on-demand access to computing infrastructure over the Internet and can be used in combination with custom built virtual machines to distribute pre-packaged with pre-configured software. Results We describe the Cloud Virtual Resource, CloVR, a new desktop application for push-button automated sequence analysis that can utilize cloud computing resources. CloVR is implemented as a single portable virtual machine (VM) that provides several automated analysis pipelines for microbial genomics, including 16S, whole genome and metagenome sequence analysis. The CloVR VM runs on a personal computer, utilizes local computer resources and requires minimal installation, addressing key challenges in deploying bioinformatics workflows. In addition CloVR supports use of remote cloud computing resources to improve performance for large-scale sequence processing. In a case study, we demonstrate the use of CloVR to automatically process next-generation sequencing data on multiple cloud computing platforms. Conclusion The CloVR VM and associated architecture lowers the barrier of entry for utilizing complex analysis protocols on both local single- and multi-core computers and cloud systems for high throughput data processing. PMID:21878105
Anslan, Sten; Bahram, Mohammad; Hiiesalu, Indrek; Tedersoo, Leho
2017-11-01
High-throughput sequencing methods have become a routine analysis tool in environmental sciences as well as in public and private sector. These methods provide vast amount of data, which need to be analysed in several steps. Although the bioinformatics may be applied using several public tools, many analytical pipelines allow too few options for the optimal analysis for more complicated or customized designs. Here, we introduce PipeCraft, a flexible and handy bioinformatics pipeline with a user-friendly graphical interface that links several public tools for analysing amplicon sequencing data. Users are able to customize the pipeline by selecting the most suitable tools and options to process raw sequences from Illumina, Pacific Biosciences, Ion Torrent and Roche 454 sequencing platforms. We described the design and options of PipeCraft and evaluated its performance by analysing the data sets from three different sequencing platforms. We demonstrated that PipeCraft is able to process large data sets within 24 hr. The graphical user interface and the automated links between various bioinformatics tools enable easy customization of the workflow. All analytical steps and options are recorded in log files and are easily traceable. © 2017 John Wiley & Sons Ltd.
Jacob, Jacob H; Hussein, Emad I; Shakhatreh, Muhamad Ali K; Cornelison, Christopher T
2017-10-01
Amplicon sequencing using next-generation technology (bTEFAP ® ) has been utilized in describing the diversity of Dead Sea microbiota. The investigated area is a well-known salt lake in the western part of Jordan found in the lowest geographical location in the world (more than 420 m below sea level) and characterized by extreme salinity (approximately, 34%) in addition to other extreme conditions (low pH, unique ionic composition different from sea water). DNA was extracted from Dead Sea water. A total of 314,310 small subunit RNA (SSU rRNA) sequences were parsed, and 288,452 sequences were then clustered. For alpha diversity analysis, sample was rarefied to 3,000 sequences. The Shannon-Wiener index curve plot reached a plateau at approximately 3,000 sequences indicating that sequencing depth was sufficient to capture the full scope of microbial diversity. Archaea was found to be dominating the sequences (52%), whereas Bacteria constitute 45% of the sequences. Altogether, prokaryotic sequences (which constitute 97% of all sequences) were found to predominate. The findings expand on previous studies by using high-throughput amplicon sequencing to describe the microbial community in an environment which in recent years has been shown to hide some interesting diversity. © 2017 The Authors. MicrobiologyOpen published by John Wiley & Sons Ltd.
NASA Astrophysics Data System (ADS)
Cannon, M. V.; Hester, J.; Shalkhauser, A.; Chan, E. R.; Logue, K.; Small, S. T.; Serre, D.
2016-03-01
Analysis of environmental DNA (eDNA) enables the detection of species of interest from water and soil samples, typically using species-specific PCR. Here, we describe a method to characterize the biodiversity of a given environment by amplifying eDNA using primer pairs targeting a wide range of taxa and high-throughput sequencing for species identification. We tested this approach on 91 water samples of 40 mL collected along the Cuyahoga River (Ohio, USA). We amplified eDNA using 12 primer pairs targeting mammals, fish, amphibians, birds, bryophytes, arthropods, copepods, plants and several microorganism taxa and sequenced all PCR products simultaneously by high-throughput sequencing. Overall, we identified DNA sequences from 15 species of fish, 17 species of mammals, 8 species of birds, 15 species of arthropods, one turtle and one salamander. Interestingly, in addition to aquatic and semi-aquatic animals, we identified DNA from terrestrial species that live near the Cuyahoga River. We also identified DNA from one Asian carp species invasive to the Great Lakes but that had not been previously reported in the Cuyahoga River. Our study shows that analysis of eDNA extracted from small water samples using wide-range PCR amplification combined with high-throughput sequencing can provide a broad perspective on biological diversity.
Cannon, M. V.; Hester, J.; Shalkhauser, A.; Chan, E. R.; Logue, K.; Small, S. T.; Serre, D.
2016-01-01
Analysis of environmental DNA (eDNA) enables the detection of species of interest from water and soil samples, typically using species-specific PCR. Here, we describe a method to characterize the biodiversity of a given environment by amplifying eDNA using primer pairs targeting a wide range of taxa and high-throughput sequencing for species identification. We tested this approach on 91 water samples of 40 mL collected along the Cuyahoga River (Ohio, USA). We amplified eDNA using 12 primer pairs targeting mammals, fish, amphibians, birds, bryophytes, arthropods, copepods, plants and several microorganism taxa and sequenced all PCR products simultaneously by high-throughput sequencing. Overall, we identified DNA sequences from 15 species of fish, 17 species of mammals, 8 species of birds, 15 species of arthropods, one turtle and one salamander. Interestingly, in addition to aquatic and semi-aquatic animals, we identified DNA from terrestrial species that live near the Cuyahoga River. We also identified DNA from one Asian carp species invasive to the Great Lakes but that had not been previously reported in the Cuyahoga River. Our study shows that analysis of eDNA extracted from small water samples using wide-range PCR amplification combined with high-throughput sequencing can provide a broad perspective on biological diversity. PMID:26965911
USDA-ARS?s Scientific Manuscript database
High-throughput sequencing of reduced representation genomic libraries has ushered in an era of genotyping-by-sequencing (GBS), where genome-wide genotype data can be obtained for nearly any species. However, there remains a need for imputation-free GBS methods for genotyping large samples taken fr...
Xiao, Yongli; Sheng, Zong-Mei; Taubenberger, Jeffery K.
2015-01-01
The vast majority of surgical biopsy and post-mortem tissue samples are formalin-fixed and paraffin-embedded (FFPE), but this process leads to RNA degradation that limits gene expression analysis. As an example, the viral RNA genome of the 1918 pandemic influenza A virus was previously determined in a 9-year effort by overlapping RT-PCR from post-mortem samples. Using the protocols described here, the full genome of the 1918 virus at high coverage was determined in one high-throughput sequencing run of a cDNA library derived from total RNA of a 1918 FFPE sample after duplex-specific nuclease treatments. This basic methodological approach should assist in the analysis of FFPE tissue samples isolated over the past century from a variety of infectious diseases. PMID:26344216
Vinícius de Melo, Gilberto
2018-01-01
Summary Coffee bean fermentation is a spontaneous, on-farm process involving the action of different microbial groups, including bacteria and fungi. In this study, high-throughput sequencing approach was employed to study the diversity and dynamics of bacteria associated with Brazilian coffee bean fermentation. The total DNA from fermenting coffee samples was extracted at different time points, and the 16S rRNA gene with segments around the V4 variable region was sequenced by Illumina high-throughput platform. Using this approach, the presence of over eighty bacterial genera was determined, many of which have been detected for the first time during coffee bean fermentation, including Fructobacillus, Pseudonocardia, Pedobacter, Sphingomonas and Hymenobacter. The presence of Fructobacillus suggests an influence of these bacteria on fructose metabolism during coffee fermentation. Temporal analysis showed a strong dominance of lactic acid bacteria with over 97% of read sequences at the end of fermentation, mainly represented by the Leuconostoc and Lactococcus. Metabolism of lactic acid bacteria was associated with the high formation of lactic acid during fermentation, as determined by HPLC analysis. The results reported in this study confirm the underestimation of bacterial diversity associated with coffee fermentation. New microbial groups reported in this study may be explored as functional starter cultures for on-farm coffee processing.
High-Throughput Mapping of Single-Neuron Projections by Sequencing of Barcoded RNA.
Kebschull, Justus M; Garcia da Silva, Pedro; Reid, Ashlan P; Peikon, Ian D; Albeanu, Dinu F; Zador, Anthony M
2016-09-07
Neurons transmit information to distant brain regions via long-range axonal projections. In the mouse, area-to-area connections have only been systematically mapped using bulk labeling techniques, which obscure the diverse projections of intermingled single neurons. Here we describe MAPseq (Multiplexed Analysis of Projections by Sequencing), a technique that can map the projections of thousands or even millions of single neurons by labeling large sets of neurons with random RNA sequences ("barcodes"). Axons are filled with barcode mRNA, each putative projection area is dissected, and the barcode mRNA is extracted and sequenced. Applying MAPseq to the locus coeruleus (LC), we find that individual LC neurons have preferred cortical targets. By recasting neuroanatomy, which is traditionally viewed as a problem of microscopy, as a problem of sequencing, MAPseq harnesses advances in sequencing technology to permit high-throughput interrogation of brain circuits. Copyright © 2016 Elsevier Inc. All rights reserved.
Assembly and diploid architecture of an individual human genome via single-molecule technologies
Pendleton, Matthew; Sebra, Robert; Pang, Andy Wing Chun; Ummat, Ajay; Franzen, Oscar; Rausch, Tobias; Stütz, Adrian M; Stedman, William; Anantharaman, Thomas; Hastie, Alex; Dai, Heng; Fritz, Markus Hsi-Yang; Cao, Han; Cohain, Ariella; Deikus, Gintaras; Durrett, Russell E; Blanchard, Scott C; Altman, Roger; Chin, Chen-Shan; Guo, Yan; Paxinos, Ellen E; Korbel, Jan O; Darnell, Robert B; McCombie, W Richard; Kwok, Pui-Yan; Mason, Christopher E; Schadt, Eric E; Bashir, Ali
2015-01-01
We present the first comprehensive analysis of a diploid human genome that combines single-molecule sequencing with single-molecule genome maps. Our hybrid assembly markedly improves upon the contiguity observed from traditional shotgun sequencing approaches, with scaffold N50 values approaching 30 Mb, and we identified complex structural variants (SVs) missed by other high-throughput approaches. Furthermore, by combining Illumina short-read data with long reads, we phased both single-nucleotide variants and SVs, generating haplotypes with over 99% consistency with previous trio-based studies. Our work shows that it is now possible to integrate single-molecule and high-throughput sequence data to generate de novo assembled genomes that approach reference quality. PMID:26121404
Assembly and diploid architecture of an individual human genome via single-molecule technologies.
Pendleton, Matthew; Sebra, Robert; Pang, Andy Wing Chun; Ummat, Ajay; Franzen, Oscar; Rausch, Tobias; Stütz, Adrian M; Stedman, William; Anantharaman, Thomas; Hastie, Alex; Dai, Heng; Fritz, Markus Hsi-Yang; Cao, Han; Cohain, Ariella; Deikus, Gintaras; Durrett, Russell E; Blanchard, Scott C; Altman, Roger; Chin, Chen-Shan; Guo, Yan; Paxinos, Ellen E; Korbel, Jan O; Darnell, Robert B; McCombie, W Richard; Kwok, Pui-Yan; Mason, Christopher E; Schadt, Eric E; Bashir, Ali
2015-08-01
We present the first comprehensive analysis of a diploid human genome that combines single-molecule sequencing with single-molecule genome maps. Our hybrid assembly markedly improves upon the contiguity observed from traditional shotgun sequencing approaches, with scaffold N50 values approaching 30 Mb, and we identified complex structural variants (SVs) missed by other high-throughput approaches. Furthermore, by combining Illumina short-read data with long reads, we phased both single-nucleotide variants and SVs, generating haplotypes with over 99% consistency with previous trio-based studies. Our work shows that it is now possible to integrate single-molecule and high-throughput sequence data to generate de novo assembled genomes that approach reference quality.
A computational method for estimating the PCR duplication rate in DNA and RNA-seq experiments.
Bansal, Vikas
2017-03-14
PCR amplification is an important step in the preparation of DNA sequencing libraries prior to high-throughput sequencing. PCR amplification introduces redundant reads in the sequence data and estimating the PCR duplication rate is important to assess the frequency of such reads. Existing computational methods do not distinguish PCR duplicates from "natural" read duplicates that represent independent DNA fragments and therefore, over-estimate the PCR duplication rate for DNA-seq and RNA-seq experiments. In this paper, we present a computational method to estimate the average PCR duplication rate of high-throughput sequence datasets that accounts for natural read duplicates by leveraging heterozygous variants in an individual genome. Analysis of simulated data and exome sequence data from the 1000 Genomes project demonstrated that our method can accurately estimate the PCR duplication rate on paired-end as well as single-end read datasets which contain a high proportion of natural read duplicates. Further, analysis of exome datasets prepared using the Nextera library preparation method indicated that 45-50% of read duplicates correspond to natural read duplicates likely due to fragmentation bias. Finally, analysis of RNA-seq datasets from individuals in the 1000 Genomes project demonstrated that 70-95% of read duplicates observed in such datasets correspond to natural duplicates sampled from genes with high expression and identified outlier samples with a 2-fold greater PCR duplication rate than other samples. The method described here is a useful tool for estimating the PCR duplication rate of high-throughput sequence datasets and for assessing the fraction of read duplicates that correspond to natural read duplicates. An implementation of the method is available at https://github.com/vibansal/PCRduplicates .
Kebschull, Moritz; Fittler, Melanie Julia; Demmer, Ryan T; Papapanou, Panos N
2017-01-01
Today, -omics analyses, including the systematic cataloging of messenger RNA and microRNA sequences or DNA methylation patterns in a cell population, organ, or tissue sample, allow for an unbiased, comprehensive genome-level analysis of complex diseases, offering a large advantage over earlier "candidate" gene or pathway analyses. A primary goal in the analysis of these high-throughput assays is the detection of those features among several thousand that differ between different groups of samples. In the context of oral biology, our group has successfully utilized -omics technology to identify key molecules and pathways in different diagnostic entities of periodontal disease.A major issue when inferring biological information from high-throughput -omics studies is the fact that the sheer volume of high-dimensional data generated by contemporary technology is not appropriately analyzed using common statistical methods employed in the biomedical sciences.In this chapter, we outline a robust and well-accepted bioinformatics workflow for the initial analysis of -omics data generated using microarrays or next-generation sequencing technology using open-source tools. Starting with quality control measures and necessary preprocessing steps for data originating from different -omics technologies, we next outline a differential expression analysis pipeline that can be used for data from both microarray and sequencing experiments, and offers the possibility to account for random or fixed effects. Finally, we present an overview of the possibilities for a functional analysis of the obtained data.
Non-biological synthetic spike-in controls and the AMPtk software pipeline improve mycobiome data
Jonathan M. Palmer; Michelle A. Jusino; Mark T. Banik; Daniel L. Lindner
2018-01-01
High-throughput amplicon sequencing (HTAS) of conserved DNA regions is a powerful technique to characterize microbial communities. Recently, spike-in mock communities have been used to measure accuracy of sequencing platforms and data analysis pipelines. To assess the ability of sequencing platforms and data processing pipelines using fungal internal transcribed spacer...
HTSeq--a Python framework to work with high-throughput sequencing data.
Anders, Simon; Pyl, Paul Theodor; Huber, Wolfgang
2015-01-15
A large choice of tools exists for many standard tasks in the analysis of high-throughput sequencing (HTS) data. However, once a project deviates from standard workflows, custom scripts are needed. We present HTSeq, a Python library to facilitate the rapid development of such scripts. HTSeq offers parsers for many common data formats in HTS projects, as well as classes to represent data, such as genomic coordinates, sequences, sequencing reads, alignments, gene model information and variant calls, and provides data structures that allow for querying via genomic coordinates. We also present htseq-count, a tool developed with HTSeq that preprocesses RNA-Seq data for differential expression analysis by counting the overlap of reads with genes. HTSeq is released as an open-source software under the GNU General Public Licence and available from http://www-huber.embl.de/HTSeq or from the Python Package Index at https://pypi.python.org/pypi/HTSeq. © The Author 2014. Published by Oxford University Press.
Diroma, Maria Angela; Santorsola, Mariangela; Guttà, Cristiano; Gasparre, Giuseppe; Picardi, Ernesto; Pesole, Graziano; Attimonelli, Marcella
2014-01-01
Motivation: The increasing availability of mitochondria-targeted and off-target sequencing data in whole-exome and whole-genome sequencing studies (WXS and WGS) has risen the demand of effective pipelines to accurately measure heteroplasmy and to easily recognize the most functionally important mitochondrial variants among a huge number of candidates. To this purpose, we developed MToolBox, a highly automated pipeline to reconstruct and analyze human mitochondrial DNA from high-throughput sequencing data. Results: MToolBox implements an effective computational strategy for mitochondrial genomes assembling and haplogroup assignment also including a prioritization analysis of detected variants. MToolBox provides a Variant Call Format file featuring, for the first time, allele-specific heteroplasmy and annotation files with prioritized variants. MToolBox was tested on simulated samples and applied on 1000 Genomes WXS datasets. Availability and implementation: MToolBox package is available at https://sourceforge.net/projects/mtoolbox/. Contact: marcella.attimonelli@uniba.it Supplementary information: Supplementary data are available at Bioinformatics online. PMID:25028726
Microbial forensics: fiber optic microarray subtyping of Bacillus anthracis
NASA Astrophysics Data System (ADS)
Shepard, Jason R. E.
2009-05-01
The past decade has seen increased development and subsequent adoption of rapid molecular techniques involving DNA analysis for detection of pathogenic microorganisms, also termed microbial forensics. The continued accumulation of microbial sequence information in genomic databases now better positions the field of high-throughput DNA analysis to proceed in a more manageable fashion. The potential to build off of these databases exists as technology continues to develop, which will enable more rapid, cost effective analyses. This wealth of genetic information, along with new technologies, has the potential to better address some of the current problems and solve the key issues involved in DNA analysis of pathogenic microorganisms. To this end, a high density fiber optic microarray has been employed, housing numerous DNA sequences simultaneously for detection of various pathogenic microorganisms, including Bacillus anthracis, among others. Each organism is analyzed with multiple sequences and can be sub-typed against other closely related organisms. For public health labs, real-time PCR methods have been developed as an initial preliminary screen, but culture and growth are still considered the gold standard. Technologies employing higher throughput than these standard methods are better suited to capitalize on the limitless potential garnered from the sequence information. Microarray analyses are one such format positioned to exploit this potential, and our array platform is reusable, allowing repetitive tests on a single array, providing an increase in throughput and decrease in cost, along with a certainty of detection, down to the individual strain level.
PANGEA: pipeline for analysis of next generation amplicons
Giongo, Adriana; Crabb, David B; Davis-Richardson, Austin G; Chauliac, Diane; Mobberley, Jennifer M; Gano, Kelsey A; Mukherjee, Nabanita; Casella, George; Roesch, Luiz FW; Walts, Brandon; Riva, Alberto; King, Gary; Triplett, Eric W
2010-01-01
High-throughput DNA sequencing can identify organisms and describe population structures in many environmental and clinical samples. Current technologies generate millions of reads in a single run, requiring extensive computational strategies to organize, analyze and interpret those sequences. A series of bioinformatics tools for high-throughput sequencing analysis, including preprocessing, clustering, database matching and classification, have been compiled into a pipeline called PANGEA. The PANGEA pipeline was written in Perl and can be run on Mac OSX, Windows or Linux. With PANGEA, sequences obtained directly from the sequencer can be processed quickly to provide the files needed for sequence identification by BLAST and for comparison of microbial communities. Two different sets of bacterial 16S rRNA sequences were used to show the efficiency of this workflow. The first set of 16S rRNA sequences is derived from various soils from Hawaii Volcanoes National Park. The second set is derived from stool samples collected from diabetes-resistant and diabetes-prone rats. The workflow described here allows the investigator to quickly assess libraries of sequences on personal computers with customized databases. PANGEA is provided for users as individual scripts for each step in the process or as a single script where all processes, except the χ2 step, are joined into one program called the ‘backbone’. PMID:20182525
PANGEA: pipeline for analysis of next generation amplicons.
Giongo, Adriana; Crabb, David B; Davis-Richardson, Austin G; Chauliac, Diane; Mobberley, Jennifer M; Gano, Kelsey A; Mukherjee, Nabanita; Casella, George; Roesch, Luiz F W; Walts, Brandon; Riva, Alberto; King, Gary; Triplett, Eric W
2010-07-01
High-throughput DNA sequencing can identify organisms and describe population structures in many environmental and clinical samples. Current technologies generate millions of reads in a single run, requiring extensive computational strategies to organize, analyze and interpret those sequences. A series of bioinformatics tools for high-throughput sequencing analysis, including pre-processing, clustering, database matching and classification, have been compiled into a pipeline called PANGEA. The PANGEA pipeline was written in Perl and can be run on Mac OSX, Windows or Linux. With PANGEA, sequences obtained directly from the sequencer can be processed quickly to provide the files needed for sequence identification by BLAST and for comparison of microbial communities. Two different sets of bacterial 16S rRNA sequences were used to show the efficiency of this workflow. The first set of 16S rRNA sequences is derived from various soils from Hawaii Volcanoes National Park. The second set is derived from stool samples collected from diabetes-resistant and diabetes-prone rats. The workflow described here allows the investigator to quickly assess libraries of sequences on personal computers with customized databases. PANGEA is provided for users as individual scripts for each step in the process or as a single script where all processes, except the chi(2) step, are joined into one program called the 'backbone'.
Christiansen, Anders; Kringelum, Jens V; Hansen, Christian S; Bøgh, Katrine L; Sullivan, Eric; Patel, Jigar; Rigby, Neil M; Eiwegger, Thomas; Szépfalusi, Zsolt; de Masi, Federico; Nielsen, Morten; Lund, Ole; Dufva, Martin
2015-08-06
Phage display is a prominent screening technique with a multitude of applications including therapeutic antibody development and mapping of antigen epitopes. In this study, phages were selected based on their interaction with patient serum and exhaustively characterised by high-throughput sequencing. A bioinformatics approach was developed in order to identify peptide motifs of interest based on clustering and contrasting to control samples. Comparison of patient and control samples confirmed a major issue in phage display, namely the selection of unspecific peptides. The potential of the bioinformatic approach was demonstrated by identifying epitopes of a prominent peanut allergen, Ara h 1, in sera from patients with severe peanut allergy. The identified epitopes were confirmed by high-density peptide micro-arrays. The present study demonstrates that high-throughput sequencing can empower phage display by (i) enabling the analysis of complex biological samples, (ii) circumventing the traditional laborious picking and functional testing of individual phage clones and (iii) reducing the number of selection rounds.
Burdick, David B; Cavnor, Chris C; Handcock, Jeremy; Killcoyne, Sarah; Lin, Jake; Marzolf, Bruz; Ramsey, Stephen A; Rovira, Hector; Bressler, Ryan; Shmulevich, Ilya; Boyle, John
2010-07-14
High throughput sequencing has become an increasingly important tool for biological research. However, the existing software systems for managing and processing these data have not provided the flexible infrastructure that research requires. Existing software solutions provide static and well-established algorithms in a restrictive package. However as high throughput sequencing is a rapidly evolving field, such static approaches lack the ability to readily adopt the latest advances and techniques which are often required by researchers. We have used a loosely coupled, service-oriented infrastructure to develop SeqAdapt. This system streamlines data management and allows for rapid integration of novel algorithms. Our approach also allows computational biologists to focus on developing and applying new methods instead of writing boilerplate infrastructure code. The system is based around the Addama service architecture and is available at our website as a demonstration web application, an installable single download and as a collection of individual customizable services.
2010-01-01
Background High throughput sequencing has become an increasingly important tool for biological research. However, the existing software systems for managing and processing these data have not provided the flexible infrastructure that research requires. Results Existing software solutions provide static and well-established algorithms in a restrictive package. However as high throughput sequencing is a rapidly evolving field, such static approaches lack the ability to readily adopt the latest advances and techniques which are often required by researchers. We have used a loosely coupled, service-oriented infrastructure to develop SeqAdapt. This system streamlines data management and allows for rapid integration of novel algorithms. Our approach also allows computational biologists to focus on developing and applying new methods instead of writing boilerplate infrastructure code. Conclusion The system is based around the Addama service architecture and is available at our website as a demonstration web application, an installable single download and as a collection of individual customizable services. PMID:20630057
Rosenfeld, Aaron M; Meng, Wenzhao; Luning Prak, Eline T; Hershberg, Uri
2017-01-15
As high-throughput sequencing of B cells becomes more common, the need for tools to analyze the large quantity of data also increases. This article introduces ImmuneDB, a system for analyzing vast amounts of heavy chain variable region sequences and exploring the resulting data. It can take as input raw FASTA/FASTQ data, identify genes, determine clones, construct lineages, as well as provide information such as selection pressure and mutation analysis. It uses an industry leading database, MySQL, to provide fast analysis and avoid the complexities of using error prone flat-files. ImmuneDB is freely available at http://immunedb.comA demo of the ImmuneDB web interface is available at: http://immunedb.com/demo CONTACT: Uh25@drexel.eduSupplementary information: Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
TriageTools: tools for partitioning and prioritizing analysis of high-throughput sequencing data.
Fimereli, Danai; Detours, Vincent; Konopka, Tomasz
2013-04-01
High-throughput sequencing is becoming a popular research tool but carries with it considerable costs in terms of computation time, data storage and bandwidth. Meanwhile, some research applications focusing on individual genes or pathways do not necessitate processing of a full sequencing dataset. Thus, it is desirable to partition a large dataset into smaller, manageable, but relevant pieces. We present a toolkit for partitioning raw sequencing data that includes a method for extracting reads that are likely to map onto pre-defined regions of interest. We show the method can be used to extract information about genes of interest from DNA or RNA sequencing samples in a fraction of the time and disk space required to process and store a full dataset. We report speedup factors between 2.6 and 96, depending on settings and samples used. The software is available at http://www.sourceforge.net/projects/triagetools/.
Takahashi, Mayumi; Wu, Xiwei; Ho, Michelle; Chomchan, Pritsana; Rossi, John J; Burnett, John C; Zhou, Jiehua
2016-09-22
The systemic evolution of ligands by exponential enrichment (SELEX) technique is a powerful and effective aptamer-selection procedure. However, modifications to the process can dramatically improve selection efficiency and aptamer performance. For example, droplet digital PCR (ddPCR) has been recently incorporated into SELEX selection protocols to putatively reduce the propagation of byproducts and avoid selection bias that result from differences in PCR efficiency of sequences within the random library. However, a detailed, parallel comparison of the efficacy of conventional solution PCR versus the ddPCR modification in the RNA aptamer-selection process is needed to understand effects on overall SELEX performance. In the present study, we took advantage of powerful high throughput sequencing technology and bioinformatics analysis coupled with SELEX (HT-SELEX) to thoroughly investigate the effects of initial library and PCR methods in the RNA aptamer identification. Our analysis revealed that distinct "biased sequences" and nucleotide composition existed in the initial, unselected libraries purchased from two different manufacturers and that the fate of the "biased sequences" was target-dependent during selection. Our comparison of solution PCR- and ddPCR-driven HT-SELEX demonstrated that PCR method affected not only the nucleotide composition of the enriched sequences, but also the overall SELEX efficiency and aptamer efficacy.
High-Throughput Analysis of T-DNA Location and Structure Using Sequence Capture.
Inagaki, Soichi; Henry, Isabelle M; Lieberman, Meric C; Comai, Luca
2015-01-01
Agrobacterium-mediated transformation of plants with T-DNA is used both to introduce transgenes and for mutagenesis. Conventional approaches used to identify the genomic location and the structure of the inserted T-DNA are laborious and high-throughput methods using next-generation sequencing are being developed to address these problems. Here, we present a cost-effective approach that uses sequence capture targeted to the T-DNA borders to select genomic DNA fragments containing T-DNA-genome junctions, followed by Illumina sequencing to determine the location and junction structure of T-DNA insertions. Multiple probes can be mixed so that transgenic lines transformed with different T-DNA types can be processed simultaneously, using a simple, index-based pooling approach. We also developed a simple bioinformatic tool to find sequence read pairs that span the junction between the genome and T-DNA or any foreign DNA. We analyzed 29 transgenic lines of Arabidopsis thaliana, each containing inserts from 4 different T-DNA vectors. We determined the location of T-DNA insertions in 22 lines, 4 of which carried multiple insertion sites. Additionally, our analysis uncovered a high frequency of unconventional and complex T-DNA insertions, highlighting the needs for high-throughput methods for T-DNA localization and structural characterization. Transgene insertion events have to be fully characterized prior to use as commercial products. Our method greatly facilitates the first step of this characterization of transgenic plants by providing an efficient screen for the selection of promising lines.
Boyer, François; Boutouil, Hend; Dalloul, Iman; Dalloul, Zeinab; Cook-Moreau, Jeanne; Aldigier, Jean-Claude; Carrion, Claire; Herve, Bastien; Scaon, Erwan; Cogné, Michel; Péron, Sophie
2017-05-15
B cells ensure humoral immune responses due to the production of Ag-specific memory B cells and Ab-secreting plasma cells. In secondary lymphoid organs, Ag-driven B cell activation induces terminal maturation and Ig isotype class switch (class switch recombination [CSR]). CSR creates a virtually unique IgH locus in every B cell clone by intrachromosomal recombination between two switch (S) regions upstream of each C region gene. Amount and structural features of CSR junctions reveal valuable information about the CSR mechanism, and analysis of CSR junctions is useful in basic and clinical research studies of B cell functions. To provide an automated tool able to analyze large data sets of CSR junction sequences produced by high-throughput sequencing (HTS), we designed CSReport, a software program dedicated to support analysis of CSR recombination junctions sequenced with a HTS-based protocol (Ion Torrent technology). CSReport was assessed using simulated data sets of CSR junctions and then used for analysis of Sμ-Sα and Sμ-Sγ1 junctions from CH12F3 cells and primary murine B cells, respectively. CSReport identifies junction segment breakpoints on reference sequences and junction structure (blunt-ended junctions or junctions with insertions or microhomology). Besides the ability to analyze unprecedentedly large libraries of junction sequences, CSReport will provide a unified framework for CSR junction studies. Our results show that CSReport is an accurate tool for analysis of sequences from our HTS-based protocol for CSR junctions, thereby facilitating and accelerating their study. Copyright © 2017 by The American Association of Immunologists, Inc.
Sul, Woo Jun; Cole, James R.; Jesus, Ederson da C.; Wang, Qiong; Farris, Ryan J.; Fish, Jordan A.; Tiedje, James M.
2011-01-01
High-throughput sequencing of 16S rRNA genes has increased our understanding of microbial community structure, but now even higher-throughput methods to the Illumina scale allow the creation of much larger datasets with more samples and orders-of-magnitude more sequences that swamp current analytic methods. We developed a method capable of handling these larger datasets on the basis of assignment of sequences into an existing taxonomy using a supervised learning approach (taxonomy-supervised analysis). We compared this method with a commonly used clustering approach based on sequence similarity (taxonomy-unsupervised analysis). We sampled 211 different bacterial communities from various habitats and obtained ∼1.3 million 16S rRNA sequences spanning the V4 hypervariable region by pyrosequencing. Both methodologies gave similar ecological conclusions in that β-diversity measures calculated by using these two types of matrices were significantly correlated to each other, as were the ordination configurations and hierarchical clustering dendrograms. In addition, our taxonomy-supervised analyses were also highly correlated with phylogenetic methods, such as UniFrac. The taxonomy-supervised analysis has the advantages that it is not limited by the exhaustive computation required for the alignment and clustering necessary for the taxonomy-unsupervised analysis, is more tolerant of sequencing errors, and allows comparisons when sequences are from different regions of the 16S rRNA gene. With the tremendous expansion in 16S rRNA data acquisition underway, the taxonomy-supervised approach offers the potential to provide more rapid and extensive community comparisons across habitats and samples. PMID:21873204
DOE Office of Scientific and Technical Information (OSTI.GOV)
Davenport, Karen
Karen Davenport of Los Alamos National Laboratory discusses a high-throughput next generation genome finishing pipeline on June 3, 2010 at the "Sequencing, Finishing, Analysis in the Future" meeting in Santa Fe, NM.
USDA-ARS?s Scientific Manuscript database
Ongoing developments and cost decreases in next-generation sequencing (NGS) technologies have led to an increase in their application, which has greatly enhanced the fields of genetics and genomics. Mapping sequence reads onto a reference genome is a fundamental step in the analysis of NGS data. Eff...
Pollier, Jacob; González-Guzmán, Miguel; Ardiles-Diaz, Wilson; Geelen, Danny; Goossens, Alain
2011-01-01
cDNA-Amplified Fragment Length Polymorphism (cDNA-AFLP) is a commonly used technique for genome-wide expression analysis that does not require prior sequence knowledge. Typically, quantitative expression data and sequence information are obtained for a large number of differentially expressed gene tags. However, most of the gene tags do not correspond to full-length (FL) coding sequences, which is a prerequisite for subsequent functional analysis. A medium-throughput screening strategy, based on integration of polymerase chain reaction (PCR) and colony hybridization, was developed that allows in parallel screening of a cDNA library for FL clones corresponding to incomplete cDNAs. The method was applied to screen for the FL open reading frames of a selection of 163 cDNA-AFLP tags from three different medicinal plants, leading to the identification of 109 (67%) FL clones. Furthermore, the protocol allows for the use of multiple probes in a single hybridization event, thus significantly increasing the throughput when screening for rare transcripts. The presented strategy offers an efficient method for the conversion of incomplete expressed sequence tags (ESTs), such as cDNA-AFLP tags, to FL-coding sequences.
SNP-based genotyping in lentil: linking sequence information with phenotypes
USDA-ARS?s Scientific Manuscript database
Lentil (Lens culinaris) has been late to enter the world of high throughput molecular analysis due to a general lack of genomic resources. Using a 454 sequencing-based approach, SNPs have been identified in genes across the lentil genome. Several hundred have been turned into single SNP KASP assay...
2016-12-01
AWARD NUMBER: W81XWH-13-1-0371 TITLE: High-Throughput Sequencing of Germline and Tumor From Men with Early- Onset Metastatic Prostate Cancer...DATES COVERED 30 Sep 2013 - 29 Sep 2016 4. TITLE AND SUBTITLE 5a. CONTRACT NUMBER High-Throughput Sequencing of Germline and Tumor From Men with...presenting with metastatic prostate cancer at a young age (before age 60 years). Whole exome sequencing identified a panel of germline variants that have
Alignment of high-throughput sequencing data inside in-memory databases.
Firnkorn, Daniel; Knaup-Gregori, Petra; Lorenzo Bermejo, Justo; Ganzinger, Matthias
2014-01-01
In times of high-throughput DNA sequencing techniques, performance-capable analysis of DNA sequences is of high importance. Computer supported DNA analysis is still an intensive time-consuming task. In this paper we explore the potential of a new In-Memory database technology by using SAP's High Performance Analytic Appliance (HANA). We focus on read alignment as one of the first steps in DNA sequence analysis. In particular, we examined the widely used Burrows-Wheeler Aligner (BWA) and implemented stored procedures in both, HANA and the free database system MySQL, to compare execution time and memory management. To ensure that the results are comparable, MySQL has been running in memory as well, utilizing its integrated memory engine for database table creation. We implemented stored procedures, containing exact and inexact searching of DNA reads within the reference genome GRCh37. Due to technical restrictions in SAP HANA concerning recursion, the inexact matching problem could not be implemented on this platform. Hence, performance analysis between HANA and MySQL was made by comparing the execution time of the exact search procedures. Here, HANA was approximately 27 times faster than MySQL which means, that there is a high potential within the new In-Memory concepts, leading to further developments of DNA analysis procedures in the future.
Automated sequence analysis and editing software for HIV drug resistance testing.
Struck, Daniel; Wallis, Carole L; Denisov, Gennady; Lambert, Christine; Servais, Jean-Yves; Viana, Raquel V; Letsoalo, Esrom; Bronze, Michelle; Aitken, Sue C; Schuurman, Rob; Stevens, Wendy; Schmit, Jean Claude; Rinke de Wit, Tobias; Perez Bercoff, Danielle
2012-05-01
Access to antiretroviral treatment in resource-limited-settings is inevitably paralleled by the emergence of HIV drug resistance. Monitoring treatment efficacy and HIV drugs resistance testing are therefore of increasing importance in resource-limited settings. Yet low-cost technologies and procedures suited to the particular context and constraints of such settings are still lacking. The ART-A (Affordable Resistance Testing for Africa) consortium brought together public and private partners to address this issue. To develop an automated sequence analysis and editing software to support high throughput automated sequencing. The ART-A Software was designed to automatically process and edit ABI chromatograms or FASTA files from HIV-1 isolates. The ART-A Software performs the basecalling, assigns quality values, aligns query sequences against a set reference, infers a consensus sequence, identifies the HIV type and subtype, translates the nucleotide sequence to amino acids and reports insertions/deletions, premature stop codons, ambiguities and mixed calls. The results can be automatically exported to Excel to identify mutations. Automated analysis was compared to manual analysis using a panel of 1624 PR-RT sequences generated in 3 different laboratories. Discrepancies between manual and automated sequence analysis were 0.69% at the nucleotide level and 0.57% at the amino acid level (668,047 AA analyzed), and discordances at major resistance mutations were recorded in 62 cases (4.83% of differences, 0.04% of all AA) for PR and 171 (6.18% of differences, 0.03% of all AA) cases for RT. The ART-A Software is a time-sparing tool for pre-analyzing HIV and viral quasispecies sequences in high throughput laboratories and highlighting positions requiring attention. Copyright © 2012 Elsevier B.V. All rights reserved.
Ozer, Abdullah; Tome, Jacob M; Friedman, Robin C; Gheba, Dan; Schroth, Gary P; Lis, John T
2015-08-01
Because RNA-protein interactions have a central role in a wide array of biological processes, methods that enable a quantitative assessment of these interactions in a high-throughput manner are in great demand. Recently, we developed the high-throughput sequencing-RNA affinity profiling (HiTS-RAP) assay that couples sequencing on an Illumina GAIIx genome analyzer with the quantitative assessment of protein-RNA interactions. This assay is able to analyze interactions between one or possibly several proteins with millions of different RNAs in a single experiment. We have successfully used HiTS-RAP to analyze interactions of the EGFP and negative elongation factor subunit E (NELF-E) proteins with their corresponding canonical and mutant RNA aptamers. Here we provide a detailed protocol for HiTS-RAP that can be completed in about a month (8 d hands-on time). This includes the preparation and testing of recombinant proteins and DNA templates, clustering DNA templates on a flowcell, HiTS and protein binding with a GAIIx instrument, and finally data analysis. We also highlight aspects of HiTS-RAP that can be further improved and points of comparison between HiTS-RAP and two other recently developed methods, quantitative analysis of RNA on a massively parallel array (RNA-MaP) and RNA Bind-n-Seq (RBNS), for quantitative analysis of RNA-protein interactions.
Methods for processing high-throughput RNA sequencing data.
Ares, Manuel
2014-11-03
High-throughput sequencing (HTS) methods for analyzing RNA populations (RNA-Seq) are gaining rapid application to many experimental situations. The steps in an RNA-Seq experiment require thought and planning, especially because the expense in time and materials is currently higher and the protocols are far less routine than those used for other high-throughput methods, such as microarrays. As always, good experimental design will make analysis and interpretation easier. Having a clear biological question, an idea about the best way to do the experiment, and an understanding of the number of replicates needed will make the entire process more satisfying. Whether the goal is capturing transcriptome complexity from a tissue or identifying small fragments of RNA cross-linked to a protein of interest, conversion of the RNA to cDNA followed by direct sequencing using the latest methods is a developing practice, with new technical modifications and applications appearing every day. Even more rapid are the development and improvement of methods for analysis of the very large amounts of data that arrive at the end of an RNA-Seq experiment, making considerations regarding reproducibility, validation, visualization, and interpretation increasingly important. This introduction is designed to review and emphasize a pathway of analysis from experimental design through data presentation that is likely to be successful, with the recognition that better methods are right around the corner. © 2014 Cold Spring Harbor Laboratory Press.
SINA: accurate high-throughput multiple sequence alignment of ribosomal RNA genes.
Pruesse, Elmar; Peplies, Jörg; Glöckner, Frank Oliver
2012-07-15
In the analysis of homologous sequences, computation of multiple sequence alignments (MSAs) has become a bottleneck. This is especially troublesome for marker genes like the ribosomal RNA (rRNA) where already millions of sequences are publicly available and individual studies can easily produce hundreds of thousands of new sequences. Methods have been developed to cope with such numbers, but further improvements are needed to meet accuracy requirements. In this study, we present the SILVA Incremental Aligner (SINA) used to align the rRNA gene databases provided by the SILVA ribosomal RNA project. SINA uses a combination of k-mer searching and partial order alignment (POA) to maintain very high alignment accuracy while satisfying high throughput performance demands. SINA was evaluated in comparison with the commonly used high throughput MSA programs PyNAST and mothur. The three BRAliBase III benchmark MSAs could be reproduced with 99.3, 97.6 and 96.1 accuracy. A larger benchmark MSA comprising 38 772 sequences could be reproduced with 98.9 and 99.3% accuracy using reference MSAs comprising 1000 and 5000 sequences. SINA was able to achieve higher accuracy than PyNAST and mothur in all performed benchmarks. Alignment of up to 500 sequences using the latest SILVA SSU/LSU Ref datasets as reference MSA is offered at http://www.arb-silva.de/aligner. This page also links to Linux binaries, user manual and tutorial. SINA is made available under a personal use license.
NASA Astrophysics Data System (ADS)
Zhang, Xiao-Yong; Wang, Guang-Hua; Xu, Xin-Ya; Nong, Xu-Hua; Wang, Jie; Amin, Muhammad; Qi, Shu-Hua
2016-10-01
The present study investigated the fungal diversity in four different deep-sea sediments from Okinawa Trough using high-throughput Illumina sequencing of the nuclear ribosomal internal transcribed spacer-1 (ITS1). A total of 40,297 fungal ITS1 sequences clustered into 420 operational taxonomic units (OTUs) with 97% sequence similarity and 170 taxa were recovered from these sediments. Most ITS1 sequences (78%) belonged to the phylum Ascomycota, followed by Basidiomycota (17.3%), Zygomycota (1.5%) and Chytridiomycota (0.8%), and a small proportion (2.4%) belonged to unassigned fungal phyla. Compared with previous studies on fungal diversity of sediments from deep-sea environments by culture-dependent approach and clone library analysis, the present result suggested that Illumina sequencing had been dramatically accelerating the discovery of fungal community of deep-sea sediments. Furthermore, our results revealed that Sordariomycetes was the most diverse and abundant fungal class in this study, challenging the traditional view that the diversity of Sordariomycetes phylotypes was low in the deep-sea environments. In addition, more than 12 taxa accounted for 21.5% sequences were found to be rarely reported as deep-sea fungi, suggesting the deep-sea sediments from Okinawa Trough harbored a plethora of different fungal communities compared with other deep-sea environments. To our knowledge, this study is the first exploration of the fungal diversity in deep-sea sediments from Okinawa Trough using high-throughput Illumina sequencing.
Hu, Jiazhi; Meyers, Robin M; Dong, Junchao; Panchakshari, Rohit A; Alt, Frederick W; Frock, Richard L
2016-05-01
Unbiased, high-throughput assays for detecting and quantifying DNA double-stranded breaks (DSBs) across the genome in mammalian cells will facilitate basic studies of the mechanisms that generate and repair endogenous DSBs. They will also enable more applied studies, such as those to evaluate the on- and off-target activities of engineered nucleases. Here we describe a linear amplification-mediated high-throughput genome-wide sequencing (LAM-HTGTS) method for the detection of genome-wide 'prey' DSBs via their translocation in cultured mammalian cells to a fixed 'bait' DSB. Bait-prey junctions are cloned directly from isolated genomic DNA using LAM-PCR and unidirectionally ligated to bridge adapters; subsequent PCR steps amplify the single-stranded DNA junction library in preparation for Illumina Miseq paired-end sequencing. A custom bioinformatics pipeline identifies prey sequences that contribute to junctions and maps them across the genome. LAM-HTGTS differs from related approaches because it detects a wide range of broken end structures with nucleotide-level resolution. Familiarity with nucleic acid methods and next-generation sequencing analysis is necessary for library generation and data interpretation. LAM-HTGTS assays are sensitive, reproducible, relatively inexpensive, scalable and straightforward to implement with a turnaround time of <1 week.
Mutation detection using automated fluorescence-based sequencing.
Montgomery, Kate T; Iartchouck, Oleg; Li, Li; Perera, Anoja; Yassin, Yosuf; Tamburino, Alex; Loomis, Stephanie; Kucherlapati, Raju
2008-04-01
The development of high-throughput DNA sequencing techniques has made direct DNA sequencing of PCR-amplified genomic DNA a rapid and economical approach to the identification of polymorphisms that may play a role in disease. Point mutations as well as small insertions or deletions are readily identified by DNA sequencing. The mutations may be heterozygous (occurring in one allele while the other allele retains the normal sequence) or homozygous (occurring in both alleles). Sequencing alone cannot discriminate between true homozygosity and apparent homozygosity due to the loss of one allele due to a large deletion. In this unit, strategies are presented for using PCR amplification and automated fluorescence-based sequencing to identify sequence variation. The size of the project and laboratory preference and experience will dictate how the data is managed and which software tools are used for analysis. A high-throughput protocol is given that has been used to search for mutations in over 200 different genes at the Harvard Medical School - Partners Center for Genetics and Genomics (HPCGG, http://www.hpcgg.org/). Copyright 2008 by John Wiley & Sons, Inc.
Du, Hongqi; Sun, Xiaoge; Shi, Yinghua; Wang, Chengzhang
2014-01-01
Background MicroRNAs (miRNAs) are a class of regulatory small RNAs (sRNAs) that regulate gene post-transcriptional expression in plants and animals. High-throughput sequencing technology is capable of identifying small RNAs in plant species. Alfalfa (Medicago sativa L.) is one of the most widely cultivated perennial forage legumes worldwide, and fall dormancy is an adaptive characteristic related to the biomass production and winter survival in alfalfa. Here, we applied high-throughput sRNA sequencing to identify some miRNAs that were responsive to fall dormancy in standard variety (Maverick and CUF101) of alfalfa. Results Four sRNA libraries were generated and sequenced from alfalfa leaves in two typical varieties at distinct seasons. Through integrative analysis, we identified 51 novel miRNA candidates of 206 families. Additionally, we identified 28 miRNAs associated with fall dormancy in standard variety (Maverick and CUF101), including 20 known miRNAs and eight novel miRNAs. Both high-throughput sequencing and RT-qPCR confirmed that eight known miRNA members were up-regulated and six known miRNA members were down-regulated in response to fall dormancy in standard variety (Maverick and CUF101). Among the 51 novel miRNA candidates, five miRNAs were up-regulated and three miRNAs were down-regulated in response to fall dormancy in standard variety (Maverick and CUF101), and five of them were confirmed by Northern blot analysis. Conclusion We identified 20 known miRNAs and eight new miRNA candidates that were responsive to fall dormancy in standard variety (Maverick and CUF101) by high-throughput sequencing of small RNAs from Medicago sativa. Our data provide a useful resource for investigating miRNA-mediated regulatory mechanisms of fall dormancy in alfalfa, and these findings are important for our understanding of the roles played by miRNAs in the response of plants to abiotic stress in general and fall dormancy in alfalfa. PMID:25473944
Fan, Wenna; Zhang, Senhao; Du, Hongqi; Sun, Xiaoge; Shi, Yinghua; Wang, Chengzhang
2014-01-01
MicroRNAs (miRNAs) are a class of regulatory small RNAs (sRNAs) that regulate gene post-transcriptional expression in plants and animals. High-throughput sequencing technology is capable of identifying small RNAs in plant species. Alfalfa (Medicago sativa L.) is one of the most widely cultivated perennial forage legumes worldwide, and fall dormancy is an adaptive characteristic related to the biomass production and winter survival in alfalfa. Here, we applied high-throughput sRNA sequencing to identify some miRNAs that were responsive to fall dormancy in standard variety (Maverick and CUF101) of alfalfa. Four sRNA libraries were generated and sequenced from alfalfa leaves in two typical varieties at distinct seasons. Through integrative analysis, we identified 51 novel miRNA candidates of 206 families. Additionally, we identified 28 miRNAs associated with fall dormancy in standard variety (Maverick and CUF101), including 20 known miRNAs and eight novel miRNAs. Both high-throughput sequencing and RT-qPCR confirmed that eight known miRNA members were up-regulated and six known miRNA members were down-regulated in response to fall dormancy in standard variety (Maverick and CUF101). Among the 51 novel miRNA candidates, five miRNAs were up-regulated and three miRNAs were down-regulated in response to fall dormancy in standard variety (Maverick and CUF101), and five of them were confirmed by Northern blot analysis. We identified 20 known miRNAs and eight new miRNA candidates that were responsive to fall dormancy in standard variety (Maverick and CUF101) by high-throughput sequencing of small RNAs from Medicago sativa. Our data provide a useful resource for investigating miRNA-mediated regulatory mechanisms of fall dormancy in alfalfa, and these findings are important for our understanding of the roles played by miRNAs in the response of plants to abiotic stress in general and fall dormancy in alfalfa.
Report for the NGFA-5 project.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jaing, C; Jackson, P; Thissen, J
The objective of this project is to provide DHS a comprehensive evaluation of the current genomic technologies including genotyping, TaqMan PCR, multiple locus variable tandem repeat analysis (MLVA), microarray and high-throughput DNA sequencing in the analysis of biothreat agents from complex environmental samples. To effectively compare the sensitivity and specificity of the different genomic technologies, we used SNP TaqMan PCR, MLVA, microarray and high-throughput illumine and 454 sequencing to test various strains from B. anthracis, B. thuringiensis, BioWatch aerosol filter extracts or soil samples that were spiked with B. anthracis, and samples that were previously collected during DHS and EPAmore » environmental release exercises that were known to contain B. thuringiensis spores. The results of all the samples against the various assays are discussed in this report.« less
Droplet barcoding for single cell transcriptomics applied to embryonic stem cells
Klein, Allon M; Mazutis, Linas; Akartuna, Ilke; Tallapragada, Naren; Veres, Adrian; Li, Victor; Peshkin, Leonid; Weitz, David A; Kirschner, Marc W
2015-01-01
Summary It has long been the dream of biologists to map gene expression at the single cell level. With such data one might track heterogeneous cell sub-populations, and infer regulatory relationships between genes and pathways. Recently, RNA sequencing has achieved single cell resolution. What is limiting is an effective way to routinely isolate and process large numbers of individual cells for quantitative in-depth sequencing. We have developed a high-throughput droplet-microfluidic approach for barcoding the RNA from thousands of individual cells for subsequent analysis by next-generation sequencing. The method shows a surprisingly low noise profile and is readily adaptable to other sequencing-based assays. We analyzed mouse embryonic stem cells, revealing in detail the population structure and the heterogeneous onset of differentiation after LIF withdrawal. The reproducibility of these high-throughput single cell data allowed us to deconstruct cell populations and infer gene expression relationships. PMID:26000487
Analysis of petunia hybrida in response to salt stress using high throughput RNA sequencing
USDA-ARS?s Scientific Manuscript database
Salt and drought are among the greatest challenges to crop and native plants in meeting their yield and reproductive potentials. DNA sequencing-enabled transcriptome profiling provides a means of assessing what genes are responding to salt or drought stress so as to better understand the molecular ...
The application of the high throughput sequencing technology in the transposable elements.
Liu, Zhen; Xu, Jian-hong
2015-09-01
High throughput sequencing technology has dramatically improved the efficiency of DNA sequencing, and decreased the costs to a great extent. Meanwhile, this technology usually has advantages of better specificity, higher sensitivity and accuracy. Therefore, it has been applied to the research on genetic variations, transcriptomics and epigenomics. Recently, this technology has been widely employed in the studies of transposable elements and has achieved fruitful results. In this review, we summarize the application of high throughput sequencing technology in the fields of transposable elements, including the estimation of transposon content, preference of target sites and distribution, insertion polymorphism and population frequency, identification of rare copies, transposon horizontal transfers as well as transposon tagging. We also briefly introduce the major common sequencing strategies and algorithms, their advantages and disadvantages, and the corresponding solutions. Finally, we envision the developing trends of high throughput sequencing technology, especially the third generation sequencing technology, and its application in transposon studies in the future, hopefully providing a comprehensive understanding and reference for related scientific researchers.
Using high throughput sequencing to explore the biodiversity in oral bacterial communities.
Diaz, P I; Dupuy, A K; Abusleme, L; Reese, B; Obergfell, C; Choquette, L; Dongari-Bagtzoglou, A; Peterson, D E; Terzi, E; Strausbaugh, L D
2012-06-01
High throughput sequencing of 16S ribosomal RNA gene amplicons is a cost-effective method for characterization of oral bacterial communities. However, before undertaking large-scale studies, it is necessary to understand the technique-associated limitations and intrinsic variability of the oral ecosystem. In this work we evaluated bias in species representation using an in vitro-assembled mock community of oral bacteria. We then characterized the bacterial communities in saliva and buccal mucosa of five healthy subjects to investigate the power of high throughput sequencing in revealing their diversity and biogeography patterns. Mock community analysis showed primer and DNA isolation biases and an overestimation of diversity that was reduced after eliminating singleton operational taxonomic units (OTUs). Sequencing of salivary and mucosal communities found a total of 455 OTUs (0.3% dissimilarity) with only 78 of these present in all subjects. We demonstrate that this variability was partly the result of incomplete richness coverage even at great sequencing depths, and so comparing communities by their structure was more effective than comparisons based solely on membership. With respect to oral biogeography, we found inter-subject variability in community structure was lower than site differences between salivary and mucosal communities within subjects. These differences were evident at very low sequencing depths and were mostly caused by the abundance of Streptococcus mitis and Gemella haemolysans in mucosa. In summary, we present an experimental and data analysis framework that will facilitate design and interpretation of pyrosequencing-based studies. Despite challenges associated with this technique, we demonstrate its power for evaluation of oral diversity and biogeography patterns. © 2012 John Wiley & Sons A/S.
High-throughput analysis of T-DNA location and structure using sequence capture
DOE Office of Scientific and Technical Information (OSTI.GOV)
Inagaki, Soichi; Henry, Isabelle M.; Lieberman, Meric C.
Agrobacterium-mediated transformation of plants with T-DNA is used both to introduce transgenes and for mutagenesis. Conventional approaches used to identify the genomic location and the structure of the inserted T-DNA are laborious and high-throughput methods using next-generation sequencing are being developed to address these problems. Here, we present a cost-effective approach that uses sequence capture targeted to the T-DNA borders to select genomic DNA fragments containing T-DNA—genome junctions, followed by Illumina sequencing to determine the location and junction structure of T-DNA insertions. Multiple probes can be mixed so that transgenic lines transformed with different T-DNA types can be processed simultaneously,more » using a simple, index-based pooling approach. We also developed a simple bioinformatic tool to find sequence read pairs that span the junction between the genome and T-DNA or any foreign DNA. We analyzed 29 transgenic lines of Arabidopsis thaliana, each containing inserts from 4 different T-DNA vectors. We determined the location of T-DNA insertions in 22 lines, 4 of which carried multiple insertion sites. Additionally, our analysis uncovered a high frequency of unconventional and complex T-DNA insertions, highlighting the needs for high-throughput methods for T-DNA localization and structural characterization. Transgene insertion events have to be fully characterized prior to use as commercial products. As a result, our method greatly facilitates the first step of this characterization of transgenic plants by providing an efficient screen for the selection of promising lines.« less
High-throughput analysis of T-DNA location and structure using sequence capture
Inagaki, Soichi; Henry, Isabelle M.; Lieberman, Meric C.; ...
2015-10-07
Agrobacterium-mediated transformation of plants with T-DNA is used both to introduce transgenes and for mutagenesis. Conventional approaches used to identify the genomic location and the structure of the inserted T-DNA are laborious and high-throughput methods using next-generation sequencing are being developed to address these problems. Here, we present a cost-effective approach that uses sequence capture targeted to the T-DNA borders to select genomic DNA fragments containing T-DNA—genome junctions, followed by Illumina sequencing to determine the location and junction structure of T-DNA insertions. Multiple probes can be mixed so that transgenic lines transformed with different T-DNA types can be processed simultaneously,more » using a simple, index-based pooling approach. We also developed a simple bioinformatic tool to find sequence read pairs that span the junction between the genome and T-DNA or any foreign DNA. We analyzed 29 transgenic lines of Arabidopsis thaliana, each containing inserts from 4 different T-DNA vectors. We determined the location of T-DNA insertions in 22 lines, 4 of which carried multiple insertion sites. Additionally, our analysis uncovered a high frequency of unconventional and complex T-DNA insertions, highlighting the needs for high-throughput methods for T-DNA localization and structural characterization. Transgene insertion events have to be fully characterized prior to use as commercial products. As a result, our method greatly facilitates the first step of this characterization of transgenic plants by providing an efficient screen for the selection of promising lines.« less
Trapnell, Cole; Roberts, Adam; Goff, Loyal; Pertea, Geo; Kim, Daehwan; Kelley, David R; Pimentel, Harold; Salzberg, Steven L; Rinn, John L; Pachter, Lior
2012-01-01
Recent advances in high-throughput cDNA sequencing (RNA-seq) can reveal new genes and splice variants and quantify expression genome-wide in a single assay. The volume and complexity of data from RNA-seq experiments necessitate scalable, fast and mathematically principled analysis software. TopHat and Cufflinks are free, open-source software tools for gene discovery and comprehensive expression analysis of high-throughput mRNA sequencing (RNA-seq) data. Together, they allow biologists to identify new genes and new splice variants of known ones, as well as compare gene and transcript expression under two or more conditions. This protocol describes in detail how to use TopHat and Cufflinks to perform such analyses. It also covers several accessory tools and utilities that aid in managing data, including CummeRbund, a tool for visualizing RNA-seq analysis results. Although the procedure assumes basic informatics skills, these tools assume little to no background with RNA-seq analysis and are meant for novices and experts alike. The protocol begins with raw sequencing reads and produces a transcriptome assembly, lists of differentially expressed and regulated genes and transcripts, and publication-quality visualizations of analysis results. The protocol's execution time depends on the volume of transcriptome sequencing data and available computing resources but takes less than 1 d of computer time for typical experiments and ~1 h of hands-on time. PMID:22383036
Binladen, Jonas; Gilbert, M Thomas P; Bollback, Jonathan P; Panitz, Frank; Bendixen, Christian; Nielsen, Rasmus; Willerslev, Eske
2007-02-14
The invention of the Genome Sequence 20 DNA Sequencing System (454 parallel sequencing platform) has enabled the rapid and high-volume production of sequence data. Until now, however, individual emulsion PCR (emPCR) reactions and subsequent sequencing runs have been unable to combine template DNA from multiple individuals, as homologous sequences cannot be subsequently assigned to their original sources. We use conventional PCR with 5'-nucleotide tagged primers to generate homologous DNA amplification products from multiple specimens, followed by sequencing through the high-throughput Genome Sequence 20 DNA Sequencing System (GS20, Roche/454 Life Sciences). Each DNA sequence is subsequently traced back to its individual source through 5'tag-analysis. We demonstrate that this new approach enables the assignment of virtually all the generated DNA sequences to the correct source once sequencing anomalies are accounted for (miss-assignment rate<0.4%). Therefore, the method enables accurate sequencing and assignment of homologous DNA sequences from multiple sources in single high-throughput GS20 run. We observe a bias in the distribution of the differently tagged primers that is dependent on the 5' nucleotide of the tag. In particular, primers 5' labelled with a cytosine are heavily overrepresented among the final sequences, while those 5' labelled with a thymine are strongly underrepresented. A weaker bias also exists with regards to the distribution of the sequences as sorted by the second nucleotide of the dinucleotide tags. As the results are based on a single GS20 run, the general applicability of the approach requires confirmation. However, our experiments demonstrate that 5'primer tagging is a useful method in which the sequencing power of the GS20 can be applied to PCR-based assays of multiple homologous PCR products. The new approach will be of value to a broad range of research areas, such as those of comparative genomics, complete mitochondrial analyses, population genetics, and phylogenetics.
Pyicos: a versatile toolkit for the analysis of high-throughput sequencing data.
Althammer, Sonja; González-Vallinas, Juan; Ballaré, Cecilia; Beato, Miguel; Eyras, Eduardo
2011-12-15
High-throughput sequencing (HTS) has revolutionized gene regulation studies and is now fundamental for the detection of protein-DNA and protein-RNA binding, as well as for measuring RNA expression. With increasing variety and sequencing depth of HTS datasets, the need for more flexible and memory-efficient tools to analyse them is growing. We describe Pyicos, a powerful toolkit for the analysis of mapped reads from diverse HTS experiments: ChIP-Seq, either punctuated or broad signals, CLIP-Seq and RNA-Seq. We prove the effectiveness of Pyicos to select for significant signals and show that its accuracy is comparable and sometimes superior to that of methods specifically designed for each particular type of experiment. Pyicos facilitates the analysis of a variety of HTS datatypes through its flexibility and memory efficiency, providing a useful framework for data integration into models of regulatory genomics. Open-source software, with tutorials and protocol files, is available at http://regulatorygenomics.upf.edu/pyicos or as a Galaxy server at http://regulatorygenomics.upf.edu/galaxy eduardo.eyras@upf.edu Supplementary data are available at Bioinformatics online.
Xie, Zeyi; Zhou, Zhilin; Li, Hongmin; Yu, Jingjing; Jiang, Jiaojiao; Tang, Zhonghou; Ma, Daifu; Zhang, Baohong; Han, Yonghua; Li, Zongyun
2018-05-21
Sweetpotato (Ipomoea batatas L.) is a globally important economic food crop. It belongs to Convolvulaceae family and origins in the tropics; however, sweetpotato is sensitive to cold stress during storage. In this study, we performed transcriptome sequencing to investigate the sweetpotato response to chilling stress during storage. A total of 110,110 unigenes were generated via high-throughput sequencing. Differentially expressed genes (DEGs) analysis showed that 18,681 genes were up-regulated and 21,983 genes were down-regulated in low temperature condition. Many DEGs were related to the cell membrane system, antioxidant enzymes, carbohydrate metabolism, and hormone metabolism, which are potentially associated with sweetpotato resistance to low temperature. The existence of DEGs suggests a molecular basis for the biochemical and physiological consequences of sweetpotato in low temperature storage conditions. Our analysis will provide a new target for enhancement of sweetpotato cold stress tolerance in postharvest storage through genetic manipulation. Copyright © 2018. Published by Elsevier Inc.
USDA-ARS?s Scientific Manuscript database
Bermuda grass samples were examined by transmission electron microscopy and 28-30 nm spherical virus particles were observed. Total RNA from these plants was subjected to high throughput sequencing (HTS). The nearly full genome sequence of a previously uncharacterized Panicovirus was identified from...
Chipster: user-friendly analysis software for microarray and other high-throughput data.
Kallio, M Aleksi; Tuimala, Jarno T; Hupponen, Taavi; Klemelä, Petri; Gentile, Massimiliano; Scheinin, Ilari; Koski, Mikko; Käki, Janne; Korpelainen, Eija I
2011-10-14
The growth of high-throughput technologies such as microarrays and next generation sequencing has been accompanied by active research in data analysis methodology, producing new analysis methods at a rapid pace. While most of the newly developed methods are freely available, their use requires substantial computational skills. In order to enable non-programming biologists to benefit from the method development in a timely manner, we have created the Chipster software. Chipster (http://chipster.csc.fi/) brings a powerful collection of data analysis methods within the reach of bioscientists via its intuitive graphical user interface. Users can analyze and integrate different data types such as gene expression, miRNA and aCGH. The analysis functionality is complemented with rich interactive visualizations, allowing users to select datapoints and create new gene lists based on these selections. Importantly, users can save the performed analysis steps as reusable, automatic workflows, which can also be shared with other users. Being a versatile and easily extendable platform, Chipster can be used for microarray, proteomics and sequencing data. In this article we describe its comprehensive collection of analysis and visualization tools for microarray data using three case studies. Chipster is a user-friendly analysis software for high-throughput data. Its intuitive graphical user interface enables biologists to access a powerful collection of data analysis and integration tools, and to visualize data interactively. Users can collaborate by sharing analysis sessions and workflows. Chipster is open source, and the server installation package is freely available.
Chipster: user-friendly analysis software for microarray and other high-throughput data
2011-01-01
Background The growth of high-throughput technologies such as microarrays and next generation sequencing has been accompanied by active research in data analysis methodology, producing new analysis methods at a rapid pace. While most of the newly developed methods are freely available, their use requires substantial computational skills. In order to enable non-programming biologists to benefit from the method development in a timely manner, we have created the Chipster software. Results Chipster (http://chipster.csc.fi/) brings a powerful collection of data analysis methods within the reach of bioscientists via its intuitive graphical user interface. Users can analyze and integrate different data types such as gene expression, miRNA and aCGH. The analysis functionality is complemented with rich interactive visualizations, allowing users to select datapoints and create new gene lists based on these selections. Importantly, users can save the performed analysis steps as reusable, automatic workflows, which can also be shared with other users. Being a versatile and easily extendable platform, Chipster can be used for microarray, proteomics and sequencing data. In this article we describe its comprehensive collection of analysis and visualization tools for microarray data using three case studies. Conclusions Chipster is a user-friendly analysis software for high-throughput data. Its intuitive graphical user interface enables biologists to access a powerful collection of data analysis and integration tools, and to visualize data interactively. Users can collaborate by sharing analysis sessions and workflows. Chipster is open source, and the server installation package is freely available. PMID:21999641
Investigating the viral ecology of global bee communities with high-throughput metagenomics.
Galbraith, David A; Fuller, Zachary L; Ray, Allyson M; Brockmann, Axel; Frazier, Maryann; Gikungu, Mary W; Martinez, J Francisco Iturralde; Kapheim, Karen M; Kerby, Jeffrey T; Kocher, Sarah D; Losyev, Oleksiy; Muli, Elliud; Patch, Harland M; Rosa, Cristina; Sakamoto, Joyce M; Stanley, Scott; Vaudo, Anthony D; Grozinger, Christina M
2018-06-11
Bee viral ecology is a fascinating emerging area of research: viruses exert a range of effects on their hosts, exacerbate impacts of other environmental stressors, and, importantly, are readily shared across multiple bee species in a community. However, our understanding of bee viral communities is limited, as it is primarily derived from studies of North American and European Apis mellifera populations. Here, we examined viruses in populations of A. mellifera and 11 other bee species from 9 countries, across 4 continents and Oceania. We developed a novel pipeline to rapidly and inexpensively screen for bee viruses. This pipeline includes purification of encapsulated RNA/DNA viruses, sequence-independent amplification, high throughput sequencing, integrated assembly of contigs, and filtering to identify contigs specifically corresponding to viral sequences. We identified sequences for (+)ssRNA, (-)ssRNA, dsRNA, and ssDNA viruses. Overall, we found 127 contigs corresponding to novel viruses (i.e. previously not observed in bees), with 27 represented by >0.1% of the reads in a given sample, and 7 contained an RdRp or replicase sequence which could be used for robust phylogenetic analysis. This study provides a sequence-independent pipeline for viral metagenomics analysis, and greatly expands our understanding of the diversity of viruses found in bee communities.
Quantitative phenotyping via deep barcode sequencing.
Smith, Andrew M; Heisler, Lawrence E; Mellor, Joseph; Kaper, Fiona; Thompson, Michael J; Chee, Mark; Roth, Frederick P; Giaever, Guri; Nislow, Corey
2009-10-01
Next-generation DNA sequencing technologies have revolutionized diverse genomics applications, including de novo genome sequencing, SNP detection, chromatin immunoprecipitation, and transcriptome analysis. Here we apply deep sequencing to genome-scale fitness profiling to evaluate yeast strain collections in parallel. This method, Barcode analysis by Sequencing, or "Bar-seq," outperforms the current benchmark barcode microarray assay in terms of both dynamic range and throughput. When applied to a complex chemogenomic assay, Bar-seq quantitatively identifies drug targets, with performance superior to the benchmark microarray assay. We also show that Bar-seq is well-suited for a multiplex format. We completely re-sequenced and re-annotated the yeast deletion collection using deep sequencing, found that approximately 20% of the barcodes and common priming sequences varied from expectation, and used this revised list of barcode sequences to improve data quality. Together, this new assay and analysis routine provide a deep-sequencing-based toolkit for identifying gene-environment interactions on a genome-wide scale.
Developing High-Throughput HIV Incidence Assay with Pyrosequencing Platform
Park, Sung Yong; Goeken, Nolan; Lee, Hyo Jin; Bolan, Robert; Dubé, Michael P.
2014-01-01
ABSTRACT Human immunodeficiency virus (HIV) incidence is an important measure for monitoring the epidemic and evaluating the efficacy of intervention and prevention trials. This study developed a high-throughput, single-measure incidence assay by implementing a pyrosequencing platform. We devised a signal-masking bioinformatics pipeline, which yielded a process error rate of 5.8 × 10−4 per base. The pipeline was then applied to analyze 18,434 envelope gene segments (HXB2 7212 to 7601) obtained from 12 incident and 24 chronic patients who had documented HIV-negative and/or -positive tests. The pyrosequencing data were cross-checked by using the single-genome-amplification (SGA) method to independently obtain 302 sequences from 13 patients. Using two genomic biomarkers that probe for the presence of similar sequences, the pyrosequencing platform correctly classified all 12 incident subjects (100% sensitivity) and 23 of 24 chronic subjects (96% specificity). One misclassified subject's chronic infection was correctly classified by conducting the same analysis with SGA data. The biomarkers were statistically associated across the two platforms, suggesting the assay's reproducibility and robustness. Sampling simulations showed that the biomarkers were tolerant of sequencing errors and template resampling, two factors most likely to affect the accuracy of pyrosequencing results. We observed comparable biomarker scores between AIDS and non-AIDS chronic patients (multivariate analysis of variance [MANOVA], P = 0.12), indicating that the stage of HIV disease itself does not affect the classification scheme. The high-throughput genomic HIV incidence marks a significant step toward determining incidence from a single measure in cross-sectional surveys. IMPORTANCE Annual HIV incidence, the number of newly infected individuals within a year, is the key measure of monitoring the epidemic's rise and decline. Developing reliable assays differentiating recent from chronic infections has been a long-standing quest in the HIV community. Over the past 15 years, these assays have traditionally measured various HIV-specific antibodies, but recent technological advancements have expanded the diversity of proposed accurate, user-friendly, and financially viable tools. Here we designed a high-throughput genomic HIV incidence assay based on the signature imprinted in the HIV gene sequence population. By combining next-generation sequencing techniques with bioinformatics analysis, we demonstrated that genomic fingerprints are capable of distinguishing recently infected patients from chronically infected patients with high precision. Our high-throughput platform is expected to allow us to process many patients' samples from a single experiment, permitting the assay to be cost-effective for routine surveillance. PMID:24371062
He, Ji; Dai, Xinbin; Zhao, Xuechun
2007-02-09
BLAST searches are widely used for sequence alignment. The search results are commonly adopted for various functional and comparative genomics tasks such as annotating unknown sequences, investigating gene models and comparing two sequence sets. Advances in sequencing technologies pose challenges for high-throughput analysis of large-scale sequence data. A number of programs and hardware solutions exist for efficient BLAST searching, but there is a lack of generic software solutions for mining and personalized management of the results. Systematically reviewing the results and identifying information of interest remains tedious and time-consuming. Personal BLAST Navigator (PLAN) is a versatile web platform that helps users to carry out various personalized pre- and post-BLAST tasks, including: (1) query and target sequence database management, (2) automated high-throughput BLAST searching, (3) indexing and searching of results, (4) filtering results online, (5) managing results of personal interest in favorite categories, (6) automated sequence annotation (such as NCBI NR and ontology-based annotation). PLAN integrates, by default, the Decypher hardware-based BLAST solution provided by Active Motif Inc. with a greatly improved efficiency over conventional BLAST software. BLAST results are visualized by spreadsheets and graphs and are full-text searchable. BLAST results and sequence annotations can be exported, in part or in full, in various formats including Microsoft Excel and FASTA. Sequences and BLAST results are organized in projects, the data publication levels of which are controlled by the registered project owners. In addition, all analytical functions are provided to public users without registration. PLAN has proved a valuable addition to the community for automated high-throughput BLAST searches, and, more importantly, for knowledge discovery, management and sharing based on sequence alignment results. The PLAN web interface is platform-independent, easily configurable and capable of comprehensive expansion, and user-intuitive. PLAN is freely available to academic users at http://bioinfo.noble.org/plan/. The source code for local deployment is provided under free license. Full support on system utilization, installation, configuration and customization are provided to academic users.
He, Ji; Dai, Xinbin; Zhao, Xuechun
2007-01-01
Background BLAST searches are widely used for sequence alignment. The search results are commonly adopted for various functional and comparative genomics tasks such as annotating unknown sequences, investigating gene models and comparing two sequence sets. Advances in sequencing technologies pose challenges for high-throughput analysis of large-scale sequence data. A number of programs and hardware solutions exist for efficient BLAST searching, but there is a lack of generic software solutions for mining and personalized management of the results. Systematically reviewing the results and identifying information of interest remains tedious and time-consuming. Results Personal BLAST Navigator (PLAN) is a versatile web platform that helps users to carry out various personalized pre- and post-BLAST tasks, including: (1) query and target sequence database management, (2) automated high-throughput BLAST searching, (3) indexing and searching of results, (4) filtering results online, (5) managing results of personal interest in favorite categories, (6) automated sequence annotation (such as NCBI NR and ontology-based annotation). PLAN integrates, by default, the Decypher hardware-based BLAST solution provided by Active Motif Inc. with a greatly improved efficiency over conventional BLAST software. BLAST results are visualized by spreadsheets and graphs and are full-text searchable. BLAST results and sequence annotations can be exported, in part or in full, in various formats including Microsoft Excel and FASTA. Sequences and BLAST results are organized in projects, the data publication levels of which are controlled by the registered project owners. In addition, all analytical functions are provided to public users without registration. Conclusion PLAN has proved a valuable addition to the community for automated high-throughput BLAST searches, and, more importantly, for knowledge discovery, management and sharing based on sequence alignment results. The PLAN web interface is platform-independent, easily configurable and capable of comprehensive expansion, and user-intuitive. PLAN is freely available to academic users at . The source code for local deployment is provided under free license. Full support on system utilization, installation, configuration and customization are provided to academic users. PMID:17291345
Liu, Gary W; Livesay, Brynn R; Kacherovsky, Nataly A; Cieslewicz, Maryelise; Lutz, Emi; Waalkes, Adam; Jensen, Michael C; Salipante, Stephen J; Pun, Suzie H
2015-08-19
Peptide ligands are used to increase the specificity of drug carriers to their target cells and to facilitate intracellular delivery. One method to identify such peptide ligands, phage display, enables high-throughput screening of peptide libraries for ligands binding to therapeutic targets of interest. However, conventional methods for identifying target binders in a library by Sanger sequencing are low-throughput, labor-intensive, and provide a limited perspective (<0.01%) of the complete sequence space. Moreover, the small sample space can be dominated by nonspecific, preferentially amplifying "parasitic sequences" and plastic-binding sequences, which may lead to the identification of false positives or exclude the identification of target-binding sequences. To overcome these challenges, we employed next-generation Illumina sequencing to couple high-throughput screening and high-throughput sequencing, enabling more comprehensive access to the phage display library sequence space. In this work, we define the hallmarks of binding sequences in next-generation sequencing data, and develop a method that identifies several target-binding phage clones for murine, alternatively activated M2 macrophages with a high (100%) success rate: sequences and binding motifs were reproducibly present across biological replicates; binding motifs were identified across multiple unique sequences; and an unselected, amplified library accurately filtered out parasitic sequences. In addition, we validate the Multiple Em for Motif Elicitation tool as an efficient and principled means of discovering binding sequences.
Sha, Shankar Prasad; Jani, Kunal; Sharma, Avinash; Anupma, Anu; Pradhan, Pooja; Shouche, Yogesh; Tamang, Jyoti Prakash
2017-09-08
Marcha and thiat are traditionally prepared amylolytic starters use for production of various ethnic alcoholic beverages in Sikkim and Meghalaya states in India. In the present study we have tried to investigate the bacterial and fungal community composition of marcha and thiat by using high throughput sequencing. Characterization of bacterial community depicts phylum Proteobacteria is the most dominant in both marcha (91.4%) and thiat (53.8%), followed by Firmicutes, and Actinobacteria. Estimates of fungal community composition showed Ascomycota as the dominant phylum. Presence of Zygomycota in marcha distinguishes it from the thiat. The results of NGS analysis revealed dominance of yeasts in marcha whereas molds out numbers in case of thiat. This is the first report on microbial communities of traditionally prepared amylolytic starters of India using high throughput sequencing.
USDA-ARS?s Scientific Manuscript database
Many species of mites and ticks are of agricultural and medical importance. Much can be learned from the study of transcriptomes of acarines which can generate DNA-sequence information of potential target genes for the control of acarine pests. High throughput transcriptome sequencing can also yie...
S-MART, a software toolbox to aid RNA-Seq data analysis.
Zytnicki, Matthias; Quesneville, Hadi
2011-01-01
High-throughput sequencing is now routinely performed in many experiments. But the analysis of the millions of sequences generated, is often beyond the expertise of the wet labs who have no personnel specializing in bioinformatics. Whereas several tools are now available to map high-throughput sequencing data on a genome, few of these can extract biological knowledge from the mapped reads. We have developed a toolbox called S-MART, which handles mapped RNA-Seq data. S-MART is an intuitive and lightweight tool which performs many of the tasks usually required for the analysis of mapped RNA-Seq reads. S-MART does not require any computer science background and thus can be used by all of the biologist community through a graphical interface. S-MART can run on any personal computer, yielding results within an hour even for Gb of data for most queries. S-MART may perform the entire analysis of the mapped reads, without any need for other ad hoc scripts. With this tool, biologists can easily perform most of the analyses on their computer for their RNA-Seq data, from the mapped data to the discovery of important loci.
S-MART, A Software Toolbox to Aid RNA-seq Data Analysis
Zytnicki, Matthias; Quesneville, Hadi
2011-01-01
High-throughput sequencing is now routinely performed in many experiments. But the analysis of the millions of sequences generated, is often beyond the expertise of the wet labs who have no personnel specializing in bioinformatics. Whereas several tools are now available to map high-throughput sequencing data on a genome, few of these can extract biological knowledge from the mapped reads. We have developed a toolbox called S-MART, which handles mapped RNA-Seq data. S-MART is an intuitive and lightweight tool which performs many of the tasks usually required for the analysis of mapped RNA-Seq reads. S-MART does not require any computer science background and thus can be used by all of the biologist community through a graphical interface. S-MART can run on any personal computer, yielding results within an hour even for Gb of data for most queries. S-MART may perform the entire analysis of the mapped reads, without any need for other ad hoc scripts. With this tool, biologists can easily perform most of the analyses on their computer for their RNA-Seq data, from the mapped data to the discovery of important loci. PMID:21998740
Devailly, Guillaume; Mantsoki, Anna; Joshi, Anagha
2016-11-01
Better protocols and decreasing costs have made high-throughput sequencing experiments now accessible even to small experimental laboratories. However, comparing one or few experiments generated by an individual lab to the vast amount of relevant data freely available in the public domain might be limited due to lack of bioinformatics expertise. Though several tools, including genome browsers, allow such comparison at a single gene level, they do not provide a genome-wide view. We developed Heat*seq, a web-tool that allows genome scale comparison of high throughput experiments chromatin immuno-precipitation followed by sequencing, RNA-sequencing and Cap Analysis of Gene Expression) provided by a user, to the data in the public domain. Heat*seq currently contains over 12 000 experiments across diverse tissues and cell types in human, mouse and drosophila. Heat*seq displays interactive correlation heatmaps, with an ability to dynamically subset datasets to contextualize user experiments. High quality figures and tables are produced and can be downloaded in multiple formats. Web application: http://www.heatstarseq.roslin.ed.ac.uk/ Source code: https://github.com/gdevailly CONTACT: Guillaume.Devailly@roslin.ed.ac.uk or Anagha.Joshi@roslin.ed.ac.ukSupplementary information: Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.
Miller, Mark P.; Knaus, Brian J.; Mullins, Thomas D.; Haig, Susan M.
2013-01-01
SSR_pipeline is a flexible set of programs designed to efficiently identify simple sequence repeats (e.g., microsatellites) from paired-end high-throughput Illumina DNA sequencing data. The program suite contains 3 analysis modules along with a fourth control module that can automate analyses of large volumes of data. The modules are used to 1) identify the subset of paired-end sequences that pass Illumina quality standards, 2) align paired-end reads into a single composite DNA sequence, and 3) identify sequences that possess microsatellites (both simple and compound) conforming to user-specified parameters. The microsatellite search algorithm is extremely efficient, and we have used it to identify repeats with motifs from 2 to 25bp in length. Each of the 3 analysis modules can also be used independently to provide greater flexibility or to work with FASTQ or FASTA files generated from other sequencing platforms (Roche 454, Ion Torrent, etc.). We demonstrate use of the program with data from the brine fly Ephydra packardi (Diptera: Ephydridae) and provide empirical timing benchmarks to illustrate program performance on a common desktop computer environment. We further show that the Illumina platform is capable of identifying large numbers of microsatellites, even when using unenriched sample libraries and a very small percentage of the sequencing capacity from a single DNA sequencing run. All modules from SSR_pipeline are implemented in the Python programming language and can therefore be used from nearly any computer operating system (Linux, Macintosh, and Windows).
Transcriptome-based differentiation of closely-related Miscanthus lines.
Chouvarine, Philippe; Cooksey, Amanda M; McCarthy, Fiona M; Ray, David A; Baldwin, Brian S; Burgess, Shane C; Peterson, Daniel G
2012-01-01
Distinguishing between individuals is critical to those conducting animal/plant breeding, food safety/quality research, diagnostic and clinical testing, and evolutionary biology studies. Classical genetic identification studies are based on marker polymorphisms, but polymorphism-based techniques are time and labor intensive and often cannot distinguish between closely related individuals. Illumina sequencing technologies provide the detailed sequence data required for rapid and efficient differentiation of related species, lines/cultivars, and individuals in a cost-effective manner. Here we describe the use of Illumina high-throughput exome sequencing, coupled with SNP mapping, as a rapid means of distinguishing between related cultivars of the lignocellulosic bioenergy crop giant miscanthus (Miscanthus × giganteus). We provide the first exome sequence database for Miscanthus species complete with Gene Ontology (GO) functional annotations. A SNP comparative analysis of rhizome-derived cDNA sequences was successfully utilized to distinguish three Miscanthus × giganteus cultivars from each other and from other Miscanthus species. Moreover, the resulting phylogenetic tree generated from SNP frequency data parallels the known breeding history of the plants examined. Some of the giant miscanthus plants exhibit considerable sequence divergence. Here we describe an analysis of Miscanthus in which high-throughput exome sequencing was utilized to differentiate between closely related genotypes despite the current lack of a reference genome sequence. We functionally annotated the exome sequences and provide resources to support Miscanthus systems biology. In addition, we demonstrate the use of the commercial high-performance cloud computing to do computational GO annotation.
Hu, Xihao; Wu, Yang; Lu, Zhi John; Yip, Kevin Y
2016-11-01
High-throughput sequencing has been used to study posttranscriptional regulations, where the identification of protein-RNA binding is a major and fast-developing sub-area, which is in turn benefited by the sequencing methods for whole-transcriptome probing of RNA secondary structures. In the study of RNA secondary structures using high-throughput sequencing, bases are modified or cleaved according to their structural features, which alter the resulting composition of sequencing reads. In the study of protein-RNA binding, methods have been proposed to immuno-precipitate (IP) protein-bound RNA transcripts in vitro or in vivo By sequencing these transcripts, the protein-RNA interactions and the binding locations can be identified. For both types of data, read counts are affected by a combination of confounding factors, including expression levels of transcripts, sequence biases, mapping errors and the probing or IP efficiency of the experimental protocols. Careful processing of the sequencing data and proper extraction of important features are fundamentally important to a successful analysis. Here we review and compare different experimental methods for probing RNA secondary structures and binding sites of RNA-binding proteins (RBPs), and the computational methods proposed for analyzing the corresponding sequencing data. We suggest how these two types of data should be integrated to study the structural properties of RBP binding sites as a systematic way to better understand posttranscriptional regulations. © The Author 2015. Published by Oxford University Press. For Permissions, please email: journals.permissions@oup.com.
Miller, Mark P; Knaus, Brian J; Mullins, Thomas D; Haig, Susan M
2013-01-01
SSR_pipeline is a flexible set of programs designed to efficiently identify simple sequence repeats (e.g., microsatellites) from paired-end high-throughput Illumina DNA sequencing data. The program suite contains 3 analysis modules along with a fourth control module that can automate analyses of large volumes of data. The modules are used to 1) identify the subset of paired-end sequences that pass Illumina quality standards, 2) align paired-end reads into a single composite DNA sequence, and 3) identify sequences that possess microsatellites (both simple and compound) conforming to user-specified parameters. The microsatellite search algorithm is extremely efficient, and we have used it to identify repeats with motifs from 2 to 25 bp in length. Each of the 3 analysis modules can also be used independently to provide greater flexibility or to work with FASTQ or FASTA files generated from other sequencing platforms (Roche 454, Ion Torrent, etc.). We demonstrate use of the program with data from the brine fly Ephydra packardi (Diptera: Ephydridae) and provide empirical timing benchmarks to illustrate program performance on a common desktop computer environment. We further show that the Illumina platform is capable of identifying large numbers of microsatellites, even when using unenriched sample libraries and a very small percentage of the sequencing capacity from a single DNA sequencing run. All modules from SSR_pipeline are implemented in the Python programming language and can therefore be used from nearly any computer operating system (Linux, Macintosh, and Windows).
Wang, Yao; Cui, Yazhou; Zhou, Xiaoyan; Han, Jinxiang
2015-01-01
Objective Osteogenesis imperfecta (OI) is a rare inherited skeletal disease, characterized by bone fragility and low bone density. The mutations in this disorder have been widely reported to be on various exonal hotspots of the candidate genes, including COL1A1, COL1A2, CRTAP, LEPRE1, and FKBP10, thus creating a great demand for precise genetic tests. However, large genome sizes make the process daunting and the analyses, inefficient and expensive. Therefore, we aimed at developing a fast, accurate, efficient, and cheaper sequencing platform for OI diagnosis; and to this end, use of an advanced array-based technique was proposed. Method A CustomSeq Affymetrix Resequencing Array was established for high-throughput sequencing of five genes simultaneously. Genomic DNA extraction from 13 OI patients and 85 normal controls and amplification using long-range PCR (LR-PCR) were followed by DNA fragmentation and chip hybridization, according to standard Affymetrix protocols. Hybridization signals were determined using GeneChip Sequence Analysis Software (GSEQ). To examine the feasibility, the outcome from new resequencing approach was validated by conventional capillary sequencing method. Result Overall call rates using resequencing array was 96–98% and the agreement between microarray and capillary sequencing was 99.99%. 11 out of 13 OI patients with pathogenic mutations were successfully detected by the chip analysis without adjustment, and one mutation could also be identified using manual visual inspection. Conclusion A high-throughput resequencing array was developed that detects the disease-associated mutations in OI, providing a potential tool to facilitate large-scale genetic screening for OI patients. Through this method, a novel mutation was also found. PMID:25742658
Ferret, Yann; Caillault, Aurélie; Sebda, Shéhérazade; Duez, Marc; Grardel, Nathalie; Duployez, Nicolas; Villenet, Céline; Figeac, Martin; Preudhomme, Claude; Salson, Mikaël; Giraud, Mathieu
2016-05-01
High-throughput sequencing (HTS) is considered a technical revolution that has improved our knowledge of lymphoid and autoimmune diseases, changing our approach to leukaemia both at diagnosis and during follow-up. As part of an immunoglobulin/T cell receptor-based minimal residual disease (MRD) assessment of acute lymphoblastic leukaemia patients, we assessed the performance and feasibility of the replacement of the first steps of the approach based on DNA isolation and Sanger sequencing, using a HTS protocol combined with bioinformatics analysis and visualization using the Vidjil software. We prospectively analysed the diagnostic and relapse samples of 34 paediatric patients, thus identifying 125 leukaemic clones with recombinations on multiple loci (TRG, TRD, IGH and IGK), including Dd2/Dd3 and Intron/KDE rearrangements. Sequencing failures were halved (14% vs. 34%, P = 0.0007), enabling more patients to be monitored. Furthermore, more markers per patient could be monitored, reducing the probability of false negative MRD results. The whole analysis, from sample receipt to clinical validation, was shorter than our current diagnostic protocol, with equal resources. V(D)J recombination was successfully assigned by the software, even for unusual recombinations. This study emphasizes the progress that HTS with adapted bioinformatics tools can bring to the diagnosis of leukaemia patients. © 2016 John Wiley & Sons Ltd.
Takahashi, Mayumi; Wu, Xiwei; Ho, Michelle; Chomchan, Pritsana; Rossi, John J.; Burnett, John C.; Zhou, Jiehua
2016-01-01
The systemic evolution of ligands by exponential enrichment (SELEX) technique is a powerful and effective aptamer-selection procedure. However, modifications to the process can dramatically improve selection efficiency and aptamer performance. For example, droplet digital PCR (ddPCR) has been recently incorporated into SELEX selection protocols to putatively reduce the propagation of byproducts and avoid selection bias that result from differences in PCR efficiency of sequences within the random library. However, a detailed, parallel comparison of the efficacy of conventional solution PCR versus the ddPCR modification in the RNA aptamer-selection process is needed to understand effects on overall SELEX performance. In the present study, we took advantage of powerful high throughput sequencing technology and bioinformatics analysis coupled with SELEX (HT-SELEX) to thoroughly investigate the effects of initial library and PCR methods in the RNA aptamer identification. Our analysis revealed that distinct “biased sequences” and nucleotide composition existed in the initial, unselected libraries purchased from two different manufacturers and that the fate of the “biased sequences” was target-dependent during selection. Our comparison of solution PCR- and ddPCR-driven HT-SELEX demonstrated that PCR method affected not only the nucleotide composition of the enriched sequences, but also the overall SELEX efficiency and aptamer efficacy. PMID:27652575
High-Resolution Melt Analysis for Rapid Comparison of Bacterial Community Compositions
Hjelmsø, Mathis Hjort; Hansen, Lars Hestbjerg; Bælum, Jacob; Feld, Louise; Holben, William E.
2014-01-01
In the study of bacterial community composition, 16S rRNA gene amplicon sequencing is today among the preferred methods of analysis. The cost of nucleotide sequence analysis, including requisite computational and bioinformatic steps, however, takes up a large part of many research budgets. High-resolution melt (HRM) analysis is the study of the melt behavior of specific PCR products. Here we describe a novel high-throughput approach in which we used HRM analysis targeting the 16S rRNA gene to rapidly screen multiple complex samples for differences in bacterial community composition. We hypothesized that HRM analysis of amplified 16S rRNA genes from a soil ecosystem could be used as a screening tool to identify changes in bacterial community structure. This hypothesis was tested using a soil microcosm setup exposed to a total of six treatments representing different combinations of pesticide and fertilization treatments. The HRM analysis identified a shift in the bacterial community composition in two of the treatments, both including the soil fumigant Basamid GR. These results were confirmed with both denaturing gradient gel electrophoresis (DGGE) analysis and 454-based 16S rRNA gene amplicon sequencing. HRM analysis was shown to be a fast, high-throughput technique that can serve as an effective alternative to gel-based screening methods to monitor microbial community composition. PMID:24610853
Mavromatis, Konstantinos; Land, Miriam L; Brettin, Thomas S; Quest, Daniel J; Copeland, Alex; Clum, Alicia; Goodwin, Lynne; Woyke, Tanja; Lapidus, Alla; Klenk, Hans Peter; Cottingham, Robert W; Kyrpides, Nikos C
2012-01-01
The emergence of next generation sequencing (NGS) has provided the means for rapid and high throughput sequencing and data generation at low cost, while concomitantly creating a new set of challenges. The number of available assembled microbial genomes continues to grow rapidly and their quality reflects the quality of the sequencing technology used, but also of the analysis software employed for assembly and annotation. In this work, we have explored the quality of the microbial draft genomes across various sequencing technologies. We have compared the draft and finished assemblies of 133 microbial genomes sequenced at the Department of Energy-Joint Genome Institute and finished at the Los Alamos National Laboratory using a variety of combinations of sequencing technologies, reflecting the transition of the institute from Sanger-based sequencing platforms to NGS platforms. The quality of the public assemblies and of the associated gene annotations was evaluated using various metrics. Results obtained with the different sequencing technologies, as well as their effects on downstream processes, were analyzed. Our results demonstrate that the Illumina HiSeq 2000 sequencing system, the primary sequencing technology currently used for de novo genome sequencing and assembly at JGI, has various advantages in terms of total sequence throughput and cost, but it also introduces challenges for the downstream analyses. In all cases assembly results although on average are of high quality, need to be viewed critically and consider sources of errors in them prior to analysis. These data follow the evolution of microbial sequencing and downstream processing at the JGI from draft genome sequences with large gaps corresponding to missing genes of significant biological role to assemblies with multiple small gaps (Illumina) and finally to assemblies that generate almost complete genomes (Illumina+PacBio).
Vidjil: A Web Platform for Analysis of High-Throughput Repertoire Sequencing.
Duez, Marc; Giraud, Mathieu; Herbert, Ryan; Rocher, Tatiana; Salson, Mikaël; Thonier, Florian
2016-01-01
The B and T lymphocytes are white blood cells playing a key role in the adaptive immunity. A part of their DNA, called the V(D)J recombinations, is specific to each lymphocyte, and enables recognition of specific antigenes. Today, with new sequencing techniques, one can get billions of DNA sequences from these regions. With dedicated Repertoire Sequencing (RepSeq) methods, it is now possible to picture population of lymphocytes, and to monitor more accurately the immune response as well as pathologies such as leukemia. Vidjil is an open-source platform for the interactive analysis of high-throughput sequencing data from lymphocyte recombinations. It contains an algorithm gathering reads into clonotypes according to their V(D)J junctions, a web application made of a sample, experiment and patient database and a visualization for the analysis of clonotypes along the time. Vidjil is implemented in C++, Python and Javascript and licensed under the GPLv3 open-source license. Source code, binaries and a public web server are available at http://www.vidjil.org and at http://bioinfo.lille.inria.fr/vidjil. Using the Vidjil web application consists of four steps: 1. uploading a raw sequence file (typically a FASTQ); 2. running RepSeq analysis software; 3. visualizing the results; 4. annotating the results and saving them for future use. For the end-user, the Vidjil web application needs no specific installation and just requires a connection and a modern web browser. Vidjil is used by labs in hematology or immunology for research and clinical applications.
Vidjil: A Web Platform for Analysis of High-Throughput Repertoire Sequencing
Duez, Marc; Herbert, Ryan; Rocher, Tatiana; Salson, Mikaël; Thonier, Florian
2016-01-01
Background The B and T lymphocytes are white blood cells playing a key role in the adaptive immunity. A part of their DNA, called the V(D)J recombinations, is specific to each lymphocyte, and enables recognition of specific antigenes. Today, with new sequencing techniques, one can get billions of DNA sequences from these regions. With dedicated Repertoire Sequencing (RepSeq) methods, it is now possible to picture population of lymphocytes, and to monitor more accurately the immune response as well as pathologies such as leukemia. Methods and Results Vidjil is an open-source platform for the interactive analysis of high-throughput sequencing data from lymphocyte recombinations. It contains an algorithm gathering reads into clonotypes according to their V(D)J junctions, a web application made of a sample, experiment and patient database and a visualization for the analysis of clonotypes along the time. Vidjil is implemented in C++, Python and Javascript and licensed under the GPLv3 open-source license. Source code, binaries and a public web server are available at http://www.vidjil.org and at http://bioinfo.lille.inria.fr/vidjil. Using the Vidjil web application consists of four steps: 1. uploading a raw sequence file (typically a FASTQ); 2. running RepSeq analysis software; 3. visualizing the results; 4. annotating the results and saving them for future use. For the end-user, the Vidjil web application needs no specific installation and just requires a connection and a modern web browser. Vidjil is used by labs in hematology or immunology for research and clinical applications. PMID:27835690
USDA-ARS?s Scientific Manuscript database
High-throughput sequencing is often used for studies of the transcriptome, particularly for comparisons between experimental conditions. Due to sequencing costs, a limited number of biological replicates are typically considered in such experiments, leading to low detection power for differential ex...
'PACLIMS': a component LIM system for high-throughput functional genomic analysis.
Donofrio, Nicole; Rajagopalon, Ravi; Brown, Douglas; Diener, Stephen; Windham, Donald; Nolin, Shelly; Floyd, Anna; Mitchell, Thomas; Galadima, Natalia; Tucker, Sara; Orbach, Marc J; Patel, Gayatri; Farman, Mark; Pampanwar, Vishal; Soderlund, Cari; Lee, Yong-Hwan; Dean, Ralph A
2005-04-12
Recent advances in sequencing techniques leading to cost reduction have resulted in the generation of a growing number of sequenced eukaryotic genomes. Computational tools greatly assist in defining open reading frames and assigning tentative annotations. However, gene functions cannot be asserted without biological support through, among other things, mutational analysis. In taking a genome-wide approach to functionally annotate an entire organism, in this application the approximately 11,000 predicted genes in the rice blast fungus (Magnaporthe grisea), an effective platform for tracking and storing both the biological materials created and the data produced across several participating institutions was required. The platform designed, named PACLIMS, was built to support our high throughput pipeline for generating 50,000 random insertion mutants of Magnaporthe grisea. To be a useful tool for materials and data tracking and storage, PACLIMS was designed to be simple to use, modifiable to accommodate refinement of research protocols, and cost-efficient. Data entry into PACLIMS was simplified through the use of barcodes and scanners, thus reducing the potential human error, time constraints, and labor. This platform was designed in concert with our experimental protocol so that it leads the researchers through each step of the process from mutant generation through phenotypic assays, thus ensuring that every mutant produced is handled in an identical manner and all necessary data is captured. Many sequenced eukaryotes have reached the point where computational analyses are no longer sufficient and require biological support for their predicted genes. Consequently, there is an increasing need for platforms that support high throughput genome-wide mutational analyses. While PACLIMS was designed specifically for this project, the source and ideas present in its implementation can be used as a model for other high throughput mutational endeavors.
'PACLIMS': A component LIM system for high-throughput functional genomic analysis
Donofrio, Nicole; Rajagopalon, Ravi; Brown, Douglas; Diener, Stephen; Windham, Donald; Nolin, Shelly; Floyd, Anna; Mitchell, Thomas; Galadima, Natalia; Tucker, Sara; Orbach, Marc J; Patel, Gayatri; Farman, Mark; Pampanwar, Vishal; Soderlund, Cari; Lee, Yong-Hwan; Dean, Ralph A
2005-01-01
Background Recent advances in sequencing techniques leading to cost reduction have resulted in the generation of a growing number of sequenced eukaryotic genomes. Computational tools greatly assist in defining open reading frames and assigning tentative annotations. However, gene functions cannot be asserted without biological support through, among other things, mutational analysis. In taking a genome-wide approach to functionally annotate an entire organism, in this application the ~11,000 predicted genes in the rice blast fungus (Magnaporthe grisea), an effective platform for tracking and storing both the biological materials created and the data produced across several participating institutions was required. Results The platform designed, named PACLIMS, was built to support our high throughput pipeline for generating 50,000 random insertion mutants of Magnaporthe grisea. To be a useful tool for materials and data tracking and storage, PACLIMS was designed to be simple to use, modifiable to accommodate refinement of research protocols, and cost-efficient. Data entry into PACLIMS was simplified through the use of barcodes and scanners, thus reducing the potential human error, time constraints, and labor. This platform was designed in concert with our experimental protocol so that it leads the researchers through each step of the process from mutant generation through phenotypic assays, thus ensuring that every mutant produced is handled in an identical manner and all necessary data is captured. Conclusion Many sequenced eukaryotes have reached the point where computational analyses are no longer sufficient and require biological support for their predicted genes. Consequently, there is an increasing need for platforms that support high throughput genome-wide mutational analyses. While PACLIMS was designed specifically for this project, the source and ideas present in its implementation can be used as a model for other high throughput mutational endeavors. PMID:15826298
Identification and removal of low-complexity sites in allele-specific analysis of ChIP-seq data.
Waszak, Sebastian M; Kilpinen, Helena; Gschwind, Andreas R; Orioli, Andrea; Raghav, Sunil K; Witwicki, Robert M; Migliavacca, Eugenia; Yurovsky, Alisa; Lappalainen, Tuuli; Hernandez, Nouria; Reymond, Alexandre; Dermitzakis, Emmanouil T; Deplancke, Bart
2014-01-15
High-throughput sequencing technologies enable the genome-wide analysis of the impact of genetic variation on molecular phenotypes at unprecedented resolution. However, although powerful, these technologies can also introduce unexpected artifacts. We investigated the impact of library amplification bias on the identification of allele-specific (AS) molecular events from high-throughput sequencing data derived from chromatin immunoprecipitation assays (ChIP-seq). Putative AS DNA binding activity for RNA polymerase II was determined using ChIP-seq data derived from lymphoblastoid cell lines of two parent-daughter trios. We found that, at high-sequencing depth, many significant AS binding sites suffered from an amplification bias, as evidenced by a larger number of clonal reads representing one of the two alleles. To alleviate this bias, we devised an amplification bias detection strategy, which filters out sites with low read complexity and sites featuring a significant excess of clonal reads. This method will be useful for AS analyses involving ChIP-seq and other functional sequencing assays. The R package abs filter for library clonality simulations and detection of amplification-biased sites is available from http://updepla1srv1.epfl.ch/waszaks/absfilter
Miller, Mark P.; Knaus, Brian J.; Mullins, Thomas D.; Haig, Susan M.
2013-01-01
SSR_pipeline is a flexible set of programs designed to efficiently identify simple sequence repeats (SSRs; for example, microsatellites) from paired-end high-throughput Illumina DNA sequencing data. The program suite contains three analysis modules along with a fourth control module that can be used to automate analyses of large volumes of data. The modules are used to (1) identify the subset of paired-end sequences that pass quality standards, (2) align paired-end reads into a single composite DNA sequence, and (3) identify sequences that possess microsatellites conforming to user specified parameters. Each of the three separate analysis modules also can be used independently to provide greater flexibility or to work with FASTQ or FASTA files generated from other sequencing platforms (Roche 454, Ion Torrent, etc). All modules are implemented in the Python programming language and can therefore be used from nearly any computer operating system (Linux, Macintosh, Windows). The program suite relies on a compiled Python extension module to perform paired-end alignments. Instructions for compiling the extension from source code are provided in the documentation. Users who do not have Python installed on their computers or who do not have the ability to compile software also may choose to download packaged executable files. These files include all Python scripts, a copy of the compiled extension module, and a minimal installation of Python in a single binary executable. See program documentation for more information.
Khan, Arifa S; Vacante, Dominick A; Cassart, Jean-Pol; Ng, Siemon H S; Lambert, Christophe; Charlebois, Robert L; King, Kathryn E
Several nucleic-acid based technologies have recently emerged with capabilities for broad virus detection. One of these, high throughput sequencing, has the potential for novel virus detection because this method does not depend upon prior viral sequence knowledge. However, the use of high throughput sequencing for testing biologicals poses greater challenges as compared to other newly introduced tests due to its technical complexities and big data bioinformatics. Thus, the Advanced Virus Detection Technologies Users Group was formed as a joint effort by regulatory and industry scientists to facilitate discussions and provide a forum for sharing data and experiences using advanced new virus detection technologies, with a focus on high throughput sequencing technologies. The group was initiated as a task force that was coordinated by the Parenteral Drug Association and subsequently became the Advanced Virus Detection Technologies Interest Group to continue efforts for using new technologies for detection of adventitious viruses with broader participation, including international government agencies, academia, and technology service providers. © PDA, Inc. 2016.
Taggart, David J.; Camerlengo, Terry L.; Harrison, Jason K.; Sherrer, Shanen M.; Kshetry, Ajay K.; Taylor, John-Stephen; Huang, Kun; Suo, Zucai
2013-01-01
Cellular genomes are constantly damaged by endogenous and exogenous agents that covalently and structurally modify DNA to produce DNA lesions. Although most lesions are mended by various DNA repair pathways in vivo, a significant number of damage sites persist during genomic replication. Our understanding of the mutagenic outcomes derived from these unrepaired DNA lesions has been hindered by the low throughput of existing sequencing methods. Therefore, we have developed a cost-effective high-throughput short oligonucleotide sequencing assay that uses next-generation DNA sequencing technology for the assessment of the mutagenic profiles of translesion DNA synthesis catalyzed by any error-prone DNA polymerase. The vast amount of sequencing data produced were aligned and quantified by using our novel software. As an example, the high-throughput short oligonucleotide sequencing assay was used to analyze the types and frequencies of mutations upstream, downstream and at a site-specifically placed cis–syn thymidine–thymidine dimer generated individually by three lesion-bypass human Y-family DNA polymerases. PMID:23470999
Droplet barcoding for single-cell transcriptomics applied to embryonic stem cells.
Klein, Allon M; Mazutis, Linas; Akartuna, Ilke; Tallapragada, Naren; Veres, Adrian; Li, Victor; Peshkin, Leonid; Weitz, David A; Kirschner, Marc W
2015-05-21
It has long been the dream of biologists to map gene expression at the single-cell level. With such data one might track heterogeneous cell sub-populations, and infer regulatory relationships between genes and pathways. Recently, RNA sequencing has achieved single-cell resolution. What is limiting is an effective way to routinely isolate and process large numbers of individual cells for quantitative in-depth sequencing. We have developed a high-throughput droplet-microfluidic approach for barcoding the RNA from thousands of individual cells for subsequent analysis by next-generation sequencing. The method shows a surprisingly low noise profile and is readily adaptable to other sequencing-based assays. We analyzed mouse embryonic stem cells, revealing in detail the population structure and the heterogeneous onset of differentiation after leukemia inhibitory factor (LIF) withdrawal. The reproducibility of these high-throughput single-cell data allowed us to deconstruct cell populations and infer gene expression relationships. VIDEO ABSTRACT. Copyright © 2015 Elsevier Inc. All rights reserved.
High-throughput full-length single-cell mRNA-seq of rare cells.
Ooi, Chin Chun; Mantalas, Gary L; Koh, Winston; Neff, Norma F; Fuchigami, Teruaki; Wong, Dawson J; Wilson, Robert J; Park, Seung-Min; Gambhir, Sanjiv S; Quake, Stephen R; Wang, Shan X
2017-01-01
Single-cell characterization techniques, such as mRNA-seq, have been applied to a diverse range of applications in cancer biology, yielding great insight into mechanisms leading to therapy resistance and tumor clonality. While single-cell techniques can yield a wealth of information, a common bottleneck is the lack of throughput, with many current processing methods being limited to the analysis of small volumes of single cell suspensions with cell densities on the order of 107 per mL. In this work, we present a high-throughput full-length mRNA-seq protocol incorporating a magnetic sifter and magnetic nanoparticle-antibody conjugates for rare cell enrichment, and Smart-seq2 chemistry for sequencing. We evaluate the efficiency and quality of this protocol with a simulated circulating tumor cell system, whereby non-small-cell lung cancer cell lines (NCI-H1650 and NCI-H1975) are spiked into whole blood, before being enriched for single-cell mRNA-seq by EpCAM-functionalized magnetic nanoparticles and the magnetic sifter. We obtain high efficiency (> 90%) capture and release of these simulated rare cells via the magnetic sifter, with reproducible transcriptome data. In addition, while mRNA-seq data is typically only used for gene expression analysis of transcriptomic data, we demonstrate the use of full-length mRNA-seq chemistries like Smart-seq2 to facilitate variant analysis of expressed genes. This enables the use of mRNA-seq data for differentiating cells in a heterogeneous population by both their phenotypic and variant profile. In a simulated heterogeneous mixture of circulating tumor cells in whole blood, we utilize this high-throughput protocol to differentiate these heterogeneous cells by both their phenotype (lung cancer versus white blood cells), and mutational profile (H1650 versus H1975 cells), in a single sequencing run. This high-throughput method can help facilitate single-cell analysis of rare cell populations, such as circulating tumor or endothelial cells, with demonstrably high-quality transcriptomic data.
fluff: exploratory analysis and visualization of high-throughput sequencing data
Georgiou, Georgios
2016-01-01
Summary. In this article we describe fluff, a software package that allows for simple exploration, clustering and visualization of high-throughput sequencing data mapped to a reference genome. The package contains three command-line tools to generate publication-quality figures in an uncomplicated manner using sensible defaults. Genome-wide data can be aggregated, clustered and visualized in a heatmap, according to different clustering methods. This includes a predefined setting to identify dynamic clusters between different conditions or developmental stages. Alternatively, clustered data can be visualized in a bandplot. Finally, fluff includes a tool to generate genomic profiles. As command-line tools, the fluff programs can easily be integrated into standard analysis pipelines. The installation is straightforward and documentation is available at http://fluff.readthedocs.org. Availability. fluff is implemented in Python and runs on Linux. The source code is freely available for download at https://github.com/simonvh/fluff. PMID:27547532
Egorov, Evgeny S; Merzlyak, Ekaterina M; Shelenkov, Andrew A; Britanova, Olga V; Sharonov, George V; Staroverov, Dmitriy B; Bolotin, Dmitriy A; Davydov, Alexey N; Barsova, Ekaterina; Lebedev, Yuriy B; Shugay, Mikhail; Chudakov, Dmitriy M
2015-06-15
Emerging high-throughput sequencing methods for the analyses of complex structure of TCR and BCR repertoires give a powerful impulse to adaptive immunity studies. However, there are still essential technical obstacles for performing a truly quantitative analysis. Specifically, it remains challenging to obtain comprehensive information on the clonal composition of small lymphocyte populations, such as Ag-specific, functional, or tissue-resident cell subsets isolated by sorting, microdissection, or fine needle aspirates. In this study, we report a robust approach based on unique molecular identifiers that allows profiling Ag receptors for several hundred to thousand lymphocytes while preserving qualitative and quantitative information on clonal composition of the sample. We also describe several general features regarding the data analysis with unique molecular identifiers that are critical for accurate counting of starting molecules in high-throughput sequencing applications. Copyright © 2015 by The American Association of Immunologists, Inc.
Aryee, Martin J.; Jaffe, Andrew E.; Corrada-Bravo, Hector; Ladd-Acosta, Christine; Feinberg, Andrew P.; Hansen, Kasper D.; Irizarry, Rafael A.
2014-01-01
Motivation: The recently released Infinium HumanMethylation450 array (the ‘450k’ array) provides a high-throughput assay to quantify DNA methylation (DNAm) at ∼450 000 loci across a range of genomic features. Although less comprehensive than high-throughput sequencing-based techniques, this product is more cost-effective and promises to be the most widely used DNAm high-throughput measurement technology over the next several years. Results: Here we describe a suite of computational tools that incorporate state-of-the-art statistical techniques for the analysis of DNAm data. The software is structured to easily adapt to future versions of the technology. We include methods for preprocessing, quality assessment and detection of differentially methylated regions from the kilobase to the megabase scale. We show how our software provides a powerful and flexible development platform for future methods. We also illustrate how our methods empower the technology to make discoveries previously thought to be possible only with sequencing-based methods. Availability and implementation: http://bioconductor.org/packages/release/bioc/html/minfi.html. Contact: khansen@jhsph.edu; rafa@jimmy.harvard.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:24478339
Pyicos: a versatile toolkit for the analysis of high-throughput sequencing data
Althammer, Sonja; González-Vallinas, Juan; Ballaré, Cecilia; Beato, Miguel; Eyras, Eduardo
2011-01-01
Motivation: High-throughput sequencing (HTS) has revolutionized gene regulation studies and is now fundamental for the detection of protein–DNA and protein–RNA binding, as well as for measuring RNA expression. With increasing variety and sequencing depth of HTS datasets, the need for more flexible and memory-efficient tools to analyse them is growing. Results: We describe Pyicos, a powerful toolkit for the analysis of mapped reads from diverse HTS experiments: ChIP-Seq, either punctuated or broad signals, CLIP-Seq and RNA-Seq. We prove the effectiveness of Pyicos to select for significant signals and show that its accuracy is comparable and sometimes superior to that of methods specifically designed for each particular type of experiment. Pyicos facilitates the analysis of a variety of HTS datatypes through its flexibility and memory efficiency, providing a useful framework for data integration into models of regulatory genomics. Availability: Open-source software, with tutorials and protocol files, is available at http://regulatorygenomics.upf.edu/pyicos or as a Galaxy server at http://regulatorygenomics.upf.edu/galaxy Contact: eduardo.eyras@upf.edu Supplementary Information: Supplementary data are available at Bioinformatics online. PMID:21994224
Fractal-like Distributions over the Rational Numbers in High-throughput Biological and Clinical Data
NASA Astrophysics Data System (ADS)
Trifonov, Vladimir; Pasqualucci, Laura; Dalla-Favera, Riccardo; Rabadan, Raul
2011-12-01
Recent developments in extracting and processing biological and clinical data are allowing quantitative approaches to studying living systems. High-throughput sequencing (HTS), expression profiles, proteomics, and electronic health records (EHR) are some examples of such technologies. Extracting meaningful information from those technologies requires careful analysis of the large volumes of data they produce. In this note, we present a set of fractal-like distributions that commonly appear in the analysis of such data. The first set of examples are drawn from a HTS experiment. Here, the distributions appear as part of the evaluation of the error rate of the sequencing and the identification of tumorogenic genomic alterations. The other examples are obtained from risk factor evaluation and analysis of relative disease prevalence and co-mordbidity as these appear in EHR. The distributions are also relevant to identification of subclonal populations in tumors and the study of quasi-species and intrahost diversity of viral populations.
Genome sequencing in microfabricated high-density picolitre reactors.
Margulies, Marcel; Egholm, Michael; Altman, William E; Attiya, Said; Bader, Joel S; Bemben, Lisa A; Berka, Jan; Braverman, Michael S; Chen, Yi-Ju; Chen, Zhoutao; Dewell, Scott B; Du, Lei; Fierro, Joseph M; Gomes, Xavier V; Godwin, Brian C; He, Wen; Helgesen, Scott; Ho, Chun Heen; Ho, Chun He; Irzyk, Gerard P; Jando, Szilveszter C; Alenquer, Maria L I; Jarvie, Thomas P; Jirage, Kshama B; Kim, Jong-Bum; Knight, James R; Lanza, Janna R; Leamon, John H; Lefkowitz, Steven M; Lei, Ming; Li, Jing; Lohman, Kenton L; Lu, Hong; Makhijani, Vinod B; McDade, Keith E; McKenna, Michael P; Myers, Eugene W; Nickerson, Elizabeth; Nobile, John R; Plant, Ramona; Puc, Bernard P; Ronan, Michael T; Roth, George T; Sarkis, Gary J; Simons, Jan Fredrik; Simpson, John W; Srinivasan, Maithreyan; Tartaro, Karrie R; Tomasz, Alexander; Vogt, Kari A; Volkmer, Greg A; Wang, Shally H; Wang, Yong; Weiner, Michael P; Yu, Pengguang; Begley, Richard F; Rothberg, Jonathan M
2005-09-15
The proliferation of large-scale DNA-sequencing projects in recent years has driven a search for alternative methods to reduce time and cost. Here we describe a scalable, highly parallel sequencing system with raw throughput significantly greater than that of state-of-the-art capillary electrophoresis instruments. The apparatus uses a novel fibre-optic slide of individual wells and is able to sequence 25 million bases, at 99% or better accuracy, in one four-hour run. To achieve an approximately 100-fold increase in throughput over current Sanger sequencing technology, we have developed an emulsion method for DNA amplification and an instrument for sequencing by synthesis using a pyrosequencing protocol optimized for solid support and picolitre-scale volumes. Here we show the utility, throughput, accuracy and robustness of this system by shotgun sequencing and de novo assembly of the Mycoplasma genitalium genome with 96% coverage at 99.96% accuracy in one run of the machine.
2013-01-01
Following recent trends in environmental microbiology, food microbiology has benefited from the advances in molecular biology and adopted novel strategies to detect, identify, and monitor microbes in food. An in-depth study of the microbial diversity in food can now be achieved by using high-throughput sequencing (HTS) approaches after direct nucleic acid extraction from the sample to be studied. In this review, the workflow of applying culture-independent HTS to food matrices is described. The current scenario and future perspectives of HTS uses to study food microbiota are presented, and the decision-making process leading to the best choice of working conditions to fulfill the specific needs of food research is described. PMID:23475615
Whole Wiskott‑Aldrich syndrome protein gene deletion identified by high throughput sequencing.
He, Xiangling; Zou, Runying; Zhang, Bing; You, Yalan; Yang, Yang; Tian, Xin
2017-11-01
Wiskott‑Aldrich syndrome (WAS) is a rare X‑linked recessive immunodeficiency disorder, characterized by thrombocytopenia, small platelets, eczema and recurrent infections associated with increased risk of autoimmunity and malignancy disorders. Mutations in the WAS protein (WASP) gene are responsible for WAS. To date, WASP mutations, including missense/nonsense, splicing, small deletions, small insertions, gross deletions, and gross insertions have been identified in patients with WAS. In addition, WASP‑interacting proteins are suspected in patients with clinical features of WAS, in whom the WASP gene sequence and mRNA levels are normal. The present study aimed to investigate the application of next generation sequencing in definitive diagnosis and clinical therapy for WAS. A 5 month‑old child with WAS who displayed symptoms of thrombocytopenia was examined. Whole exome sequence analysis of genomic DNA showed that the coverage and depth of WASP were extremely low. Quantitative polymerase chain reaction indicated total WASP gene deletion in the proband. In conclusion, high throughput sequencing is useful for the verification of WAS on the genetic profile, and has implications for family planning guidance and establishment of clinical programs.
Zador, Anthony M.; Dubnau, Joshua; Oyibo, Hassana K.; Zhan, Huiqing; Cao, Gang; Peikon, Ian D.
2012-01-01
Connectivity determines the function of neural circuits. Historically, circuit mapping has usually been viewed as a problem of microscopy, but no current method can achieve high-throughput mapping of entire circuits with single neuron precision. Here we describe a novel approach to determining connectivity. We propose BOINC (“barcoding of individual neuronal connections”), a method for converting the problem of connectivity into a form that can be read out by high-throughput DNA sequencing. The appeal of using sequencing is that its scale—sequencing billions of nucleotides per day is now routine—is a natural match to the complexity of neural circuits. An inexpensive high-throughput technique for establishing circuit connectivity at single neuron resolution could transform neuroscience research. PMID:23109909
Quantitative phenotyping via deep barcode sequencing
Smith, Andrew M.; Heisler, Lawrence E.; Mellor, Joseph; Kaper, Fiona; Thompson, Michael J.; Chee, Mark; Roth, Frederick P.; Giaever, Guri; Nislow, Corey
2009-01-01
Next-generation DNA sequencing technologies have revolutionized diverse genomics applications, including de novo genome sequencing, SNP detection, chromatin immunoprecipitation, and transcriptome analysis. Here we apply deep sequencing to genome-scale fitness profiling to evaluate yeast strain collections in parallel. This method, Barcode analysis by Sequencing, or “Bar-seq,” outperforms the current benchmark barcode microarray assay in terms of both dynamic range and throughput. When applied to a complex chemogenomic assay, Bar-seq quantitatively identifies drug targets, with performance superior to the benchmark microarray assay. We also show that Bar-seq is well-suited for a multiplex format. We completely re-sequenced and re-annotated the yeast deletion collection using deep sequencing, found that ∼20% of the barcodes and common priming sequences varied from expectation, and used this revised list of barcode sequences to improve data quality. Together, this new assay and analysis routine provide a deep-sequencing-based toolkit for identifying gene–environment interactions on a genome-wide scale. PMID:19622793
High-Throughput Block Optical DNA Sequence Identification.
Sagar, Dodderi Manjunatha; Korshoj, Lee Erik; Hanson, Katrina Bethany; Chowdhury, Partha Pratim; Otoupal, Peter Britton; Chatterjee, Anushree; Nagpal, Prashant
2018-01-01
Optical techniques for molecular diagnostics or DNA sequencing generally rely on small molecule fluorescent labels, which utilize light with a wavelength of several hundred nanometers for detection. Developing a label-free optical DNA sequencing technique will require nanoscale focusing of light, a high-throughput and multiplexed identification method, and a data compression technique to rapidly identify sequences and analyze genomic heterogeneity for big datasets. Such a method should identify characteristic molecular vibrations using optical spectroscopy, especially in the "fingerprinting region" from ≈400-1400 cm -1 . Here, surface-enhanced Raman spectroscopy is used to demonstrate label-free identification of DNA nucleobases with multiplexed 3D plasmonic nanofocusing. While nanometer-scale mode volumes prevent identification of single nucleobases within a DNA sequence, the block optical technique can identify A, T, G, and C content in DNA k-mers. The content of each nucleotide in a DNA block can be a unique and high-throughput method for identifying sequences, genes, and other biomarkers as an alternative to single-letter sequencing. Additionally, coupling two complementary vibrational spectroscopy techniques (infrared and Raman) can improve block characterization. These results pave the way for developing a novel, high-throughput block optical sequencing method with lossy genomic data compression using k-mer identification from multiplexed optical data acquisition. © 2017 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Desai, Aarti; Marwah, Veer Singh; Yadav, Akshay; Jha, Vineet; Dhaygude, Kishor; Bangar, Ujwala; Kulkarni, Vivek; Jere, Abhay
2013-01-01
Next Generation Sequencing (NGS) is a disruptive technology that has found widespread acceptance in the life sciences research community. The high throughput and low cost of sequencing has encouraged researchers to undertake ambitious genomic projects, especially in de novo genome sequencing. Currently, NGS systems generate sequence data as short reads and de novo genome assembly using these short reads is computationally very intensive. Due to lower cost of sequencing and higher throughput, NGS systems now provide the ability to sequence genomes at high depth. However, currently no report is available highlighting the impact of high sequence depth on genome assembly using real data sets and multiple assembly algorithms. Recently, some studies have evaluated the impact of sequence coverage, error rate and average read length on genome assembly using multiple assembly algorithms, however, these evaluations were performed using simulated datasets. One limitation of using simulated datasets is that variables such as error rates, read length and coverage which are known to impact genome assembly are carefully controlled. Hence, this study was undertaken to identify the minimum depth of sequencing required for de novo assembly for different sized genomes using graph based assembly algorithms and real datasets. Illumina reads for E.coli (4.6 MB) S.kudriavzevii (11.18 MB) and C.elegans (100 MB) were assembled using SOAPdenovo, Velvet, ABySS, Meraculous and IDBA-UD. Our analysis shows that 50X is the optimum read depth for assembling these genomes using all assemblers except Meraculous which requires 100X read depth. Moreover, our analysis shows that de novo assembly from 50X read data requires only 6-40 GB RAM depending on the genome size and assembly algorithm used. We believe that this information can be extremely valuable for researchers in designing experiments and multiplexing which will enable optimum utilization of sequencing as well as analysis resources.
Bolsheva, Nadezhda L; Melnikova, Nataliya V; Kirov, Ilya V; Speranskaya, Anna S; Krinitsina, Anastasia A; Dmitriev, Alexey A; Belenikin, Maxim S; Krasnov, George S; Lakunina, Valentina A; Snezhkina, Anastasiya V; Rozhmina, Tatiana A; Samatadze, Tatiana E; Yurkevich, Olga Yu; Zoshchuk, Svyatoslav A; Amosova, Аlexandra V; Kudryavtseva, Anna V; Muravenko, Olga V
2017-12-28
The species relationships within the genus Linum have already been studied several times by means of different molecular and phylogenetic approaches. Nevertheless, a number of ambiguities in phylogeny of Linum still remain unresolved. In particular, the species relationships within the sections Stellerolinum and Dasylinum need further clarification. Also, the question of independence of the species of the section Adenolinum still remains unanswered. Moreover, the relationships of L. narbonense and other species of the section Linum require further clarification. Additionally, the origin of tetraploid species of the section Linum (2n = 30) including the cultivated species L. usitatissimum has not been explored. The present study examines the phylogeny of blue-flowered species of Linum by comparisons of 5S rRNA gene sequences as well as ITS1 and ITS2 sequences of 35S rRNA genes. High-throughput sequencing has been used for analysis of multicopy rRNA gene families. In addition to the molecular phylogenetic analysis, the number and chromosomal localization of 5S and 35S rDNA sites has been determined by FISH. Our findings confirm that L. stelleroides forms a basal branch from the clade of blue-flowered flaxes which is independent of the branch formed by species of the sect. Dasylinum. The current molecular phylogenetic approaches, the cytogenetic analysis as well as different genomic DNA fingerprinting methods applied previously did not discriminate certain species within the sect. Adenolinum. The allotetraploid cultivated species L. usitatissimum and its wild ancestor L. angustifolium (2n = 30) could originate either as the result of hybridization of two diploid species (2n = 16) related to the modern L. gandiflorum and L. decumbens, or hybridization of a diploid species (2n = 16) and a diploid ancestor of modern L. narbonense (2n = 14). High-throughput sequencing of multicopy rRNA gene families allowed us to make several adjustments to the phylogeny of blue-flowered flax species and also reveal intra- and interspecific divergence of the rRNA gene sequences.
Wright, Imogen A.; Travers, Simon A.
2014-01-01
The challenge presented by high-throughput sequencing necessitates the development of novel tools for accurate alignment of reads to reference sequences. Current approaches focus on using heuristics to map reads quickly to large genomes, rather than generating highly accurate alignments in coding regions. Such approaches are, thus, unsuited for applications such as amplicon-based analysis and the realignment phase of exome sequencing and RNA-seq, where accurate and biologically relevant alignment of coding regions is critical. To facilitate such analyses, we have developed a novel tool, RAMICS, that is tailored to mapping large numbers of sequence reads to short lengths (<10 000 bp) of coding DNA. RAMICS utilizes profile hidden Markov models to discover the open reading frame of each sequence and aligns to the reference sequence in a biologically relevant manner, distinguishing between genuine codon-sized indels and frameshift mutations. This approach facilitates the generation of highly accurate alignments, accounting for the error biases of the sequencing machine used to generate reads, particularly at homopolymer regions. Performance improvements are gained through the use of graphics processing units, which increase the speed of mapping through parallelization. RAMICS substantially outperforms all other mapping approaches tested in terms of alignment quality while maintaining highly competitive speed performance. PMID:24861618
Library preparation and data analysis packages for rapid genome sequencing.
Pomraning, Kyle R; Smith, Kristina M; Bredeweg, Erin L; Connolly, Lanelle R; Phatale, Pallavi A; Freitag, Michael
2012-01-01
High-throughput sequencing (HTS) has quickly become a valuable tool for comparative genetics and genomics and is now regularly carried out in laboratories that are not connected to large sequencing centers. Here we describe an updated version of our protocol for constructing single- and paired-end Illumina sequencing libraries, beginning with purified genomic DNA. The present protocol can also be used for "multiplexing," i.e. the analysis of several samples in a single flowcell lane by generating "barcoded" or "indexed" Illumina sequencing libraries in a way that is independent from Illumina-supported methods. To analyze sequencing results, we suggest several independent approaches but end users should be aware that this is a quickly evolving field and that currently many alignment (or "mapping") and counting algorithms are being developed and tested.
Du, Yushen; Wu, Nicholas C.; Jiang, Lin; Zhang, Tianhao; Gong, Danyang; Shu, Sara; Wu, Ting-Ting
2016-01-01
ABSTRACT Identification and annotation of functional residues are fundamental questions in protein sequence analysis. Sequence and structure conservation provides valuable information to tackle these questions. It is, however, limited by the incomplete sampling of sequence space in natural evolution. Moreover, proteins often have multiple functions, with overlapping sequences that present challenges to accurate annotation of the exact functions of individual residues by conservation-based methods. Using the influenza A virus PB1 protein as an example, we developed a method to systematically identify and annotate functional residues. We used saturation mutagenesis and high-throughput sequencing to measure the replication capacity of single nucleotide mutations across the entire PB1 protein. After predicting protein stability upon mutations, we identified functional PB1 residues that are essential for viral replication. To further annotate the functional residues important to the canonical or noncanonical functions of viral RNA-dependent RNA polymerase (vRdRp), we performed a homologous-structure analysis with 16 different vRdRp structures. We achieved high sensitivity in annotating the known canonical polymerase functional residues. Moreover, we identified a cluster of noncanonical functional residues located in the loop region of the PB1 β-ribbon. We further demonstrated that these residues were important for PB1 protein nuclear import through the interaction with Ran-binding protein 5. In summary, we developed a systematic and sensitive method to identify and annotate functional residues that are not restrained by sequence conservation. Importantly, this method is generally applicable to other proteins about which homologous-structure information is available. PMID:27803181
Assaying gene function by growth competition experiment.
Merritt, Joshua; Edwards, Jeremy S
2004-07-01
High-throughput screening and analysis is one of the emerging paradigms in biotechnology. In particular, high-throughput methods are essential in the field of functional genomics because of the vast amount of data generated in recent and ongoing genome sequencing efforts. In this report we discuss integrated functional analysis methodologies which incorporate both a growth competition component and a highly parallel assay used to quantify results of the growth competition. Several applications of the two most widely used technologies in the field, i.e., transposon mutagenesis and deletion strain library growth competition, and individual applications of several developing or less widely reported technologies are presented.
NASA Astrophysics Data System (ADS)
Campbell, T. L.; Geller, J. B.; Heller, P.; Ruiz, G.; Chang, A.; McCann, L.; Ceballos, L.; Marraffini, M.; Ashton, G.; Larson, K.; Havard, S.; Meagher, K.; Wheelock, M.; Drake, C.; Rhett, G.
2016-02-01
The Ballast Water Management Act, the Marine Invasive Species Act, and the Coastal Ecosystem Protection Act require the California Department of Fish and Wildlife to monitor and evaluate the extent of biological invasions in the state's marine and estuarine waters. This has been performed statewide, using a variety of methodologies. Conventional sample collection and processing is laborious, slow and costly, and may require considerable taxonomic expertise requiring detailed time-consuming microscopic study of multiple specimens. These factors limit the volume of biomass that can be searched for introduced species. New technologies continue to reduce the cost and increase the throughput of genetic analyses, which become efficient alternatives to traditional morphological analysis for identification, monitoring and surveillance of marine invasive species. Using next-generation sequencing of mitochondrial Cytochrome c oxidase subunit I (COI) and nuclear large subunit ribosomal RNA (LSU), we analyzed over 15,000 individual marine invertebrates collected in Californian waters. We have created sequence databases of California native and non-native species to assist in molecular identification and surveillance in North American waters. Metagenetics, the next-generation sequencing of environmental samples with comparison to DNA sequence databases, is a faster and cost-effective alternative to individual sample analysis. We have sequenced from biomass collected from whole settlement plates and plankton in California harbors, and used our introduced species database to create species lists. We can combine these species lists for individual marinas with collected environmental data, such as temperature, salinity, and dissolved oxygen to understand the ecology of marine invasions. Here we discuss high throughput sampling, sequencing, and COASTLINE, our data analysis answer to challenges working with hundreds of millions of sequencing reads from tens of thousands of specimens.
Hykin, Sarah M.; Bi, Ke; McGuire, Jimmy A.
2015-01-01
For 150 years or more, specimens were routinely collected and deposited in natural history collections without preserving fresh tissue samples for genetic analysis. In the case of most herpetological specimens (i.e. amphibians and reptiles), attempts to extract and sequence DNA from formalin-fixed, ethanol-preserved specimens—particularly for use in phylogenetic analyses—has been laborious and largely ineffective due to the highly fragmented nature of the DNA. As a result, tens of thousands of specimens in herpetological collections have not been available for sequence-based phylogenetic studies. Massively parallel High-Throughput Sequencing methods and the associated bioinformatics, however, are particularly suited to recovering meaningful genetic markers from severely degraded/fragmented DNA sequences such as DNA damaged by formalin-fixation. In this study, we compared previously published DNA extraction methods on three tissue types subsampled from formalin-fixed specimens of Anolis carolinensis, followed by sequencing. Sufficient quality DNA was recovered from liver tissue, making this technique minimally destructive to museum specimens. Sequencing was only successful for the more recently collected specimen (collected ~30 ybp). We suspect this could be due either to the conditions of preservation and/or the amount of tissue used for extraction purposes. For the successfully sequenced sample, we found a high rate of base misincorporation. After rigorous trimming, we successfully mapped 27.93% of the cleaned reads to the reference genome, were able to reconstruct the complete mitochondrial genome, and recovered an accurate phylogenetic placement for our specimen. We conclude that the amount of DNA available, which can vary depending on specimen age and preservation conditions, will determine if sequencing will be successful. The technique described here will greatly improve the value of museum collections by making many formalin-fixed specimens available for genetic analysis. PMID:26505622
Hykin, Sarah M; Bi, Ke; McGuire, Jimmy A
2015-01-01
For 150 years or more, specimens were routinely collected and deposited in natural history collections without preserving fresh tissue samples for genetic analysis. In the case of most herpetological specimens (i.e. amphibians and reptiles), attempts to extract and sequence DNA from formalin-fixed, ethanol-preserved specimens-particularly for use in phylogenetic analyses-has been laborious and largely ineffective due to the highly fragmented nature of the DNA. As a result, tens of thousands of specimens in herpetological collections have not been available for sequence-based phylogenetic studies. Massively parallel High-Throughput Sequencing methods and the associated bioinformatics, however, are particularly suited to recovering meaningful genetic markers from severely degraded/fragmented DNA sequences such as DNA damaged by formalin-fixation. In this study, we compared previously published DNA extraction methods on three tissue types subsampled from formalin-fixed specimens of Anolis carolinensis, followed by sequencing. Sufficient quality DNA was recovered from liver tissue, making this technique minimally destructive to museum specimens. Sequencing was only successful for the more recently collected specimen (collected ~30 ybp). We suspect this could be due either to the conditions of preservation and/or the amount of tissue used for extraction purposes. For the successfully sequenced sample, we found a high rate of base misincorporation. After rigorous trimming, we successfully mapped 27.93% of the cleaned reads to the reference genome, were able to reconstruct the complete mitochondrial genome, and recovered an accurate phylogenetic placement for our specimen. We conclude that the amount of DNA available, which can vary depending on specimen age and preservation conditions, will determine if sequencing will be successful. The technique described here will greatly improve the value of museum collections by making many formalin-fixed specimens available for genetic analysis.
Pseudouridines have context-dependent mutation and stop rates in high-throughput sequencing.
Zhou, Katherine I; Clark, Wesley C; Pan, David W; Eckwahl, Matthew J; Dai, Qing; Pan, Tao
2018-05-11
The abundant RNA modification pseudouridine (Ψ) has been mapped transcriptome-wide by chemically modifying pseudouridines with carbodiimide and detecting the resulting reverse transcription stops in high-throughput sequencing. However, these methods have limited sensitivity and specificity, in part due to the use of reverse transcription stops. We sought to use mutations rather than just stops in sequencing data to identify pseudouridine sites. Here, we identify reverse transcription conditions that allow read-through of carbodiimide-modified pseudouridine (CMC-Ψ), and we show that pseudouridines in carbodiimide-treated human ribosomal RNA have context-dependent mutation and stop rates in high-throughput sequencing libraries prepared under these conditions. Furthermore, accounting for the context-dependence of mutation and stop rates can enhance the detection of pseudouridine sites. Similar approaches could contribute to the sequencing-based detection of many RNA modifications.
Matsumura, Emilyn E; Coletta-Filho, Helvecio D; Nouri, Shahideh; Falk, Bryce W; Nerva, Luca; Oliveira, Tiago S; Dorta, Silvia O; Machado, Marcos A
2017-04-24
Citrus sudden death (CSD) has caused the death of approximately four million orange trees in a very important citrus region in Brazil. Although its etiology is still not completely clear, symptoms and distribution of affected plants indicate a viral disease. In a search for viruses associated with CSD, we have performed a comparative high-throughput sequencing analysis of the transcriptome and small RNAs from CSD-symptomatic and -asymptomatic plants using the Illumina platform. The data revealed mixed infections that included Citrus tristeza virus (CTV) as the most predominant virus, followed by the Citrus sudden death-associated virus (CSDaV), Citrus endogenous pararetrovirus (CitPRV) and two putative novel viruses tentatively named Citrus jingmen-like virus (CJLV), and Citrus virga-like virus (CVLV). The deep sequencing analyses were sensitive enough to differentiate two genotypes of both viruses previously associated with CSD-affected plants: CTV and CSDaV. Our data also showed a putative association of the CSD-symptomatic plants with a specific CSDaV genotype and a likely association with CitPRV as well, whereas the two putative novel viruses showed to be more associated with CSD-asymptomatic plants. This is the first high-throughput sequencing-based study of the viral sequences present in CSD-affected citrus plants, and generated valuable information for further CSD studies.
Niland, Courtney N.; Jankowsky, Eckhard; Harris, Michael E.
2016-01-01
Quantification of the specificity of RNA binding proteins and RNA processing enzymes is essential to understanding their fundamental roles in biological processes. High Throughput Sequencing Kinetics (HTS-Kin) uses high throughput sequencing and internal competition kinetics to simultaneously monitor the processing rate constants of thousands of substrates by RNA processing enzymes. This technique has provided unprecedented insight into the substrate specificity of the tRNA processing endonuclease ribonuclease P. Here, we investigate the accuracy and robustness of measurements associated with each step of the HTS-Kin procedure. We examine the effect of substrate concentration on the observed rate constant, determine the optimal kinetic parameters, and provide guidelines for reducing error in amplification of the substrate population. Importantly, we find that high-throughput sequencing, and experimental reproducibility contribute their own sources of error, and these are the main sources of imprecision in the quantified results when otherwise optimized guidelines are followed. PMID:27296633
Gold nanoparticles for high-throughput genotyping of long-range haplotypes
NASA Astrophysics Data System (ADS)
Chen, Peng; Pan, Dun; Fan, Chunhai; Chen, Jianhua; Huang, Ke; Wang, Dongfang; Zhang, Honglu; Li, You; Feng, Guoyin; Liang, Peiji; He, Lin; Shi, Yongyong
2011-10-01
Completion of the Human Genome Project and the HapMap Project has led to increasing demands for mapping complex traits in humans to understand the aetiology of diseases. Identifying variations in the DNA sequence, which affect how we develop disease and respond to pathogens and drugs, is important for this purpose, but it is difficult to identify these variations in large sample sets. Here we show that through a combination of capillary sequencing and polymerase chain reaction assisted by gold nanoparticles, it is possible to identify several DNA variations that are associated with age-related macular degeneration and psoriasis on significant regions of human genomic DNA. Our method is accurate and promising for large-scale and high-throughput genetic analysis of susceptibility towards disease and drug resistance.
Transcriptome analysis by strand-specific sequencing of complementary DNA
Parkhomchuk, Dmitri; Borodina, Tatiana; Amstislavskiy, Vyacheslav; Banaru, Maria; Hallen, Linda; Krobitsch, Sylvia; Lehrach, Hans; Soldatov, Alexey
2009-01-01
High-throughput complementary DNA sequencing (RNA-Seq) is a powerful tool for whole-transcriptome analysis, supplying information about a transcript's expression level and structure. However, it is difficult to determine the polarity of transcripts, and therefore identify which strand is transcribed. Here, we present a simple cDNA sequencing protocol that preserves information about a transcript's direction. Using Saccharomyces cerevisiae and mouse brain transcriptomes as models, we demonstrate that knowing the transcript's orientation allows more accurate determination of the structure and expression of genes. It also helps to identify new genes and enables studying promoter-associated and antisense transcription. The transcriptional landscapes we obtained are available online. PMID:19620212
Transcriptome analysis by strand-specific sequencing of complementary DNA.
Parkhomchuk, Dmitri; Borodina, Tatiana; Amstislavskiy, Vyacheslav; Banaru, Maria; Hallen, Linda; Krobitsch, Sylvia; Lehrach, Hans; Soldatov, Alexey
2009-10-01
High-throughput complementary DNA sequencing (RNA-Seq) is a powerful tool for whole-transcriptome analysis, supplying information about a transcript's expression level and structure. However, it is difficult to determine the polarity of transcripts, and therefore identify which strand is transcribed. Here, we present a simple cDNA sequencing protocol that preserves information about a transcript's direction. Using Saccharomyces cerevisiae and mouse brain transcriptomes as models, we demonstrate that knowing the transcript's orientation allows more accurate determination of the structure and expression of genes. It also helps to identify new genes and enables studying promoter-associated and antisense transcription. The transcriptional landscapes we obtained are available online.
Discovery of 100K SNP array and its utilization in sugarcane
USDA-ARS?s Scientific Manuscript database
Next generation sequencing (NGS) enable us to identify thousands of single nucleotide polymorphisms (SNPs) marker for genotyping and fingerprinting. However, the process requires very precise bioinformatics analysis and filtering process. High throughput SNP array with predefined genomic location co...
Analysis, annotation, and profiling of the oat seed transcriptome
USDA-ARS?s Scientific Manuscript database
Novel high-throughput next generation sequencing (NGS) technologies are providing opportunities to explore genomes and transcriptomes in a cost-effective manner. To construct a gene expression atlas of developing oat (Avena sativa) seeds, two software packages specifically designed for RNA-seq (Trin...
Pyrosequencing the Canine Faecal Microbiota: Breadth and Depth of Biodiversity
Hand, Daniel; Wallis, Corrin; Colyer, Alison; Penn, Charles W.
2013-01-01
Mammalian intestinal microbiota remain poorly understood despite decades of interest and investigation by culture-based and other long-established methodologies. Using high-throughput sequencing technology we now report a detailed analysis of canine faecal microbiota. The study group of animals comprised eleven healthy adult miniature Schnauzer dogs of mixed sex and age, some closely related and all housed in kennel and pen accommodation on the same premises with similar feeding and exercise regimes. DNA was extracted from faecal specimens and subjected to PCR amplification of 16S rDNA, followed by sequencing of the 5′ region that included variable regions V1 and V2. Barcoded amplicons were sequenced by Roche-454 FLX high-throughput pyrosequencing. Sequences were assigned to taxa using the Ribosomal Database Project Bayesian classifier and revealed dominance of Fusobacterium and Bacteroidetes phyla. Differences between animals in the proportions of different taxa, among 10,000 reads per animal, were clear and not supportive of the concept of a “core microbiota”. Despite this variability in prominent genera, littermates were shown to have a more similar faecal microbial composition than unrelated dogs. Diversity of the microbiota was also assessed by assignment of sequence reads into operational taxonomic units (OTUs) at the level of 97% sequence identity. The OTU data were then subjected to rarefaction analysis and determination of Chao1 richness estimates. The data indicated that faecal microbiota comprised possibly as many as 500 to 1500 OTUs. PMID:23382835
Networking Omic Data to Envisage Systems Biological Regulation.
Kalapanulak, Saowalak; Saithong, Treenut; Thammarongtham, Chinae
To understand how biological processes work, it is necessary to explore the systematic regulation governing the behaviour of the processes. Not only driving the normal behavior of organisms, the systematic regulation evidently underlies the temporal responses to surrounding environments (dynamics) and long-term phenotypic adaptation (evolution). The systematic regulation is, in effect, formulated from the regulatory components which collaboratively work together as a network. In the drive to decipher such a code of lives, a spectrum of technologies has continuously been developed in the post-genomic era. With current advances, high-throughput sequencing technologies are tremendously powerful for facilitating genomics and systems biology studies in the attempt to understand system regulation inside the cells. The ability to explore relevant regulatory components which infer transcriptional and signaling regulation, driving core cellular processes, is thus enhanced. This chapter reviews high-throughput sequencing technologies, including second and third generation sequencing technologies, which support the investigation of genomics and transcriptomics data. Utilization of this high-throughput data to form the virtual network of systems regulation is explained, particularly transcriptional regulatory networks. Analysis of the resulting regulatory networks could lead to an understanding of cellular systems regulation at the mechanistic and dynamics levels. The great contribution of the biological networking approach to envisage systems regulation is finally demonstrated by a broad range of examples.
Lee, Byungwook; Kim, Taehyung; Kim, Seon-Kyu; Lee, Kwang H; Lee, Doheon
2007-01-01
With the advent of automated and high-throughput techniques, the number of patent applications containing biological sequences has been increasing rapidly. However, they have attracted relatively little attention compared to other sequence resources. We have built a database server called Patome, which contains biological sequence data disclosed in patents and published applications, as well as their analysis information. The analysis is divided into two steps. The first is an annotation step in which the disclosed sequences were annotated with RefSeq database. The second is an association step where the sequences were linked to Entrez Gene, OMIM and GO databases, and their results were saved as a gene-patent table. From the analysis, we found that 55% of human genes were associated with patenting. The gene-patent table can be used to identify whether a particular gene or disease is related to patenting. Patome is available at http://www.patome.org/; the information is updated bimonthly.
Lee, Byungwook; Kim, Taehyung; Kim, Seon-Kyu; Lee, Kwang H.; Lee, Doheon
2007-01-01
With the advent of automated and high-throughput techniques, the number of patent applications containing biological sequences has been increasing rapidly. However, they have attracted relatively little attention compared to other sequence resources. We have built a database server called Patome, which contains biological sequence data disclosed in patents and published applications, as well as their analysis information. The analysis is divided into two steps. The first is an annotation step in which the disclosed sequences were annotated with RefSeq database. The second is an association step where the sequences were linked to Entrez Gene, OMIM and GO databases, and their results were saved as a gene–patent table. From the analysis, we found that 55% of human genes were associated with patenting. The gene–patent table can be used to identify whether a particular gene or disease is related to patenting. Patome is available at ; the information is updated bimonthly. PMID:17085479
Kumar, Dhananjay; Dutta, Summi; Singh, Dharmendra; Prabhu, Kumble Vinod; Kumar, Manish; Mukhopadhyay, Kunal
2017-01-01
Deep sequencing identified 497 conserved and 559 novel miRNAs in wheat, while degradome analysis revealed 701 targets genes. QRT-PCR demonstrated differential expression of miRNAs during stages of leaf rust progression. Bread wheat (Triticum aestivum L.) is an important cereal food crop feeding 30 % of the world population. Major threat to wheat production is the rust epidemics. This study was targeted towards identification and functional characterizations of micro(mi)RNAs and their target genes in wheat in response to leaf rust ingression. High-throughput sequencing was used for transcriptome-wide identification of miRNAs and their expression profiling in retort to leaf rust using mock and pathogen-inoculated resistant and susceptible near-isogenic wheat plants. A total of 1056 mature miRNAs were identified, of which 497 miRNAs were conserved and 559 miRNAs were novel. The pathogen-inoculated resistant plants manifested more miRNAs compared with the pathogen infected susceptible plants. The miRNA counts increased in susceptible isoline due to leaf rust, conversely, the counts decreased in the resistant isoline in response to pathogenesis illustrating precise spatial tuning of miRNAs during compatible and incompatible interaction. Stem-loop quantitative real-time PCR was used to profile 10 highly differentially expressed miRNAs obtained from high-throughput sequencing data. The spatio-temporal profiling validated the differential expression of miRNAs between the isolines as well as in retort to pathogen infection. Degradome analysis provided 701 predicted target genes associated with defense response, signal transduction, development, metabolism, and transcriptional regulation. The obtained results indicate that wheat isolines employ diverse arrays of miRNAs that modulate their target genes during compatible and incompatible interaction. Our findings contribute to increase knowledge on roles of microRNA in wheat-leaf rust interactions and could help in rust resistance breeding programs.
Notredame, Cedric
2018-05-02
Cedric Notredame from the Centre for Genomic Regulation gives a presentation on New Challenges of the Computation of Multiple Sequence Alignments in the High-Throughput Era at the JGI/Argonne HPC Workshop on January 26, 2010.
GeneSCF: a real-time based functional enrichment tool with support for multiple organisms.
Subhash, Santhilal; Kanduri, Chandrasekhar
2016-09-13
High-throughput technologies such as ChIP-sequencing, RNA-sequencing, DNA sequencing and quantitative metabolomics generate a huge volume of data. Researchers often rely on functional enrichment tools to interpret the biological significance of the affected genes from these high-throughput studies. However, currently available functional enrichment tools need to be updated frequently to adapt to new entries from the functional database repositories. Hence there is a need for a simplified tool that can perform functional enrichment analysis by using updated information directly from the source databases such as KEGG, Reactome or Gene Ontology etc. In this study, we focused on designing a command-line tool called GeneSCF (Gene Set Clustering based on Functional annotations), that can predict the functionally relevant biological information for a set of genes in a real-time updated manner. It is designed to handle information from more than 4000 organisms from freely available prominent functional databases like KEGG, Reactome and Gene Ontology. We successfully employed our tool on two of published datasets to predict the biologically relevant functional information. The core features of this tool were tested on Linux machines without the need for installation of more dependencies. GeneSCF is more reliable compared to other enrichment tools because of its ability to use reference functional databases in real-time to perform enrichment analysis. It is an easy-to-integrate tool with other pipelines available for downstream analysis of high-throughput data. More importantly, GeneSCF can run multiple gene lists simultaneously on different organisms thereby saving time for the users. Since the tool is designed to be ready-to-use, there is no need for any complex compilation and installation procedures.
Vinner, Lasse; Mourier, Tobias; Friis-Nielsen, Jens; Gniadecki, Robert; Dybkaer, Karen; Rosenberg, Jacob; Langhoff, Jill Levin; Cruz, David Flores Santa; Fonager, Jannik; Izarzugaza, Jose M G; Gupta, Ramneek; Sicheritz-Ponten, Thomas; Brunak, Søren; Willerslev, Eske; Nielsen, Lars Peter; Hansen, Anders Johannes
2015-08-19
Although nearly one fifth of all human cancers have an infectious aetiology, the causes for the majority of cancers remain unexplained. Despite the enormous data output from high-throughput shotgun sequencing, viral DNA in a clinical sample typically constitutes a proportion of host DNA that is too small to be detected. Sequence variation among virus genomes complicates application of sequence-specific, and highly sensitive, PCR methods. Therefore, we aimed to develop and characterize a method that permits sensitive detection of sequences despite considerable variation. We demonstrate that our low-stringency in-solution hybridization method enables detection of <100 viral copies. Furthermore, distantly related proviral sequences may be enriched by orders of magnitude, enabling discovery of hitherto unknown viral sequences by high-throughput sequencing. The sensitivity was sufficient to detect retroviral sequences in clinical samples. We used this method to conduct an investigation for novel retrovirus in samples from three cancer types. In accordance with recent studies our investigation revealed no retroviral infections in human B-cell lymphoma cells, cutaneous T-cell lymphoma or colorectal cancer biopsies. Nonetheless, our generally applicable method makes sensitive detection possible and permits sequencing of distantly related sequences from complex material.
Targeted Capture and High-Throughput Sequencing Using Molecular Inversion Probes (MIPs).
Cantsilieris, Stuart; Stessman, Holly A; Shendure, Jay; Eichler, Evan E
2017-01-01
Molecular inversion probes (MIPs) in combination with massively parallel DNA sequencing represent a versatile, yet economical tool for targeted sequencing of genomic DNA. Several thousand genomic targets can be selectively captured using long oligonucleotides containing unique targeting arms and universal linkers. The ability to append sequencing adaptors and sample-specific barcodes allows large-scale pooling and subsequent high-throughput sequencing at relatively low cost per sample. Here, we describe a "wet bench" protocol detailing the capture and subsequent sequencing of >2000 genomic targets from 192 samples, representative of a single lane on the Illumina HiSeq 2000 platform.
Restructuring of the Aquatic Bacterial Community by Hydric Dynamics Associated with Superstorm Sandy
Ulrich, Nikea; Rosenberger, Abigail; Brislawn, Colin; Wright, Justin; Kessler, Collin; Toole, David; Solomon, Caroline; Strutt, Steven; McClure, Erin
2016-01-01
ABSTRACT Bacterial community composition and longitudinal fluctuations were monitored in a riverine system during and after Superstorm Sandy to better characterize inter- and intracommunity responses associated with the disturbance associated with a 100-year storm event. High-throughput sequencing of the 16S rRNA gene was used to assess microbial community structure within water samples from Muddy Creek Run, a second-order stream in Huntingdon, PA, at 12 different time points during the storm event (29 October to 3 November 2012) and under seasonally matched baseline conditions. High-throughput sequencing of the 16S rRNA gene was used to track changes in bacterial community structure and divergence during and after Superstorm Sandy. Bacterial community dynamics were correlated to measured physicochemical parameters and fecal indicator bacteria (FIB) concentrations. Bioinformatics analyses of 2.1 million 16S rRNA gene sequences revealed a significant increase in bacterial diversity in samples taken during peak discharge of the storm. Beta-diversity analyses revealed longitudinal shifts in the bacterial community structure. Successional changes were observed, in which Betaproteobacteria and Gammaproteobacteria decreased in 16S rRNA gene relative abundance, while the relative abundance of members of the Firmicutes increased. Furthermore, 16S rRNA gene sequences matching pathogenic bacteria, including strains of Legionella, Campylobacter, Arcobacter, and Helicobacter, as well as bacteria of fecal origin (e.g., Bacteroides), exhibited an increase in abundance after peak discharge of the storm. This study revealed a significant restructuring of in-stream bacterial community structure associated with hydric dynamics of a storm event. IMPORTANCE In order to better understand the microbial risks associated with freshwater environments during a storm event, a more comprehensive understanding of the variations in aquatic bacterial diversity is warranted. This study investigated the bacterial communities during and after Superstorm Sandy to provide fine time point resolution of dynamic changes in bacterial composition. This study adds to the current literature by revealing the variation in bacterial community structure during the course of a storm. This study employed high-throughput DNA sequencing, which generated a deep analysis of inter- and intracommunity responses during a significant storm event. This study has highlighted the utility of applying high-throughput sequencing for water quality monitoring purposes, as this approach enabled a more comprehensive investigation of the bacterial community structure. Altogether, these data suggest a drastic restructuring of the stream bacterial community during a storm event and highlight the potential of high-throughput sequencing approaches for assessing the microbiological quality of our environment. PMID:27060115
Web-based visual analysis for high-throughput genomics
2013-01-01
Background Visualization plays an essential role in genomics research by making it possible to observe correlations and trends in large datasets as well as communicate findings to others. Visual analysis, which combines visualization with analysis tools to enable seamless use of both approaches for scientific investigation, offers a powerful method for performing complex genomic analyses. However, there are numerous challenges that arise when creating rich, interactive Web-based visualizations/visual analysis applications for high-throughput genomics. These challenges include managing data flow from Web server to Web browser, integrating analysis tools and visualizations, and sharing visualizations with colleagues. Results We have created a platform simplifies the creation of Web-based visualization/visual analysis applications for high-throughput genomics. This platform provides components that make it simple to efficiently query very large datasets, draw common representations of genomic data, integrate with analysis tools, and share or publish fully interactive visualizations. Using this platform, we have created a Circos-style genome-wide viewer, a generic scatter plot for correlation analysis, an interactive phylogenetic tree, a scalable genome browser for next-generation sequencing data, and an application for systematically exploring tool parameter spaces to find good parameter values. All visualizations are interactive and fully customizable. The platform is integrated with the Galaxy (http://galaxyproject.org) genomics workbench, making it easy to integrate new visual applications into Galaxy. Conclusions Visualization and visual analysis play an important role in high-throughput genomics experiments, and approaches are needed to make it easier to create applications for these activities. Our framework provides a foundation for creating Web-based visualizations and integrating them into Galaxy. Finally, the visualizations we have created using the framework are useful tools for high-throughput genomics experiments. PMID:23758618
Re-engineering adenovirus vector systems to enable high-throughput analyses of gene function.
Stanton, Richard J; McSharry, Brian P; Armstrong, Melanie; Tomasec, Peter; Wilkinson, Gavin W G
2008-12-01
With the enhanced capacity of bioinformatics to interrogate extensive banks of sequence data, more efficient technologies are needed to test gene function predictions. Replication-deficient recombinant adenovirus (Ad) vectors are widely used in expression analysis since they provide for extremely efficient expression of transgenes in a wide range of cell types. To facilitate rapid, high-throughput generation of recombinant viruses, we have re-engineered an adenovirus vector (designated AdZ) to allow single-step, directional gene insertion using recombineering technology. Recombineering allows for direct insertion into the Ad vector of PCR products, synthesized sequences, or oligonucleotides encoding shRNAs without requirement for a transfer vector Vectors were optimized for high-throughput applications by making them "self-excising" through incorporating the I-SceI homing endonuclease into the vector removing the need to linearize vectors prior to transfection into packaging cells. AdZ vectors allow genes to be expressed in their native form or with strep, V5, or GFP tags. Insertion of tetracycline operators downstream of the human cytomegalovirus major immediate early (HCMV MIE) promoter permits silencing of transgenes in helper cells expressing the tet repressor thus making the vector compatible with the cloning of toxic gene products. The AdZ vector system is robust, straightforward, and suited to both sporadic and high-throughput applications.
Application of Genomic Technologies to the Breeding of Trees
Badenes, Maria L.; Fernández i Martí, Angel; Ríos, Gabino; Rubio-Cabetas, María J.
2016-01-01
The recent introduction of next generation sequencing (NGS) technologies represents a major revolution in providing new tools for identifying the genes and/or genomic intervals controlling important traits for selection in breeding programs. In perennial fruit trees with long generation times and large sizes of adult plants, the impact of these techniques is even more important. High-throughput DNA sequencing technologies have provided complete annotated sequences in many important tree species. Most of the high-throughput genotyping platforms described are being used for studies of genetic diversity and population structure. Dissection of complex traits became possible through the availability of genome sequences along with phenotypic variation data, which allow to elucidate the causative genetic differences that give rise to observed phenotypic variation. Association mapping facilitates the association between genetic markers and phenotype in unstructured and complex populations, identifying molecular markers for assisted selection and breeding. Also, genomic data provide in silico identification and characterization of genes and gene families related to important traits, enabling new tools for molecular marker assisted selection in tree breeding. Deep sequencing of transcriptomes is also a powerful tool for the analysis of precise expression levels of each gene in a sample. It consists in quantifying short cDNA reads, obtained by NGS technologies, in order to compare the entire transcriptomes between genotypes and environmental conditions. The miRNAs are non-coding short RNAs involved in the regulation of different physiological processes, which can be identified by high-throughput sequencing of RNA libraries obtained by reverse transcription of purified short RNAs, and by in silico comparison with known miRNAs from other species. All together, NGS techniques and their applications have increased the resources for plant breeding in tree species, closing the former gap of genetic tools between trees and annual species. PMID:27895664
Application of Genomic Technologies to the Breeding of Trees.
Badenes, Maria L; Fernández I Martí, Angel; Ríos, Gabino; Rubio-Cabetas, María J
2016-01-01
The recent introduction of next generation sequencing (NGS) technologies represents a major revolution in providing new tools for identifying the genes and/or genomic intervals controlling important traits for selection in breeding programs. In perennial fruit trees with long generation times and large sizes of adult plants, the impact of these techniques is even more important. High-throughput DNA sequencing technologies have provided complete annotated sequences in many important tree species. Most of the high-throughput genotyping platforms described are being used for studies of genetic diversity and population structure. Dissection of complex traits became possible through the availability of genome sequences along with phenotypic variation data, which allow to elucidate the causative genetic differences that give rise to observed phenotypic variation. Association mapping facilitates the association between genetic markers and phenotype in unstructured and complex populations, identifying molecular markers for assisted selection and breeding. Also, genomic data provide in silico identification and characterization of genes and gene families related to important traits, enabling new tools for molecular marker assisted selection in tree breeding. Deep sequencing of transcriptomes is also a powerful tool for the analysis of precise expression levels of each gene in a sample. It consists in quantifying short cDNA reads, obtained by NGS technologies, in order to compare the entire transcriptomes between genotypes and environmental conditions. The miRNAs are non-coding short RNAs involved in the regulation of different physiological processes, which can be identified by high-throughput sequencing of RNA libraries obtained by reverse transcription of purified short RNAs, and by in silico comparison with known miRNAs from other species. All together, NGS techniques and their applications have increased the resources for plant breeding in tree species, closing the former gap of genetic tools between trees and annual species.
Young, J M; Austin, J J; Weyrich, L S
2017-02-01
Analysis of physical evidence is typically a deciding factor in forensic casework by establishing what transpired at a scene or who was involved. Forensic geoscience is an emerging multi-disciplinary science that can offer significant benefits to forensic investigations. Soil is a powerful, nearly 'ideal' contact trace evidence, as it is highly individualistic, easy to characterise, has a high transfer and retention probability, and is often overlooked in attempts to conceal evidence. However, many real-life cases encounter close proximity soil samples or soils with low inorganic content, which cannot be easily discriminated based on current physical and chemical analysis techniques. The capability to improve forensic soil discrimination, and identify key indicator taxa from soil using the organic fraction is currently lacking. The development of new DNA sequencing technologies offers the ability to generate detailed genetic profiles from soils and enhance current forensic soil analyses. Here, we discuss the use of DNA metabarcoding combined with high-throughput sequencing (HTS) technology to distinguish between soils from different locations in a forensic context. Specifically, we provide recommendations for best practice, outline the potential limitations encountered in a forensic context and describe the future directions required to integrate soil DNA analysis into casework. © FEMS 2016. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Pritchard, Leighton; Holden, Nicola J; Bielaszewska, Martina; Karch, Helge; Toth, Ian K
2012-01-01
An Escherichia coli O104:H4 outbreak in Germany in summer 2011 caused 53 deaths, over 4000 individual infections across Europe, and considerable economic, social and political impact. This outbreak was the first in a position to exploit rapid, benchtop high-throughput sequencing (HTS) technologies and crowdsourced data analysis early in its investigation, establishing a new paradigm for rapid response to disease threats. We describe a novel strategy for design of diagnostic PCR primers that exploited this rapid draft bacterial genome sequencing to distinguish between E. coli O104:H4 outbreak isolates and other pathogenic E. coli isolates, including the historical hæmolytic uræmic syndrome (HUSEC) E. coli HUSEC041 O104:H4 strain, which possesses the same serotype as the outbreak isolates. Primers were designed using a novel alignment-free strategy against eleven draft whole genome assemblies of E. coli O104:H4 German outbreak isolates from the E. coli O104:H4 Genome Analysis Crowd-Sourcing Consortium website, and a negative sequence set containing 69 E. coli chromosome and plasmid sequences from public databases. Validation in vitro against 21 'positive' E. coli O104:H4 outbreak and 32 'negative' non-outbreak EHEC isolates indicated that individual primer sets exhibited 100% sensitivity for outbreak isolates, with false positive rates of between 9% and 22%. A minimal combination of two primers discriminated between outbreak and non-outbreak E. coli isolates with 100% sensitivity and 100% specificity. Draft genomes of isolates of disease outbreak bacteria enable high throughput primer design and enhanced diagnostic performance in comparison to traditional molecular assays. Future outbreak investigations will be able to harness HTS rapidly to generate draft genome sequences and diagnostic primer sets, greatly facilitating epidemiology and clinical diagnostics. We expect that high throughput primer design strategies will enable faster, more precise responses to future disease outbreaks of bacterial origin, and help to mitigate their societal impact.
Lou, Tzu-Fang; Weidmann, Chase A; Killingsworth, Jordan; Tanaka Hall, Traci M; Goldstrohm, Aaron C; Campbell, Zachary T
2017-04-15
RNA-binding proteins (RBPs) collaborate to control virtually every aspect of RNA function. Tremendous progress has been made in the area of global assessment of RBP specificity using next-generation sequencing approaches both in vivo and in vitro. Understanding how protein-protein interactions enable precise combinatorial regulation of RNA remains a significant problem. Addressing this challenge requires tools that can quantitatively determine the specificities of both individual proteins and multimeric complexes in an unbiased and comprehensive way. One approach utilizes in vitro selection, high-throughput sequencing, and sequence-specificity landscapes (SEQRS). We outline a SEQRS experiment focused on obtaining the specificity of a multi-protein complex between Drosophila RBPs Pumilio (Pum) and Nanos (Nos). We discuss the necessary controls in this type of experiment and examine how the resulting data can be complemented with structural and cell-based reporter assays. Additionally, SEQRS data can be integrated with functional genomics data to uncover biological function. Finally, we propose extensions of the technique that will enhance our understanding of multi-protein regulatory complexes assembled onto RNA. Copyright © 2016 Elsevier Inc. All rights reserved.
Sun, Beili; Zhou, Dongrui; Tu, Jing; Lu, Zuhong
2017-01-01
The characteristics of tongue coating are very important symbols for disease diagnosis in traditional Chinese medicine (TCM) theory. As a habitat of oral microbiota, bacteria on the tongue dorsum have been proved to be the cause of many oral diseases. The high-throughput next-generation sequencing (NGS) platforms have been widely applied in the analysis of bacterial 16S rRNA gene. We developed a methodology based on genus-specific multiprimer amplification and ligation-based sequencing for microbiota analysis. In order to validate the efficiency of the approach, we thoroughly analyzed six tongue coating samples from lung cancer patients with different TCM types, and more than 600 genera of bacteria were detected by this platform. The results showed that ligation-based parallel sequencing combined with enzyme digestion and multiamplification could expand the effective length of sequencing reads and could be applied in the microbiota analysis.
Analyzing Immunoglobulin Repertoires
Chaudhary, Neha; Wesemann, Duane R.
2018-01-01
Somatic assembly of T cell receptor and B cell receptor (BCR) genes produces a vast diversity of lymphocyte antigen recognition capacity. The advent of efficient high-throughput sequencing of lymphocyte antigen receptor genes has recently generated unprecedented opportunities for exploration of adaptive immune responses. With these opportunities have come significant challenges in understanding the analysis techniques that most accurately reflect underlying biological phenomena. In this regard, sample preparation and sequence analysis techniques, which have largely been borrowed and adapted from other fields, continue to evolve. Here, we review current methods and challenges of library preparation, sequencing and statistical analysis of lymphocyte receptor repertoire studies. We discuss the general steps in the process of immune repertoire generation including sample preparation, platforms available for sequencing, processing of sequencing data, measurable features of the immune repertoire, and the statistical tools that can be used for analysis and interpretation of the data. Because BCR analysis harbors additional complexities, such as immunoglobulin (Ig) (i.e., antibody) gene somatic hypermutation and class switch recombination, the emphasis of this review is on Ig/BCR sequence analysis. PMID:29593723
Yu, Duo; Li, Yunfeng; Ming, Zhihui; Wang, Hongyong; Dong, Zhuo; Qiu, Ling; Wang, Tiejun
2018-01-01
Cervical cancer is one of the most common cancers in women worldwide. Malignant tumors develop resistance mechanisms and are less sensitive to or do not respond to irradiation. With the development of high-throughput sequencing technologies, circular RNA (circRNA) has been identified in an increasing number of diseases, especially cancers. It has been reported that circRNA can compete with microRNAs (miRNAs) to change the stability or translation of target RNAs, thus regulating gene expression at the transcriptional level. However, the role of circRNAs in cervical cancer and the radioresistance mechanisms of HeLa cells are unknown. The objective of this study is to investigate the role of circRNAs in radioresistance in HeLa cells. High-throughput sequencing and bioinformatics analysis of irradiated and sham-irradiated HeLa cells. The reliability of high-throughput RNA sequencing was validated using quantitative real-time polymerase chain reaction. The most significant circRNA functions and pathways were selected by Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) analyses. A circRNA-miRNA-target gene interaction network was used to find circRNAs associated with radioresistance. Moreover, a protein-protein interaction network was constructed to identify radioresistance-related hub proteins. High-throughput sequencing allowed the identification of 16,893 circRNAs involved in the response of HeLa cells to radiation. Compared with the control group, there were 153 differentially expressed circRNAs, of which 76 were up-regulated and 77 were down-regulated. GO covered three domains: biological process (BP), cellular component (CC) and molecular function (MF). The terms assigned to the BP domain were peptidyl-tyrosine dephosphorylation and regulation of cell migration. The identified CC terms were cell-cell adherens junction, nucleoplasm and cytosol, and the identified MF terms were protein binding and protein tyrosine phosphatase activity. The top five KEGG pathways were MAPK signaling pathway, endocytosis, axon guidance, neurotrophin signaling pathway, and SNARE interactions in vesicular transport. The protein-protein interaction analysis indicated that 19 proteins might be hub proteins. CircRNAs may play a major role in the response to radiation. These findings may improve our understanding of the role of circRNAs in radioresistance in HeLa cells and allow the development of novel therapeutic approaches.
Discovery of viruses and virus-like pathogens in pistachio using high-throughput sequencing
USDA-ARS?s Scientific Manuscript database
Pistachio (Pistacia vera L.) trees from the National Clonal Germplasm Repository (NCGR) and orchards in California were surveyed for viruses and virus-like agents by high-throughput sequencing (HTS). Analyses of 60 trees including clonal UCB-1 hybrid rootstock (P. atlantica × P. integerrima) identif...
Development and use of molecular markers: past and present.
Grover, Atul; Sharma, P C
2016-01-01
Molecular markers, due to their stability, cost-effectiveness and ease of use provide an immensely popular tool for a variety of applications including genome mapping, gene tagging, genetic diversity diversity, phylogenetic analysis and forensic investigations. In the last three decades, a number of molecular marker techniques have been developed and exploited worldwide in different systems. However, only a handful of these techniques, namely RFLPs, RAPDs, AFLPs, ISSRs, SSRs and SNPs have received global acceptance. A recent revolution in DNA sequencing techniques has taken the discovery and application of molecular markers to high-throughput and ultrahigh-throughput levels. Although, the choice of marker will obviously depend on the targeted use, microsatellites, SNPs and genotyping by sequencing (GBS) largely fulfill most of the user requirements. Further, modern transcriptomic and functional markers will lead the ventures onto high-density genetic map construction, identification of QTLs, breeding and conservation strategies in times to come in combination with other high throughput techniques. This review presents an overview of different marker technologies and their variants with a comparative account of their characteristic features and applications.
Tempo and mode of genomic mutations unveil human evolutionary history.
Hara, Yuichiro
2015-01-01
Mutations that have occurred in human genomes provide insight into various aspects of evolutionary history such as speciation events and degrees of natural selection. Comparing genome sequences between human and great apes or among humans is a feasible approach for inferring human evolutionary history. Recent advances in high-throughput or so-called 'next-generation' DNA sequencing technologies have enabled the sequencing of thousands of individual human genomes, as well as a variety of reference genomes of hominids, many of which are publicly available. These sequence data can help to unveil the detailed demographic history of the lineage leading to humans as well as the explosion of modern human population size in the last several thousand years. In addition, high-throughput sequencing illustrates the tempo and mode of de novo mutations, which are producing human genetic variation at this moment. Pedigree-based human genome sequencing has shown that mutation rates vary significantly across the human genome. These studies have also provided an improved timescale of human evolution, because the mutation rate estimated from pedigree analysis is half that estimated from traditional analyses based on molecular phylogeny. Because of the dramatic reduction in sequencing cost, sequencing on-demand samples designed for specific studies is now also becoming popular. To produce data of sufficient quality to meet the requirements of the study, it is necessary to set an explicit sequencing plan that includes the choice of sample collection methods, sequencing platforms, and number of sequence reads.
Zhang, Yan; Zhao, Fuzheng; Deng, Yongfeng; Zhao, Yanping; Ren, Hongqiang
2015-04-03
Disinfection byproducts (DBPs) in drinking water have been linked to various diseases, including colon, colorectal, rectal, and bladder cancer. Trichloroacetamide (TCAcAm) is an emerging nitrogenous DBP, and our previous study found that TCAcAm could induce some changes associated with host-gut microbiota co-metabolism. In this study, we used an integrated approach combining metagenomics, based on high-throughput sequencing, and metabolomics, based on nuclear magnetic resonance (NMR), to evaluate the toxic effects of TCAcAm exposure on the gut microbiome and urine metabolome. High-throughput sequencing revealed that the gut microbiome's composition and function were significantly altered after TCAcAm exposure for 90 days in Mus musculus mice. In addition, metabolomic analysis showed that a number of gut microbiota-related metabolites were dramatically perturbed in the urine of the mice. These results may provide novel insight into evaluating the health risk of environmental pollutants as well as revealing the potential mechanism of TCAcAm's toxic effects.
Farlora, Rodolfo; Araya-Garay, José; Gallardo-Escárate, Cristian
2014-06-01
Understanding the molecular underpinnings involved in the reproduction of the salmon louse is critical for designing novel strategies of pest management for this ectoparasite. However, genomic information on sex-related genes is still limited. In the present work, sex-specific gene transcription was revealed in the salmon louse Caligus rogercresseyi using high-throughput Illumina sequencing. A total of 30,191,914 and 32,292,250 high quality reads were generated for females and males, and these were de novo assembled into 32,173 and 38,177 contigs, respectively. Gene ontology analysis showed a pattern of higher expression in the female as compared to the male transcriptome. Based on our sequence analysis and known sex-related proteins, several genes putatively involved in sex differentiation, including Dmrt3, FOXL2, VASA, and FEM1, and other potentially significant candidate genes in C. rogercresseyi, were identified for the first time. In addition, the occurrence of SNPs in several differentially expressed contigs annotating for sex-related genes was found. This transcriptome dataset provides a useful resource for future functional analyses, opening new opportunities for sea lice pest control. Copyright © 2014 Elsevier B.V. All rights reserved.
Characterizing ncRNAs in Human Pathogenic Protists Using High-Throughput Sequencing Technology
Collins, Lesley Joan
2011-01-01
ncRNAs are key genes in many human diseases including cancer and viral infection, as well as providing critical functions in pathogenic organisms such as fungi, bacteria, viruses, and protists. Until now the identification and characterization of ncRNAs associated with disease has been slow or inaccurate requiring many years of testing to understand complicated RNA and protein gene relationships. High-throughput sequencing now offers the opportunity to characterize miRNAs, siRNAs, small nucleolar RNAs (snoRNAs), and long ncRNAs on a genomic scale, making it faster and easier to clarify how these ncRNAs contribute to the disease state. However, this technology is still relatively new, and ncRNA discovery is not an application of high priority for streamlined bioinformatics. Here we summarize background concepts and practical approaches for ncRNA analysis using high-throughput sequencing, and how it relates to understanding human disease. As a case study, we focus on the parasitic protists Giardia lamblia and Trichomonas vaginalis, where large evolutionary distance has meant difficulties in comparing ncRNAs with those from model eukaryotes. A combination of biological, computational, and sequencing approaches has enabled easier classification of ncRNA classes such as snoRNAs, but has also aided the identification of novel classes. It is hoped that a higher level of understanding of ncRNA expression and interaction may aid in the development of less harsh treatment for protist-based diseases. PMID:22303390
Wright, Imogen A; Travers, Simon A
2014-07-01
The challenge presented by high-throughput sequencing necessitates the development of novel tools for accurate alignment of reads to reference sequences. Current approaches focus on using heuristics to map reads quickly to large genomes, rather than generating highly accurate alignments in coding regions. Such approaches are, thus, unsuited for applications such as amplicon-based analysis and the realignment phase of exome sequencing and RNA-seq, where accurate and biologically relevant alignment of coding regions is critical. To facilitate such analyses, we have developed a novel tool, RAMICS, that is tailored to mapping large numbers of sequence reads to short lengths (<10 000 bp) of coding DNA. RAMICS utilizes profile hidden Markov models to discover the open reading frame of each sequence and aligns to the reference sequence in a biologically relevant manner, distinguishing between genuine codon-sized indels and frameshift mutations. This approach facilitates the generation of highly accurate alignments, accounting for the error biases of the sequencing machine used to generate reads, particularly at homopolymer regions. Performance improvements are gained through the use of graphics processing units, which increase the speed of mapping through parallelization. RAMICS substantially outperforms all other mapping approaches tested in terms of alignment quality while maintaining highly competitive speed performance. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.
Microfluidic single-cell whole-transcriptome sequencing.
Streets, Aaron M; Zhang, Xiannian; Cao, Chen; Pang, Yuhong; Wu, Xinglong; Xiong, Liang; Yang, Lu; Fu, Yusi; Zhao, Liang; Tang, Fuchou; Huang, Yanyi
2014-05-13
Single-cell whole-transcriptome analysis is a powerful tool for quantifying gene expression heterogeneity in populations of cells. Many techniques have, thus, been recently developed to perform transcriptome sequencing (RNA-Seq) on individual cells. To probe subtle biological variation between samples with limiting amounts of RNA, more precise and sensitive methods are still required. We adapted a previously developed strategy for single-cell RNA-Seq that has shown promise for superior sensitivity and implemented the chemistry in a microfluidic platform for single-cell whole-transcriptome analysis. In this approach, single cells are captured and lysed in a microfluidic device, where mRNAs with poly(A) tails are reverse-transcribed into cDNA. Double-stranded cDNA is then collected and sequenced using a next generation sequencing platform. We prepared 94 libraries consisting of single mouse embryonic cells and technical replicates of extracted RNA and thoroughly characterized the performance of this technology. Microfluidic implementation increased mRNA detection sensitivity as well as improved measurement precision compared with tube-based protocols. With 0.2 M reads per cell, we were able to reconstruct a majority of the bulk transcriptome with 10 single cells. We also quantified variation between and within different types of mouse embryonic cells and found that enhanced measurement precision, detection sensitivity, and experimental throughput aided the distinction between biological variability and technical noise. With this work, we validated the advantages of an early approach to single-cell RNA-Seq and showed that the benefits of combining microfluidic technology with high-throughput sequencing will be valuable for large-scale efforts in single-cell transcriptome analysis.
BioVLAB-MMIA-NGS: microRNA-mRNA integrated analysis using high-throughput sequencing data.
Chae, Heejoon; Rhee, Sungmin; Nephew, Kenneth P; Kim, Sun
2015-01-15
It is now well established that microRNAs (miRNAs) play a critical role in regulating gene expression in a sequence-specific manner, and genome-wide efforts are underway to predict known and novel miRNA targets. However, the integrated miRNA-mRNA analysis remains a major computational challenge, requiring powerful informatics systems and bioinformatics expertise. The objective of this study was to modify our widely recognized Web server for the integrated mRNA-miRNA analysis (MMIA) and its subsequent deployment on the Amazon cloud (BioVLAB-MMIA) to be compatible with high-throughput platforms, including next-generation sequencing (NGS) data (e.g. RNA-seq). We developed a new version called the BioVLAB-MMIA-NGS, deployed on both Amazon cloud and on a high-performance publicly available server called MAHA. By using NGS data and integrating various bioinformatics tools and databases, BioVLAB-MMIA-NGS offers several advantages. First, sequencing data is more accurate than array-based methods for determining miRNA expression levels. Second, potential novel miRNAs can be detected by using various computational methods for characterizing miRNAs. Third, because miRNA-mediated gene regulation is due to hybridization of an miRNA to its target mRNA, sequencing data can be used to identify many-to-many relationship between miRNAs and target genes with high accuracy. http://epigenomics.snu.ac.kr/biovlab_mmia_ngs/. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
High-throughput sequencing methods to study neuronal RNA-protein interactions.
Ule, Jernej
2009-12-01
UV-cross-linking and RNase protection, combined with high-throughput sequencing, have provided global maps of RNA sites bound by individual proteins or ribosomes. Using a stringent purification protocol, UV-CLIP (UV-cross-linking and immunoprecipitation) was able to identify intronic and exonic sites bound by splicing regulators in mouse brain tissue. Ribosome profiling has been used to quantify ribosome density on budding yeast mRNAs under different environmental conditions. Post-transcriptional regulation in neurons requires high spatial and temporal precision, as is evident from the role of localized translational control in synaptic plasticity. It remains to be seen if the high-throughput methods can be applied quantitatively to study the dynamics of RNP (ribonucleoprotein) remodelling in specific neuronal populations during the neurodegenerative process. It is certain, however, that applications of new biochemical techniques followed by high-throughput sequencing will continue to provide important insights into the mechanisms of neuronal post-transcriptional regulation.
Next-Generation Technologies for Multiomics Approaches Including Interactome Sequencing
Ohashi, Hiroyuki; Miyamoto-Sato, Etsuko
2015-01-01
The development of high-speed analytical techniques such as next-generation sequencing and microarrays allows high-throughput analysis of biological information at a low cost. These techniques contribute to medical and bioscience advancements and provide new avenues for scientific research. Here, we outline a variety of new innovative techniques and discuss their use in omics research (e.g., genomics, transcriptomics, metabolomics, proteomics, and interactomics). We also discuss the possible applications of these methods, including an interactome sequencing technology that we developed, in future medical and life science research. PMID:25649523
Hanriot, Lucie; Keime, Céline; Gay, Nadine; Faure, Claudine; Dossat, Carole; Wincker, Patrick; Scoté-Blachon, Céline; Peyron, Christelle; Gandrillon, Olivier
2008-01-01
Background "Open" transcriptome analysis methods allow to study gene expression without a priori knowledge of the transcript sequences. As of now, SAGE (Serial Analysis of Gene Expression), LongSAGE and MPSS (Massively Parallel Signature Sequencing) are the mostly used methods for "open" transcriptome analysis. Both LongSAGE and MPSS rely on the isolation of 21 pb tag sequences from each transcript. In contrast to LongSAGE, the high throughput sequencing method used in MPSS enables the rapid sequencing of very large libraries containing several millions of tags, allowing deep transcriptome analysis. However, a bias in the complexity of the transcriptome representation obtained by MPSS was recently uncovered. Results In order to make a deep analysis of mouse hypothalamus transcriptome avoiding the limitation introduced by MPSS, we combined LongSAGE with the Solexa sequencing technology and obtained a library of more than 11 millions of tags. We then compared it to a LongSAGE library of mouse hypothalamus sequenced with the Sanger method. Conclusion We found that Solexa sequencing technology combined with LongSAGE is perfectly suited for deep transcriptome analysis. In contrast to MPSS, it gives a complex representation of transcriptome as reliable as a LongSAGE library sequenced by the Sanger method. PMID:18796152
Lee, Seungeun; Yamamoto, Naomichi
2015-12-01
This study characterized the accuracy of high-throughput amplicon sequencing to identify species within the genus Aspergillus. To this end, we sequenced the internal transcribed spacer 1 (ITS1), β-tubulin (BenA), and calmodulin (CaM) gene encoding sequences as DNA markers from eight reference Aspergillus strains with known identities using 300-bp sequencing on the Illumina MiSeq platform, and compared them with the BLASTn outputs. The identifications with the sequences longer than 250 bp were accurate at the section rank, with some ambiguities observed at the species rank due to mostly cross detection of sibling species. Additionally, in silico analysis was performed to predict the identification accuracy for all species in the genus Aspergillus, where 107, 210, and 187 species were predicted to be identifiable down to the species rank based on ITS1, BenA, and CaM, respectively. Finally, air filter samples were analysed to quantify the relative abundances of Aspergillus species in outdoor air. The results were reproducible across biological duplicates both at the species and section ranks, but not strongly correlated between ITS1 and BenA, suggesting the Aspergillus detection can be taxonomically biased depending on the selection of the DNA markers and/or primers. Copyright © 2015 The British Mycological Society. Published by Elsevier Ltd. All rights reserved.
Alterations of microbiota in urine from women with interstitial cystitis
2012-01-01
Background Interstitial Cystitis (IC) is a chronic inflammatory condition of the bladder with unknown etiology. The aim of this study was to characterize the microbial community present in the urine from IC female patients by 454 high throughput sequencing of the 16S variable regions V1V2 and V6. The taxonomical composition, richness and diversity of the IC microbiota were determined and compared to the microbial profile of asymptomatic healthy female (HF) urine. Results The composition and distribution of bacterial sequences differed between the urine microbiota of IC patients and HFs. Reduced sequence richness and diversity were found in IC patient urine, and a significant difference in the community structure of IC urine in relation to HF urine was observed. More than 90% of the IC sequence reads were identified as belonging to the bacterial genus Lactobacillus, a marked increase compared to 60% in HF urine. Conclusion The 16S rDNA sequence data demonstrates a shift in the composition of the bacterial community in IC urine. The reduced microbial diversity and richness is accompanied by a higher abundance of the bacterial genus Lactobacillus, compared to HF urine. This study demonstrates that high throughput sequencing analysis of urine microbiota in IC patients is a powerful tool towards a better understanding of this enigmatic disease. PMID:22974186
A massive parallel sequencing workflow for diagnostic genetic testing of mismatch repair genes
Hansen, Maren F; Neckmann, Ulrike; Lavik, Liss A S; Vold, Trine; Gilde, Bodil; Toft, Ragnhild K; Sjursen, Wenche
2014-01-01
The purpose of this study was to develop a massive parallel sequencing (MPS) workflow for diagnostic analysis of mismatch repair (MMR) genes using the GS Junior system (Roche). A pathogenic variant in one of four MMR genes, (MLH1, PMS2, MSH6, and MSH2), is the cause of Lynch Syndrome (LS), which mainly predispose to colorectal cancer. We used an amplicon-based sequencing method allowing specific and preferential amplification of the MMR genes including PMS2, of which several pseudogenes exist. The amplicons were pooled at different ratios to obtain coverage uniformity and maximize the throughput of a single-GS Junior run. In total, 60 previously identified and distinct variants (substitutions and indels), were sequenced by MPS and successfully detected. The heterozygote detection range was from 19% to 63% and dependent on sequence context and coverage. We were able to distinguish between false-positive and true-positive calls in homopolymeric regions by cross-sample comparison and evaluation of flow signal distributions. In addition, we filtered variants according to a predefined status, which facilitated variant annotation. Our study shows that implementation of MPS in routine diagnostics of LS can accelerate sample throughput and reduce costs without compromising sensitivity, compared to Sanger sequencing. PMID:24689082
Sasagawa, Yohei; Danno, Hiroki; Takada, Hitomi; Ebisawa, Masashi; Tanaka, Kaori; Hayashi, Tetsutaro; Kurisaki, Akira; Nikaido, Itoshi
2018-03-09
High-throughput single-cell RNA-seq methods assign limited unique molecular identifier (UMI) counts as gene expression values to single cells from shallow sequence reads and detect limited gene counts. We thus developed a high-throughput single-cell RNA-seq method, Quartz-Seq2, to overcome these issues. Our improvements in the reaction steps make it possible to effectively convert initial reads to UMI counts, at a rate of 30-50%, and detect more genes. To demonstrate the power of Quartz-Seq2, we analyzed approximately 10,000 transcriptomes from in vitro embryonic stem cells and an in vivo stromal vascular fraction with a limited number of reads.
The main challenges that remain in applying high-throughput sequencing to clinical diagnostics.
Loeffelholz, Michael; Fofanov, Yuriy
2015-01-01
Over the last 10 years, the quality, price and availability of high-throughput sequencing instruments have improved to the point that this technology may be close to becoming a routine tool in the diagnostic microbiology laboratory. Two groups of challenges, however, have to be resolved in order to move this powerful research technology into routine use in the clinical microbiology laboratory. The computational/bioinformatics challenges include data storage cost and privacy concerns, requiring analysis to be performed without access to cloud storage or expensive computational infrastructure. The logistical challenges include interpretation of complex results and acceptance and understanding of the advantages and limitations of this technology by the medical community. This article focuses on the approaches to address these challenges, such as file formats, algorithms, data collection, reporting and good laboratory practices.
Mapping specificity landscapes of RNA-protein interactions by high throughput sequencing.
Jankowsky, Eckhard; Harris, Michael E
2017-04-15
To function in a biological setting, RNA binding proteins (RBPs) have to discriminate between alternative binding sites in RNAs. This discrimination can occur in the ground state of an RNA-protein binding reaction, in its transition state, or in both. The extent by which RBPs discriminate at these reaction states defines RBP specificity landscapes. Here, we describe the HiTS-Kin and HiTS-EQ techniques, which combine kinetic and equilibrium binding experiments with high throughput sequencing to quantitatively assess substrate discrimination for large numbers of substrate variants at ground and transition states of RNA-protein binding reactions. We discuss experimental design, practical considerations and data analysis and outline how a combination of HiTS-Kin and HiTS-EQ allows the mapping of RBP specificity landscapes. Copyright © 2017 Elsevier Inc. All rights reserved.
Du, Yushen; Wu, Nicholas C; Jiang, Lin; Zhang, Tianhao; Gong, Danyang; Shu, Sara; Wu, Ting-Ting; Sun, Ren
2016-11-01
Identification and annotation of functional residues are fundamental questions in protein sequence analysis. Sequence and structure conservation provides valuable information to tackle these questions. It is, however, limited by the incomplete sampling of sequence space in natural evolution. Moreover, proteins often have multiple functions, with overlapping sequences that present challenges to accurate annotation of the exact functions of individual residues by conservation-based methods. Using the influenza A virus PB1 protein as an example, we developed a method to systematically identify and annotate functional residues. We used saturation mutagenesis and high-throughput sequencing to measure the replication capacity of single nucleotide mutations across the entire PB1 protein. After predicting protein stability upon mutations, we identified functional PB1 residues that are essential for viral replication. To further annotate the functional residues important to the canonical or noncanonical functions of viral RNA-dependent RNA polymerase (vRdRp), we performed a homologous-structure analysis with 16 different vRdRp structures. We achieved high sensitivity in annotating the known canonical polymerase functional residues. Moreover, we identified a cluster of noncanonical functional residues located in the loop region of the PB1 β-ribbon. We further demonstrated that these residues were important for PB1 protein nuclear import through the interaction with Ran-binding protein 5. In summary, we developed a systematic and sensitive method to identify and annotate functional residues that are not restrained by sequence conservation. Importantly, this method is generally applicable to other proteins about which homologous-structure information is available. To fully comprehend the diverse functions of a protein, it is essential to understand the functionality of individual residues. Current methods are highly dependent on evolutionary sequence conservation, which is usually limited by sampling size. Sequence conservation-based methods are further confounded by structural constraints and multifunctionality of proteins. Here we present a method that can systematically identify and annotate functional residues of a given protein. We used a high-throughput functional profiling platform to identify essential residues. Coupling it with homologous-structure comparison, we were able to annotate multiple functions of proteins. We demonstrated the method with the PB1 protein of influenza A virus and identified novel functional residues in addition to its canonical function as an RNA-dependent RNA polymerase. Not limited to virology, this method is generally applicable to other proteins that can be functionally selected and about which homologous-structure information is available. Copyright © 2016 Du et al.
Tahir, Muhammad N; Lockhart, Ben; Grinstead, Samuel; Mollov, Dimitre
2017-04-01
Bermuda grass samples were examined by transmission electron microscopy and 28-30 nm spherical virus particles were observed. Total RNA from these plants was subjected to high-throughput sequencing (HTS). The nearly full genome sequence of a panicovirus was identified from one HTS scaffold. Sanger sequencing was used to confirm the HTS results and complete the genome sequence of 4404 nt. This virus was provisionally named Bermuda grass latent virus (BGLV). Its predicted open reading frames follow the typical arrangement of the genus Panicovirus. Based on sequence comparisons and phylogenetic analyses BGLV differs from other viruses and therefore taxonomically it is a new member of the genus Panicovirus, family Tombusviridae.
Xia, Li C; Ai, Dongmei; Cram, Jacob A; Liang, Xiaoyi; Fuhrman, Jed A; Sun, Fengzhu
2015-09-21
Local trend (i.e. shape) analysis of time series data reveals co-changing patterns in dynamics of biological systems. However, slow permutation procedures to evaluate the statistical significance of local trend scores have limited its applications to high-throughput time series data analysis, e.g., data from the next generation sequencing technology based studies. By extending the theories for the tail probability of the range of sum of Markovian random variables, we propose formulae for approximating the statistical significance of local trend scores. Using simulations and real data, we show that the approximate p-value is close to that obtained using a large number of permutations (starting at time points >20 with no delay and >30 with delay of at most three time steps) in that the non-zero decimals of the p-values obtained by the approximation and the permutations are mostly the same when the approximate p-value is less than 0.05. In addition, the approximate p-value is slightly larger than that based on permutations making hypothesis testing based on the approximate p-value conservative. The approximation enables efficient calculation of p-values for pairwise local trend analysis, making large scale all-versus-all comparisons possible. We also propose a hybrid approach by integrating the approximation and permutations to obtain accurate p-values for significantly associated pairs. We further demonstrate its use with the analysis of the Polymouth Marine Laboratory (PML) microbial community time series from high-throughput sequencing data and found interesting organism co-occurrence dynamic patterns. The software tool is integrated into the eLSA software package that now provides accelerated local trend and similarity analysis pipelines for time series data. The package is freely available from the eLSA website: http://bitbucket.org/charade/elsa.
Sources of PCR-induced distortions in high-throughput sequencing data sets
Kebschull, Justus M.; Zador, Anthony M.
2015-01-01
PCR permits the exponential and sequence-specific amplification of DNA, even from minute starting quantities. PCR is a fundamental step in preparing DNA samples for high-throughput sequencing. However, there are errors associated with PCR-mediated amplification. Here we examine the effects of four important sources of error—bias, stochasticity, template switches and polymerase errors—on sequence representation in low-input next-generation sequencing libraries. We designed a pool of diverse PCR amplicons with a defined structure, and then used Illumina sequencing to search for signatures of each process. We further developed quantitative models for each process, and compared predictions of these models to our experimental data. We find that PCR stochasticity is the major force skewing sequence representation after amplification of a pool of unique DNA amplicons. Polymerase errors become very common in later cycles of PCR but have little impact on the overall sequence distribution as they are confined to small copy numbers. PCR template switches are rare and confined to low copy numbers. Our results provide a theoretical basis for removing distortions from high-throughput sequencing data. In addition, our findings on PCR stochasticity will have particular relevance to quantification of results from single cell sequencing, in which sequences are represented by only one or a few molecules. PMID:26187991
A Polyglot Approach to Bioinformatics Data Integration: A Phylogenetic Analysis of HIV-1
Reisman, Steven; Hatzopoulos, Thomas; Läufer, Konstantin; Thiruvathukal, George K.; Putonti, Catherine
2016-01-01
As sequencing technologies continue to drop in price and increase in throughput, new challenges emerge for the management and accessibility of genomic sequence data. We have developed a pipeline for facilitating the storage, retrieval, and subsequent analysis of molecular data, integrating both sequence and metadata. Taking a polyglot approach involving multiple languages, libraries, and persistence mechanisms, sequence data can be aggregated from publicly available and local repositories. Data are exposed in the form of a RESTful web service, formatted for easy querying, and retrieved for downstream analyses. As a proof of concept, we have developed a resource for annotated HIV-1 sequences. Phylogenetic analyses were conducted for >6,000 HIV-1 sequences revealing spatial and temporal factors influence the evolution of the individual genes uniquely. Nevertheless, signatures of origin can be extrapolated even despite increased globalization. The approach developed here can easily be customized for any species of interest. PMID:26819543
Selected Insights from Application of Whole Genome Sequencing for Outbreak Investigations
Le, Vien Thi Minh; Diep, Binh An
2014-01-01
Purpose of review The advent of high-throughput whole genome sequencing has the potential to revolutionize the conduct of outbreak investigation. Because of its ultimate pathogen strain resolution, whole genome sequencing could augment traditional epidemiologic investigations of infectious disease outbreaks. Recent findings The combination of whole genome sequencing and intensive epidemiologic analysis provided new insights on the sources and transmission dynamics of large-scale epidemics caused by Escherichia coli and Vibrio cholerae, nosocomial outbreaks caused by methicillin-resistant Staphylococcus aureus, Klebsiella pneumonia, and Mycobacterium abscessus, community-centered outbreaks caused by Mycobacterium tuberculosis, and natural disaster-associated outbreak caused by environmentally acquired molds. Summary When combined with traditional epidemiologic investigation, whole genome sequencing has proven useful for elucidating sources and transmission dynamics of disease outbreaks. Development of a fully automated bioinformatics pipeline for analysis of whole genome sequence data is much needed to make this powerful tool more widely accessible. PMID:23856896
Pereiro, Patricia; Balseiro, Pablo; Romero, Alejandro; Dios, Sonia; Forn-Cuni, Gabriel; Fuste, Berta; Planas, Josep V.; Beltran, Sergi; Novoa, Beatriz; Figueras, Antonio
2012-01-01
Background Turbot (Scophthalmus maximus L.) is an important aquacultural resource both in Europe and Asia. However, there is little information on gene sequences available in public databases. Currently, one of the main problems affecting the culture of this flatfish is mortality due to several pathogens, especially viral diseases which are not treatable. In order to identify new genes involved in immune defense, we conducted 454-pyrosequencing of the turbot transcriptome after different immune stimulations. Methodology/Principal Findings Turbot were injected with viral stimuli to increase the expression level of immune-related genes. High-throughput deep sequencing using 454-pyrosequencing technology yielded 915,256 high-quality reads. These sequences were assembled into 55,404 contigs that were subjected to annotation steps. Intriguingly, 55.16% of the deduced protein was not significantly similar to any sequences in the databases used for the annotation and only 0.85% of the BLASTx top-hits matched S. maximus protein sequences. This relatively low level of annotation is possibly due to the limited information for this specie and other flatfish in the database. These results suggest the identification of a large number of new genes in turbot and in fish in general. A more detailed analysis showed the presence of putative members of several innate and specific immune pathways. Conclusions/Significance To our knowledge, this study is the first transcriptome analysis using 454-pyrosequencing for turbot. Previously, there were only 12,471 EST and less of 1,500 nucleotide sequences for S. maximus in NCBI database. Our results provide a rich source of data (55,404 contigs and 181,845 singletons) for discovering and identifying new genes, which will serve as a basis for microarray construction, gene expression characterization and for identification of genetic markers to be used in several applications. Immune stimulation in turbot was very effective, obtaining an enormous variety of sequences belonging to genes involved in the defense mechanisms. PMID:22629298
Evaluation of Sequencing Approaches for High-Throughput Transcriptomics - (BOSC)
Whole-genome in vitro transcriptomics has shown the capability to identify mechanisms of action and estimates of potency for chemical-mediated effects in a toxicological framework, but with limited throughput and high cost. The generation of high-throughput global gene expression...
Droege, Marcus; Hill, Brendon
2008-08-31
The Genome Sequencer FLX System (GS FLX), powered by 454 Sequencing, is a next-generation DNA sequencing technology featuring a unique mix of long reads, exceptional accuracy, and ultra-high throughput. It has been proven to be the most versatile of all currently available next-generation sequencing technologies, supporting many high-profile studies in over seven applications categories. GS FLX users have pursued innovative research in de novo sequencing, re-sequencing of whole genomes and target DNA regions, metagenomics, and RNA analysis. 454 Sequencing is a powerful tool for human genetics research, having recently re-sequenced the genome of an individual human, currently re-sequencing the complete human exome and targeted genomic regions using the NimbleGen sequence capture process, and detected low-frequency somatic mutations linked to cancer.
A computational genomics pipeline for prokaryotic sequencing projects.
Kislyuk, Andrey O; Katz, Lee S; Agrawal, Sonia; Hagen, Matthew S; Conley, Andrew B; Jayaraman, Pushkala; Nelakuditi, Viswateja; Humphrey, Jay C; Sammons, Scott A; Govil, Dhwani; Mair, Raydel D; Tatti, Kathleen M; Tondella, Maria L; Harcourt, Brian H; Mayer, Leonard W; Jordan, I King
2010-08-01
New sequencing technologies have accelerated research on prokaryotic genomes and have made genome sequencing operations outside major genome sequencing centers routine. However, no off-the-shelf solution exists for the combined assembly, gene prediction, genome annotation and data presentation necessary to interpret sequencing data. The resulting requirement to invest significant resources into custom informatics support for genome sequencing projects remains a major impediment to the accessibility of high-throughput sequence data. We present a self-contained, automated high-throughput open source genome sequencing and computational genomics pipeline suitable for prokaryotic sequencing projects. The pipeline has been used at the Georgia Institute of Technology and the Centers for Disease Control and Prevention for the analysis of Neisseria meningitidis and Bordetella bronchiseptica genomes. The pipeline is capable of enhanced or manually assisted reference-based assembly using multiple assemblers and modes; gene predictor combining; and functional annotation of genes and gene products. Because every component of the pipeline is executed on a local machine with no need to access resources over the Internet, the pipeline is suitable for projects of a sensitive nature. Annotation of virulence-related features makes the pipeline particularly useful for projects working with pathogenic prokaryotes. The pipeline is licensed under the open-source GNU General Public License and available at the Georgia Tech Neisseria Base (http://nbase.biology.gatech.edu/). The pipeline is implemented with a combination of Perl, Bourne Shell and MySQL and is compatible with Linux and other Unix systems.
2011-01-01
Background The generation and analysis of high-throughput sequencing data are becoming a major component of many studies in molecular biology and medical research. Illumina's Genome Analyzer (GA) and HiSeq instruments are currently the most widely used sequencing devices. Here, we comprehensively evaluate properties of genomic HiSeq and GAIIx data derived from two plant genomes and one virus, with read lengths of 95 to 150 bases. Results We provide quantifications and evidence for GC bias, error rates, error sequence context, effects of quality filtering, and the reliability of quality values. By combining different filtering criteria we reduced error rates 7-fold at the expense of discarding 12.5% of alignable bases. While overall error rates are low in HiSeq data we observed regions of accumulated wrong base calls. Only 3% of all error positions accounted for 24.7% of all substitution errors. Analyzing the forward and reverse strands separately revealed error rates of up to 18.7%. Insertions and deletions occurred at very low rates on average but increased to up to 2% in homopolymers. A positive correlation between read coverage and GC content was found depending on the GC content range. Conclusions The errors and biases we report have implications for the use and the interpretation of Illumina sequencing data. GAIIx and HiSeq data sets show slightly different error profiles. Quality filtering is essential to minimize downstream analysis artifacts. Supporting previous recommendations, the strand-specificity provides a criterion to distinguish sequencing errors from low abundance polymorphisms. PMID:22067484
USDA-ARS?s Scientific Manuscript database
Determinate cucumber is characterized with short vines, fewer nodes, and terminal flowers, which is a useful plant architecture for cucumbers in certain production systems. The genetic and molecular mechanisms of determinate growth habit is not well understood. In addition, environmental factors als...
Transcriptome analysis of Pseudomonas syringae identifies new genes, ncRNAs, and antisense activity
USDA-ARS?s Scientific Manuscript database
To fully understand how bacteria respond to their environment, it is essential to assess genome-wide transcriptional activity. New high throughput sequencing technologies make it possible to query the transcriptome of an organism in an efficient unbiased manner. We applied a strand-specific method t...
USDA-ARS?s Scientific Manuscript database
Antimicrobial resistance in pathogenic bacteria is a major concern in human and animal health. The National Antimicrobial Resistance Monitoring System (NARMS) was designed by the CDC, FDA, and USDA to monitor antimicrobial resistance in the U.S. The Bacterial Epidemiology and Antimicrobial Resistanc...
Yusuf, Noor Hydayaty Md; Ong, Wen Dee; Redwan, Raimi Mohamed; Latip, Mariam Abd; Kumar, S Vijay
2015-10-15
MicroRNAs (miRNAs) are a class of small, endogenous non-coding RNAs that negatively regulate gene expression, resulting in the silencing of target mRNA transcripts through mRNA cleavage or translational inhibition. MiRNAs play significant roles in various biological and physiological processes in plants. However, the miRNA-mediated gene regulatory network in pineapple, the model tropical non-climacteric fruit, remains largely unexplored. Here, we report a complete list of pineapple mature miRNAs obtained from high-throughput small RNA sequencing and precursor miRNAs (pre-miRNAs) obtained from ESTs. Two small RNA libraries were constructed from pineapple fruits and leaves, respectively, using Illumina's Solexa technology. Sequence similarity analysis using miRBase revealed 579,179 reads homologous to 153 miRNAs from 41 miRNA families. In addition, a pineapple fruit transcriptome library consisting of approximately 30,000 EST contigs constructed using Solexa sequencing was used for the discovery of pre-miRNAs. In all, four pre-miRNAs were identified (MIR156, MIR399, MIR444 and MIR2673). Furthermore, the same pineapple transcriptome was used to dissect the function of the miRNAs in pineapple by predicting their putative targets in conjunction with their regulatory networks. In total, 23 metabolic pathways were found to be regulated by miRNAs in pineapple. The use of high-throughput sequencing in pineapples to unveil the presence of miRNAs and their regulatory pathways provides insight into the repertoire of miRNA regulation used exclusively in this non-climacteric model plant. Copyright © 2015 Elsevier B.V. All rights reserved.
Gerrard, Gareth; Valgañón, Mikel; Foong, Hui En; Kasperaviciute, Dalia; Iskander, Deena; Game, Laurence; Müller, Michael; Aitman, Timothy J; Roberts, Irene; de la Fuente, Josu; Foroni, Letizia; Karadimitris, Anastasios
2013-08-01
Diamond-Blackfan anaemia (DBA) is caused by inactivating mutations in ribosomal protein (RP) genes, with mutations in 13 of the 80 RP genes accounting for 50-60% of cases. The remaining 40-50% cases may harbour mutations in one of the remaining RP genes, but the very low frequencies render conventional genetic screening as challenging. We, therefore, applied custom enrichment technology combined with high-throughput sequencing to screen all 80 RP genes. Using this approach, we identified and validated inactivating mutations in 15/17 (88%) DBA patients. Target enrichment combined with high-throughput sequencing is a robust and improved methodology for the genetic diagnosis of DBA. © 2013 John Wiley & Sons Ltd.
Wang, Xixian; Ren, Lihui; Su, Yetian; Ji, Yuetong; Liu, Yaoping; Li, Chunyu; Li, Xunrong; Zhang, Yi; Wang, Wei; Hu, Qiang; Han, Danxiang; Xu, Jian; Ma, Bo
2017-11-21
Raman-activated cell sorting (RACS) has attracted increasing interest, yet throughput remains one major factor limiting its broader application. Here we present an integrated Raman-activated droplet sorting (RADS) microfluidic system for functional screening of live cells in a label-free and high-throughput manner, by employing AXT-synthetic industrial microalga Haematococcus pluvialis (H. pluvialis) as a model. Raman microspectroscopy analysis of individual cells is carried out prior to their microdroplet encapsulation, which is then directly coupled to DEP-based droplet sorting. To validate the system, H. pluvialis cells containing different levels of AXT were mixed and underwent RADS. Those AXT-hyperproducing cells were sorted with an accuracy of 98.3%, an enrichment ratio of eight folds, and a throughput of ∼260 cells/min. Of the RADS-sorted cells, 92.7% remained alive and able to proliferate, which is equivalent to the unsorted cells. Thus, the RADS achieves a much higher throughput than existing RACS systems, preserves the vitality of cells, and facilitates seamless coupling with downstream manipulations such as single-cell sequencing and cultivation.
High-throughput physical mapping of chromosomes using automated in situ hybridization.
George, Phillip; Sharakhova, Maria V; Sharakhov, Igor V
2012-06-28
Projects to obtain whole-genome sequences for 10,000 vertebrate species and for 5,000 insect and related arthropod species are expected to take place over the next 5 years. For example, the sequencing of the genomes for 15 malaria mosquitospecies is currently being done using an Illumina platform. This Anopheles species cluster includes both vectors and non-vectors of malaria. When the genome assemblies become available, researchers will have the unique opportunity to perform comparative analysis for inferring evolutionary changes relevant to vector ability. However, it has proven difficult to use next-generation sequencing reads to generate high-quality de novo genome assemblies. Moreover, the existing genome assemblies for Anopheles gambiae, although obtained using the Sanger method, are gapped or fragmented. Success of comparative genomic analyses will be limited if researchers deal with numerous sequencing contigs, rather than with chromosome-based genome assemblies. Fragmented, unmapped sequences create problems for genomic analyses because: (i) unidentified gaps cause incorrect or incomplete annotation of genomic sequences; (ii) unmapped sequences lead to confusion between paralogous genes and genes from different haplotypes; and (iii) the lack of chromosome assignment and orientation of the sequencing contigs does not allow for reconstructing rearrangement phylogeny and studying chromosome evolution. Developing high-resolution physical maps for species with newly sequenced genomes is a timely and cost-effective investment that will facilitate genome annotation, evolutionary analysis, and re-sequencing of individual genomes from natural populations. Here, we present innovative approaches to chromosome preparation, fluorescent in situ hybridization (FISH), and imaging that facilitate rapid development of physical maps. Using An. gambiae as an example, we demonstrate that the development of physical chromosome maps can potentially improve genome assemblies and, thus, the quality of genomic analyses. First, we use a high-pressure method to prepare polytene chromosome spreads. This method, originally developed for Drosophila, allows the user to visualize more details on chromosomes than the regular squashing technique. Second, a fully automated, front-end system for FISH is used for high-throughput physical genome mapping. The automated slide staining system runs multiple assays simultaneously and dramatically reduces hands-on time. Third, an automatic fluorescent imaging system, which includes a motorized slide stage, automatically scans and photographs labeled chromosomes after FISH. This system is especially useful for identifying and visualizing multiple chromosomal plates on the same slide. In addition, the scanning process captures a more uniform FISH result. Overall, the automated high-throughput physical mapping protocol is more efficient than a standard manual protocol.
History, applications, and challenges of immune repertoire research.
Liu, Xiao; Wu, Jinghua
2018-02-27
The diversity of T and B cells in terms of their receptor sequences is huge in the vertebrate's immune system and provides broad protection against the vast diversity of pathogens. Immune repertoire is defined as the sum of T cell receptors and B cell receptors (also named immunoglobulin) that makes the organism's adaptive immune system. Before the emergence of high-throughput sequencing, the studies on immune repertoire were limited by the underdeveloped methodologies, since it was impossible to capture the whole picture by the low-throughput tools. The massive paralleled sequencing technology suits perfectly the researches on immune repertoire. In this article, we review the history of immune repertoire studies, in terms of technologies and research applications. Particularly, we discuss several aspects of challenges in this field and highlight the efforts to develop potential solutions, in the era of high-throughput sequencing of the immune repertoire.
Research progress of plant population genomics based on high-throughput sequencing.
Wang, Yun-sheng
2016-08-01
Population genomics, a new paradigm for population genetics, combine the concepts and techniques of genomics with the theoretical system of population genetics and improve our understanding of microevolution through identification of site-specific effect and genome-wide effects using genome-wide polymorphic sites genotypeing. With the appearance and improvement of the next generation high-throughput sequencing technology, the numbers of plant species with complete genome sequences increased rapidly and large scale resequencing has also been carried out in recent years. Parallel sequencing has also been done in some plant species without complete genome sequences. These studies have greatly promoted the development of population genomics and deepened our understanding of the genetic diversity, level of linking disequilibium, selection effect, demographical history and molecular mechanism of complex traits of relevant plant population at a genomic level. In this review, I briely introduced the concept and research methods of population genomics and summarized the research progress of plant population genomics based on high-throughput sequencing. I also discussed the prospect as well as existing problems of plant population genomics in order to provide references for related studies.
Zhu, Qihui; Smith, Shavannor M; Ayele, Mulu; Yang, Lixing; Jogi, Ansuya; Chaluvadi, Srinivasa R; Bennetzen, Jeffrey L
2012-11-01
Tef (Eragrostis tef) is a major cereal crop in Ethiopia. Lodging is the primary constraint to increasing productivity in this allotetraploid species, accounting for losses of ∼15-45% in yield each year. As a first step toward identifying semi-dwarf varieties that might have improved lodging resistance, an ∼6× fosmid library was constructed and used to identify both homeologues of the dw3 semi-dwarfing gene of Sorghum bicolor. An EMS mutagenized population, consisting of ∼21,210 tef plants, was planted and leaf materials were collected into 23 superpools. Two dwarfing candidate genes, homeologues of dw3 of sorghum and rht1 of wheat, were sequenced directly from each superpool with 454 technology, and 120 candidate mutations were identified. Out of 10 candidates tested, six independent mutations were validated by Sanger sequencing, including two predicted detrimental mutations in both dw3 homeologues with a potential to improve lodging resistance in tef through further breeding. This study demonstrates that high-throughput sequencing can identify potentially valuable mutations in under-studied plant species like tef and has provided mutant lines that can now be combined and tested in breeding programs for improved lodging resistance.
Morelli, M; Chiumenti, M; De Stradis, A; La Notte, P; Minafra, A
2015-02-01
Through the application of next generation sequencing, in synergy with conventional cloning of DOP-PCR fragments, two double-stranded RNA (dsRNA) molecules of about 1.5 kbp in size were isolated from leaf tissue of a Japanese persimmon (accession SSPI) from Apulia (southern Italy) showing veinlets necrosis. High-throughput sequencing allowed whole genome sequence assembly, yielding a 1,577 and a 1,491 bp contigs identified as dsRNA-1 and dsRNA-2 of a previously undescribed virus, provisionally named as Persimmon cryptic virus (PeCV). In silico analysis showed that both dsRNA fragments were monocistronic and comprised the RNA-dependent RNA polymerase (RdRp) and the capsid protein (CP) genes, respectively. Phylogenetic reconstruction revealed a close relationship of these dsRNAs with those of cryptoviruses described in woody and herbaceous hosts, recently gathered in genus Deltapartitivirus. Virus-specific primers for RT-PCR, designed in the CP cistron, detected viral RNAs also in symptomless persimmon trees sampled from the same geographical area of SSPI, thus proving that PeCV infection may be fairly common and presumably latent.
Zhou, Jizhong; He, Zhili; Yang, Yunfeng; Deng, Ye; Tringe, Susannah G; Alvarez-Cohen, Lisa
2015-01-27
Understanding the structure, functions, activities and dynamics of microbial communities in natural environments is one of the grand challenges of 21st century science. To address this challenge, over the past decade, numerous technologies have been developed for interrogating microbial communities, of which some are amenable to exploratory work (e.g., high-throughput sequencing and phenotypic screening) and others depend on reference genes or genomes (e.g., phylogenetic and functional gene arrays). Here, we provide a critical review and synthesis of the most commonly applied "open-format" and "closed-format" detection technologies. We discuss their characteristics, advantages, and disadvantages within the context of environmental applications and focus on analysis of complex microbial systems, such as those in soils, in which diversity is high and reference genomes are few. In addition, we discuss crucial issues and considerations associated with applying complementary high-throughput molecular technologies to address important ecological questions. Copyright © 2015 Zhou et al.
He, Zhili; Yang, Yunfeng; Deng, Ye; Tringe, Susannah G.; Alvarez-Cohen, Lisa
2015-01-01
ABSTRACT Understanding the structure, functions, activities and dynamics of microbial communities in natural environments is one of the grand challenges of 21st century science. To address this challenge, over the past decade, numerous technologies have been developed for interrogating microbial communities, of which some are amenable to exploratory work (e.g., high-throughput sequencing and phenotypic screening) and others depend on reference genes or genomes (e.g., phylogenetic and functional gene arrays). Here, we provide a critical review and synthesis of the most commonly applied “open-format” and “closed-format” detection technologies. We discuss their characteristics, advantages, and disadvantages within the context of environmental applications and focus on analysis of complex microbial systems, such as those in soils, in which diversity is high and reference genomes are few. In addition, we discuss crucial issues and considerations associated with applying complementary high-throughput molecular technologies to address important ecological questions. PMID:25626903
Zhou, Jizhong; He, Zhili; Yang, Yunfeng; ...
2015-01-27
Understanding the structure, functions, activities and dynamics of microbial communities in natural environments is one of the grand challenges of 21st century science. To address this challenge, over the past decade, numerous technologies have been developed for interrogating microbial communities, of which some are amenable to exploratory work (e.g., high-throughput sequencing and phenotypic screening) and others depend on reference genes or genomes (e.g., phylogenetic and functional gene arrays). Here, we provide a critical review and synthesis of the most commonly applied “open-format” and “closed-format” detection technologies. We discuss their characteristics, advantages, and disadvantages within the context of environmental applications andmore » focus on analysis of complex microbial systems, such as those in soils, in which diversity is high and reference genomes are few. In addition, we discuss crucial issues and considerations associated with applying complementary high-throughput molecular technologies to address important ecological questions.« less
Bartram, Jack; Mountjoy, Edward; Brooks, Tony; Hancock, Jeremy; Williamson, Helen; Wright, Gary; Moppett, John; Goulden, Nick; Hubank, Mike
2016-07-01
High-throughput sequencing (HTS) (next-generation sequencing) of the rearranged Ig and T-cell receptor genes promises to be less expensive and more sensitive than current methods of monitoring minimal residual disease (MRD) in patients with acute lymphoblastic leukemia. However, the adoption of new approaches by clinical laboratories requires careful evaluation of all potential sources of error and the development of strategies to ensure the highest accuracy. Timely and efficient clinical use of HTS platforms will depend on combining multiple samples (multiplexing) in each sequencing run. Here we examine the Ig heavy-chain gene HTS on the Illumina MiSeq platform for MRD. We identify errors associated with multiplexing that could potentially impact the accuracy of MRD analysis. We optimize a strategy that combines high-purity, sequence-optimized oligonucleotides, dual indexing, and an error-aware demultiplexing approach to minimize errors and maximize sensitivity. We present a probability-based, demultiplexing pipeline Error-Aware Demultiplexer that is suitable for all MiSeq strategies and accurately assigns samples to the correct identifier without excessive loss of data. Finally, using controls quantified by digital PCR, we show that HTS-MRD can accurately detect as few as 1 in 10(6) copies of specific leukemic MRD. Crown Copyright © 2016. Published by Elsevier Inc. All rights reserved.
Synthetic spike-in standards for high-throughput 16S rRNA gene amplicon sequencing
Tourlousse, Dieter M.; Yoshiike, Satowa; Ohashi, Akiko; Matsukura, Satoko; Noda, Naohiro
2017-01-01
Abstract High-throughput sequencing of 16S rRNA gene amplicons (16S-seq) has become a widely deployed method for profiling complex microbial communities but technical pitfalls related to data reliability and quantification remain to be fully addressed. In this work, we have developed and implemented a set of synthetic 16S rRNA genes to serve as universal spike-in standards for 16S-seq experiments. The spike-ins represent full-length 16S rRNA genes containing artificial variable regions with negligible identity to known nucleotide sequences, permitting unambiguous identification of spike-in sequences in 16S-seq read data from any microbiome sample. Using defined mock communities and environmental microbiota, we characterized the performance of the spike-in standards and demonstrated their utility for evaluating data quality on a per-sample basis. Further, we showed that staggered spike-in mixtures added at the point of DNA extraction enable concurrent estimation of absolute microbial abundances suitable for comparative analysis. Results also underscored that template-specific Illumina sequencing artifacts may lead to biases in the perceived abundance of certain taxa. Taken together, the spike-in standards represent a novel bioanalytical tool that can substantially improve 16S-seq-based microbiome studies by enabling comprehensive quality control along with absolute quantification. PMID:27980100
Metagenomic Assembly: Overview, Challenges and Applications
Ghurye, Jay S.; Cepeda-Espinoza, Victoria; Pop, Mihai
2016-01-01
Advances in sequencing technologies have led to the increased use of high throughput sequencing in characterizing the microbial communities associated with our bodies and our environment. Critical to the analysis of the resulting data are sequence assembly algorithms able to reconstruct genes and organisms from complex mixtures. Metagenomic assembly involves new computational challenges due to the specific characteristics of the metagenomic data. In this survey, we focus on major algorithmic approaches for genome and metagenome assembly, and discuss the new challenges and opportunities afforded by this new field. We also review several applications of metagenome assembly in addressing interesting biological problems. PMID:27698619
kpLogo: positional k-mer analysis reveals hidden specificity in biological sequences
2017-01-01
Abstract Motifs of only 1–4 letters can play important roles when present at key locations within macromolecules. Because existing motif-discovery tools typically miss these position-specific short motifs, we developed kpLogo, a probability-based logo tool for integrated detection and visualization of position-specific ultra-short motifs from a set of aligned sequences. kpLogo also overcomes the limitations of conventional motif-visualization tools in handling positional interdependencies and utilizing ranked or weighted sequences increasingly available from high-throughput assays. kpLogo can be found at http://kplogo.wi.mit.edu/. PMID:28460012
Kwon, Andrew T.; Arenillas, David J.; Hunt, Rebecca Worsley; Wasserman, Wyeth W.
2012-01-01
oPOSSUM-3 is a web-accessible software system for identification of over-represented transcription factor binding sites (TFBS) and TFBS families in either DNA sequences of co-expressed genes or sequences generated from high-throughput methods, such as ChIP-Seq. Validation of the system with known sets of co-regulated genes and published ChIP-Seq data demonstrates the capacity for oPOSSUM-3 to identify mediating transcription factors (TF) for co-regulated genes or co-recovered sequences. oPOSSUM-3 is available at http://opossum.cisreg.ca. PMID:22973536
Kwon, Andrew T; Arenillas, David J; Worsley Hunt, Rebecca; Wasserman, Wyeth W
2012-09-01
oPOSSUM-3 is a web-accessible software system for identification of over-represented transcription factor binding sites (TFBS) and TFBS families in either DNA sequences of co-expressed genes or sequences generated from high-throughput methods, such as ChIP-Seq. Validation of the system with known sets of co-regulated genes and published ChIP-Seq data demonstrates the capacity for oPOSSUM-3 to identify mediating transcription factors (TF) for co-regulated genes or co-recovered sequences. oPOSSUM-3 is available at http://opossum.cisreg.ca.
CisSERS: Customizable in silico sequence evaluation for restriction sites
Sharpe, Richard M.; Koepke, Tyson; Harper, Artemus; ...
2016-04-12
High-throughput sequencing continues to produce an immense volume of information that is processed and assembled into mature sequence data. Here, data analysis tools are urgently needed that leverage the embedded DNA sequence polymorphisms and consequent changes to restriction sites or sequence motifs in a high-throughput manner to enable biological experimentation. CisSERS was developed as a standalone open source tool to analyze sequence datasets and provide biologists with individual or comparative genome organization information in terms of presence and frequency of patterns or motifs such as restriction enzymes. Predicted agarose gel visualization of the custom analyses results was also integrated tomore » enhance the usefulness of the software. CisSERS offers several novel functionalities, such as handling of large and multiple datasets in parallel, multiple restriction enzyme site detection and custom motif detection features, which are seamlessly integrated with real time agarose gel visualization. Using a simple fasta-formatted file as input, CisSERS utilizes the REBASE enzyme database. Results from CisSERSenable the user to make decisions for designing genotyping by sequencing experiments, reduced representation sequencing, 3’UTR sequencing, and cleaved amplified polymorphic sequence (CAPS) molecular markers for large sample sets. CisSERS is a java based graphical user interface built around a perl backbone. Several of the applications of CisSERS including CAPS molecular marker development were successfully validated using wet-lab experimentation. Here, we present the tool CisSERSand results from in-silico and corresponding wet-lab analyses demonstrating that CisSERS is a technology platform solution that facilitates efficient data utilization in genomics and genetics studies.« less
CisSERS: Customizable in silico sequence evaluation for restriction sites
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sharpe, Richard M.; Koepke, Tyson; Harper, Artemus
High-throughput sequencing continues to produce an immense volume of information that is processed and assembled into mature sequence data. Here, data analysis tools are urgently needed that leverage the embedded DNA sequence polymorphisms and consequent changes to restriction sites or sequence motifs in a high-throughput manner to enable biological experimentation. CisSERS was developed as a standalone open source tool to analyze sequence datasets and provide biologists with individual or comparative genome organization information in terms of presence and frequency of patterns or motifs such as restriction enzymes. Predicted agarose gel visualization of the custom analyses results was also integrated tomore » enhance the usefulness of the software. CisSERS offers several novel functionalities, such as handling of large and multiple datasets in parallel, multiple restriction enzyme site detection and custom motif detection features, which are seamlessly integrated with real time agarose gel visualization. Using a simple fasta-formatted file as input, CisSERS utilizes the REBASE enzyme database. Results from CisSERSenable the user to make decisions for designing genotyping by sequencing experiments, reduced representation sequencing, 3’UTR sequencing, and cleaved amplified polymorphic sequence (CAPS) molecular markers for large sample sets. CisSERS is a java based graphical user interface built around a perl backbone. Several of the applications of CisSERS including CAPS molecular marker development were successfully validated using wet-lab experimentation. Here, we present the tool CisSERSand results from in-silico and corresponding wet-lab analyses demonstrating that CisSERS is a technology platform solution that facilitates efficient data utilization in genomics and genetics studies.« less
USDA-ARS?s Scientific Manuscript database
The effect of refrigeration on bacterial communities within raw and pasteurized buffalo milk was studied using high-throughput sequencing. High quality samples of raw buffalo milk were obtained from five dairy farms in the Guangxi province of China. A sample of each milk was pasteurized, and both r...
D. Lee Taylor; Michael G. Booth; Jack W. McFarland; Ian C. Herriott; Niall J. Lennon; Chad Nusbaum; Thomas G. Marr
2008-01-01
High throughput sequencing methods are widely used in analyses of microbial diversity but are generally applied to small numbers of samples, which precludes charaterization of patterns of microbial diversity across space and time. We have designed a primer-tagging approach that allows pooling and subsequent sorting of numerous samples, which is directed to...
Campanaro, Stefano; Treu, Laura; Kougias, Panagiotis G; De Francisci, Davide; Valle, Giorgio; Angelidaki, Irini
2016-01-01
Biogas production is an economically attractive technology that has gained momentum worldwide over the past years. Biogas is produced by a biologically mediated process, widely known as "anaerobic digestion." This process is performed by a specialized and complex microbial community, in which different members have distinct roles in the establishment of a collective organization. Deciphering the complex microbial community engaged in this process is interesting both for unraveling the network of bacterial interactions and for applicability potential to the derived knowledge. In this study, we dissect the bioma involved in anaerobic digestion by means of high throughput Illumina sequencing (~51 gigabases of sequence data), disclosing nearly one million genes and extracting 106 microbial genomes by a novel strategy combining two binning processes. Microbial phylogeny and putative taxonomy performed using >400 proteins revealed that the biogas community is a trove of new species. A new approach based on functional properties as per network representation was developed to assign roles to the microbial species. The organization of the anaerobic digestion microbiome is resembled by a funnel concept, in which the microbial consortium presents a progressive functional specialization while reaching the final step of the process (i.e., methanogenesis). Key microbial genomes encoding enzymes involved in specific metabolic pathways, such as carbohydrates utilization, fatty acids degradation, amino acids fermentation, and syntrophic acetate oxidation, were identified. Additionally, the analysis identified a new uncultured archaeon that was putatively related to Methanomassiliicoccales but surprisingly having a methylotrophic methanogenic pathway. This study is a pioneer research on the phylogenetic and functional characterization of the microbial community populating biogas reactors. By applying for the first time high-throughput sequencing and a novel binning strategy, the identified genes were anchored to single genomes providing a clear understanding of their metabolic pathways and highlighting their involvement in anaerobic digestion. The overall research established a reference catalog of biogas microbial genomes that will greatly simplify future genomic studies.
Optimization and quality control of genome-wide Hi-C library preparation.
Zhang, Xiang-Yuan; He, Chao; Ye, Bing-Yu; Xie, De-Jian; Shi, Ming-Lei; Zhang, Yan; Shen, Wen-Long; Li, Ping; Zhao, Zhi-Hu
2017-09-20
Highest-throughput chromosome conformation capture (Hi-C) is one of the key assays for genome- wide chromatin interaction studies. It is a time-consuming process that involves many steps and many different kinds of reagents, consumables, and equipments. At present, the reproducibility is unsatisfactory. By optimizing the key steps of the Hi-C experiment, such as crosslinking, pretreatment of digestion, inactivation of restriction enzyme, and in situ ligation etc., we established a robust Hi-C procedure and prepared two biological replicates of Hi-C libraries from the GM12878 cells. After preliminary quality control by Sanger sequencing, the two replicates were high-throughput sequenced. The bioinformatics analysis of the raw sequencing data revealed the mapping-ability and pair-mate rate of the raw data were around 90% and 72%, respectively. Additionally, after removal of self-circular ligations and dangling-end products, more than 96% of the valid pairs were reached. Genome-wide interactome profiling shows clear topological associated domains (TADs), which is consistent with previous reports. Further correlation analysis showed that the two biological replicates strongly correlate with each other in terms of both bin coverage and all bin pairs. All these results indicated that the optimized Hi-C procedure is robust and stable, which will be very helpful for the wide applications of the Hi-C assay.
Single Nucleobase Identification Using Biophysical Signatures from Nanoelectronic Quantum Tunneling.
Korshoj, Lee E; Afsari, Sepideh; Khan, Sajida; Chatterjee, Anushree; Nagpal, Prashant
2017-03-01
Nanoelectronic DNA sequencing can provide an important alternative to sequencing-by-synthesis by reducing sample preparation time, cost, and complexity as a high-throughput next-generation technique with accurate single-molecule identification. However, sample noise and signature overlap continue to prevent high-resolution and accurate sequencing results. Probing the molecular orbitals of chemically distinct DNA nucleobases offers a path for facile sequence identification, but molecular entropy (from nucleotide conformations) makes such identification difficult when relying only on the energies of lowest-unoccupied and highest-occupied molecular orbitals (LUMO and HOMO). Here, nine biophysical parameters are developed to better characterize molecular orbitals of individual nucleobases, intended for single-molecule DNA sequencing using quantum tunneling of charges. For this analysis, theoretical models for quantum tunneling are combined with transition voltage spectroscopy to obtain measurable parameters unique to the molecule within an electronic junction. Scanning tunneling spectroscopy is then used to measure these nine biophysical parameters for DNA nucleotides, and a modified machine learning algorithm identified nucleobases. The new parameters significantly improve base calling over merely using LUMO and HOMO frontier orbital energies. Furthermore, high accuracies for identifying DNA nucleobases were observed at different pH conditions. These results have significant implications for developing a robust and accurate high-throughput nanoelectronic DNA sequencing technique. © 2017 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Single-cell genome sequencing at ultra-high-throughput with microfluidic droplet barcoding.
Lan, Freeman; Demaree, Benjamin; Ahmed, Noorsher; Abate, Adam R
2017-07-01
The application of single-cell genome sequencing to large cell populations has been hindered by technical challenges in isolating single cells during genome preparation. Here we present single-cell genomic sequencing (SiC-seq), which uses droplet microfluidics to isolate, fragment, and barcode the genomes of single cells, followed by Illumina sequencing of pooled DNA. We demonstrate ultra-high-throughput sequencing of >50,000 cells per run in a synthetic community of Gram-negative and Gram-positive bacteria and fungi. The sequenced genomes can be sorted in silico based on characteristic sequences. We use this approach to analyze the distributions of antibiotic-resistance genes, virulence factors, and phage sequences in microbial communities from an environmental sample. The ability to routinely sequence large populations of single cells will enable the de-convolution of genetic heterogeneity in diverse cell populations.
Application of resequencing to rice genomics, functional genomics and evolutionary analysis
2014-01-01
Rice is a model system used for crop genomics studies. The completion of the rice genome draft sequences in 2002 not only accelerated functional genome studies, but also initiated a new era of resequencing rice genomes. Based on the reference genome in rice, next-generation sequencing (NGS) using the high-throughput sequencing system can efficiently accomplish whole genome resequencing of various genetic populations and diverse germplasm resources. Resequencing technology has been effectively utilized in evolutionary analysis, rice genomics and functional genomics studies. This technique is beneficial for both bridging the knowledge gap between genotype and phenotype and facilitating molecular breeding via gene design in rice. Here, we also discuss the limitation, application and future prospects of rice resequencing. PMID:25006357
Salson, Mikaël; Giraud, Mathieu; Caillault, Aurélie; Grardel, Nathalie; Duployez, Nicolas; Ferret, Yann; Duez, Marc; Herbert, Ryan; Rocher, Tatiana; Sebda, Shéhérazade; Quief, Sabine; Villenet, Céline; Figeac, Martin; Preudhomme, Claude
2017-02-01
Minimal residual disease (MRD) is known to be an independent prognostic factor in patients with acute lymphoblastic leukemia (ALL). High-throughput sequencing (HTS) is currently used in routine practice for the diagnosis and follow-up of patients with hematological neoplasms. In this retrospective study, we examined the role of immunoglobulin/T-cell receptor-based MRD in patients with ALL by HTS analysis of immunoglobulin H and/or T-cell receptor gamma chain loci in bone marrow samples from 11 patients with ALL, at diagnosis and during follow-up. We assessed the clinical feasibility of using combined HTS and bioinformatics analysis with interactive visualization using Vidjil software. We discuss the advantages and drawbacks of HTS for monitoring MRD. HTS gives a more complete insight of the leukemic population than conventional real-time quantitative PCR (qPCR), and allows identification of new emerging clones at each time point of the monitoring. Thus, HTS monitoring of Ig/TR based MRD is expected to improve the management of patients with ALL. Copyright © 2016 Elsevier Ltd. All rights reserved.
Beckers, Matthew; Mohorianu, Irina; Stocks, Matthew; Applegate, Christopher; Dalmay, Tamas; Moulton, Vincent
2017-01-01
Recently, high-throughput sequencing (HTS) has revealed compelling details about the small RNA (sRNA) population in eukaryotes. These 20 to 25 nt noncoding RNAs can influence gene expression by acting as guides for the sequence-specific regulatory mechanism known as RNA silencing. The increase in sequencing depth and number of samples per project enables a better understanding of the role sRNAs play by facilitating the study of expression patterns. However, the intricacy of the biological hypotheses coupled with a lack of appropriate tools often leads to inadequate mining of the available data and thus, an incomplete description of the biological mechanisms involved. To enable a comprehensive study of differential expression in sRNA data sets, we present a new interactive pipeline that guides researchers through the various stages of data preprocessing and analysis. This includes various tools, some of which we specifically developed for sRNA analysis, for quality checking and normalization of sRNA samples as well as tools for the detection of differentially expressed sRNAs and identification of the resulting expression patterns. The pipeline is available within the UEA sRNA Workbench, a user-friendly software package for the processing of sRNA data sets. We demonstrate the use of the pipeline on a H. sapiens data set; additional examples on a B. terrestris data set and on an A. thaliana data set are described in the Supplemental Information. A comparison with existing approaches is also included, which exemplifies some of the issues that need to be addressed for sRNA analysis and how the new pipeline may be used to do this. PMID:28289155
Wei, Yu-Jie; Wu, Yun; Yan, Yin-Zhuo; Zou, Wan; Xue, Jie; Ma, Wen-Rui; Wang, Wei; Tian, Ge; Wang, Li-Ye
2018-01-01
In this study Illumina MiSeq was performed to investigate microbial diversity in soil, leaves, grape, grape juice and wine. A total of 1,043,102 fungal Internal Transcribed Spacer (ITS) reads and 2,422,188 high quality bacterial 16S rDNA sequences were used for taxonomic classification, revealed five fungal and eight bacterial phyla. At the genus level, the dominant fungi were Ascomycota, Sordariales, Tetracladium and Geomyces in soil, Aureobasidium and Pleosporaceae in grapes leaves, Aureobasidium in grape and grape juice. The dominant bacteria were Kaistobacter, Arthrobacter, Skermanella and Sphingomonas in soil, Pseudomonas, Acinetobacter and Kaistobacter in grape and grapes leaves, and Oenococcus in grape juice and wine. Principal coordinate analysis showed structural separation between the composition of fungi and bacteria in all samples. This is the first study to understand microbiome population in soil, grape, grapes leaves, grape juice and wine in Xinjiang through High-throughput Sequencing and identify microorganisms like Saccharomyces cerevisiae and Oenococcus spp. that may contribute to the quality and flavor of wine.
Ancient pathogen DNA in archaeological samples detected with a Microbial Detection Array.
Devault, Alison M; McLoughlin, Kevin; Jaing, Crystal; Gardner, Shea; Porter, Teresita M; Enk, Jacob M; Thissen, James; Allen, Jonathan; Borucki, Monica; DeWitte, Sharon N; Dhody, Anna N; Poinar, Hendrik N
2014-03-06
Ancient human remains of paleopathological interest typically contain highly degraded DNA in which pathogenic taxa are often minority components, making sequence-based metagenomic characterization costly. Microarrays may hold a potential solution to these challenges, offering a rapid, affordable, and highly informative snapshot of microbial diversity in complex samples without the lengthy analysis and/or high cost associated with high-throughput sequencing. Their versatility is well established for modern clinical specimens, but they have yet to be applied to ancient remains. Here we report bacterial profiles of archaeological and historical human remains using the Lawrence Livermore Microbial Detection Array (LLMDA). The array successfully identified previously-verified bacterial human pathogens, including Vibrio cholerae (cholera) in a 19th century intestinal specimen and Yersinia pestis ("Black Death" plague) in a medieval tooth, which represented only minute fractions (0.03% and 0.08% alignable high-throughput shotgun sequencing reads) of their respective DNA content. This demonstrates that the LLMDA can identify primary and/or co-infecting bacterial pathogens in ancient samples, thereby serving as a rapid and inexpensive paleopathological screening tool to study health across both space and time.
Yan, Yin-zhuo; Zou, Wan; Ma, Wen-rui; Wang, Wei; Tian, Ge; Wang, Li-ye
2018-01-01
In this study Illumina MiSeq was performed to investigate microbial diversity in soil, leaves, grape, grape juice and wine. A total of 1,043,102 fungal Internal Transcribed Spacer (ITS) reads and 2,422,188 high quality bacterial 16S rDNA sequences were used for taxonomic classification, revealed five fungal and eight bacterial phyla. At the genus level, the dominant fungi were Ascomycota, Sordariales, Tetracladium and Geomyces in soil, Aureobasidium and Pleosporaceae in grapes leaves, Aureobasidium in grape and grape juice. The dominant bacteria were Kaistobacter, Arthrobacter, Skermanella and Sphingomonas in soil, Pseudomonas, Acinetobacter and Kaistobacter in grape and grapes leaves, and Oenococcus in grape juice and wine. Principal coordinate analysis showed structural separation between the composition of fungi and bacteria in all samples. This is the first study to understand microbiome population in soil, grape, grapes leaves, grape juice and wine in Xinjiang through High-throughput Sequencing and identify microorganisms like Saccharomyces cerevisiae and Oenococcus spp. that may contribute to the quality and flavor of wine. PMID:29565999
FusionAnalyser: a new graphical, event-driven tool for fusion rearrangements discovery
Piazza, Rocco; Pirola, Alessandra; Spinelli, Roberta; Valletta, Simona; Redaelli, Sara; Magistroni, Vera; Gambacorti-Passerini, Carlo
2012-01-01
Gene fusions are common driver events in leukaemias and solid tumours; here we present FusionAnalyser, a tool dedicated to the identification of driver fusion rearrangements in human cancer through the analysis of paired-end high-throughput transcriptome sequencing data. We initially tested FusionAnalyser by using a set of in silico randomly generated sequencing data from 20 known human translocations occurring in cancer and subsequently using transcriptome data from three chronic and three acute myeloid leukaemia samples. in all the cases our tool was invariably able to detect the presence of the correct driver fusion event(s) with high specificity. In one of the acute myeloid leukaemia samples, FusionAnalyser identified a novel, cryptic, in-frame ETS2–ERG fusion. A fully event-driven graphical interface and a flexible filtering system allow complex analyses to be run in the absence of any a priori programming or scripting knowledge. Therefore, we propose FusionAnalyser as an efficient and robust graphical tool for the identification of functional rearrangements in the context of high-throughput transcriptome sequencing data. PMID:22570408
FusionAnalyser: a new graphical, event-driven tool for fusion rearrangements discovery.
Piazza, Rocco; Pirola, Alessandra; Spinelli, Roberta; Valletta, Simona; Redaelli, Sara; Magistroni, Vera; Gambacorti-Passerini, Carlo
2012-09-01
Gene fusions are common driver events in leukaemias and solid tumours; here we present FusionAnalyser, a tool dedicated to the identification of driver fusion rearrangements in human cancer through the analysis of paired-end high-throughput transcriptome sequencing data. We initially tested FusionAnalyser by using a set of in silico randomly generated sequencing data from 20 known human translocations occurring in cancer and subsequently using transcriptome data from three chronic and three acute myeloid leukaemia samples. in all the cases our tool was invariably able to detect the presence of the correct driver fusion event(s) with high specificity. In one of the acute myeloid leukaemia samples, FusionAnalyser identified a novel, cryptic, in-frame ETS2-ERG fusion. A fully event-driven graphical interface and a flexible filtering system allow complex analyses to be run in the absence of any a priori programming or scripting knowledge. Therefore, we propose FusionAnalyser as an efficient and robust graphical tool for the identification of functional rearrangements in the context of high-throughput transcriptome sequencing data.
Translational bioinformatics in the cloud: an affordable alternative
2010-01-01
With the continued exponential expansion of publicly available genomic data and access to low-cost, high-throughput molecular technologies for profiling patient populations, computational technologies and informatics are becoming vital considerations in genomic medicine. Although cloud computing technology is being heralded as a key enabling technology for the future of genomic research, available case studies are limited to applications in the domain of high-throughput sequence data analysis. The goal of this study was to evaluate the computational and economic characteristics of cloud computing in performing a large-scale data integration and analysis representative of research problems in genomic medicine. We find that the cloud-based analysis compares favorably in both performance and cost in comparison to a local computational cluster, suggesting that cloud computing technologies might be a viable resource for facilitating large-scale translational research in genomic medicine. PMID:20691073
Stiffler, Michael A; Subramanian, Subu K; Salinas, Victor H; Ranganathan, Rama
2016-07-03
Site-directed mutagenesis has long been used as a method to interrogate protein structure, function and evolution. Recent advances in massively-parallel sequencing technology have opened up the possibility of assessing the functional or fitness effects of large numbers of mutations simultaneously. Here, we present a protocol for experimentally determining the effects of all possible single amino acid mutations in a protein of interest utilizing high-throughput sequencing technology, using the 263 amino acid antibiotic resistance enzyme TEM-1 β-lactamase as an example. In this approach, a whole-protein saturation mutagenesis library is constructed by site-directed mutagenic PCR, randomizing each position individually to all possible amino acids. The library is then transformed into bacteria, and selected for the ability to confer resistance to β-lactam antibiotics. The fitness effect of each mutation is then determined by deep sequencing of the library before and after selection. Importantly, this protocol introduces methods which maximize sequencing read depth and permit the simultaneous selection of the entire mutation library, by mixing adjacent positions into groups of length accommodated by high-throughput sequencing read length and utilizing orthogonal primers to barcode each group. Representative results using this protocol are provided by assessing the fitness effects of all single amino acid mutations in TEM-1 at a clinically relevant dosage of ampicillin. The method should be easily extendable to other proteins for which a high-throughput selection assay is in place.
Chaitankar, Vijender; Karakülah, Gökhan; Ratnapriya, Rinki; Giuste, Felipe O.; Brooks, Matthew J.; Swaroop, Anand
2016-01-01
The advent of high throughput next generation sequencing (NGS) has accelerated the pace of discovery of disease-associated genetic variants and genomewide profiling of expressed sequences and epigenetic marks, thereby permitting systems-based analyses of ocular development and disease. Rapid evolution of NGS and associated methodologies presents significant challenges in acquisition, management, and analysis of large data sets and for extracting biologically or clinically relevant information. Here we illustrate the basic design of commonly used NGS-based methods, specifically whole exome sequencing, transcriptome, and epigenome profiling, and provide recommendations for data analyses. We briefly discuss systems biology approaches for integrating multiple data sets to elucidate gene regulatory or disease networks. While we provide examples from the retina, the NGS guidelines reviewed here are applicable to other tissues/cell types as well. PMID:27297499
Aschard, Hugues; Cattoir, Vincent; Yoder-Himes, Deborah; Lory, Stephen; Pier, Gerald B.
2013-01-01
High-throughput sequencing of transposon (Tn) libraries created within entire genomes identifies and quantifies the contribution of individual genes and operons to the fitness of organisms in different environments. We used insertion-sequencing (INSeq) to analyze the contribution to fitness of all non-essential genes in the chromosome of Pseudomonas aeruginosa strain PA14 based on a library of ∼300,000 individual Tn insertions. In vitro growth in LB provided a baseline for comparison with the survival of the Tn insertion strains following 6 days of colonization of the murine gastrointestinal tract as well as a comparison with Tn-inserts subsequently able to systemically disseminate to the spleen following induction of neutropenia. Sequencing was performed following DNA extraction from the recovered bacteria, digestion with the MmeI restriction enzyme that hydrolyzes DNA 16 bp away from the end of the Tn insert, and fractionation into oligonucleotides of 1,200–1,500 bp that were prepared for high-throughput sequencing. Changes in frequency of Tn inserts into the P. aeruginosa genome were used to quantify in vivo fitness resulting from loss of a gene. 636 genes had <10 sequencing reads in LB, thus defined as unable to grow in this medium. During in vivo infection there were major losses of strains with Tn inserts in almost all known virulence factors, as well as respiration, energy utilization, ion pumps, nutritional genes and prophages. Many new candidates for virulence factors were also identified. There were consistent changes in the recovery of Tn inserts in genes within most operons and Tn insertions into some genes enhanced in vivo fitness. Strikingly, 90% of the non-essential genes were required for in vivo survival following systemic dissemination during neutropenia. These experiments resulted in the identification of the P. aeruginosa strain PA14 genes necessary for optimal survival in the mucosal and systemic environments of a mammalian host. PMID:24039572
SMARTIV: combined sequence and structure de-novo motif discovery for in-vivo RNA binding data.
Polishchuk, Maya; Paz, Inbal; Yakhini, Zohar; Mandel-Gutfreund, Yael
2018-05-25
Gene expression regulation is highly dependent on binding of RNA-binding proteins (RBPs) to their RNA targets. Growing evidence supports the notion that both RNA primary sequence and its local secondary structure play a role in specific Protein-RNA recognition and binding. Despite the great advance in high-throughput experimental methods for identifying sequence targets of RBPs, predicting the specific sequence and structure binding preferences of RBPs remains a major challenge. We present a novel webserver, SMARTIV, designed for discovering and visualizing combined RNA sequence and structure motifs from high-throughput RNA-binding data, generated from in-vivo experiments. The uniqueness of SMARTIV is that it predicts motifs from enriched k-mers that combine information from ranked RNA sequences and their predicted secondary structure, obtained using various folding methods. Consequently, SMARTIV generates Position Weight Matrices (PWMs) in a combined sequence and structure alphabet with assigned P-values. SMARTIV concisely represents the sequence and structure motif content as a single graphical logo, which is informative and easy for visual perception. SMARTIV was examined extensively on a variety of high-throughput binding experiments for RBPs from different families, generated from different technologies, showing consistent and accurate results. Finally, SMARTIV is a user-friendly webserver, highly efficient in run-time and freely accessible via http://smartiv.technion.ac.il/.
Winnowing DNA for rare sequences: highly specific sequence and methylation based enrichment.
Thompson, Jason D; Shibahara, Gosuke; Rajan, Sweta; Pel, Joel; Marziali, Andre
2012-01-01
Rare mutations in cell populations are known to be hallmarks of many diseases and cancers. Similarly, differential DNA methylation patterns arise in rare cell populations with diagnostic potential such as fetal cells circulating in maternal blood. Unfortunately, the frequency of alleles with diagnostic potential, relative to wild-type background sequence, is often well below the frequency of errors in currently available methods for sequence analysis, including very high throughput DNA sequencing. We demonstrate a DNA preparation and purification method that through non-linear electrophoretic separation in media containing oligonucleotide probes, achieves 10,000 fold enrichment of target DNA with single nucleotide specificity, and 100 fold enrichment of unmodified methylated DNA differing from the background by the methylation of a single cytosine residue.
Bielaszewska, Martina; Karch, Helge; Toth, Ian K.
2012-01-01
Background An Escherichia coli O104:H4 outbreak in Germany in summer 2011 caused 53 deaths, over 4000 individual infections across Europe, and considerable economic, social and political impact. This outbreak was the first in a position to exploit rapid, benchtop high-throughput sequencing (HTS) technologies and crowdsourced data analysis early in its investigation, establishing a new paradigm for rapid response to disease threats. We describe a novel strategy for design of diagnostic PCR primers that exploited this rapid draft bacterial genome sequencing to distinguish between E. coli O104:H4 outbreak isolates and other pathogenic E. coli isolates, including the historical hæmolytic uræmic syndrome (HUSEC) E. coli HUSEC041 O104:H4 strain, which possesses the same serotype as the outbreak isolates. Methodology/Principal Findings Primers were designed using a novel alignment-free strategy against eleven draft whole genome assemblies of E. coli O104:H4 German outbreak isolates from the E. coli O104:H4 Genome Analysis Crowd-Sourcing Consortium website, and a negative sequence set containing 69 E. coli chromosome and plasmid sequences from public databases. Validation in vitro against 21 ‘positive’ E. coli O104:H4 outbreak and 32 ‘negative’ non-outbreak EHEC isolates indicated that individual primer sets exhibited 100% sensitivity for outbreak isolates, with false positive rates of between 9% and 22%. A minimal combination of two primers discriminated between outbreak and non-outbreak E. coli isolates with 100% sensitivity and 100% specificity. Conclusions/Significance Draft genomes of isolates of disease outbreak bacteria enable high throughput primer design and enhanced diagnostic performance in comparison to traditional molecular assays. Future outbreak investigations will be able to harness HTS rapidly to generate draft genome sequences and diagnostic primer sets, greatly facilitating epidemiology and clinical diagnostics. We expect that high throughput primer design strategies will enable faster, more precise responses to future disease outbreaks of bacterial origin, and help to mitigate their societal impact. PMID:22496820
A computational genomics pipeline for prokaryotic sequencing projects
Kislyuk, Andrey O.; Katz, Lee S.; Agrawal, Sonia; Hagen, Matthew S.; Conley, Andrew B.; Jayaraman, Pushkala; Nelakuditi, Viswateja; Humphrey, Jay C.; Sammons, Scott A.; Govil, Dhwani; Mair, Raydel D.; Tatti, Kathleen M.; Tondella, Maria L.; Harcourt, Brian H.; Mayer, Leonard W.; Jordan, I. King
2010-01-01
Motivation: New sequencing technologies have accelerated research on prokaryotic genomes and have made genome sequencing operations outside major genome sequencing centers routine. However, no off-the-shelf solution exists for the combined assembly, gene prediction, genome annotation and data presentation necessary to interpret sequencing data. The resulting requirement to invest significant resources into custom informatics support for genome sequencing projects remains a major impediment to the accessibility of high-throughput sequence data. Results: We present a self-contained, automated high-throughput open source genome sequencing and computational genomics pipeline suitable for prokaryotic sequencing projects. The pipeline has been used at the Georgia Institute of Technology and the Centers for Disease Control and Prevention for the analysis of Neisseria meningitidis and Bordetella bronchiseptica genomes. The pipeline is capable of enhanced or manually assisted reference-based assembly using multiple assemblers and modes; gene predictor combining; and functional annotation of genes and gene products. Because every component of the pipeline is executed on a local machine with no need to access resources over the Internet, the pipeline is suitable for projects of a sensitive nature. Annotation of virulence-related features makes the pipeline particularly useful for projects working with pathogenic prokaryotes. Availability and implementation: The pipeline is licensed under the open-source GNU General Public License and available at the Georgia Tech Neisseria Base (http://nbase.biology.gatech.edu/). The pipeline is implemented with a combination of Perl, Bourne Shell and MySQL and is compatible with Linux and other Unix systems. Contact: king.jordan@biology.gatech.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:20519285
Patel, Rajesh; Tsan, Alison; Sumiyoshi, Teiko; Fu, Ling; Desai, Rupal; Schoenbrunner, Nancy; Myers, Thomas W.; Bauer, Keith; Smith, Edward; Raja, Rajiv
2014-01-01
Molecular profiling of tumor tissue to detect alterations, such as oncogenic mutations, plays a vital role in determining treatment options in oncology. Hence, there is an increasing need for a robust and high-throughput technology to detect oncogenic hotspot mutations. Although commercial assays are available to detect genetic alterations in single genes, only a limited amount of tissue is often available from patients, requiring multiplexing to allow for simultaneous detection of mutations in many genes using low DNA input. Even though next-generation sequencing (NGS) platforms provide powerful tools for this purpose, they face challenges such as high cost, large DNA input requirement, complex data analysis, and long turnaround times, limiting their use in clinical settings. We report the development of the next generation mutation multi-analyte panel (MUT-MAP), a high-throughput microfluidic, panel for detecting 120 somatic mutations across eleven genes of therapeutic interest (AKT1, BRAF, EGFR, FGFR3, FLT3, HRAS, KIT, KRAS, MET, NRAS, and PIK3CA) using allele-specific PCR (AS-PCR) and Taqman technology. This mutation panel requires as little as 2 ng of high quality DNA from fresh frozen or 100 ng of DNA from formalin-fixed paraffin-embedded (FFPE) tissues. Mutation calls, including an automated data analysis process, have been implemented to run 88 samples per day. Validation of this platform using plasmids showed robust signal and low cross-reactivity in all of the newly added assays and mutation calls in cell line samples were found to be consistent with the Catalogue of Somatic Mutations in Cancer (COSMIC) database allowing for direct comparison of our platform to Sanger sequencing. High correlation with NGS when compared to the SuraSeq500 panel run on the Ion Torrent platform in a FFPE dilution experiment showed assay sensitivity down to 0.45%. This multiplexed mutation panel is a valuable tool for high-throughput biomarker discovery in personalized medicine and cancer drug development. PMID:24658394
Moreira, Rebeca; Balseiro, Pablo; Planas, Josep V.; Fuste, Berta; Beltran, Sergi; Novoa, Beatriz; Figueras, Antonio
2012-01-01
Background The Manila clam (Ruditapes philippinarum) is a worldwide cultured bivalve species with important commercial value. Diseases affecting this species can result in large economic losses. Because knowledge of the molecular mechanisms of the immune response in bivalves, especially clams, is scarce and fragmentary, we sequenced RNA from immune-stimulated R. philippinarum hemocytes by 454-pyrosequencing to identify genes involved in their immune defense against infectious diseases. Methodology and Principal Findings High-throughput deep sequencing of R. philippinarum using 454 pyrosequencing technology yielded 974,976 high-quality reads with an average read length of 250 bp. The reads were assembled into 51,265 contigs and the 44.7% of the translated nucleotide sequences into protein were annotated successfully. The 35 most frequently found contigs included a large number of immune-related genes, and a more detailed analysis showed the presence of putative members of several immune pathways and processes like the apoptosis, the toll like signaling pathway and the complement cascade. We have found sequences from molecules never described in bivalves before, especially in the complement pathway where almost all the components are present. Conclusions This study represents the first transcriptome analysis using 454-pyrosequencing conducted on R. philippinarum focused on its immune system. Our results will provide a rich source of data to discover and identify new genes, which will serve as a basis for microarray construction and the study of gene expression as well as for the identification of genetic markers. The discovery of new immune sequences was very productive and resulted in a large variety of contigs that may play a role in the defense mechanisms of Ruditapes philippinarum. PMID:22536348
Mapping of disease-associated variants in admixed populations
2011-01-01
Recent developments in high-throughput genotyping and whole-genome sequencing will enhance the identification of disease loci in admixed populations. We discuss how a more refined estimation of ancestry benefits both admixture mapping and association mapping, making disease loci identification in admixed populations more powerful. High-throughput genotyping and sequencing will enable refined estimation of ancestry, thus enhancing disease loci identification in admixed populations PMID:21635713
A new fungal large subunit ribosomal RNA primer for high throughput sequencing surveys
DOE Office of Scientific and Technical Information (OSTI.GOV)
Mueller, Rebecca C.; Gallegos-Graves, La Verne; Kuske, Cheryl R.
The inclusion of phylogenetic metrics in community ecology has provided insights into important ecological processes, particularly when combined with high-throughput sequencing methods; however, these approaches have not been widely used in studies of fungal communities relative to other microbial groups. Two obstacles have been considered: (1) the internal transcribed spacer (ITS) region has limited utility for constructing phylogenies and (2) most PCR primers that target the large subunit (LSU) ribosomal unit generate amplicons that exceed current limits of high-throughput sequencing platforms. We designed and tested a PCR primer (LR22R) to target approximately 300–400 bp region of the D2 hypervariable regionmore » of the fungal LSU for use with the Illumina MiSeq platform. Both in silico and empirical analyses showed that the LR22R–LR3 pair captured a broad range of fungal taxonomic groups with a small fraction of non-fungal groups. Phylogenetic placement of publically available LSU D2 sequences showed broad agreement with taxonomic classification. Comparisons of the LSU D2 and the ITS2 ribosomal regions from environmental samples and known communities showed similar discriminatory abilities of the two primer sets. Altogether, these findings show that the LR22R–LR3 primer pair has utility for phylogenetic analyses of fungal communities using high-throughput sequencing methods.« less
A new fungal large subunit ribosomal RNA primer for high throughput sequencing surveys
Mueller, Rebecca C.; Gallegos-Graves, La Verne; Kuske, Cheryl R.
2015-12-09
The inclusion of phylogenetic metrics in community ecology has provided insights into important ecological processes, particularly when combined with high-throughput sequencing methods; however, these approaches have not been widely used in studies of fungal communities relative to other microbial groups. Two obstacles have been considered: (1) the internal transcribed spacer (ITS) region has limited utility for constructing phylogenies and (2) most PCR primers that target the large subunit (LSU) ribosomal unit generate amplicons that exceed current limits of high-throughput sequencing platforms. We designed and tested a PCR primer (LR22R) to target approximately 300–400 bp region of the D2 hypervariable regionmore » of the fungal LSU for use with the Illumina MiSeq platform. Both in silico and empirical analyses showed that the LR22R–LR3 pair captured a broad range of fungal taxonomic groups with a small fraction of non-fungal groups. Phylogenetic placement of publically available LSU D2 sequences showed broad agreement with taxonomic classification. Comparisons of the LSU D2 and the ITS2 ribosomal regions from environmental samples and known communities showed similar discriminatory abilities of the two primer sets. Altogether, these findings show that the LR22R–LR3 primer pair has utility for phylogenetic analyses of fungal communities using high-throughput sequencing methods.« less
NGSANE: a lightweight production informatics framework for high-throughput data analysis.
Buske, Fabian A; French, Hugh J; Smith, Martin A; Clark, Susan J; Bauer, Denis C
2014-05-15
The initial steps in the analysis of next-generation sequencing data can be automated by way of software 'pipelines'. However, individual components depreciate rapidly because of the evolving technology and analysis methods, often rendering entire versions of production informatics pipelines obsolete. Constructing pipelines from Linux bash commands enables the use of hot swappable modular components as opposed to the more rigid program call wrapping by higher level languages, as implemented in comparable published pipelining systems. Here we present Next Generation Sequencing ANalysis for Enterprises (NGSANE), a Linux-based, high-performance-computing-enabled framework that minimizes overhead for set up and processing of new projects, yet maintains full flexibility of custom scripting when processing raw sequence data. Ngsane is implemented in bash and publicly available under BSD (3-Clause) licence via GitHub at https://github.com/BauerLab/ngsane. Denis.Bauer@csiro.au Supplementary data are available at Bioinformatics online.
MetaDP: a comprehensive web server for disease prediction of 16S rRNA metagenomic datasets.
Xu, Xilin; Wu, Aiping; Zhang, Xinlei; Su, Mingming; Jiang, Taijiao; Yuan, Zhe-Ming
2016-01-01
High-throughput sequencing-based metagenomics has garnered considerable interest in recent years. Numerous methods and tools have been developed for the analysis of metagenomic data. However, it is still a daunting task to install a large number of tools and complete a complicated analysis, especially for researchers with minimal bioinformatics backgrounds. To address this problem, we constructed an automated software named MetaDP for 16S rRNA sequencing data analysis, including data quality control, operational taxonomic unit clustering, diversity analysis, and disease risk prediction modeling. Furthermore, a support vector machine-based prediction model for intestinal bowel syndrome (IBS) was built by applying MetaDP to microbial 16S sequencing data from 108 children. The success of the IBS prediction model suggests that the platform may also be applied to other diseases related to gut microbes, such as obesity, metabolic syndrome, or intestinal cancer, among others (http://metadp.cn:7001/).
Biggar, Kyle K; Wu, Cheng-Wei; Storey, Kenneth B
2014-10-01
This study makes a significant advancement on a microRNA amplification technique previously used for expression analysis and sequencing in animal models without annotated mature microRNA sequences. As research progresses into the post-genomic era of microRNA prediction and analysis, the need for a rapid and cost-effective method for microRNA amplification is critical to facilitate wide-scale analysis of microRNA expression. To facilitate this requirement, we have reoptimized the design of amplification primers and introduced a polyadenylation step to allow amplification of all mature microRNAs from a single RNA sample. Importantly, this method retains the ability to sequence reverse transcription polymerase chain reaction (RT-PCR) products, validating microRNA-specific amplification. Copyright © 2014 Elsevier Inc. All rights reserved.
Chabbert, Christophe D; Adjalley, Sophie H; Steinmetz, Lars M; Pelechano, Vicent
2018-01-01
Chromatin immunoprecipitation followed by sequencing (ChIP-Seq) or microarray hybridization (ChIP-on-chip) are standard methods for the study of transcription factor binding sites and histone chemical modifications. However, these approaches only allow profiling of a single factor or protein modification at a time.In this chapter, we present Bar-ChIP, a higher throughput version of ChIP-Seq that relies on the direct ligation of molecular barcodes to chromatin fragments. Bar-ChIP enables the concurrent profiling of multiple DNA-protein interactions and is therefore amenable to experimental scale-up, without the need for any robotic instrumentation.
Validation of high throughput sequencing and microbial forensics applications
2014-01-01
High throughput sequencing (HTS) generates large amounts of high quality sequence data for microbial genomics. The value of HTS for microbial forensics is the speed at which evidence can be collected and the power to characterize microbial-related evidence to solve biocrimes and bioterrorist events. As HTS technologies continue to improve, they provide increasingly powerful sets of tools to support the entire field of microbial forensics. Accurate, credible results allow analysis and interpretation, significantly influencing the course and/or focus of an investigation, and can impact the response of the government to an attack having individual, political, economic or military consequences. Interpretation of the results of microbial forensic analyses relies on understanding the performance and limitations of HTS methods, including analytical processes, assays and data interpretation. The utility of HTS must be defined carefully within established operating conditions and tolerances. Validation is essential in the development and implementation of microbial forensics methods used for formulating investigative leads attribution. HTS strategies vary, requiring guiding principles for HTS system validation. Three initial aspects of HTS, irrespective of chemistry, instrumentation or software are: 1) sample preparation, 2) sequencing, and 3) data analysis. Criteria that should be considered for HTS validation for microbial forensics are presented here. Validation should be defined in terms of specific application and the criteria described here comprise a foundation for investigators to establish, validate and implement HTS as a tool in microbial forensics, enhancing public safety and national security. PMID:25101166
Validation of high throughput sequencing and microbial forensics applications.
Budowle, Bruce; Connell, Nancy D; Bielecka-Oder, Anna; Colwell, Rita R; Corbett, Cindi R; Fletcher, Jacqueline; Forsman, Mats; Kadavy, Dana R; Markotic, Alemka; Morse, Stephen A; Murch, Randall S; Sajantila, Antti; Schmedes, Sarah E; Ternus, Krista L; Turner, Stephen D; Minot, Samuel
2014-01-01
High throughput sequencing (HTS) generates large amounts of high quality sequence data for microbial genomics. The value of HTS for microbial forensics is the speed at which evidence can be collected and the power to characterize microbial-related evidence to solve biocrimes and bioterrorist events. As HTS technologies continue to improve, they provide increasingly powerful sets of tools to support the entire field of microbial forensics. Accurate, credible results allow analysis and interpretation, significantly influencing the course and/or focus of an investigation, and can impact the response of the government to an attack having individual, political, economic or military consequences. Interpretation of the results of microbial forensic analyses relies on understanding the performance and limitations of HTS methods, including analytical processes, assays and data interpretation. The utility of HTS must be defined carefully within established operating conditions and tolerances. Validation is essential in the development and implementation of microbial forensics methods used for formulating investigative leads attribution. HTS strategies vary, requiring guiding principles for HTS system validation. Three initial aspects of HTS, irrespective of chemistry, instrumentation or software are: 1) sample preparation, 2) sequencing, and 3) data analysis. Criteria that should be considered for HTS validation for microbial forensics are presented here. Validation should be defined in terms of specific application and the criteria described here comprise a foundation for investigators to establish, validate and implement HTS as a tool in microbial forensics, enhancing public safety and national security.
High-throughput sequencing of forensic genetic samples using punches of FTA cards with buccal swabs.
Kampmann, Marie-Louise; Buchard, Anders; Børsting, Claus; Morling, Niels
2016-01-01
Here, we demonstrate that punches from buccal swab samples preserved on FTA cards can be used for high-throughput DNA sequencing, also known as massively parallel sequencing (MPS). We typed 44 reference samples with the HID-Ion AmpliSeq Identity Panel using washed 1.2 mm punches from FTA cards with buccal swabs and compared the results with those obtained with DNA extracted using the EZ1 DNA Investigator Kit. Concordant profiles were obtained for all samples. Our protocol includes simple punch, wash, and PCR steps, reducing cost and hands-on time in the laboratory. Furthermore, it facilitates automation of DNA sequencing.
Hou, Weiguo; Wang, Shang; Briggs, Brandon R; Li, Gaoyuan; Xie, Wei; Dong, Hailiang
2018-01-01
Myocyanophages, a group of viruses infecting cyanobacteria, are abundant and play important roles in elemental cycling. Here we investigated the particle-associated viral communities retained on 0.2 μm filters and in sediment samples (representing ancient cyanophage communities) from four ocean and three lake locations, using high-throughput sequencing and a newly designed primer pair targeting a gene fragment (∼145-bp in length) encoding the cyanophage gp23 major capsid protein (MCP). Diverse viral communities were detected in all samples. The fragments of 142-, 145-, and 148-bp in length were most abundant in the amplicons, and most sequences (>92%) belonged to cyanophages. Additionally, different sequencing depths resulted in different diversity estimates of the viral community. Operational taxonomic units obtained from deep sequencing of the MCP gene covered the majority of those obtained from shallow sequencing, suggesting that deep sequencing exhibited a more complete picture of cyanophage community than shallow sequencing. Our results also revealed a wide geographic distribution of marine myocyanophages, i.e., higher dissimilarities of the myocyanophage communities corresponded with the larger distances between the sampling sites. Collectively, this study suggests that the newly designed primer pair can be effectively used to study the community and diversity of myocyanophage from different environments, and the high-throughput sequencing represents a good method to understand viral diversity.
Hou, Weiguo; Wang, Shang; Briggs, Brandon R.; Li, Gaoyuan; Xie, Wei; Dong, Hailiang
2018-01-01
Myocyanophages, a group of viruses infecting cyanobacteria, are abundant and play important roles in elemental cycling. Here we investigated the particle-associated viral communities retained on 0.2 μm filters and in sediment samples (representing ancient cyanophage communities) from four ocean and three lake locations, using high-throughput sequencing and a newly designed primer pair targeting a gene fragment (∼145-bp in length) encoding the cyanophage gp23 major capsid protein (MCP). Diverse viral communities were detected in all samples. The fragments of 142-, 145-, and 148-bp in length were most abundant in the amplicons, and most sequences (>92%) belonged to cyanophages. Additionally, different sequencing depths resulted in different diversity estimates of the viral community. Operational taxonomic units obtained from deep sequencing of the MCP gene covered the majority of those obtained from shallow sequencing, suggesting that deep sequencing exhibited a more complete picture of cyanophage community than shallow sequencing. Our results also revealed a wide geographic distribution of marine myocyanophages, i.e., higher dissimilarities of the myocyanophage communities corresponded with the larger distances between the sampling sites. Collectively, this study suggests that the newly designed primer pair can be effectively used to study the community and diversity of myocyanophage from different environments, and the high-throughput sequencing represents a good method to understand viral diversity.
DNA-encoded chemistry: enabling the deeper sampling of chemical space.
Goodnow, Robert A; Dumelin, Christoph E; Keefe, Anthony D
2017-02-01
DNA-encoded chemical library technologies are increasingly being adopted in drug discovery for hit and lead generation. DNA-encoded chemistry enables the exploration of chemical spaces four to five orders of magnitude more deeply than is achievable by traditional high-throughput screening methods. Operation of this technology requires developing a range of capabilities including aqueous synthetic chemistry, building block acquisition, oligonucleotide conjugation, large-scale molecular biological transformations, selection methodologies, PCR, sequencing, sequence data analysis and the analysis of large chemistry spaces. This Review provides an overview of the development and applications of DNA-encoded chemistry, highlighting the challenges and future directions for the use of this technology.
Mapping the miRNA interactome by crosslinking ligation and sequencing of hybrids (CLASH)
Helwak, Aleksandra; Tollervey, David
2014-01-01
RNA-RNA interactions play critical roles in many cellular processes but studying them is difficult and laborious. Here, we describe an experimental procedure, termed crosslinking ligation and sequencing of hybrids (CLASH), which allows high-throughput identification of sites of RNA-RNA interaction. During CLASH, a tagged bait protein is UV crosslinked in vivo to stabilise RNA interactions and purified under denaturing conditions. RNAs associated with the bait protein are partially truncated, and the ends of RNA-duplexes are ligated together. Following linker addition, cDNA library preparation and high-throughput sequencing, the ligated duplexes give rise to chimeric cDNAs, which unambiguously identify RNA-RNA interaction sites independent of bioinformatic predictions. This protocol is optimized for studying miRNA targets bound by Argonaute proteins, but should be easily adapted for other RNA-binding proteins and classes of RNA. The protocol requires around 5 days to complete, excluding the time required for high-throughput sequencing and bioinformatic analyses. PMID:24577361
Novel method for high-throughput colony PCR screening in nanoliter-reactors
Walser, Marcel; Pellaux, Rene; Meyer, Andreas; Bechtold, Matthias; Vanderschuren, Herve; Reinhardt, Richard; Magyar, Joseph; Panke, Sven; Held, Martin
2009-01-01
We introduce a technology for the rapid identification and sequencing of conserved DNA elements employing a novel suspension array based on nanoliter (nl)-reactors made from alginate. The reactors have a volume of 35 nl and serve as reaction compartments during monoseptic growth of microbial library clones, colony lysis, thermocycling and screening for sequence motifs via semi-quantitative fluorescence analyses. nl-Reactors were kept in suspension during all high-throughput steps which allowed performing the protocol in a highly space-effective fashion and at negligible expenses of consumables and reagents. As a first application, 11 high-quality microsatellites for polymorphism studies in cassava were isolated and sequenced out of a library of 20 000 clones in 2 days. The technology is widely scalable and we envision that throughputs for nl-reactor based screenings can be increased up to 100 000 and more samples per day thereby efficiently complementing protocols based on established deep-sequencing technologies. PMID:19282448
A high-throughput multiplex method adapted for GMO detection.
Chaouachi, Maher; Chupeau, Gaëlle; Berard, Aurélie; McKhann, Heather; Romaniuk, Marcel; Giancola, Sandra; Laval, Valérie; Bertheau, Yves; Brunel, Dominique
2008-12-24
A high-throughput multiplex assay for the detection of genetically modified organisms (GMO) was developed on the basis of the existing SNPlex method designed for SNP genotyping. This SNPlex assay allows the simultaneous detection of up to 48 short DNA sequences (approximately 70 bp; "signature sequences") from taxa endogenous reference genes, from GMO constructions, screening targets, construct-specific, and event-specific targets, and finally from donor organisms. This assay avoids certain shortcomings of multiplex PCR-based methods already in widespread use for GMO detection. The assay demonstrated high specificity and sensitivity. The results suggest that this assay is reliable, flexible, and cost- and time-effective for high-throughput GMO detection.
Detection of somatic mutations by high-resolution DNA melting (HRM) analysis in multiple cancers.
Gonzalez-Bosquet, Jesus; Calcei, Jacob; Wei, Jun S; Garcia-Closas, Montserrat; Sherman, Mark E; Hewitt, Stephen; Vockley, Joseph; Lissowska, Jolanta; Yang, Hannah P; Khan, Javed; Chanock, Stephen
2011-01-17
Identification of somatic mutations in cancer is a major goal for understanding and monitoring the events related to cancer initiation and progression. High resolution melting (HRM) curve analysis represents a fast, post-PCR high-throughput method for scanning somatic sequence alterations in target genes. The aim of this study was to assess the sensitivity and specificity of HRM analysis for tumor mutation screening in a range of tumor samples, which included 216 frozen pediatric small rounded blue-cell tumors as well as 180 paraffin-embedded tumors from breast, endometrial and ovarian cancers (60 of each). HRM analysis was performed in exons of the following candidate genes known to harbor established commonly observed mutations: PIK3CA, ERBB2, KRAS, TP53, EGFR, BRAF, GATA3, and FGFR3. Bi-directional sequencing analysis was used to determine the accuracy of the HRM analysis. For the 39 mutations observed in frozen samples, the sensitivity and specificity of HRM analysis were 97% and 87%, respectively. There were 67 mutation/variants in the paraffin-embedded samples, and the sensitivity and specificity for the HRM analysis were 88% and 80%, respectively. Paraffin-embedded samples require higher quantity of purified DNA for high performance. In summary, HRM analysis is a promising moderate-throughput screening test for mutations among known candidate genomic regions. Although the overall accuracy appears to be better in frozen specimens, somatic alterations were detected in DNA extracted from paraffin-embedded samples.
Detection of Somatic Mutations by High-Resolution DNA Melting (HRM) Analysis in Multiple Cancers
Gonzalez-Bosquet, Jesus; Calcei, Jacob; Wei, Jun S.; Garcia-Closas, Montserrat; Sherman, Mark E.; Hewitt, Stephen; Vockley, Joseph; Lissowska, Jolanta; Yang, Hannah P.; Khan, Javed; Chanock, Stephen
2011-01-01
Identification of somatic mutations in cancer is a major goal for understanding and monitoring the events related to cancer initiation and progression. High resolution melting (HRM) curve analysis represents a fast, post-PCR high-throughput method for scanning somatic sequence alterations in target genes. The aim of this study was to assess the sensitivity and specificity of HRM analysis for tumor mutation screening in a range of tumor samples, which included 216 frozen pediatric small rounded blue-cell tumors as well as 180 paraffin-embedded tumors from breast, endometrial and ovarian cancers (60 of each). HRM analysis was performed in exons of the following candidate genes known to harbor established commonly observed mutations: PIK3CA, ERBB2, KRAS, TP53, EGFR, BRAF, GATA3, and FGFR3. Bi-directional sequencing analysis was used to determine the accuracy of the HRM analysis. For the 39 mutations observed in frozen samples, the sensitivity and specificity of HRM analysis were 97% and 87%, respectively. There were 67 mutation/variants in the paraffin-embedded samples, and the sensitivity and specificity for the HRM analysis were 88% and 80%, respectively. Paraffin-embedded samples require higher quantity of purified DNA for high performance. In summary, HRM analysis is a promising moderate-throughput screening test for mutations among known candidate genomic regions. Although the overall accuracy appears to be better in frozen specimens, somatic alterations were detected in DNA extracted from paraffin-embedded samples. PMID:21264207
False positives complicate ancient pathogen identifications using high-throughput shotgun sequencing
2014-01-01
Background Identification of historic pathogens is challenging since false positives and negatives are a serious risk. Environmental non-pathogenic contaminants are ubiquitous. Furthermore, public genetic databases contain limited information regarding these species. High-throughput sequencing may help reliably detect and identify historic pathogens. Results We shotgun-sequenced 8 16th-century Mixtec individuals from the site of Teposcolula Yucundaa (Oaxaca, Mexico) who are reported to have died from the huey cocoliztli (‘Great Pestilence’ in Nahautl), an unknown disease that decimated native Mexican populations during the Spanish colonial period, in order to identify the pathogen. Comparison of these sequences with those deriving from the surrounding soil and from 4 precontact individuals from the site found a wide variety of contaminant organisms that confounded analyses. Without the comparative sequence data from the precontact individuals and soil, false positives for Yersinia pestis and rickettsiosis could have been reported. Conclusions False positives and negatives remain problematic in ancient DNA analyses despite the application of high-throughput sequencing. Our results suggest that several studies claiming the discovery of ancient pathogens may need further verification. Additionally, true single molecule sequencing’s short read lengths, inability to sequence through DNA lesions, and limited ancient-DNA-specific technical development hinder its application to palaeopathology. PMID:24568097
Role of APOE Isoforms in the Pathogenesis of TBI induced Alzheimer’s Disease
2016-10-01
deletion, APOE targeted replacement, complex breeding, CCI model optimization, mRNA library generation, high throughput massive parallel sequencing...demonstrate that the lack of Abca1 increases amyloid plaques and decreased APOE protein levels in AD-model mice. In this proposal we will test the hypothesis...injury, inflammatory reaction, transcriptome, high throughput massive parallel sequencing, mRNA-seq., behavioral testing, memory impairment, recovery 3
Wang, Zhu; Zhang, Xu-Xiang; Lu, Xin; Liu, Bo; Li, Yan; Long, Chao; Li, Aimin
2014-01-01
Biological nitrification/denitrification is frequently used to remove nitrogen from tannery wastewater containing high concentrations of ammonia. However, information is limited about the bacterial nitrifiers and denitrifiers and their functional genes in tannery wastewater treatment plants (WWTPs) due to the low-throughput of the previously used methods. In this study, 454 pyrosequencing and Illumina high-throughput sequencing, combined with molecular methods, were used to comprehensively characterize structures and functions of nitrification and denitrification bacterial communities in aerobic and anaerobic sludge of two full-scale tannery WWTPs. Pyrosequencing of 16S rRNA genes showed that Proteobacteria and Synergistetes dominated in the aerobic and anaerobic sludge, respectively. Ammonia-oxidizing bacteria (AOB) amoA gene cloning revealed that Nitrosomonas europaea dominated the ammonia-oxidizing community in the WWTPs. Metagenomic analysis showed that the denitrifiers mainly included the genera of Thauera, Paracoccus, Hyphomicrobium, Comamonas and Azoarcus, which may greatly contribute to the nitrogen removal in the two WWTPs. It is interesting that AOB and ammonia-oxidizing archaea had low abundance although both WWTPs demonstrated high ammonium removal efficiency. Good correlation between the qPCR and metagenomic analysis is observed for the quantification of functional genes amoA, nirK, nirS and nosZ, indicating that the metagenomic approach may be a promising method used to comprehensively investigate the abundance of functional genes of nitrifiers and denitrifiers in the environment. PMID:25420093
Hess, Jon E; Campbell, Nathan R; Docker, Margaret F; Baker, Cyndi; Jackson, Aaron; Lampman, Ralph; McIlraith, Brian; Moser, Mary L; Statler, David P; Young, William P; Wildbill, Andrew J; Narum, Shawn R
2015-01-01
Next-generation sequencing data can be mined for highly informative single nucleotide polymorphisms (SNPs) to develop high-throughput genomic assays for nonmodel organisms. However, choosing a set of SNPs to address a variety of objectives can be difficult because SNPs are often not equally informative. We developed an optimal combination of 96 high-throughput SNP assays from a total of 4439 SNPs identified in a previous study of Pacific lamprey (Entosphenus tridentatus) and used them to address four disparate objectives: parentage analysis, species identification and characterization of neutral and adaptive variation. Nine of these SNPs are FST outliers, and five of these outliers are localized within genes and significantly associated with geography, run-timing and dwarf life history. Two of the 96 SNPs were diagnostic for two other lamprey species that were morphologically indistinguishable at early larval stages and were sympatric in the Pacific Northwest. The majority (85) of SNPs in the panel were highly informative for parentage analysis, that is, putatively neutral with high minor allele frequency across the species' range. Results from three case studies are presented to demonstrate the broad utility of this panel of SNP markers in this species. As Pacific lamprey populations are undergoing rapid decline, these SNPs provide an important resource to address critical uncertainties associated with the conservation and recovery of this imperiled species. © 2014 John Wiley & Sons Ltd.
Caboche, Ségolène; Even, Gaël; Loywick, Alexandre; Audebert, Christophe; Hot, David
2017-12-19
The increase in available sequence data has advanced the field of microbiology; however, making sense of these data without bioinformatics skills is still problematic. We describe MICRA, an automatic pipeline, available as a web interface, for microbial identification and characterization through reads analysis. MICRA uses iterative mapping against reference genomes to identify genes and variations. Additional modules allow prediction of antibiotic susceptibility and resistance and comparing the results of several samples. MICRA is fast, producing few false-positive annotations and variant calls compared to current methods, making it a tool of great interest for fully exploiting sequencing data.
Liu, Yang; Chiaromonte, Francesca; Ross, Howard; Malhotra, Raunaq; Elleder, Daniel; Poss, Mary
2015-06-30
Infection with feline immunodeficiency virus (FIV) causes an immunosuppressive disease whose consequences are less severe if cats are co-infected with an attenuated FIV strain (PLV). We use virus diversity measurements, which reflect replication ability and the virus response to various conditions, to test whether diversity of virulent FIV in lymphoid tissues is altered in the presence of PLV. Our data consisted of the 3' half of the FIV genome from three tissues of animals infected with FIV alone, or with FIV and PLV, sequenced by 454 technology. Since rare variants dominate virus populations, we had to carefully distinguish sequence variation from errors due to experimental protocols and sequencing. We considered an exponential-normal convolution model used for background correction of microarray data, and modified it to formulate an error correction approach for minor allele frequencies derived from high-throughput sequencing. Similar to accounting for over-dispersion in counts, this accounts for error-inflated variability in frequencies - and quite effectively reproduces empirically observed distributions. After obtaining error-corrected minor allele frequencies, we applied ANalysis Of VAriance (ANOVA) based on a linear mixed model and found that conserved sites and transition frequencies in FIV genes differ among tissues of dual and single infected cats. Furthermore, analysis of minor allele frequencies at individual FIV genome sites revealed 242 sites significantly affected by infection status (dual vs. single) or infection status by tissue interaction. All together, our results demonstrated a decrease in FIV diversity in bone marrow in the presence of PLV. Importantly, these effects were weakened or undetectable when error correction was performed with other approaches (thresholding of minor allele frequencies; probabilistic clustering of reads). We also queried the data for cytidine deaminase activity on the viral genome, which causes an asymmetric increase in G to A substitutions, but found no evidence for this host defense strategy. Our error correction approach for minor allele frequencies (more sensitive and computationally efficient than other algorithms) and our statistical treatment of variation (ANOVA) were critical for effective use of high-throughput sequencing data in understanding viral diversity. We found that co-infection with PLV shifts FIV diversity from bone marrow to lymph node and spleen.
Ning, Yi; Li, Yan-Ling; Zhou, Guo-Ying; Yang, Lu-Cun; Xu, Wen-Hua
2016-04-01
High throughput sequencing technology is also called Next Generation Sequencing (NGS), which can sequence hundreds and thousands sequences in different samples at the same time. In the present study, the culture-independent high throughput sequencing technology was applied to sequence the fungi metagenomic DNA of the fungal internal transcribed spacer 1(ITS 1) in the root of Sinopodophyllum hexandrum. Sequencing data suggested that after the quality control, 22 565 reads were remained. Cluster similarity analysis was done based on 97% sequence similarity, which obtained 517 OTUs for the three samples (LD1, LD2 and LD3). All the fungi which identified from all the reads of OTUs based on 0.8 classification thresholds using the software of RDP classifier were classified as 13 classes, 35 orders, 44 family, 55 genera. Among these genera, the genus of Tetracladium was the dominant genera in all samples(35.49%, 68.55% and 12.96%).The Shannon's diversity indices and the Simpson indices of the endophytic fungi in the samples ranged from 1.75-2.92, 0.11-0.32, respectively.This is the first time for applying high through put sequencing technol-ogyto analyze the community composition and diversity of endophytic fungi in the medicinal plant, and the results showed that there were hyper diver sity and high community composition complexity of endophytic fungi in the root of S. hexandrum. It is also proved that the high through put sequencing technology has great advantage for analyzing ecommunity composition and diversity of endophtye in the plant. Copyright© by the Chinese Pharmaceutical Association.
Alkahest NuclearBLAST : a user-friendly BLAST management and analysis system
Diener, Stephen E; Houfek, Thomas D; Kalat, Sam E; Windham, DE; Burke, Mark; Opperman, Charles; Dean, Ralph A
2005-01-01
Background - Sequencing of EST and BAC end datasets is no longer limited to large research groups. Drops in per-base pricing have made high throughput sequencing accessible to individual investigators. However, there are few options available which provide a free and user-friendly solution to the BLAST result storage and data mining needs of biologists. Results - Here we describe NuclearBLAST, a batch BLAST analysis, storage and management system designed for the biologist. It is a wrapper for NCBI BLAST which provides a user-friendly web interface which includes a request wizard and the ability to view and mine the results. All BLAST results are stored in a MySQL database which allows for more advanced data-mining through supplied command-line utilities or direct database access. NuclearBLAST can be installed on a single machine or clustered amongst a number of machines to improve analysis throughput. NuclearBLAST provides a platform which eases data-mining of multiple BLAST results. With the supplied scripts, the program can export data into a spreadsheet-friendly format, automatically assign Gene Ontology terms to sequences and provide bi-directional best hits between two datasets. Users with SQL experience can use the database to ask even more complex questions and extract any subset of data they require. Conclusion - This tool provides a user-friendly interface for requesting, viewing and mining of BLAST results which makes the management and data-mining of large sets of BLAST analyses tractable to biologists. PMID:15958161
Chao, Tianle; Wang, Guizhi; Wang, Jianmin; Liu, Zhaohua; Ji, Zhibin; Hou, Lei; Zhang, Chunlan
2016-01-01
High-throughput mRNA sequencing enables the discovery of new transcripts and additional parts of incompletely annotated transcripts. Compared with the human and cow genomes, the reference annotation level of the sheep genome is still low. An investigation of new transcripts in sheep skeletal muscle will improve our understanding of muscle development. Therefore, applying high-throughput sequencing, two cDNA libraries from the biceps brachii of small-tailed Han sheep and Dorper sheep were constructed, and whole-transcriptome analysis was performed to determine the unknown transcript catalogue of this tissue. In this study, 40,129 transcripts were finally mapped to the sheep genome. Among them, 3,467 transcripts were determined to be unannotated in the current reference sheep genome and were defined as new transcripts. Based on protein-coding capacity prediction and comparative analysis of sequence similarity, 246 transcripts were classified as portions of unannotated genes or incompletely annotated genes. Another 1,520 transcripts were predicted with high confidence to be long non-coding RNAs. Our analysis also revealed 334 new transcripts that displayed specific expression in ruminants and uncovered a number of new transcripts without intergenus homology but with specific expression in sheep skeletal muscle. The results confirmed a complex transcript pattern of coding and non-coding RNA in sheep skeletal muscle. This study provided important information concerning the sheep genome and transcriptome annotation, which could provide a basis for further study.
Synthetic spike-in standards for high-throughput 16S rRNA gene amplicon sequencing.
Tourlousse, Dieter M; Yoshiike, Satowa; Ohashi, Akiko; Matsukura, Satoko; Noda, Naohiro; Sekiguchi, Yuji
2017-02-28
High-throughput sequencing of 16S rRNA gene amplicons (16S-seq) has become a widely deployed method for profiling complex microbial communities but technical pitfalls related to data reliability and quantification remain to be fully addressed. In this work, we have developed and implemented a set of synthetic 16S rRNA genes to serve as universal spike-in standards for 16S-seq experiments. The spike-ins represent full-length 16S rRNA genes containing artificial variable regions with negligible identity to known nucleotide sequences, permitting unambiguous identification of spike-in sequences in 16S-seq read data from any microbiome sample. Using defined mock communities and environmental microbiota, we characterized the performance of the spike-in standards and demonstrated their utility for evaluating data quality on a per-sample basis. Further, we showed that staggered spike-in mixtures added at the point of DNA extraction enable concurrent estimation of absolute microbial abundances suitable for comparative analysis. Results also underscored that template-specific Illumina sequencing artifacts may lead to biases in the perceived abundance of certain taxa. Taken together, the spike-in standards represent a novel bioanalytical tool that can substantially improve 16S-seq-based microbiome studies by enabling comprehensive quality control along with absolute quantification. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
Ovaskainen, Otso; Schigel, Dmitry; Ali-Kovero, Heini; Auvinen, Petri; Paulin, Lars; Nordén, Björn; Nordén, Jenni
2013-01-01
Before the recent revolution in molecular biology, field studies on fungal communities were mostly confined to fruit bodies, whereas mycelial interactions were studied in the laboratory. Here we combine high-throughput sequencing with a fruit body inventory to study simultaneously mycelial and fruit body occurrences in a community of fungi inhabiting dead wood of Norway spruce. We studied mycelial occurrence by extracting DNA from wood samples followed by 454-sequencing of the ITS1 and ITS2 regions and an automated procedure for species identification. In total, we detected 198 species as mycelia and 137 species as fruit bodies. The correlation between mycelial and fruit body occurrences was high for the majority of the species, suggesting that high-throughput sequencing can successfully characterize the dominating fungal communities, despite possible biases related to sampling, PCR, sequencing and molecular identification. We used the fruit body and molecular data to test hypothesized links between life history and population dynamic parameters. We show that the species that have on average a high mycelial abundance also have a high fruiting rate and produce large fruit bodies, leading to a positive feedback loop in their population dynamics. Earlier studies have shown that species with specialized resource requirements are rarely seen fruiting, for which reason they are often classified as red-listed. We show with the help of high-throughput sequencing that some of these species are more abundant as mycelium in wood than what could be expected from their occurrence as fruit bodies. PMID:23575372
Protein-RNA specificity by high-throughput principal component analysis of NMR spectra.
Collins, Katherine M; Oregioni, Alain; Robertson, Laura E; Kelly, Geoff; Ramos, Andres
2015-03-31
Defining the RNA target selectivity of the proteins regulating mRNA metabolism is a key issue in RNA biology. Here we present a novel use of principal component analysis (PCA) to extract the RNA sequence preference of RNA binding proteins. We show that PCA can be used to compare the changes in the nuclear magnetic resonance (NMR) spectrum of a protein upon binding a set of quasi-degenerate RNAs and define the nucleobase specificity. We couple this application of PCA to an automated NMR spectra recording and processing protocol and obtain an unbiased and high-throughput NMR method for the analysis of nucleobase preference in protein-RNA interactions. We test the method on the RNA binding domains of three important regulators of RNA metabolism. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.
High Throughput Biological Analysis Using Multi-bit Magnetic Digital Planar Tags
NASA Astrophysics Data System (ADS)
Hong, B.; Jeong, J.-R.; Llandro, J.; Hayward, T. J.; Ionescu, A.; Trypiniotis, T.; Mitrelias, T.; Kopper, K. P.; Steinmuller, S. J.; Bland, J. A. C.
2008-06-01
We report a new magnetic labelling technology for high-throughput biomolecular identification and DNA sequencing. Planar multi-bit magnetic tags have been designed and fabricated, which comprise a magnetic barcode formed by an ensemble of micron-sized thin film Ni80Fe20 bars encapsulated in SU8. We show that by using a globally applied magnetic field and magneto-optical Kerr microscopy the magnetic elements in the multi-bit magnetic tags can be addressed individually and encoded/decoded remotely. The critical steps needed to show the feasibility of this technology are demonstrated, including fabrication, flow transport, remote writing and reading, and successful functionalization of the tags as verified by fluorescence detection. This approach is ideal for encoding information on tags in microfluidic flow or suspension, for such applications as labelling of chemical precursors during drug synthesis and combinatorial library-based high-throughput multiplexed bioassays.
High throughput ion-channel pharmacology: planar-array-based voltage clamp.
Kiss, Laszlo; Bennett, Paul B; Uebele, Victor N; Koblan, Kenneth S; Kane, Stefanie A; Neagle, Brad; Schroeder, Kirk
2003-02-01
Technological advances often drive major breakthroughs in biology. Examples include PCR, automated DNA sequencing, confocal/single photon microscopy, AFM, and voltage/patch-clamp methods. The patch-clamp method, first described nearly 30 years ago, was a major technical achievement that permitted voltage-clamp analysis (membrane potential control) of ion channels in most cells and revealed a role for channels in unimagined areas. Because of the high information content, voltage clamp is the best way to study ion-channel function; however, throughput is too low for drug screening. Here we describe a novel breakthrough planar-array-based HT patch-clamp technology developed by Essen Instruments capable of voltage-clamping thousands of cells per day. This technology provides greater than two orders of magnitude increase in throughput compared with the traditional voltage-clamp techniques. We have applied this method to study the hERG K(+) channel and to determine the pharmacological profile of QT prolonging drugs.
Multiplexed fragaria chloroplast genome sequencing
W. Njuguna; A. Liston; R. Cronn; N.V. Bassil
2010-01-01
A method to sequence multiple chloroplast genomes using ultra high throughput sequencing technologies was recently described. Complete chloroplast genome sequences can resolve phylogenetic relationships at low taxonomic levels and identify informative point mutations and indels. The objective of this research was to sequence multiple Fragaria...
Issues with RNA-seq analysis in non-model organisms: A salmonid example.
Sundaram, Arvind; Tengs, Torstein; Grimholt, Unni
2017-10-01
High throughput sequencing (HTS) is useful for many purposes as exemplified by the other topics included in this special issue. The purpose of this paper is to look into the unique challenges of using this technology in non-model organisms where resources such as genomes, functional genome annotations or genome complexity provide obstacles not met in model organisms. To describe these challenges, we narrow our scope to RNA sequencing used to study differential gene expression in response to pathogen challenge. As a demonstration species we chose Atlantic salmon, which has a sequenced genome with poor annotation and an added complexity due to many duplicated genes. We find that our RNA-seq analysis pipeline deciphers between duplicates despite high sequence identity. However, annotation issues provide problems in linking differentially expressed genes to pathways. Also, comparing results between approaches and species are complicated due to lack of standardized annotation. Copyright © 2017 Elsevier Ltd. All rights reserved.
Xia, Fei; Chen, Xin; Guo, Meng-Yuan; Bai, Xiao-Hui; Liu, Yan; Shen, Guang-Rong; Li, Yu-Ling; Lin, Juan; Zhou, Xuan-Wei
2016-01-01
Chinese Cordyceps, known in Chinese as “DongChong XiaCao”, is a parasitic complex of a fungus (Ophiocordyceps sinensis) and a caterpillar. The current study explored the endogenetic fungal communities inhabiting Chinese Cordyceps. Samples were collected from five different geographical regions of Qinghai and Tibet, and the nuclear ribosomal internal transcribed spacer-1 sequences from each sample were obtained using Illumina high-throughput sequencing. The results showed that Ascomycota was the dominant fungal phylum in Chinese Cordyceps and its soil microhabitat from different sampling regions. Among the Ascomycota, 65 genera were identified, and the abundant operational taxonomic units showed the strongest sequence similarity to Ophiocordyceps, Verticillium, Pseudallescheria, Candida and Ilyonectria Not surprisingly, the genus Ophiocordyceps was the largest among the fungal communities identified in the fruiting bodies and external mycelial cortices of Chinese Cordyceps. In addition, fungal communities in the soil microhabitats were clustered separately from the external mycelial cortices and fruiting bodies of Chinese Cordyceps from different sampling regions. There was no significant structural difference in the fungal communities between the fruiting bodies and external mycelial cortices of Chinese Cordyceps. This study revealed an unexpectedly high diversity of fungal communities inhabiting the Chinese Cordyceps and its microhabitats. PMID:27625176
Chen, Tingtao; Shi, Yan; Wang, Xiaolei; Wang, Xin; Meng, Fanjing; Yang, Shaoguo; Yang, Jian; Xin, Hongbo
2017-07-01
Recurrence of oral diseases caused by antibiotics has brought about an urgent requirement to explore the oral microbial diversity in the human oral cavity. In the present study, the high‑throughput sequencing method was adopted to compare the microbial diversity of healthy people and oral patients and sequence analysis was performed by UPARSE software package. The Venn results indicated that a mean of 315 operational taxonomic units (OTUs) was obtained, and 73, 64, 53, 19 and 18 common OTUs belonging to Firmicutes, Bacteroidetes, Proteobacteria, Actinobacteria and Fusobacteria, respectively, were identified in healthy people. Moreover, the reduction of Firmicutes and the increase of Proteobacteria in the children group, and the increase of Firmicutes and the reduction of Proteobacteria in the youth and adult groups, indicated that the age bracket and oral disease had largely influenced the tooth development and microbial development in the oral cavity. In addition, the traditional 'pathogenic bacteria' of Firmicutes, Proteobacteria and Bacteroidetes (accounted for >95% of the total sequencing number in each group) indicated that the 'harmful' bacteria may exert beneficial effects on oral health. Therefore, the data will provide certain clues for curing some oral diseases by the strategy of adjusting the disturbed microbial compositions in oral disease to healthy level.
Yang, Seung Hak; Lim, Joung Soo; Khan, Modabber Ahmed; Kim, Bong Soo; Choi, Dong Yoon; Lee, Eun Young; Ahn, Hee Kwon
2015-01-01
The leachate generated by the decomposition of animal carcass has been implicated as an environmental contaminant surrounding the burial site. High-throughput nucleotide sequencing was conducted to investigate the bacterial communities in leachates from the decomposition of pig carcasses. We acquired 51,230 reads from six different samples (1, 2, 3, 4, 6 and 14 week-old carcasses) and found that sequences representing the phylum Firmicutes predominated. The diversity of bacterial 16S rRNA gene sequences in the leachate was the highest at 6 weeks, in contrast to those at 2 and 14 weeks. The relative abundance of Firmicutes was reduced, while the proportion of Bacteroidetes and Proteobacteria increased from 3–6 weeks. The representation of phyla was restored after 14 weeks. However, the community structures between the samples taken at 1–2 and 14 weeks differed at the bacterial classification level. The trend in pH was similar to the changes seen in bacterial communities, indicating that the pH of the leachate could be related to the shift in the microbial community. The results indicate that the composition of bacterial communities in leachates of decomposing pig carcasses shifted continuously during the study period and might be influenced by the burial site. PMID:26500442
GlycoExtractor: a web-based interface for high throughput processing of HPLC-glycan data.
Artemenko, Natalia V; Campbell, Matthew P; Rudd, Pauline M
2010-04-05
Recently, an automated high-throughput HPLC platform has been developed that can be used to fully sequence and quantify low concentrations of N-linked sugars released from glycoproteins, supported by an experimental database (GlycoBase) and analytical tools (autoGU). However, commercial packages that support the operation of HPLC instruments and data storage lack platforms for the extraction of large volumes of data. The lack of resources and agreed formats in glycomics is now a major limiting factor that restricts the development of bioinformatic tools and automated workflows for high-throughput HPLC data analysis. GlycoExtractor is a web-based tool that interfaces with a commercial HPLC database/software solution to facilitate the extraction of large volumes of processed glycan profile data (peak number, peak areas, and glucose unit values). The tool allows the user to export a series of sample sets to a set of file formats (XML, JSON, and CSV) rather than a collection of disconnected files. This approach not only reduces the amount of manual refinement required to export data into a suitable format for data analysis but also opens the field to new approaches for high-throughput data interpretation and storage, including biomarker discovery and validation and monitoring of online bioprocessing conditions for next generation biotherapeutics.
Big Data, Big Opportunities, and Big Challenges.
Frelinger, Jeffrey A
2015-11-01
High-throughput assays have begun to revolutionize modern biology and medicine. The advent of cheap next-generation sequencing (NGS) has made it possible to interrogate cells and human populations as never before. Although this has allowed us to investigate the genetics, gene expression, and impacts of the microbiome, there remain both practical and conceptual challenges. These include data handling, storage, and statistical analysis, as well as an inherent problem of the analysis of heterogeneous cell populations.
Winnowing DNA for Rare Sequences: Highly Specific Sequence and Methylation Based Enrichment
Thompson, Jason D.; Shibahara, Gosuke; Rajan, Sweta; Pel, Joel; Marziali, Andre
2012-01-01
Rare mutations in cell populations are known to be hallmarks of many diseases and cancers. Similarly, differential DNA methylation patterns arise in rare cell populations with diagnostic potential such as fetal cells circulating in maternal blood. Unfortunately, the frequency of alleles with diagnostic potential, relative to wild-type background sequence, is often well below the frequency of errors in currently available methods for sequence analysis, including very high throughput DNA sequencing. We demonstrate a DNA preparation and purification method that through non-linear electrophoretic separation in media containing oligonucleotide probes, achieves 10,000 fold enrichment of target DNA with single nucleotide specificity, and 100 fold enrichment of unmodified methylated DNA differing from the background by the methylation of a single cytosine residue. PMID:22355378
Miyazaki, Nobuo; Kiyose, Norihiko; Akazawa, Yoko; Takashima, Mizuki; Hagihara, Yosihisa; Inoue, Naokazu; Matsuda, Tomonari; Ogawa, Ryu; Inoue, Seiya; Ito, Yuji
2015-09-01
The antigen-binding domain of camelid dimeric heavy chain antibodies, known as VHH or Nanobody, has much potential in pharmaceutical and industrial applications. To establish the isolation process of antigen-specific VHH, a VHH phage library was constructed with a diversity of 8.4 × 10(7) from cDNA of peripheral blood mononuclear cells of an alpaca (Lama pacos) immunized with a fragment of IZUMO1 (IZUMO1PFF) as a model antigen. By conventional biopanning, 13 antigen-specific VHHs were isolated. The amino acid sequences of these VHHs, designated as N-group VHHs, were very similar to each other (>93% identity). To find more diverse antibodies, we performed high-throughput sequencing (HTS) of VHH genes. By comparing the frequencies of each sequence between before and after biopanning, we found the sequences whose frequencies were increased by biopanning. The top 100 sequences of them were supplied for phylogenic tree analysis. In total 75% of them belonged to N-group VHHs, but the other were phylogenically apart from N-group VHHs (Non N-group). Two of three VHHs selected from non N-group VHHs showed sufficient antigen binding ability. These results suggested that biopanning followed by HTS provided a useful method for finding minor and diverse antigen-specific clones that could not be identified by conventional biopanning. © The Authors 2015. Published by Oxford University Press on behalf of the Japanese Biochemical Society. All rights reserved.
Precision Medicine: Functional Advancements.
Caskey, Thomas
2018-01-29
Precision medicine was conceptualized on the strength of genomic sequence analysis. High-throughput functional metrics have enhanced sequence interpretation and clinical precision. These technologies include metabolomics, magnetic resonance imaging, and I rhythm (cardiac monitoring), among others. These technologies are discussed and placed in clinical context for the medical specialties of internal medicine, pediatrics, obstetrics, and gynecology. Publications in these fields support the concept of a higher level of precision in identifying disease risk. Precise disease risk identification has the potential to enable intervention with greater specificity, resulting in disease prevention-an important goal of precision medicine.
Gene expression profiling of flax (Linum usitatissimum L.) under edaphic stress.
Dmitriev, Alexey A; Kudryavtseva, Anna V; Krasnov, George S; Koroban, Nadezhda V; Speranskaya, Anna S; Krinitsina, Anastasia A; Belenikin, Maxim S; Snezhkina, Anastasiya V; Sadritdinova, Asiya F; Kishlyan, Natalya V; Rozhmina, Tatiana A; Yurkevich, Olga Yu; Muravenko, Olga V; Bolsheva, Nadezhda L; Melnikova, Nataliya V
2016-11-16
Cultivated flax (Linum usitatissimum L.) is widely used for production of textile, food, chemical and pharmaceutical products. However, various stresses decrease flax production. Search for genes, which are involved in stress response, is necessary for breeding of adaptive cultivars. Imbalanced concentration of nutrient elements in soil decrease flax yields and also results in heritable changes in some flax lines. The appearance of Linum Insertion Sequence 1 (LIS-1) is the most studied modification. However, LIS-1 function is still unclear. High-throughput sequencing of transcriptome of flax plants grown under normal (N), phosphate deficient (P), and nutrient excess (NPK) conditions was carried out using Illumina platform. The assembly of transcriptome was performed, and a total of 34924, 33797, and 33698 unique transcripts for N, P, and NPK sequencing libraries were identified, respectively. We have not revealed any LIS-1 derived mRNA in our sequencing data. The analysis of high-throughput sequencing data allowed us to identify genes with potentially differential expression under imbalanced nutrition. For further investigation with qPCR, 15 genes were chosen and their expression levels were evaluated in the extended sampling of 31 flax plants. Significant expression alterations were revealed for genes encoding WRKY and JAZ protein families under P and NPK conditions. Moreover, the alterations of WRKY family genes differed depending on LIS-1 presence in flax plant genome. Besides, we revealed slight and LIS-1 independent mRNA level changes of KRP2 and ING1 genes, which are adjacent to LIS-1, under nutrition stress. Differentially expressed genes were identified in flax plants, which were grown under phosphate deficiency and excess nutrition, on the basis of high-throughput sequencing and qPCR data. We showed that WRKY and JAS gene families participate in flax response to imbalanced nutrient content in soil. Besides, we have not identified any mRNA, which could be derived from LIS-1, in our transcriptome sequencing data. Expression of LIS-1 flanking genes, ING1 and KRP2, was suggested not to be nutrient stress-induced. Obtained results provide new insights into edaphic stress response in flax and the role of LIS-1 in these process.
Chen, Mianmian; Xu, Juntian; Yao, Huochun; Lu, Chengping; Zhang, Wei
2016-05-10
Avian pathogenic Escherichia coli (APEC) causes colibacillosis, which results in significant economic losses to the poultry industry worldwide. Due to the drug residues and increased antibiotic resistance caused by antibiotic use, bacteriophages and other alternative therapeutic agents are expected to control APEC infection in poultry. Two APEC phages, named P483 and P694, were isolated from the feces from the farmers market in China. We then studied their biological properties, and carried out high-throughput genome sequencing and homology analyses of these phages. Assembly results of high-throughput sequencing showed that the structures of both P483 and P694 genomes consist of linear and double-stranded DNA. Results of the electron microscopy and homology analysis revealed that both P483 and P694 belong to T7-like virus which is a member of the Podoviridae family of the Caudovirales order. Comparative genomic analysis showed that most of the predicted proteins of these two phages showed strongest sequence similarity to the Enterobacteria phages BA14 and 285P, Erwinia phage FE44, and Kluyvera phage Kvp1; however, some proteins such as gp0.6a, gp1.7 and gp17 showed lower similarity (<85%) with the homologs of other phages in the T7 subgroup. We also found some unique characteristics of P483 and P694, such as the two types of the genes of P694 and no lytic activity of P694 against its host bacteria in liquid medium. Our results serve to further our understanding of phage evolution of T7-like coliphages and provide the potential application of the phages as therapeutic agents for the treatment of diseases. Copyright © 2016 Elsevier B.V. All rights reserved.
Methods, Tools and Current Perspectives in Proteogenomics *
Ruggles, Kelly V.; Krug, Karsten; Wang, Xiaojing; Clauser, Karl R.; Wang, Jing; Payne, Samuel H.; Fenyö, David; Zhang, Bing; Mani, D. R.
2017-01-01
With combined technological advancements in high-throughput next-generation sequencing and deep mass spectrometry-based proteomics, proteogenomics, i.e. the integrative analysis of proteomic and genomic data, has emerged as a new research field. Early efforts in the field were focused on improving protein identification using sample-specific genomic and transcriptomic sequencing data. More recently, integrative analysis of quantitative measurements from genomic and proteomic studies have identified novel insights into gene expression regulation, cell signaling, and disease. Many methods and tools have been developed or adapted to enable an array of integrative proteogenomic approaches and in this article, we systematically classify published methods and tools into four major categories, (1) Sequence-centric proteogenomics; (2) Analysis of proteogenomic relationships; (3) Integrative modeling of proteogenomic data; and (4) Data sharing and visualization. We provide a comprehensive review of methods and available tools in each category and highlight their typical applications. PMID:28456751
Lijavetzky, Diego; Cabezas, José Antonio; Ibáñez, Ana; Rodríguez, Virginia; Martínez-Zapater, José M
2007-01-01
Background Single-nucleotide polymorphisms (SNPs) are the most abundant type of DNA sequence polymorphisms. Their higher availability and stability when compared to simple sequence repeats (SSRs) provide enhanced possibilities for genetic and breeding applications such as cultivar identification, construction of genetic maps, the assessment of genetic diversity, the detection of genotype/phenotype associations, or marker-assisted breeding. In addition, the efficiency of these activities can be improved thanks to the ease with which SNP genotyping can be automated. Expressed sequence tags (EST) sequencing projects in grapevine are allowing for the in silico detection of multiple putative sequence polymorphisms within and among a reduced number of cultivars. In parallel, the sequence of the grapevine cultivar Pinot Noir is also providing thousands of polymorphisms present in this highly heterozygous genome. Still the general application of those SNPs requires further validation since their use could be restricted to those specific genotypes. Results In order to develop a large SNP set of wide application in grapevine we followed a systematic re-sequencing approach in a group of 11 grape genotypes corresponding to ancient unrelated cultivars as well as wild plants. Using this approach, we have sequenced 230 gene fragments, what represents the analysis of over 1 Mb of grape DNA sequence. This analysis has allowed the discovery of 1573 SNPs with an average of one SNP every 64 bp (one SNP every 47 bp in non-coding regions and every 69 bp in coding regions). Nucleotide diversity in grape (π = 0.0051) was found to be similar to values observed in highly polymorphic plant species such as maize. The average number of haplotypes per gene sequence was estimated as six, with three haplotypes representing over 83% of the analyzed sequences. Short-range linkage disequilibrium (LD) studies within the analyzed sequences indicate the existence of a rapid decay of LD within the selected grapevine genotypes. To validate the use of the detected polymorphisms in genetic mapping, cultivar identification and genetic diversity studies we have used the SNPlex™ genotyping technology in a sample of grapevine genotypes and segregating progenies. Conclusion These results provide accurate values for nucleotide diversity in coding sequences and a first estimate of short-range LD in grapevine. Using SNPlex™ genotyping we have shown the application of a set of discovered SNPs as molecular markers for cultivar identification, linkage mapping and genetic diversity studies. Thus, the combination a highly efficient re-sequencing approach and the SNPlex™ high throughput genotyping technology provide a powerful tool for grapevine genetic analysis. PMID:18021442
Tian, Di; Ma, Xin; Li, Yu-E; Zha, Liang-Song; Wu, Yang; Zou, Xiao-Xia; Liu, Shuang
2013-10-01
Carbon dioxide Capture and Storage has provided a new option for mitigating global anthropogenic CO2 emission with its unique advantages. However, there is a risk of the sealed CO2 leakage, bringing a serious threat to the ecology system. It is widely known that soil microorganisms are closely related to soil health, while the study on the impact of sequestered CO2 leakage on soil microorganisms is quite deficient. In this study, the leakage scenarios of sealed CO2 were constructed and the 16S rRNA genes of soil bacteria were sequenced by Illumina high-throughput sequencing technology on Miseq platform, and related biological analysis was conducted to explore the changes of soil bacterial abundance, diversity and structure. There were 486,645 reads for 43,017 OTUs of 15 soil samples and the results of biological analysis showed that there were differences in the abundance, diversity and community structure of soil bacterial community under different CO, leakage scenarios while the abundance and diversity of the bacterial community declined with the amplification of CO2 leakage quantity and leakage time, and some bacteria species became the dominant bacteria species in the bacteria community, therefore the increase of Acidobacteria species would be a biological indicator for the impact of sealed CO2 leakage on soil ecology system.
Hu, Xiaolong; Liu, Gang; Shafer, Aaron B. A.; Wei, Yuting; Zhou, Juntong; Lin, Shaobi; Wu, Haibin; Zhou, Mi; Hu, Defu; Liu, Shuqiang
2017-01-01
The gut ecosystem is characterized by dynamic and reciprocal interactions between the host and bacteria. Although characterizing microbiota for herbivores has become recognized as important tool for gauging species health, no study to date has investigated the bacterial communities and evaluated the age-related bacterial dynamics of musk deer. Moreover, gastrointestinal diseases have been hypothesized to be a limiting factor of population growth in captive musk deer. Here, high-throughput sequencing of the bacterial 16S rRNA gene was used to profile the fecal bacterial communities in juvenile and adult alpine and forest musk deer. The two musk deer species harbored similar bacterial communities at the phylum level, whereas the key genera for the two species were distinct. The bacterial communities were dominated by Firmicutes and Bacteroidetes, with the bacterial diversity being higher in forest musk deer. The Firmicutes to Bacteroidetes ratio also increased from juvenile to adult, while the bacterial diversity, within-group and between-group similarity, all increased with age. This work serves as the first sequence-based analysis of variation in bacterial communities within and between musk deer species, and demonstrates how the gut microbial community dynamics vary among closely related species and shift with age. As gastrointestinal diseases have been observed in captive populations, this study provides valuable data that might benefit captive management and future reintroduction programs. PMID:28421061
Schilmiller, Anthony L; Miner, Dennis P; Larson, Matthew; McDowell, Eric; Gang, David R; Wilkerson, Curtis; Last, Robert L
2010-07-01
Shotgun proteomics analysis allows hundreds of proteins to be identified and quantified from a single sample at relatively low cost. Extensive DNA sequence information is a prerequisite for shotgun proteomics, and it is ideal to have sequence for the organism being studied rather than from related species or accessions. While this requirement has limited the set of organisms that are candidates for this approach, next generation sequencing technologies make it feasible to obtain deep DNA sequence coverage from any organism. As part of our studies of specialized (secondary) metabolism in tomato (Solanum lycopersicum) trichomes, 454 sequencing of cDNA was combined with shotgun proteomics analyses to obtain in-depth profiles of genes and proteins expressed in leaf and stem glandular trichomes of 3-week-old plants. The expressed sequence tag and proteomics data sets combined with metabolite analysis led to the discovery and characterization of a sesquiterpene synthase that produces beta-caryophyllene and alpha-humulene from E,E-farnesyl diphosphate in trichomes of leaf but not of stem. This analysis demonstrates the utility of combining high-throughput cDNA sequencing with proteomics experiments in a target tissue. These data can be used for dissection of other biochemical processes in these specialized epidermal cells.
Schilmiller, Anthony L.; Miner, Dennis P.; Larson, Matthew; McDowell, Eric; Gang, David R.; Wilkerson, Curtis; Last, Robert L.
2010-01-01
Shotgun proteomics analysis allows hundreds of proteins to be identified and quantified from a single sample at relatively low cost. Extensive DNA sequence information is a prerequisite for shotgun proteomics, and it is ideal to have sequence for the organism being studied rather than from related species or accessions. While this requirement has limited the set of organisms that are candidates for this approach, next generation sequencing technologies make it feasible to obtain deep DNA sequence coverage from any organism. As part of our studies of specialized (secondary) metabolism in tomato (Solanum lycopersicum) trichomes, 454 sequencing of cDNA was combined with shotgun proteomics analyses to obtain in-depth profiles of genes and proteins expressed in leaf and stem glandular trichomes of 3-week-old plants. The expressed sequence tag and proteomics data sets combined with metabolite analysis led to the discovery and characterization of a sesquiterpene synthase that produces β-caryophyllene and α-humulene from E,E-farnesyl diphosphate in trichomes of leaf but not of stem. This analysis demonstrates the utility of combining high-throughput cDNA sequencing with proteomics experiments in a target tissue. These data can be used for dissection of other biochemical processes in these specialized epidermal cells. PMID:20431087
Buxbaum, Joseph D; Daly, Mark J; Devlin, Bernie; Lehner, Thomas; Roeder, Kathryn; State, Matthew W
2012-12-20
Research during the past decade has seen significant progress in the understanding of the genetic architecture of autism spectrum disorders (ASDs), with gene discovery accelerating as the characterization of genomic variation has become increasingly comprehensive. At the same time, this research has highlighted ongoing challenges. Here we address the enormous impact of high-throughput sequencing (HTS) on ASD gene discovery, outline a consensus view for leveraging this technology, and describe a large multisite collaboration developed to accomplish these goals. Similar approaches could prove effective for severe neurodevelopmental disorders more broadly. Copyright © 2012 Elsevier Inc. All rights reserved.
Sequence investigation of 34 forensic autosomal STRs with massively parallel sequencing.
Zhang, Suhua; Niu, Yong; Bian, Yingnan; Dong, Rixia; Liu, Xiling; Bao, Yun; Jin, Chao; Zheng, Hancheng; Li, Chengtao
2018-05-01
STRs vary not only in the length of the repeat units and the number of repeats but also in the region with which they conform to an incremental repeat pattern. Massively parallel sequencing (MPS) offers new possibilities in the analysis of STRs since they can simultaneously sequence multiple targets in a single reaction and capture potential internal sequence variations. Here, we sequenced 34 STRs applied in the forensic community of China with a custom-designed panel. MPS performance were evaluated from sequencing reads analysis, concordance study and sensitivity testing. High coverage sequencing data were obtained to determine the constitute ratios and heterozygous balance. No actual inconsistent genotypes were observed between capillary electrophoresis (CE) and MPS, demonstrating the reliability of the panel and the MPS technology. With the sequencing data from the 200 investigated individuals, 346 and 418 alleles were obtained via CE and MPS technologies at the 34 STRs, indicating MPS technology provides higher discrimination than CE detection. The whole study demonstrated that STR genotyping with the custom panel and MPS technology has the potential not only to reveal length and sequence variations but also to satisfy the demands of high throughput and high multiplexing with acceptable sensitivity.
Kamel, Katarzyna A; Kroc, Magdalena; Święcicki, Wojciech
2015-01-01
Sequence tagged site (STS) markers are valuable tools for genetic and physical mapping that can be successfully used in comparative analyses among related species. Current challenges for molecular markers genotyping in plants include the lack of fast, sensitive and inexpensive methods suitable for sequence variant detection. In contrast, high resolution melting (HRM) is a simple and high-throughput assay, which has been widely applied in sequence polymorphism identification as well as in the studies of genetic variability and genotyping. The present study is the first attempt to use the HRM analysis to genotype STS markers in narrow-leafed lupin (Lupinus angustifolius L.). The sensitivity and utility of this method was confirmed by the sequence polymorphism detection based on melting curve profiles in the parental genotypes and progeny of the narrow-leafed lupin mapping population. Application of different approaches, including amplicon size and a simulated heterozygote analysis, has allowed for successful genetic mapping of 16 new STS markers in the narrow-leafed lupin genome.
Camilo, Cesar M; Lima, Gustavo M A; Maluf, Fernando V; Guido, Rafael V C; Polikarpov, Igor
2016-01-01
Following burgeoning genomic and transcriptomic sequencing data, biochemical and molecular biology groups worldwide are implementing high-throughput cloning and mutagenesis facilities in order to obtain a large number of soluble proteins for structural and functional characterization. Since manual primer design can be a time-consuming and error-generating step, particularly when working with hundreds of targets, the automation of primer design process becomes highly desirable. HTP-OligoDesigner was created to provide the scientific community with a simple and intuitive online primer design tool for both laboratory-scale and high-throughput projects of sequence-independent gene cloning and site-directed mutagenesis and a Tm calculator for quick queries.
Miyai, Manami; Eikawa, Shingo; Hosoi, Akihiro; Iino, Tamaki; Matsushita, Hirokazu; Isobe, Midori; Uenaka, Akiko; Udono, Heiichiro; Nakajima, Jun; Nakayama, Eiichi; Kakimi, Kazuhiro
2015-01-01
Comprehensive immunological evaluation is crucial for monitoring patients undergoing antigen-specific cancer immunotherapy. The identification and quantification of T cell responses is most important for the further development of such therapies. Using well-characterized clinical samples from a high responder patient (TK-f01) in an NY-ESO-1f peptide vaccine study, we performed high-throughput T cell receptor β-chain (TCRB) gene next generation sequencing (NGS) to monitor the frequency of NY-ESO-1-specific CD8+ T cells. We compared these results with those of conventional immunological assays, such as IFN-γ capture, tetramer binding and limiting dilution clonality assays. We sequenced human TCRB complementarity-determining region 3 (CDR3) rearrangements of two NY-ESO-1f-specific CD8+ T cell clones, 6-8L and 2F6, as well as PBMCs over the course of peptide vaccination. Clone 6-8L possessed the TCRB CDR3 gene TCRBV11-03*01 and BJ02-01*01 with amino acid sequence CASSLRGNEQFF, whereas 2F6 possessed TCRBV05-08*01 and BJ02-04*01 (CASSLVGTNIQYF). Using these two sequences as models, we evaluated the frequency of NY-ESO-1-specific CD8+ T cells in PBMCs ex vivo. The 6-8L CDR3 sequence was the second most frequent in PBMC and was present at high frequency (0.7133%) even prior to vaccination, and sustained over the course of vaccination. Despite a marked expansion of NY-ESO-1-specific CD8+ T cells detected from the first through 6th vaccination by tetramer staining and IFN-γ capture assays, as evaluated by CDR3 sequencing the frequency did not increase with increasing rounds of peptide vaccination. By clonal analysis using 12 day in vitro stimulation, the frequency of B*52:01-restricted NY-ESO-1f peptide-specific CD8+ T cells in PBMCs was estimated as only 0.0023%, far below the 0.7133% by NGS sequencing. Thus, assays requiring in vitro stimulation might be underestimating the frequency of clones with lower proliferation potential. High-throughput TCRB sequencing using NGS can potentially better estimate the actual frequency of antigen-specific T cells and thus provide more accurate patient monitoring.
Miyai, Manami; Eikawa, Shingo; Hosoi, Akihiro; Iino, Tamaki; Matsushita, Hirokazu; Isobe, Midori; Uenaka, Akiko; Udono, Heiichiro; Nakajima, Jun; Nakayama, Eiichi; Kakimi, Kazuhiro
2015-01-01
Comprehensive immunological evaluation is crucial for monitoring patients undergoing antigen-specific cancer immunotherapy. The identification and quantification of T cell responses is most important for the further development of such therapies. Using well-characterized clinical samples from a high responder patient (TK-f01) in an NY-ESO-1f peptide vaccine study, we performed high-throughput T cell receptor β-chain (TCRB) gene next generation sequencing (NGS) to monitor the frequency of NY-ESO-1-specific CD8+ T cells. We compared these results with those of conventional immunological assays, such as IFN-γ capture, tetramer binding and limiting dilution clonality assays. We sequenced human TCRB complementarity-determining region 3 (CDR3) rearrangements of two NY-ESO-1f-specific CD8+ T cell clones, 6-8L and 2F6, as well as PBMCs over the course of peptide vaccination. Clone 6-8L possessed the TCRB CDR3 gene TCRBV11-03*01 and BJ02-01*01 with amino acid sequence CASSLRGNEQFF, whereas 2F6 possessed TCRBV05-08*01 and BJ02-04*01 (CASSLVGTNIQYF). Using these two sequences as models, we evaluated the frequency of NY-ESO-1-specific CD8+ T cells in PBMCs ex vivo. The 6-8L CDR3 sequence was the second most frequent in PBMC and was present at high frequency (0.7133%) even prior to vaccination, and sustained over the course of vaccination. Despite a marked expansion of NY-ESO-1-specific CD8+ T cells detected from the first through 6th vaccination by tetramer staining and IFN-γ capture assays, as evaluated by CDR3 sequencing the frequency did not increase with increasing rounds of peptide vaccination. By clonal analysis using 12 day in vitro stimulation, the frequency of B*52:01-restricted NY-ESO-1f peptide-specific CD8+ T cells in PBMCs was estimated as only 0.0023%, far below the 0.7133% by NGS sequencing. Thus, assays requiring in vitro stimulation might be underestimating the frequency of clones with lower proliferation potential. High-throughput TCRB sequencing using NGS can potentially better estimate the actual frequency of antigen-specific T cells and thus provide more accurate patient monitoring. PMID:26291626
Microarray-Based Gene Expression Analysis for Veterinary Pathologists: A Review.
Raddatz, Barbara B; Spitzbarth, Ingo; Matheis, Katja A; Kalkuhl, Arno; Deschl, Ulrich; Baumgärtner, Wolfgang; Ulrich, Reiner
2017-09-01
High-throughput, genome-wide transcriptome analysis is now commonly used in all fields of life science research and is on the cusp of medical and veterinary diagnostic application. Transcriptomic methods such as microarrays and next-generation sequencing generate enormous amounts of data. The pathogenetic expertise acquired from understanding of general pathology provides veterinary pathologists with a profound background, which is essential in translating transcriptomic data into meaningful biological knowledge, thereby leading to a better understanding of underlying disease mechanisms. The scientific literature concerning high-throughput data-mining techniques usually addresses mathematicians or computer scientists as the target audience. In contrast, the present review provides the reader with a clear and systematic basis from a veterinary pathologist's perspective. Therefore, the aims are (1) to introduce the reader to the necessary methodological background; (2) to introduce the sequential steps commonly performed in a microarray analysis including quality control, annotation, normalization, selection of differentially expressed genes, clustering, gene ontology and pathway analysis, analysis of manually selected genes, and biomarker discovery; and (3) to provide references to publically available and user-friendly software suites. In summary, the data analysis methods presented within this review will enable veterinary pathologists to analyze high-throughput transcriptome data obtained from their own experiments, supplemental data that accompany scientific publications, or public repositories in order to obtain a more in-depth insight into underlying disease mechanisms.
Haplotag: Software for Haplotype-Based Genotyping-by-Sequencing Analysis
Tinker, Nicholas A.; Bekele, Wubishet A.; Hattori, Jiro
2016-01-01
Genotyping-by-sequencing (GBS), and related methods, are based on high-throughput short-read sequencing of genomic complexity reductions followed by discovery of single nucleotide polymorphisms (SNPs) within sequence tags. This provides a powerful and economical approach to whole-genome genotyping, facilitating applications in genomics, diversity analysis, and molecular breeding. However, due to the complexity of analyzing large data sets, applications of GBS may require substantial time, expertise, and computational resources. Haplotag, the novel GBS software described here, is freely available, and operates with minimal user-investment on widely available computer platforms. Haplotag is unique in fulfilling the following set of criteria: (1) operates without a reference genome; (2) can be used in a polyploid species; (3) provides a discovery mode, and a production mode; (4) discovers polymorphisms based on a model of tag-level haplotypes within sequenced tags; (5) reports SNPs as well as haplotype-based genotypes; and (6) provides an intuitive visual “passport” for each inferred locus. Haplotag is optimized for use in a self-pollinating plant species. PMID:26818073
A high-throughput approach to profile RNA structure.
Delli Ponti, Riccardo; Marti, Stefanie; Armaos, Alexandros; Tartaglia, Gian Gaetano
2017-03-17
Here we introduce the Computational Recognition of Secondary Structure (CROSS) method to calculate the structural profile of an RNA sequence (single- or double-stranded state) at single-nucleotide resolution and without sequence length restrictions. We trained CROSS using data from high-throughput experiments such as Selective 2΄-Hydroxyl Acylation analyzed by Primer Extension (SHAPE; Mouse and HIV transcriptomes) and Parallel Analysis of RNA Structure (PARS; Human and Yeast transcriptomes) as well as high-quality NMR/X-ray structures (PDB database). The algorithm uses primary structure information alone to predict experimental structural profiles with >80% accuracy, showing high performances on large RNAs such as Xist (17 900 nucleotides; Area Under the ROC Curve AUC of 0.75 on dimethyl sulfate (DMS) experiments). We integrated CROSS in thermodynamics-based methods to predict secondary structure and observed an increase in their predictive power by up to 30%. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
A tag-based approach for high-throughput analysis of CCWGG methylation.
Denisova, Oksana V; Chernov, Andrei V; Koledachkina, Tatyana Y; Matvienko, Nicholas I
2007-10-15
Non-CpG methylation occurring in the context of CNG sequences is found in plants at a large number of genomic loci. However, there is still little information available about non-CpG methylation in mammals. Efficient methods that would allow detection of scarcely localized methylated sites in small quantities of DNA are required to elucidate the biological role of non-CpG methylation in both plants and animals. In this study, we tested a new whole genome approach to identify sites of CCWGG methylation (W is A or T), a particular case of CNG methylation, in genomic DNA. This technique is based on digestion of DNAs with methylation-sensitive restriction endonucleases EcoRII-C and AjnI. Short DNAs flanking methylated CCWGG sites (tags) are selectively purified and assembled in tandem arrays of up to nine tags. This allows high-throughput sequencing of tags, identification of flanking regions, and their exact positions in the genome. In this study, we tested specificity and efficiency of the approach.
MicroRNA-21 promotes proliferation of rat hepatocyte BRL-3A by targeting FASLG.
Li, J J; Chan, W H; Leung, W Y; Wang, Y; Xu, C S
2015-04-27
Rat liver regeneration (RLR) induced by partial hepatectomy involves cell proliferation regulated by numerous factors, including microRNAs (miRNAs). miRNA high-throughput sequencing has been established and used to analyze miRNA expression profiles. This study showed that 39 miRNAs were related to RLR through the analysis of miRNA high-throughput sequencing. Their role toward rat normal hepatocyte line BRL-3A was studied by gain- and loss-of-function analyses, and one of them, microRNA-21 (miR-21), obviously upregulated and promoted BRL-3A cell proliferation. Using bioinformatics to search for miR-21 targets revealed that Fas ligand (FASLG) is one of miR-21's target genes. A dual-luciferase report assay and Western blot assay showed that miR-21 directly targeted the 3'-untranslated region of FASLG and inhibited the expression of FASLG, which suggests that miR-21 promoted BRL-3A cell proliferation by reducing FASLG expression.
Ma, Qiao; Qu, Yuanyuan; Shen, Wenli; Zhang, Zhaojing; Wang, Jingwei; Liu, Ziyan; Li, Duanxing; Li, Huijie; Zhou, Jiti
2015-03-01
In this study, Illumina high-throughput sequencing was used to reveal the community structures of nine coking wastewater treatment plants (CWWTPs) in China for the first time. The sludge systems exhibited a similar community composition at each taxonomic level. Compared to previous studies, some of the core genera in municipal wastewater treatment plants such as Zoogloea, Prosthecobacter and Gp6 were detected as minor species. Thiobacillus (20.83%), Comamonas (6.58%), Thauera (4.02%), Azoarcus (7.78%) and Rhodoplanes (1.42%) were the dominant genera shared by at least six CWWTPs. The percentages of autotrophic ammonia-oxidizing bacteria and nitrite-oxidizing bacteria were unexpectedly low, which were verified by both real-time PCR and fluorescence in situ hybridization analyses. Hierarchical clustering and canonical correspondence analysis indicated that operation mode, flow rate and temperature might be the key factors in community formation. This study provides new insights into our understanding of microbial community compositions and structures of CWWTPs. Copyright © 2014 Elsevier Ltd. All rights reserved.
Foliar fungi of Betula pendula: impact of tree species mixtures and assessment methods
Nguyen, Diem; Boberg, Johanna; Cleary, Michelle; Bruelheide, Helge; Hönig, Lydia; Koricheva, Julia; Stenlid, Jan
2017-01-01
Foliar fungi of silver birch (Betula pendula) in an experimental Finnish forest were investigated across a gradient of tree species richness using molecular high-throughput sequencing and visual macroscopic assessment. We hypothesized that the molecular approach detects more fungal taxa than visual assessment, and that there is a relationship among the most common fungal taxa detected by both techniques. Furthermore, we hypothesized that the fungal community composition, diversity, and distribution patterns are affected by changes in tree diversity. Sequencing revealed greater diversity of fungi on birch leaves than the visual assessment method. One species showed a linear relationship between the methods. Species-specific variation in fungal community composition could be partially explained by tree diversity, though overall fungal diversity was not affected by tree diversity. Analysis of specific fungal taxa indicated tree diversity effects at the local neighbourhood scale, where the proportion of birch among neighbouring trees varied, but not at the plot scale. In conclusion, both methods may be used to determine tree diversity effects on the foliar fungal community. However, high-throughput sequencing provided higher resolution of the fungal community, while the visual macroscopic assessment detected functionally active fungal species. PMID:28150710
GenomicTools: a computational platform for developing high-throughput analytics in genomics.
Tsirigos, Aristotelis; Haiminen, Niina; Bilal, Erhan; Utro, Filippo
2012-01-15
Recent advances in sequencing technology have resulted in the dramatic increase of sequencing data, which, in turn, requires efficient management of computational resources, such as computing time, memory requirements as well as prototyping of computational pipelines. We present GenomicTools, a flexible computational platform, comprising both a command-line set of tools and a C++ API, for the analysis and manipulation of high-throughput sequencing data such as DNA-seq, RNA-seq, ChIP-seq and MethylC-seq. GenomicTools implements a variety of mathematical operations between sets of genomic regions thereby enabling the prototyping of computational pipelines that can address a wide spectrum of tasks ranging from pre-processing and quality control to meta-analyses. Additionally, the GenomicTools platform is designed to analyze large datasets of any size by minimizing memory requirements. In practical applications, where comparable, GenomicTools outperforms existing tools in terms of both time and memory usage. The GenomicTools platform (version 2.0.0) was implemented in C++. The source code, documentation, user manual, example datasets and scripts are available online at http://code.google.com/p/ibm-cbc-genomic-tools.
de Muinck, Eric J; Trosvik, Pål; Gilfillan, Gregor D; Hov, Johannes R; Sundaram, Arvind Y M
2017-07-06
Advances in sequencing technologies and bioinformatics have made the analysis of microbial communities almost routine. Nonetheless, the need remains to improve on the techniques used for gathering such data, including increasing throughput while lowering cost and benchmarking the techniques so that potential sources of bias can be better characterized. We present a triple-index amplicon sequencing strategy to sequence large numbers of samples at significantly lower c ost and in a shorter timeframe compared to existing methods. The design employs a two-stage PCR protocol, incorpo rating three barcodes to each sample, with the possibility to add a fourth-index. It also includes heterogeneity spacers to overcome low complexity issues faced when sequencing amplicons on Illumina platforms. The library preparation method was extensively benchmarked through analysis of a mock community in order to assess biases introduced by sample indexing, number of PCR cycles, and template concentration. We further evaluated the method through re-sequencing of a standardized environmental sample. Finally, we evaluated our protocol on a set of fecal samples from a small cohort of healthy adults, demonstrating good performance in a realistic experimental setting. Between-sample variation was mainly related to batch effects, such as DNA extraction, while sample indexing was also a significant source of bias. PCR cycle number strongly influenced chimera formation and affected relative abundance estimates of species with high GC content. Libraries were sequenced using the Illumina HiSeq and MiSeq platforms to demonstrate that this protocol is highly scalable to sequence thousands of samples at a very low cost. Here, we provide the most comprehensive study of performance and bias inherent to a 16S rRNA gene amplicon sequencing method to date. Triple-indexing greatly reduces the number of long custom DNA oligos required for library preparation, while the inclusion of variable length heterogeneity spacers minimizes the need for PhiX spike-in. This design results in a significant cost reduction of highly multiplexed amplicon sequencing. The biases we characterize highlight the need for highly standardized protocols. Reassuringly, we find that the biological signal is a far stronger structuring factor than the various sources of bias.
A transmission imaging spectrograph and microfabricated channel system for DNA analysis.
Simpson, J W; Ruiz-Martinez, M C; Mulhern, G T; Berka, J; Latimer, D R; Ball, J A; Rothberg, J M; Went, G T
2000-01-01
In this paper we present the development of a DNA analysis system using a microfabricated channel device and a novel transmission imaging spectrograph which can be efficiently incorporated into a high throughput genomics facility for both sizing and sequencing of DNA fragments. The device contains 48 channels etched on a glass substrate. The channels are sealed with a flat glass plate which also provides a series of apertures for sample loading and contact with buffer reservoirs. Samples can be easily loaded in volumes up to 640 nL without band broadening because of an efficient electrokinetic stacking at the electrophoresis channel entrance. The system uses a dual laser excitation source and a highly sensitive charge-coupled device (CCD) detector allowing for simultaneous detection of many fluorescent dyes. The sieving matrices for the separation of single-stranded DNA fragments are polymerized in situ in denaturing buffer systems. Examples of separation of single-stranded DNA fragments up to 500 bases in length are shown, including accurate sizing of GeneCalling fragments, and sequencing samples prepared with a reduced amount of dye terminators. An increase in sample throughput has been achieved by color multiplexing.
Han, Xiaoping; Chen, Haide; Huang, Daosheng; Chen, Huidong; Fei, Lijiang; Cheng, Chen; Huang, He; Yuan, Guo-Cheng; Guo, Guoji
2018-04-05
Human pluripotent stem cells (hPSCs) provide powerful models for studying cellular differentiations and unlimited sources of cells for regenerative medicine. However, a comprehensive single-cell level differentiation roadmap for hPSCs has not been achieved. We use high throughput single-cell RNA-sequencing (scRNA-seq), based on optimized microfluidic circuits, to profile early differentiation lineages in the human embryoid body system. We present a cellular-state landscape for hPSC early differentiation that covers multiple cellular lineages, including neural, muscle, endothelial, stromal, liver, and epithelial cells. Through pseudotime analysis, we construct the developmental trajectories of these progenitor cells and reveal the gene expression dynamics in the process of cell differentiation. We further reprogram primed H9 cells into naïve-like H9 cells to study the cellular-state transition process. We find that genes related to hemogenic endothelium development are enriched in naïve-like H9. Functionally, naïve-like H9 show higher potency for differentiation into hematopoietic lineages than primed cells. Our single-cell analysis reveals the cellular-state landscape of hPSC early differentiation, offering new insights that can be harnessed for optimization of differentiation protocols.
Low-Cost, High-Throughput Sequencing of DNA Assemblies Using a Highly Multiplexed Nextera Process.
Shapland, Elaine B; Holmes, Victor; Reeves, Christopher D; Sorokin, Elena; Durot, Maxime; Platt, Darren; Allen, Christopher; Dean, Jed; Serber, Zach; Newman, Jack; Chandran, Sunil
2015-07-17
In recent years, next-generation sequencing (NGS) technology has greatly reduced the cost of sequencing whole genomes, whereas the cost of sequence verification of plasmids via Sanger sequencing has remained high. Consequently, industrial-scale strain engineers either limit the number of designs or take short cuts in quality control. Here, we show that over 4000 plasmids can be completely sequenced in one Illumina MiSeq run for less than $3 each (15× coverage), which is a 20-fold reduction over using Sanger sequencing (2× coverage). We reduced the volume of the Nextera tagmentation reaction by 100-fold and developed an automated workflow to prepare thousands of samples for sequencing. We also developed software to track the samples and associated sequence data and to rapidly identify correctly assembled constructs having the fewest defects. As DNA synthesis and assembly become a centralized commodity, this NGS quality control (QC) process will be essential to groups operating high-throughput pipelines for DNA construction.
Johnson, Matthew G.; Gardner, Elliot M.; Liu, Yang; Medina, Rafael; Goffinet, Bernard; Shaw, A. Jonathan; Zerega, Nyree J. C.; Wickett, Norman J.
2016-01-01
Premise of the study: Using sequence data generated via target enrichment for phylogenetics requires reassembly of high-throughput sequence reads into loci, presenting a number of bioinformatics challenges. We developed HybPiper as a user-friendly platform for assembly of gene regions, extraction of exon and intron sequences, and identification of paralogous gene copies. We test HybPiper using baits designed to target 333 phylogenetic markers and 125 genes of functional significance in Artocarpus (Moraceae). Methods and Results: HybPiper implements parallel execution of sequence assembly in three phases: read mapping, contig assembly, and target sequence extraction. The pipeline was able to recover nearly complete gene sequences for all genes in 22 species of Artocarpus. HybPiper also recovered more than 500 bp of nontargeted intron sequence in over half of the phylogenetic markers and identified paralogous gene copies in Artocarpus. Conclusions: HybPiper was designed for Linux and Mac OS X and is freely available at https://github.com/mossmatters/HybPiper. PMID:27437175
Dereeper, Alexis; Nicolas, Stéphane; Le Cunff, Loïc; Bacilieri, Roberto; Doligez, Agnès; Peros, Jean-Pierre; Ruiz, Manuel; This, Patrice
2011-05-05
High-throughput re-sequencing, new genotyping technologies and the availability of reference genomes allow the extensive characterization of Single Nucleotide Polymorphisms (SNPs) and insertion/deletion events (indels) in many plant species. The rapidly increasing amount of re-sequencing and genotyping data generated by large-scale genetic diversity projects requires the development of integrated bioinformatics tools able to efficiently manage, analyze, and combine these genetic data with genome structure and external data. In this context, we developed SNiPlay, a flexible, user-friendly and integrative web-based tool dedicated to polymorphism discovery and analysis. It integrates:1) a pipeline, freely accessible through the internet, combining existing softwares with new tools to detect SNPs and to compute different types of statistical indices and graphical layouts for SNP data. From standard sequence alignments, genotyping data or Sanger sequencing traces given as input, SNiPlay detects SNPs and indels events and outputs submission files for the design of Illumina's SNP chips. Subsequently, it sends sequences and genotyping data into a series of modules in charge of various processes: physical mapping to a reference genome, annotation (genomic position, intron/exon location, synonymous/non-synonymous substitutions), SNP frequency determination in user-defined groups, haplotype reconstruction and network, linkage disequilibrium evaluation, and diversity analysis (Pi, Watterson's Theta, Tajima's D).Furthermore, the pipeline allows the use of external data (such as phenotype, geographic origin, taxa, stratification) to define groups and compare statistical indices.2) a database storing polymorphisms, genotyping data and grapevine sequences released by public and private projects. It allows the user to retrieve SNPs using various filters (such as genomic position, missing data, polymorphism type, allele frequency), to compare SNP patterns between populations, and to export genotyping data or sequences in various formats. Our experiments on grapevine genetic projects showed that SNiPlay allows geneticists to rapidly obtain advanced results in several key research areas of plant genetic diversity. Both the management and treatment of large amounts of SNP data are rendered considerably easier for end-users through automation and integration. Current developments are taking into account new advances in high-throughput technologies.SNiPlay is available at: http://sniplay.cirad.fr/.
2012-01-01
The increasing size and complexity of exome/genome sequencing data requires new tools for clinical geneticists to discover disease-causing variants. Bottlenecks in identifying the causative variation include poor cross-sample querying, constantly changing functional annotation and not considering existing knowledge concerning the phenotype. We describe a methodology that facilitates exploration of patient sequencing data towards identification of causal variants under different genetic hypotheses. Annotate-it facilitates handling, analysis and interpretation of high-throughput single nucleotide variant data. We demonstrate our strategy using three case studies. Annotate-it is freely available and test data are accessible to all users at http://www.annotate-it.org. PMID:23013645
YAMAT-seq: an efficient method for high-throughput sequencing of mature transfer RNAs
Shigematsu, Megumi; Honda, Shozo; Loher, Phillipe; Telonis, Aristeidis G.; Rigoutsos, Isidore
2017-01-01
Abstract Besides translation, transfer RNAs (tRNAs) play many non-canonical roles in various biological pathways and exhibit highly variable expression profiles. To unravel the emerging complexities of tRNA biology and molecular mechanisms underlying them, an efficient tRNA sequencing method is required. However, the rigid structure of tRNA has been presenting a challenge to the development of such methods. We report the development of Y-shaped Adapter-ligated MAture TRNA sequencing (YAMAT-seq), an efficient and convenient method for high-throughput sequencing of mature tRNAs. YAMAT-seq circumvents the issue of inefficient adapter ligation, a characteristic of conventional RNA sequencing methods for mature tRNAs, by employing the efficient and specific ligation of Y-shaped adapter to mature tRNAs using T4 RNA Ligase 2. Subsequent cDNA amplification and next-generation sequencing successfully yield numerous mature tRNA sequences. YAMAT-seq has high specificity for mature tRNAs and high sensitivity to detect most isoacceptors from minute amount of total RNA. Moreover, YAMAT-seq shows quantitative capability to estimate expression levels of mature tRNAs, and has high reproducibility and broad applicability for various cell lines. YAMAT-seq thus provides high-throughput technique for identifying tRNA profiles and their regulations in various transcriptomes, which could play important regulatory roles in translation and other biological processes. PMID:28108659
Mallott, E K; Malhi, R S; Garber, P A
2015-03-01
The genus Saguinus represents a successful radiation of over 20 species of small-bodied New World monkeys. Studies of the tamarin diet indicate that insects and small vertebrates account for ∼16-45% of total feeding and foraging time, and represent an important source of lipids, protein, and metabolizable energy. Although tamarins are reported to commonly consume large-bodied insects such as grasshoppers and walking sticks (Orthoptera), little is known concerning the degree to which smaller or less easily identifiable arthropod prey comprises an important component of their diet. To better understand tamarin arthropod feeding behavior, fecal samples from 20 wild Bolivian saddleback tamarins (members of five groups) were collected over a 3 week period in June 2012, and analyzed for the presence of arthropod DNA. DNA was extracted using a Qiagen stool extraction kit, and universal insect primers were created and used to amplify a ∼280 bp section of the COI mitochondrial gene. Amplicons were sequenced on the Roche 454 sequencing platform using high-throughput sequencing techniques. An analysis of these samples indicated the presence of 43 taxa of arthropods including 10 orders, 15 families, and 12 identified genera. Many of these taxa had not been previously identified in the tamarin diet. These results highlight molecular analysis of fecal DNA as an important research tool for identifying anthropod feeding patterns in primates, and reveal broad diversity in the taxa, foraging microhabitats, and size of arthropods consumed by tamarin monkeys. © 2014 Wiley Periodicals, Inc.
Shinozuka, Hiroshi; Forster, John W
2016-01-01
Background. Multiplexed sequencing is commonly performed on massively parallel short-read sequencing platforms such as Illumina, and the efficiency of library normalisation can affect the quality of the output dataset. Although several library normalisation approaches have been established, none are ideal for highly multiplexed sequencing due to issues of cost and/or processing time. Methods. An inexpensive and high-throughput library quantification method has been developed, based on an adaptation of the melting curve assay. Sequencing libraries were subjected to the assay using the Bio-Rad Laboratories CFX Connect(TM) Real-Time PCR Detection System. The library quantity was calculated through summation of reduction of relative fluorescence units between 86 and 95 °C. Results.PCR-enriched sequencing libraries are suitable for this quantification without pre-purification of DNA. Short DNA molecules, which ideally should be eliminated from the library for subsequent processing, were differentiated from the target DNA in a mixture on the basis of differences in melting temperature. Quantification results for long sequences targeted using the melting curve assay were correlated with those from existing methods (R (2) > 0.77), and that observed from MiSeq sequencing (R (2) = 0.82). Discussion.The results of multiplexed sequencing suggested that the normalisation performance of the described method is equivalent to that of another recently reported high-throughput bead-based method, BeNUS. However, costs for the melting curve assay are considerably lower and processing times shorter than those of other existing methods, suggesting greater suitability for highly multiplexed sequencing applications.
High-throughput sequencing of three Lemnoideae (duckweeds) chloroplast genomes from total DNA.
Wang, Wenqin; Messing, Joachim
2011-01-01
Chloroplast genomes provide a wealth of information for evolutionary and population genetic studies. Chloroplasts play a particularly important role in the adaption for aquatic plants because they float on water and their major surface is exposed continuously to sunlight. The subfamily of Lemnoideae represents such a collection of aquatic species that because of photosynthesis represents one of the fastest growing plant species on earth. We sequenced the chloroplast genomes from three different genera of Lemnoideae, Spirodela polyrhiza, Wolffiella lingulata and Wolffia australiana by high-throughput DNA sequencing of genomic DNA using the SOLiD platform. Unfractionated total DNA contains high copies of plastid DNA so that sequences from the nucleus and mitochondria can easily be filtered computationally. Remaining sequence reads were assembled into contiguous sequences (contigs) using SOLiD software tools. Contigs were mapped to a reference genome of Lemna minor and gaps, selected by PCR, were sequenced on the ABI3730xl platform. This combinatorial approach yielded whole genomic contiguous sequences in a cost-effective manner. Over 1,000-time coverage of chloroplast from total DNA were reached by the SOLiD platform in a single spot on a quadrant slide without purification. Comparative analysis indicated that the chloroplast genome was conserved in gene number and organization with respect to the reference genome of L. minor. However, higher nucleotide substitution, abundant deletions and insertions occurred in non-coding regions of these genomes, indicating a greater genomic dynamics than expected from the comparison of other related species in the Pooideae. Noticeably, there was no transition bias over transversion in Lemnoideae. The data should have immediate applications in evolutionary biology and plant taxonomy with increased resolution and statistical power.
High-Throughput Sequencing of Three Lemnoideae (Duckweeds) Chloroplast Genomes from Total DNA
Wang, Wenqin; Messing, Joachim
2011-01-01
Background Chloroplast genomes provide a wealth of information for evolutionary and population genetic studies. Chloroplasts play a particularly important role in the adaption for aquatic plants because they float on water and their major surface is exposed continuously to sunlight. The subfamily of Lemnoideae represents such a collection of aquatic species that because of photosynthesis represents one of the fastest growing plant species on earth. Methods We sequenced the chloroplast genomes from three different genera of Lemnoideae, Spirodela polyrhiza, Wolffiella lingulata and Wolffia australiana by high-throughput DNA sequencing of genomic DNA using the SOLiD platform. Unfractionated total DNA contains high copies of plastid DNA so that sequences from the nucleus and mitochondria can easily be filtered computationally. Remaining sequence reads were assembled into contiguous sequences (contigs) using SOLiD software tools. Contigs were mapped to a reference genome of Lemna minor and gaps, selected by PCR, were sequenced on the ABI3730xl platform. Conclusions This combinatorial approach yielded whole genomic contiguous sequences in a cost-effective manner. Over 1,000-time coverage of chloroplast from total DNA were reached by the SOLiD platform in a single spot on a quadrant slide without purification. Comparative analysis indicated that the chloroplast genome was conserved in gene number and organization with respect to the reference genome of L. minor. However, higher nucleotide substitution, abundant deletions and insertions occurred in non-coding regions of these genomes, indicating a greater genomic dynamics than expected from the comparison of other related species in the Pooideae. Noticeably, there was no transition bias over transversion in Lemnoideae. The data should have immediate applications in evolutionary biology and plant taxonomy with increased resolution and statistical power. PMID:21931804
TRAPR: R Package for Statistical Analysis and Visualization of RNA-Seq Data.
Lim, Jae Hyun; Lee, Soo Youn; Kim, Ju Han
2017-03-01
High-throughput transcriptome sequencing, also known as RNA sequencing (RNA-Seq), is a standard technology for measuring gene expression with unprecedented accuracy. Numerous bioconductor packages have been developed for the statistical analysis of RNA-Seq data. However, these tools focus on specific aspects of the data analysis pipeline, and are difficult to appropriately integrate with one another due to their disparate data structures and processing methods. They also lack visualization methods to confirm the integrity of the data and the process. In this paper, we propose an R-based RNA-Seq analysis pipeline called TRAPR, an integrated tool that facilitates the statistical analysis and visualization of RNA-Seq expression data. TRAPR provides various functions for data management, the filtering of low-quality data, normalization, transformation, statistical analysis, data visualization, and result visualization that allow researchers to build customized analysis pipelines.
The technology and biology of single-cell RNA sequencing.
Kolodziejczyk, Aleksandra A; Kim, Jong Kyoung; Svensson, Valentine; Marioni, John C; Teichmann, Sarah A
2015-05-21
The differences between individual cells can have profound functional consequences, in both unicellular and multicellular organisms. Recently developed single-cell mRNA-sequencing methods enable unbiased, high-throughput, and high-resolution transcriptomic analysis of individual cells. This provides an additional dimension to transcriptomic information relative to traditional methods that profile bulk populations of cells. Already, single-cell RNA-sequencing methods have revealed new biology in terms of the composition of tissues, the dynamics of transcription, and the regulatory relationships between genes. Rapid technological developments at the level of cell capture, phenotyping, molecular biology, and bioinformatics promise an exciting future with numerous biological and medical applications. Copyright © 2015 Elsevier Inc. All rights reserved.
BarraCUDA - a fast short read sequence aligner using graphics processing units
2012-01-01
Background With the maturation of next-generation DNA sequencing (NGS) technologies, the throughput of DNA sequencing reads has soared to over 600 gigabases from a single instrument run. General purpose computing on graphics processing units (GPGPU), extracts the computing power from hundreds of parallel stream processors within graphics processing cores and provides a cost-effective and energy efficient alternative to traditional high-performance computing (HPC) clusters. In this article, we describe the implementation of BarraCUDA, a GPGPU sequence alignment software that is based on BWA, to accelerate the alignment of sequencing reads generated by these instruments to a reference DNA sequence. Findings Using the NVIDIA Compute Unified Device Architecture (CUDA) software development environment, we ported the most computational-intensive alignment component of BWA to GPU to take advantage of the massive parallelism. As a result, BarraCUDA offers a magnitude of performance boost in alignment throughput when compared to a CPU core while delivering the same level of alignment fidelity. The software is also capable of supporting multiple CUDA devices in parallel to further accelerate the alignment throughput. Conclusions BarraCUDA is designed to take advantage of the parallelism of GPU to accelerate the alignment of millions of sequencing reads generated by NGS instruments. By doing this, we could, at least in part streamline the current bioinformatics pipeline such that the wider scientific community could benefit from the sequencing technology. BarraCUDA is currently available from http://seqbarracuda.sf.net PMID:22244497
Ion channel drug discovery and research: the automated Nano-Patch-Clamp technology.
Brueggemann, A; George, M; Klau, M; Beckler, M; Steindl, J; Behrends, J C; Fertig, N
2004-01-01
Unlike the genomics revolution, which was largely enabled by a single technological advance (high throughput sequencing), rapid advancement in proteomics will require a broader effort to increase the throughput of a number of key tools for functional analysis of different types of proteins. In the case of ion channels -a class of (membrane) proteins of great physiological importance and potential as drug targets- the lack of adequate assay technologies is felt particularly strongly. The available, indirect, high throughput screening methods for ion channels clearly generate insufficient information. The best technology to study ion channel function and screen for compound interaction is the patch clamp technique, but patch clamping suffers from low throughput, which is not acceptable for drug screening. A first step towards a solution is presented here. The nano patch clamp technology, which is based on a planar, microstructured glass chip, enables automatic whole cell patch clamp measurements. The Port-a-Patch is an automated electrophysiology workstation, which uses planar patch clamp chips. This approach enables high quality and high content ion channel and compound evaluation on a one-cell-at-a-time basis. The presented automation of the patch process and its scalability to an array format are the prerequisites for any higher throughput electrophysiology instruments.
Holst-Jensen, Arne; Spilsberg, Bjørn; Arulandhu, Alfred J; Kok, Esther; Shi, Jianxin; Zel, Jana
2016-07-01
The emergence of high-throughput, massive or next-generation sequencing technologies has created a completely new foundation for molecular analyses. Various selective enrichment processes are commonly applied to facilitate detection of predefined (known) targets. Such approaches, however, inevitably introduce a bias and are prone to miss unknown targets. Here we review the application of high-throughput sequencing technologies and the preparation of fit-for-purpose whole genome shotgun sequencing libraries for the detection and characterization of genetically modified and derived products. The potential impact of these new sequencing technologies for the characterization, breeding selection, risk assessment, and traceability of genetically modified organisms and genetically modified products is yet to be fully acknowledged. The published literature is reviewed, and the prospects for future developments and use of the new sequencing technologies for these purposes are discussed.
TOPICAL REVIEW: Integrated genetic analysis microsystems
NASA Astrophysics Data System (ADS)
Lagally, Eric T.; Mathies, Richard A.
2004-12-01
With the completion of the Human Genome Project and the ongoing DNA sequencing of the genomes of other animals, bacteria, plants and others, a wealth of new information about the genetic composition of organisms has become available. However, as the demand for sequence information grows, so does the workload required both to generate this sequence and to use it for targeted genetic analysis. Microfabricated genetic analysis systems are well poised to assist in the collection and use of these data through increased analysis speed, lower analysis cost and higher parallelism leading to increased assay throughput. In addition, such integrated microsystems may point the way to targeted genetic experiments on single cells and in other areas that are otherwise very difficult. Concomitant with these advantages, such systems, when fully integrated, should be capable of forming portable systems for high-speed in situ analyses, enabling a new standard in disciplines such as clinical chemistry, forensics, biowarfare detection and epidemiology. This review will discuss the various technologies available for genetic analysis on the microscale, and efforts to integrate them to form fully functional robust analysis devices.
Lucero, Mary E.; Unc, Adrian; Cooke, Peter; Dowd, Scot; Sun, Shulei
2011-01-01
Microbial diversity associated with micropropagated Atriplex species was assessed using microscopy, isolate culturing, and sequencing. Light, electron, and confocal microscopy revealed microbial cells in aseptically regenerated leaves and roots. Clone libraries and tag-encoded FLX amplicon pyrosequencing (TEFAP) analysis amplified sequences from callus homologous to diverse fungal and bacterial taxa. Culturing isolated some seed borne endophyte taxa which could be readily propagated apart from the host. Microbial cells were observed within biofilm-like residues associated with plant cell surfaces and intercellular spaces. Various universal primers amplified both plant and microbial sequences, with different primers revealing different patterns of fungal diversity. Bacterial and fungal TEFAP followed by alignment with sequences from curated databases revealed 7 bacterial and 17 ascomycete taxa in A. canescens, and 5 bacterial taxa in A. torreyi. Additional diversity was observed among isolates and clone libraries. Micropropagated Atriplex retains a complex, intimately associated microbiome which includes diverse strains well poised to interact in manners that influence host physiology. Microbiome analysis was facilitated by high throughput sequencing methods, but primer biases continue to limit recovery of diverse sequences from even moderately complex communities. PMID:21437280
Gulati, Ashima; Somlo, Stefan
2018-05-01
The genesis of whole exome sequencing as a powerful tool for detailing the protein coding sequence of the human genome was conceptualized based on the availability of next-generation sequencing technology and knowledge of the human reference genome. The field of pediatric nephrology enriched with molecularly unsolved phenotypes is allowing the clinical and research application of whole exome sequencing to enable novel gene discovery and provide amendment of phenotypic misclassification. Recent studies in the field have informed us that newer high-throughput sequencing techniques are likely to be of high yield when applied in conjunction with conventional genomic approaches such as linkage analysis and other strategies used to focus subsequent analysis. They have also emphasized the need for the validation of novel genetic findings in large collaborative cohorts and the production of robust corroborative biological data. The well-structured application of comprehensive genomic testing in clinical and research arenas will hopefully continue to advance patient care and precision medicine, but does call for attention to be paid to its integrated challenges.
Pirrò, Stefano; Zanella, Letizia; Kenzo, Maurice; Montesano, Carla; Minutolo, Antonella; Potestà, Marina; Sobze, Martin Sanou; Canini, Antonella; Cirilli, Marco; Muleo, Rosario; Colizzi, Vittorio; Galgani, Andrea
2016-01-01
Moringa oleifera is a widespread plant with substantial nutritional and medicinal value. We postulated that microRNAs (miRNAs), which are endogenous, noncoding small RNAs regulating gene expression at the post-transcriptional level, might contribute to the medicinal properties of plants of this species after ingestion into human body, regulating human gene expression. However, the knowledge is scarce about miRNA in Moringa. Furthermore, in order to test the hypothesis on the pharmacological potential properties of miRNA, we conducted a high-throughput sequencing analysis using the Illumina platform. A total of 31,290,964 raw reads were produced from a library of small RNA isolated from M. oleifera seeds. We identified 94 conserved and two novel miRNAs that were validated by qRT-PCR assays. Results from qRT-PCR trials conducted on the expression of 20 Moringa miRNA showed that are conserved across multiple plant species as determined by their detection in tissue of other common crop plants. In silico analyses predicted target genes for the conserved miRNA that in turn allowed to relate the miRNAs to the regulation of physiological processes. Some of the predicted plant miRNAs have functional homology to their mammalian counterparts and regulated human genes when they were transfected into cell lines. To our knowledge, this is the first report of discovering M. oleifera miRNAs based on high-throughput sequencing and bioinformatics analysis and we provided new insight into a potential cross-species control of human gene expression. The widespread cultivation and consumption of M. oleifera, for nutritional and medicinal purposes, brings humans into close contact with products and extracts of this plant species. The potential for miRNA transfer should be evaluated as one possible mechanism of action to account for beneficial properties of this valuable species.
Searching for resistance genes to Bursaphelenchus xylophilus using high throughput screening.
Santos, Carla S; Pinheiro, Miguel; Silva, Ana I; Egas, Conceição; Vasconcelos, Marta W
2012-11-07
Pine wilt disease (PWD), caused by the pinewood nematode (PWN; Bursaphelenchus xylophilus), damages and kills pine trees and is causing serious economic damage worldwide. Although the ecological mechanism of infestation is well described, the plant's molecular response to the pathogen is not well known. This is due mainly to the lack of genomic information and the complexity of the disease. High throughput sequencing is now an efficient approach for detecting the expression of genes in non-model organisms, thus providing valuable information in spite of the lack of the genome sequence. In an attempt to unravel genes potentially involved in the pine defense against the pathogen, we hereby report the high throughput comparative sequence analysis of infested and non-infested stems of Pinus pinaster (very susceptible to PWN) and Pinus pinea (less susceptible to PWN). Four cDNA libraries from infested and non-infested stems of P. pinaster and P. pinea were sequenced in a full 454 GS FLX run, producing a total of 2,083,698 reads. The putative amino acid sequences encoded by the assembled transcripts were annotated according to Gene Ontology, to assign Pinus contigs into Biological Processes, Cellular Components and Molecular Functions categories. Most of the annotated transcripts corresponded to Picea genes-25.4-39.7%, whereas a smaller percentage, matched Pinus genes, 1.8-12.8%, probably a consequence of more public genomic information available for Picea than for Pinus. The comparative transcriptome analysis showed that when P. pinaster was infested with PWN, the genes malate dehydrogenase, ABA, water deficit stress related genes and PAR1 were highly expressed, while in PWN-infested P. pinea, the highly expressed genes were ricin B-related lectin, and genes belonging to the SNARE and high mobility group families. Quantitative PCR experiments confirmed the differential gene expression between the two pine species. Defense-related genes triggered by nematode infestation were detected in both P. pinaster and P. pinea transcriptomes utilizing 454 pyrosequencing technology. P. pinaster showed higher abundance of genes related to transcriptional regulation, terpenoid secondary metabolism (including some with nematicidal activity) and pathogen attack. P. pinea showed higher abundance of genes related to oxidative stress and higher levels of expression in general of stress responsive genes. This study provides essential information about the molecular defense mechanisms utilized by P. pinaster and P. pinea against PWN infestation and contributes to a better understanding of PWD.
Mendoza, Lucía M; Neef, Alexander; Vignolo, Graciela; Belloch, Carmela
2017-10-01
Diversity and dynamics of yeasts associated with the fermentation of Argentinian maize-based beverage chicha was investigated. Samples taken at different stages from two chicha productions were analyzed by culture-dependent and culture-independent methods. Five hundred and ninety six yeasts were isolated by classical microbiological methods and 16 species identified by RFLPs and sequencing of D1/D2 26S rRNA gene. Genetic typing of isolates from the dominant species, Saccharomyces cerevisiae, by PCR of delta elements revealed up to 42 different patterns. High-throughput sequencing (HTS) of D1/D2 26S rRNA gene amplicons from chicha samples detected more than one hundred yeast species and almost fifty filamentous fungi taxa. Analysis of the data revealed that yeasts dominated the fermentation, although, a significant percentage of filamentous fungi appeared in the first step of the process. Statistical analysis of results showed that very few taxa were represented by more than 1% of the reads per sample at any step of the process. S. cerevisiae represented more than 90% of the reads in the fermentative samples. Other yeast species dominated the pre-fermentative steps and abounded in fermented samples when S. cerevisiae was in percentages below 90%. Most yeasts species detected by pyrosequencing were not recovered by cultivation. In contrast, the cultivation-based methodology detected very few yeast taxa, and most of them corresponded with very few reads in the pyrosequencing analysis. Copyright © 2017 Elsevier Ltd. All rights reserved.
Genometa--a fast and accurate classifier for short metagenomic shotgun reads.
Davenport, Colin F; Neugebauer, Jens; Beckmann, Nils; Friedrich, Benedikt; Kameri, Burim; Kokott, Svea; Paetow, Malte; Siekmann, Björn; Wieding-Drewes, Matthias; Wienhöfer, Markus; Wolf, Stefan; Tümmler, Burkhard; Ahlers, Volker; Sprengel, Frauke
2012-01-01
Metagenomic studies use high-throughput sequence data to investigate microbial communities in situ. However, considerable challenges remain in the analysis of these data, particularly with regard to speed and reliable analysis of microbial species as opposed to higher level taxa such as phyla. We here present Genometa, a computationally undemanding graphical user interface program that enables identification of bacterial species and gene content from datasets generated by inexpensive high-throughput short read sequencing technologies. Our approach was first verified on two simulated metagenomic short read datasets, detecting 100% and 94% of the bacterial species included with few false positives or false negatives. Subsequent comparative benchmarking analysis against three popular metagenomic algorithms on an Illumina human gut dataset revealed Genometa to attribute the most reads to bacteria at species level (i.e. including all strains of that species) and demonstrate similar or better accuracy than the other programs. Lastly, speed was demonstrated to be many times that of BLAST due to the use of modern short read aligners. Our method is highly accurate if bacteria in the sample are represented by genomes in the reference sequence but cannot find species absent from the reference. This method is one of the most user-friendly and resource efficient approaches and is thus feasible for rapidly analysing millions of short reads on a personal computer. The Genometa program, a step by step tutorial and Java source code are freely available from http://genomics1.mh-hannover.de/genometa/ and on http://code.google.com/p/genometa/. This program has been tested on Ubuntu Linux and Windows XP/7.
REDItools: high-throughput RNA editing detection made easy.
Picardi, Ernesto; Pesole, Graziano
2013-07-15
The reliable detection of RNA editing sites from massive sequencing data remains challenging and, although several methodologies have been proposed, no computational tools have been released to date. Here, we introduce REDItools a suite of python scripts to perform high-throughput investigation of RNA editing using next-generation sequencing data. REDItools are in python programming language and freely available at http://code.google.com/p/reditools/. ernesto.picardi@uniba.it or graziano.pesole@uniba.it Supplementary data are available at Bioinformatics online.
High-Throughput Sequencing: A Roadmap Toward Community Ecology
Poisot, Timothée; Péquin, Bérangère; Gravel, Dominique
2013-01-01
High-throughput sequencing is becoming increasingly important in microbial ecology, yet it is surprisingly under-used to generate or test biogeographic hypotheses. In this contribution, we highlight how adding these methods to the ecologist toolbox will allow the detection of new patterns, and will help our understanding of the structure and dynamics of diversity. Starting with a review of ecological questions that can be addressed, we move on to the technical and analytical issues that will benefit from an increased collaboration between different disciplines. PMID:23610649
High-throughput sequencing in veterinary infection biology and diagnostics.
Belák, S; Karlsson, O E; Leijon, M; Granberg, F
2013-12-01
Sequencing methods have improved rapidly since the first versions of the Sanger techniques, facilitating the development of very powerful tools for detecting and identifying various pathogens, such as viruses, bacteria and other microbes. The ongoing development of high-throughput sequencing (HTS; also known as next-generation sequencing) technologies has resulted in a dramatic reduction in DNA sequencing costs, making the technology more accessible to the average laboratory. In this White Paper of the World Organisation for Animal Health (OIE) Collaborating Centre for the Biotechnology-based Diagnosis of Infectious Diseases in Veterinary Medicine (Uppsala, Sweden), several approaches and examples of HTS are summarised, and their diagnostic applicability is briefly discussed. Selected future aspects of HTS are outlined, including the need for bioinformatic resources, with a focus on improving the diagnosis and control of infectious diseases in veterinary medicine.
Zhou, Rong; Wang, Qian; Jiang, Fangling; Cao, Xue; Sun, Mintao; Liu, Min; Wu, Zhen
2016-01-01
MicroRNAs (miRNAs) are 19–24 nucleotide (nt) noncoding RNAs that play important roles in abiotic stress responses in plants. High temperatures have been the subject of considerable attention due to their negative effects on plant growth and development. Heat-responsive miRNAs have been identified in some plants. However, there have been no reports on the global identification of miRNAs and their targets in tomato at high temperatures, especially at different elevated temperatures. Here, three small-RNA libraries and three degradome libraries were constructed from the leaves of the heat-tolerant tomato at normal, moderately and acutely elevated temperatures (26/18 °C, 33/33 °C and 40/40 °C, respectively). Following high-throughput sequencing, 662 conserved and 97 novel miRNAs were identified in total with 469 conserved and 91 novel miRNAs shared in the three small-RNA libraries. Of these miRNAs, 96 and 150 miRNAs were responsive to the moderately and acutely elevated temperature, respectively. Following degradome sequencing, 349 sequences were identified as targets of 138 conserved miRNAs, and 13 sequences were identified as targets of eight novel miRNAs. The expression levels of seven miRNAs and six target genes obtained by quantitative real-time PCR (qRT-PCR) were largely consistent with the sequencing results. This study enriches the number of heat-responsive miRNAs and lays a foundation for the elucidation of the miRNA-mediated regulatory mechanism in tomatoes at elevated temperatures. PMID:27653374
NASA Astrophysics Data System (ADS)
Nicolardi, Simone; Giera, Martin; Kooijman, Pieter; Kraj, Agnieszka; Chervet, Jean-Pierre; Deelder, André M.; van der Burgt, Yuri E. M.
2013-12-01
Particularly in the field of middle- and top-down peptide and protein analysis, disulfide bridges can severely hinder fragmentation and thus impede sequence analysis (coverage). Here we present an on-line/electrochemistry/ESI-FTICR-MS approach, which was applied to the analysis of the primary structure of oxytocin, containing one disulfide bridge, and of hepcidin, containing four disulfide bridges. The presented workflow provided up to 80 % (on-line) conversion of disulfide bonds in both peptides. With minimal sample preparation, such reduction resulted in a higher number of peptide backbone cleavages upon CID or ETD fragmentation, and thus yielded improved sequence coverage. The cycle times, including electrode recovery, were rapid and, therefore, might very well be coupled with liquid chromatography for protein or peptide separation, which has great potential for high-throughput analysis.
Evaluation of sequencing approaches for high-throughput toxicogenomics (SOT)
Whole-genome in vitro transcriptomics has shown the capability to identify mechanisms of action and estimates of potency for chemical-mediated effects in a toxicological framework, but with limited throughput and high cost. We present the evaluation of three toxicogenomics platfo...
High-throughput sequencing reveals unprecedented diversities of Aspergillus species in outdoor air.
Lee, S; An, C; Xu, S; Lee, S; Yamamoto, N
2016-09-01
This study used the Illumina MiSeq to analyse compositions and diversities of Aspergillus species in outdoor air. The seasonal air samplings were performed at two locations in Seoul, South Korea. The results showed the relative abundances of all Aspergillus species combined ranging from 0·20 to 18% and from 0·19 to 21% based on the number of the internal transcribed spacer 1 (ITS1) and β-tubulin (BenA) gene sequences respectively. Aspergillus fumigatus was the most dominant species with the mean relative abundances of 1·2 and 5·5% based on the number of the ITS1 and BenA sequences respectively. A total of 29 Aspergillus species were detected and identified down to the species rank, among which nine species were known opportunistic pathogens. Remarkably, eight of the nine pathogenic species were detected by either one of the two markers, suggesting the need of using multiple markers and/or primer pairs when the assessments are made based on the high-throughput sequencing. Due to diversity of species within the genus Aspergillus, the high-throughput sequencing was useful to characterize their compositions and diversities in outdoor air, which are thought to be difficult to be accurately characterized by conventional culture and/or Sanger sequencing-based techniques. Aspergillus is a diverse genus of fungi with more than 300 species reported in literature. Aspergillus is important since some species are known allergens and opportunistic human pathogens. Traditionally, growth-dependent methods have been used to detect Aspergillus species in air. However, these methods are limited in the number of isolates that can be analysed for their identities, resulting in inaccurate characterizations of Aspergillus diversities. This study used the high-throughput sequencing to explore Aspergillus diversities in outdoor, which are thought to be difficult to be accurately characterized by traditional growth-dependent techniques. © 2016 The Society for Applied Microbiology.
Labudde, Dirk
2015-01-01
The importance of short membrane sequence motifs has been shown in many works and emphasizes the related sequence motif analysis. Together with specific transmembrane helix-helix interactions, the analysis of interacting sequence parts is helpful for understanding the process during membrane protein folding and in retaining the three-dimensional fold. Here we present a simple high-throughput analysis method for deriving mutational information of interacting sequence parts. Applied on aquaporin water channel proteins, our approach supports the analysis of mutational variants within different interacting subsequences and finally the investigation of natural variants which cause diseases like, for example, nephrogenic diabetes insipidus. In this work we demonstrate a simple method for massive membrane protein data analysis. As shown, the presented in silico analyses provide information about interacting sequence parts which are constrained by protein evolution. We present a simple graphical visualization medium for the representation of evolutionary influenced interaction pattern pairs (EIPPs) adapted to mutagen investigations of aquaporin-2, a protein whose mutants are involved in the rare endocrine disorder known as nephrogenic diabetes insipidus, and membrane proteins in general. Furthermore, we present a new method to derive new evolutionary variations within EIPPs which can be used for further mutagen laboratory investigations. PMID:26180540
Grunert, Steffen; Labudde, Dirk
2015-01-01
The importance of short membrane sequence motifs has been shown in many works and emphasizes the related sequence motif analysis. Together with specific transmembrane helix-helix interactions, the analysis of interacting sequence parts is helpful for understanding the process during membrane protein folding and in retaining the three-dimensional fold. Here we present a simple high-throughput analysis method for deriving mutational information of interacting sequence parts. Applied on aquaporin water channel proteins, our approach supports the analysis of mutational variants within different interacting subsequences and finally the investigation of natural variants which cause diseases like, for example, nephrogenic diabetes insipidus. In this work we demonstrate a simple method for massive membrane protein data analysis. As shown, the presented in silico analyses provide information about interacting sequence parts which are constrained by protein evolution. We present a simple graphical visualization medium for the representation of evolutionary influenced interaction pattern pairs (EIPPs) adapted to mutagen investigations of aquaporin-2, a protein whose mutants are involved in the rare endocrine disorder known as nephrogenic diabetes insipidus, and membrane proteins in general. Furthermore, we present a new method to derive new evolutionary variations within EIPPs which can be used for further mutagen laboratory investigations.
MinION Analysis and Reference Consortium: Phase 1 data release and analysis
Eccles, David A.; Zalunin, Vadim; Urban, John M.; Piazza, Paolo; Bowden, Rory J.; Paten, Benedict; Mwaigwisya, Solomon; Batty, Elizabeth M.; Simpson, Jared T.; Snutch, Terrance P.
2015-01-01
The advent of a miniaturized DNA sequencing device with a high-throughput contextual sequencing capability embodies the next generation of large scale sequencing tools. The MinION™ Access Programme (MAP) was initiated by Oxford Nanopore Technologies™ in April 2014, giving public access to their USB-attached miniature sequencing device. The MinION Analysis and Reference Consortium (MARC) was formed by a subset of MAP participants, with the aim of evaluating and providing standard protocols and reference data to the community. Envisaged as a multi-phased project, this study provides the global community with the Phase 1 data from MARC, where the reproducibility of the performance of the MinION was evaluated at multiple sites. Five laboratories on two continents generated data using a control strain of Escherichia coli K-12, preparing and sequencing samples according to a revised ONT protocol. Here, we provide the details of the protocol used, along with a preliminary analysis of the characteristics of typical runs including the consistency, rate, volume and quality of data produced. Further analysis of the Phase 1 data presented here, and additional experiments in Phase 2 of E. coli from MARC are already underway to identify ways to improve and enhance MinION performance. PMID:26834992
Azab, Marwa Mohamed; Fayyad, Dalia Mukhtar
2018-01-01
The use of high throughput next generation technologies has allowed more comprehensive analysis than traditional Sanger sequencing. The specific aim of this study was to investigate the microbial diversity of primary endodontic infections using Illumina MiSeq sequencing platform in Egyptian patients. Samples were collected from 19 patients in Suez Canal University Hospital (Endodontic Department) using sterile # 15K file and paper points. DNA was extracted using Mo Bio power soil DNA isolation extraction kit followed by PCR amplification and agarose gel electrophoresis. The microbiome was characterized on the basis of the V3 and V4 hypervariable region of the 16S rRNA gene by using paired-end sequencing on Illumina MiSeq device. MOTHUR software was used in sequence filtration and analysis of sequenced data. A total of 1858 operational taxonomic units at 97% similarity were assigned to 26 phyla, 245 families, and 705 genera. Four main phyla Firmicutes, Bacteroidetes, Proteobacteria, and Synergistetes were predominant in all samples. At genus level, Prevotella, Bacillus, Porphyromonas, Streptococcus, and Bacteroides were the most abundant. Illumina MiSeq platform sequencing can be used to investigate oral microbiome composition of endodontic infections. Elucidating the ecology of endodontic infections is a necessary step in developing effective intracanal antimicrobials. PMID:29849646
Development of Low-cost, High Energy-per-unit-area Solar Cell Modules
NASA Technical Reports Server (NTRS)
Jones, G. T.; Chitre, S.; Rhee, S. S.
1978-01-01
The development of two hexagonal solar cell process sequences, a laserscribing process technique for scribing hexagonal and modified hexagonal solar cells, a large through-put diffusion process, and two surface macrostructure processes suitable for large scale production is reported. Experimental analysis was made on automated spin-on anti-reflective coating equipment and high pressure wafer cleaning equipment. Six hexagonal solar cell modules were fabricated. Also covered is a detailed theoretical analysis on the optimum silicon utilization by modified hexagonal solar cells.
Analysis and Testing of Mobile Wireless Networks
NASA Technical Reports Server (NTRS)
Alena, Richard; Evenson, Darin; Rundquist, Victor; Clancy, Daniel (Technical Monitor)
2002-01-01
Wireless networks are being used to connect mobile computing elements in more applications as the technology matures. There are now many products (such as 802.11 and 802.11b) which ran in the ISM frequency band and comply with wireless network standards. They are being used increasingly to link mobile Intranet into Wired networks. Standard methods of analyzing and testing their performance and compatibility are needed to determine the limits of the technology. This paper presents analytical and experimental methods of determining network throughput, range and coverage, and interference sources. Both radio frequency (BE) domain and network domain analysis have been applied to determine wireless network throughput and range in the outdoor environment- Comparison of field test data taken under optimal conditions, with performance predicted from RF analysis, yielded quantitative results applicable to future designs. Layering multiple wireless network- sooners can increase performance. Wireless network components can be set to different radio frequency-hopping sequences or spreading functions, allowing more than one sooner to coexist. Therefore, we ran multiple 802.11-compliant systems concurrently in the same geographical area to determine interference effects and scalability, The results can be used to design of more robust networks which have multiple layers of wireless data communication paths and provide increased throughput overall.
Słomka, Marcin; Sobalska-Kwapis, Marta; Wachulec, Monika; Bartosz, Grzegorz; Strapagiel, Dominik
2017-11-03
High resolution melting (HRM) is a convenient method for gene scanning as well as genotyping of individual and multiple single nucleotide polymorphisms (SNPs). This rapid, simple, closed-tube, homogenous, and cost-efficient approach has the capacity for high specificity and sensitivity, while allowing easy transition to high-throughput scale. In this paper, we provide examples from our laboratory practice of some problematic issues which can affect the performance and data analysis of HRM results, especially with regard to reference curve-based targeted genotyping. We present those examples in order of the typical experimental workflow, and discuss the crucial significance of the respective experimental errors and limitations for the quality and analysis of results. The experimental details which have a decisive impact on correct execution of a HRM genotyping experiment include type and quality of DNA source material, reproducibility of isolation method and template DNA preparation, primer and amplicon design, automation-derived preparation and pipetting inconsistencies, as well as physical limitations in melting curve distinction for alternative variants and careful selection of samples for validation by sequencing. We provide a case-by-case analysis and discussion of actual problems we encountered and solutions that should be taken into account by researchers newly attempting HRM genotyping, especially in a high-throughput setup.
Słomka, Marcin; Sobalska-Kwapis, Marta; Wachulec, Monika; Bartosz, Grzegorz
2017-01-01
High resolution melting (HRM) is a convenient method for gene scanning as well as genotyping of individual and multiple single nucleotide polymorphisms (SNPs). This rapid, simple, closed-tube, homogenous, and cost-efficient approach has the capacity for high specificity and sensitivity, while allowing easy transition to high-throughput scale. In this paper, we provide examples from our laboratory practice of some problematic issues which can affect the performance and data analysis of HRM results, especially with regard to reference curve-based targeted genotyping. We present those examples in order of the typical experimental workflow, and discuss the crucial significance of the respective experimental errors and limitations for the quality and analysis of results. The experimental details which have a decisive impact on correct execution of a HRM genotyping experiment include type and quality of DNA source material, reproducibility of isolation method and template DNA preparation, primer and amplicon design, automation-derived preparation and pipetting inconsistencies, as well as physical limitations in melting curve distinction for alternative variants and careful selection of samples for validation by sequencing. We provide a case-by-case analysis and discussion of actual problems we encountered and solutions that should be taken into account by researchers newly attempting HRM genotyping, especially in a high-throughput setup. PMID:29099791
High-throughput gene mapping in Caenorhabditis elegans.
Swan, Kathryn A; Curtis, Damian E; McKusick, Kathleen B; Voinov, Alexander V; Mapa, Felipa A; Cancilla, Michael R
2002-07-01
Positional cloning of mutations in model genetic systems is a powerful method for the identification of targets of medical and agricultural importance. To facilitate the high-throughput mapping of mutations in Caenorhabditis elegans, we have identified a further 9602 putative new single nucleotide polymorphisms (SNPs) between two C. elegans strains, Bristol N2 and the Hawaiian mapping strain CB4856, by sequencing inserts from a CB4856 genomic DNA library and using an informatics pipeline to compare sequences with the canonical N2 genomic sequence. When combined with data from other laboratories, our marker set of 17,189 SNPs provides even coverage of the complete worm genome. To date, we have confirmed >1099 evenly spaced SNPs (one every 91 +/- 56 kb) across the six chromosomes and validated the utility of our SNP marker set and new fluorescence polarization-based genotyping methods for systematic and high-throughput identification of genes in C. elegans by cloning several proprietary genes. We illustrate our approach by recombination mapping and confirmation of the mutation in the cloned gene, dpy-18.
Bokulich, Nicholas A.
2013-01-01
Ultra-high-throughput sequencing (HTS) of fungal communities has been restricted by short read lengths and primer amplification bias, slowing the adoption of newer sequencing technologies to fungal community profiling. To address these issues, we evaluated the performance of several common internal transcribed spacer (ITS) primers and designed a novel primer set and work flow for simultaneous quantification and species-level interrogation of fungal consortia. Primer comparison and validation were predicted in silico and by sequencing a “mock community” of mixed yeast species to explore the challenges of amplicon length and amplification bias for reconstructing defined yeast community structures. The amplicon size and distribution of this primer set are smaller than for all preexisting ITS primer sets, maximizing sequencing coverage of hypervariable ITS domains by very-short-amplicon, high-throughput sequencing platforms. This feature also enables the optional integration of quantitative PCR (qPCR) directly into the HTS preparatory work flow by substituting qPCR with these primers for standard PCR, yielding quantification of individual community members. The complete work flow described here, utilizing any of the qualified primer sets evaluated, can rapidly profile mixed fungal communities and capably reconstructed well-characterized beer and wine fermentation fungal communities. PMID:23377949
FMLRC: Hybrid long read error correction using an FM-index.
Wang, Jeremy R; Holt, James; McMillan, Leonard; Jones, Corbin D
2018-02-09
Long read sequencing is changing the landscape of genomic research, especially de novo assembly. Despite the high error rate inherent to long read technologies, increased read lengths dramatically improve the continuity and accuracy of genome assemblies. However, the cost and throughput of these technologies limits their application to complex genomes. One solution is to decrease the cost and time to assemble novel genomes by leveraging "hybrid" assemblies that use long reads for scaffolding and short reads for accuracy. We describe a novel method leveraging a multi-string Burrows-Wheeler Transform with auxiliary FM-index to correct errors in long read sequences using a set of complementary short reads. We demonstrate that our method efficiently produces significantly more high quality corrected sequence than existing hybrid error-correction methods. We also show that our method produces more contiguous assemblies, in many cases, than existing state-of-the-art hybrid and long-read only de novo assembly methods. Our method accurately corrects long read sequence data using complementary short reads. We demonstrate higher total throughput of corrected long reads and a corresponding increase in contiguity of the resulting de novo assemblies. Improved throughput and computational efficiency than existing methods will help better economically utilize emerging long read sequencing technologies.
Postel, Alexander; Schmeiser, Stefanie; Zimmermann, Bernd; Becher, Paul
2016-01-01
Molecular epidemiology has become an indispensable tool in the diagnosis of diseases and in tracing the infection routes of pathogens. Due to advances in conventional sequencing and the development of high throughput technologies, the field of sequence determination is in the process of being revolutionized. Platforms for sharing sequence information and providing standardized tools for phylogenetic analyses are becoming increasingly important. The database (DB) of the European Union (EU) and World Organisation for Animal Health (OIE) Reference Laboratory for classical swine fever offers one of the world’s largest semi-public virus-specific sequence collections combined with a module for phylogenetic analysis. The classical swine fever (CSF) DB (CSF-DB) became a valuable tool for supporting diagnosis and epidemiological investigations of this highly contagious disease in pigs with high socio-economic impacts worldwide. The DB has been re-designed and now allows for the storage and analysis of traditionally used, well established genomic regions and of larger genomic regions including complete viral genomes. We present an application example for the analysis of highly similar viral sequences obtained in an endemic disease situation and introduce the new geographic “CSF Maps” tool. The concept of this standardized and easy-to-use DB with an integrated genetic typing module is suited to serve as a blueprint for similar platforms for other human or animal viruses. PMID:27827988
Dissecting enzyme function with microfluidic-based deep mutational scanning.
Romero, Philip A; Tran, Tuan M; Abate, Adam R
2015-06-09
Natural enzymes are incredibly proficient catalysts, but engineering them to have new or improved functions is challenging due to the complexity of how an enzyme's sequence relates to its biochemical properties. Here, we present an ultrahigh-throughput method for mapping enzyme sequence-function relationships that combines droplet microfluidic screening with next-generation DNA sequencing. We apply our method to map the activity of millions of glycosidase sequence variants. Microfluidic-based deep mutational scanning provides a comprehensive and unbiased view of the enzyme function landscape. The mapping displays expected patterns of mutational tolerance and a strong correspondence to sequence variation within the enzyme family, but also reveals previously unreported sites that are crucial for glycosidase function. We modified the screening protocol to include a high-temperature incubation step, and the resulting thermotolerance landscape allowed the discovery of mutations that enhance enzyme thermostability. Droplet microfluidics provides a general platform for enzyme screening that, when combined with DNA-sequencing technologies, enables high-throughput mapping of enzyme sequence space.
Mapping DNA polymerase errors by single-molecule sequencing
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lee, David F.; Lu, Jenny; Chang, Seungwoo
Genomic integrity is compromised by DNA polymerase replication errors, which occur in a sequence-dependent manner across the genome. Accurate and complete quantification of a DNA polymerase's error spectrum is challenging because errors are rare and difficult to detect. We report a high-throughput sequencing assay to map in vitro DNA replication errors at the single-molecule level. Unlike previous methods, our assay is able to rapidly detect a large number of polymerase errors at base resolution over any template substrate without quantification bias. To overcome the high error rate of high-throughput sequencing, our assay uses a barcoding strategy in which each replicationmore » product is tagged with a unique nucleotide sequence before amplification. Here, this allows multiple sequencing reads of the same product to be compared so that sequencing errors can be found and removed. We demonstrate the ability of our assay to characterize the average error rate, error hotspots and lesion bypass fidelity of several DNA polymerases.« less
Mapping DNA polymerase errors by single-molecule sequencing
Lee, David F.; Lu, Jenny; Chang, Seungwoo; ...
2016-05-16
Genomic integrity is compromised by DNA polymerase replication errors, which occur in a sequence-dependent manner across the genome. Accurate and complete quantification of a DNA polymerase's error spectrum is challenging because errors are rare and difficult to detect. We report a high-throughput sequencing assay to map in vitro DNA replication errors at the single-molecule level. Unlike previous methods, our assay is able to rapidly detect a large number of polymerase errors at base resolution over any template substrate without quantification bias. To overcome the high error rate of high-throughput sequencing, our assay uses a barcoding strategy in which each replicationmore » product is tagged with a unique nucleotide sequence before amplification. Here, this allows multiple sequencing reads of the same product to be compared so that sequencing errors can be found and removed. We demonstrate the ability of our assay to characterize the average error rate, error hotspots and lesion bypass fidelity of several DNA polymerases.« less
High-Throughput Next-Generation Sequencing of Polioviruses
Montmayeur, Anna M.; Schmidt, Alexander; Zhao, Kun; Magaña, Laura; Iber, Jane; Castro, Christina J.; Chen, Qi; Henderson, Elizabeth; Ramos, Edward; Shaw, Jing; Tatusov, Roman L.; Dybdahl-Sissoko, Naomi; Endegue-Zanga, Marie Claire; Adeniji, Johnson A.; Oberste, M. Steven; Burns, Cara C.
2016-01-01
ABSTRACT The poliovirus (PV) is currently targeted for worldwide eradication and containment. Sanger-based sequencing of the viral protein 1 (VP1) capsid region is currently the standard method for PV surveillance. However, the whole-genome sequence is sometimes needed for higher resolution global surveillance. In this study, we optimized whole-genome sequencing protocols for poliovirus isolates and FTA cards using next-generation sequencing (NGS), aiming for high sequence coverage, efficiency, and throughput. We found that DNase treatment of poliovirus RNA followed by random reverse transcription (RT), amplification, and the use of the Nextera XT DNA library preparation kit produced significantly better results than other preparations. The average viral reads per total reads, a measurement of efficiency, was as high as 84.2% ± 15.6%. PV genomes covering >99 to 100% of the reference length were obtained and validated with Sanger sequencing. A total of 52 PV genomes were generated, multiplexing as many as 64 samples in a single Illumina MiSeq run. This high-throughput, sequence-independent NGS approach facilitated the detection of a diverse range of PVs, especially for those in vaccine-derived polioviruses (VDPV), circulating VDPV, or immunodeficiency-related VDPV. In contrast to results from previous studies on other viruses, our results showed that filtration and nuclease treatment did not discernibly increase the sequencing efficiency of PV isolates. However, DNase treatment after nucleic acid extraction to remove host DNA significantly improved the sequencing results. This NGS method has been successfully implemented to generate PV genomes for molecular epidemiology of the most recent PV isolates. Additionally, the ability to obtain full PV genomes from FTA cards will aid in facilitating global poliovirus surveillance. PMID:27927929
Whole-Genome Sequencing and Assembly with High-Throughput, Short-Read Technologies
Sundquist, Andreas; Ronaghi, Mostafa; Tang, Haixu; Pevzner, Pavel; Batzoglou, Serafim
2007-01-01
While recently developed short-read sequencing technologies may dramatically reduce the sequencing cost and eventually achieve the $1000 goal for re-sequencing, their limitations prevent the de novo sequencing of eukaryotic genomes with the standard shotgun sequencing protocol. We present SHRAP (SHort Read Assembly Protocol), a sequencing protocol and assembly methodology that utilizes high-throughput short-read technologies. We describe a variation on hierarchical sequencing with two crucial differences: (1) we select a clone library from the genome randomly rather than as a tiling path and (2) we sample clones from the genome at high coverage and reads from the clones at low coverage. We assume that 200 bp read lengths with a 1% error rate and inexpensive random fragment cloning on whole mammalian genomes is feasible. Our assembly methodology is based on first ordering the clones and subsequently performing read assembly in three stages: (1) local assemblies of regions significantly smaller than a clone size, (2) clone-sized assemblies of the results of stage 1, and (3) chromosome-sized assemblies. By aggressively localizing the assembly problem during the first stage, our method succeeds in assembling short, unpaired reads sampled from repetitive genomes. We tested our assembler using simulated reads from D. melanogaster and human chromosomes 1, 11, and 21, and produced assemblies with large sets of contiguous sequence and a misassembly rate comparable to other draft assemblies. Tested on D. melanogaster and the entire human genome, our clone-ordering method produces accurate maps, thereby localizing fragment assembly and enabling the parallelization of the subsequent steps of our pipeline. Thus, we have demonstrated that truly inexpensive de novo sequencing of mammalian genomes will soon be possible with high-throughput, short-read technologies using our methodology. PMID:17534434
YAMAT-seq: an efficient method for high-throughput sequencing of mature transfer RNAs.
Shigematsu, Megumi; Honda, Shozo; Loher, Phillipe; Telonis, Aristeidis G; Rigoutsos, Isidore; Kirino, Yohei
2017-05-19
Besides translation, transfer RNAs (tRNAs) play many non-canonical roles in various biological pathways and exhibit highly variable expression profiles. To unravel the emerging complexities of tRNA biology and molecular mechanisms underlying them, an efficient tRNA sequencing method is required. However, the rigid structure of tRNA has been presenting a challenge to the development of such methods. We report the development of Y-shaped Adapter-ligated MAture TRNA sequencing (YAMAT-seq), an efficient and convenient method for high-throughput sequencing of mature tRNAs. YAMAT-seq circumvents the issue of inefficient adapter ligation, a characteristic of conventional RNA sequencing methods for mature tRNAs, by employing the efficient and specific ligation of Y-shaped adapter to mature tRNAs using T4 RNA Ligase 2. Subsequent cDNA amplification and next-generation sequencing successfully yield numerous mature tRNA sequences. YAMAT-seq has high specificity for mature tRNAs and high sensitivity to detect most isoacceptors from minute amount of total RNA. Moreover, YAMAT-seq shows quantitative capability to estimate expression levels of mature tRNAs, and has high reproducibility and broad applicability for various cell lines. YAMAT-seq thus provides high-throughput technique for identifying tRNA profiles and their regulations in various transcriptomes, which could play important regulatory roles in translation and other biological processes. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
USDA-ARS?s Scientific Manuscript database
Next-generation sequencing technologies are able to produce high-throughput short sequence reads in a cost-effective fashion. The emergence of these technologies has not only facilitated genome sequencing but also changed the landscape of life sciences. Here I survey their major applications ranging...
Recent Applications of DNA Sequencing Technologies in Food, Nutrition and Agriculture
USDA-ARS?s Scientific Manuscript database
Next-generation DNA sequencing technologies are able to produce millions of short sequence reads in a high-throughput, cost-effective fashion. The emergence of these technologies has not only facilitated genome sequencing but also changed the landscape of life sciences. This review surveys their rec...
Next generation sequencers: methods and applications in food-borne pathogens
USDA-ARS?s Scientific Manuscript database
Next generation sequencers are able to produce millions of short sequence reads in a high-throughput, low-cost way. The emergence of these technologies has not only facilitated genome sequencing but also started to change the landscape of life sciences. This chapter will survey their methods and app...
Verzotto, Davide; M Teo, Audrey S; Hillmer, Axel M; Nagarajan, Niranjan
2016-01-01
Resolution of complex repeat structures and rearrangements in the assembly and analysis of large eukaryotic genomes is often aided by a combination of high-throughput sequencing and genome-mapping technologies (for example, optical restriction mapping). In particular, mapping technologies can generate sparse maps of large DNA fragments (150 kilo base pairs (kbp) to 2 Mbp) and thus provide a unique source of information for disambiguating complex rearrangements in cancer genomes. Despite their utility, combining high-throughput sequencing and mapping technologies has been challenging because of the lack of efficient and sensitive map-alignment algorithms for robustly aligning error-prone maps to sequences. We introduce a novel seed-and-extend glocal (short for global-local) alignment method, OPTIMA (and a sliding-window extension for overlap alignment, OPTIMA-Overlap), which is the first to create indexes for continuous-valued mapping data while accounting for mapping errors. We also present a novel statistical model, agnostic with respect to technology-dependent error rates, for conservatively evaluating the significance of alignments without relying on expensive permutation-based tests. We show that OPTIMA and OPTIMA-Overlap outperform other state-of-the-art approaches (1.6-2 times more sensitive) and are more efficient (170-200 %) and precise in their alignments (nearly 99 % precision). These advantages are independent of the quality of the data, suggesting that our indexing approach and statistical evaluation are robust, provide improved sensitivity and guarantee high precision.
Speiser, Daniel I; Pankey, M Sabrina; Zaharoff, Alexander K; Battelle, Barbara A; Bracken-Grissom, Heather D; Breinholt, Jesse W; Bybee, Seth M; Cronin, Thomas W; Garm, Anders; Lindgren, Annie R; Patel, Nipam H; Porter, Megan L; Protas, Meredith E; Rivera, Ajna S; Serb, Jeanne M; Zigler, Kirk S; Crandall, Keith A; Oakley, Todd H
2014-11-19
Tools for high throughput sequencing and de novo assembly make the analysis of transcriptomes (i.e. the suite of genes expressed in a tissue) feasible for almost any organism. Yet a challenge for biologists is that it can be difficult to assign identities to gene sequences, especially from non-model organisms. Phylogenetic analyses are one useful method for assigning identities to these sequences, but such methods tend to be time-consuming because of the need to re-calculate trees for every gene of interest and each time a new data set is analyzed. In response, we employed existing tools for phylogenetic analysis to produce a computationally efficient, tree-based approach for annotating transcriptomes or new genomes that we term Phylogenetically-Informed Annotation (PIA), which places uncharacterized genes into pre-calculated phylogenies of gene families. We generated maximum likelihood trees for 109 genes from a Light Interaction Toolkit (LIT), a collection of genes that underlie the function or development of light-interacting structures in metazoans. To do so, we searched protein sequences predicted from 29 fully-sequenced genomes and built trees using tools for phylogenetic analysis in the Osiris package of Galaxy (an open-source workflow management system). Next, to rapidly annotate transcriptomes from organisms that lack sequenced genomes, we repurposed a maximum likelihood-based Evolutionary Placement Algorithm (implemented in RAxML) to place sequences of potential LIT genes on to our pre-calculated gene trees. Finally, we implemented PIA in Galaxy and used it to search for LIT genes in 28 newly-sequenced transcriptomes from the light-interacting tissues of a range of cephalopod mollusks, arthropods, and cubozoan cnidarians. Our new trees for LIT genes are available on the Bitbucket public repository ( http://bitbucket.org/osiris_phylogenetics/pia/ ) and we demonstrate PIA on a publicly-accessible web server ( http://galaxy-dev.cnsi.ucsb.edu/pia/ ). Our new trees for LIT genes will be a valuable resource for researchers studying the evolution of eyes or other light-interacting structures. We also introduce PIA, a high throughput method for using phylogenetic relationships to identify LIT genes in transcriptomes from non-model organisms. With simple modifications, our methods may be used to search for different sets of genes or to annotate data sets from taxa outside of Metazoa.
Uniform, optimal signal processing of mapped deep-sequencing data.
Kumar, Vibhor; Muratani, Masafumi; Rayan, Nirmala Arul; Kraus, Petra; Lufkin, Thomas; Ng, Huck Hui; Prabhakar, Shyam
2013-07-01
Despite their apparent diversity, many problems in the analysis of high-throughput sequencing data are merely special cases of two general problems, signal detection and signal estimation. Here we adapt formally optimal solutions from signal processing theory to analyze signals of DNA sequence reads mapped to a genome. We describe DFilter, a detection algorithm that identifies regulatory features in ChIP-seq, DNase-seq and FAIRE-seq data more accurately than assay-specific algorithms. We also describe EFilter, an estimation algorithm that accurately predicts mRNA levels from as few as 1-2 histone profiles (R ∼0.9). Notably, the presence of regulatory motifs in promoters correlates more with histone modifications than with mRNA levels, suggesting that histone profiles are more predictive of cis-regulatory mechanisms. We show by applying DFilter and EFilter to embryonic forebrain ChIP-seq data that regulatory protein identification and functional annotation are feasible despite tissue heterogeneity. The mathematical formalism underlying our tools facilitates integrative analysis of data from virtually any sequencing-based functional profile.
Li, Guoliang; Yuan, Hui; Zhang, Hongchao; Li, Yanjun; Xie, Xixian; Chen, Ning
2017-01-01
In the present study, a novel breeding strategy of atmospheric and room temperature plasma (ARTP) mutagenesis was used to improve the uridine production of engineered Bacillus subtilis TD12np. A high-throughput screening method was established using both resistant plates and 96-well microplates to select the ideal mutants with diverse phenotypes. Mutant F126 accumulated 5.7 and 30.3 g/L uridine after 30 h in shake-flask and 48 h in fed-batch fermentation, respectively, which represented a 4.4- and 8.7-fold increase over the parent strain. Sequence analysis of the pyrimidine nucleotide biosynthetic operon in the representative mutants showed that proline 1016 and glutamate 949 in the large subunit of B. subtilis carbamoyl phosphate synthetase were of importance for the allosteric regulation caused by uridine 5′-monophosphate. The proposed mutation method with efficient high-throughput screening assay was proved to be an appropriate strategy to obtain uridine-overproducing strain. PMID:28472077
Fan, Xiaoguang; Wu, Heyun; Li, Guoliang; Yuan, Hui; Zhang, Hongchao; Li, Yanjun; Xie, Xixian; Chen, Ning
2017-01-01
In the present study, a novel breeding strategy of atmospheric and room temperature plasma (ARTP) mutagenesis was used to improve the uridine production of engineered Bacillus subtilis TD12np. A high-throughput screening method was established using both resistant plates and 96-well microplates to select the ideal mutants with diverse phenotypes. Mutant F126 accumulated 5.7 and 30.3 g/L uridine after 30 h in shake-flask and 48 h in fed-batch fermentation, respectively, which represented a 4.4- and 8.7-fold increase over the parent strain. Sequence analysis of the pyrimidine nucleotide biosynthetic operon in the representative mutants showed that proline 1016 and glutamate 949 in the large subunit of B. subtilis carbamoyl phosphate synthetase were of importance for the allosteric regulation caused by uridine 5'-monophosphate. The proposed mutation method with efficient high-throughput screening assay was proved to be an appropriate strategy to obtain uridine-overproducing strain.
[The future of forensic DNA analysis for criminal justice].
Laurent, François-Xavier; Vibrac, Geoffrey; Rubio, Aurélien; Thévenot, Marie-Thérèse; Pène, Laurent
2017-11-01
In the criminal framework, the analysis of approximately 20 DNA microsatellites enables the establishment of a genetic profile with a high statistical power of discrimination. This technique gives us the possibility to establish or exclude a match between a biological trace detected at a crime scene and a suspect whose DNA was collected via an oral swab. However, conventional techniques do tend to complexify the interpretation of complex DNA samples, such as degraded DNA and mixture DNA. The aim of this review is to highlight the powerness of new forensic DNA methods (including high-throughput sequencing or single-cell sequencing) to facilitate the interpretation of the expert with full compliance with existing french legislation. © 2017 médecine/sciences – Inserm.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wang, Yonggang, E-mail: wangyg@ustc.edu.cn; Hui, Cong; Liu, Chong
The contribution of this paper is proposing a new entropy extraction mechanism based on sampling phase jitter in ring oscillators to make a high throughput true random number generator in a field programmable gate array (FPGA) practical. Starting from experimental observation and analysis of the entropy source in FPGA, a multi-phase sampling method is exploited to harvest the clock jitter with a maximum entropy and fast sampling speed. This parametrized design is implemented in a Xilinx Artix-7 FPGA, where the carry chains in the FPGA are explored to realize the precise phase shifting. The generator circuit is simple and resource-saving,more » so that multiple generation channels can run in parallel to scale the output throughput for specific applications. The prototype integrates 64 circuit units in the FPGA to provide a total output throughput of 7.68 Gbps, which meets the requirement of current high-speed quantum key distribution systems. The randomness evaluation, as well as its robustness to ambient temperature, confirms that the new method in a purely digital fashion can provide high-speed high-quality random bit sequences for a variety of embedded applications.« less
Wang, Yonggang; Hui, Cong; Liu, Chong; Xu, Chao
2016-04-01
The contribution of this paper is proposing a new entropy extraction mechanism based on sampling phase jitter in ring oscillators to make a high throughput true random number generator in a field programmable gate array (FPGA) practical. Starting from experimental observation and analysis of the entropy source in FPGA, a multi-phase sampling method is exploited to harvest the clock jitter with a maximum entropy and fast sampling speed. This parametrized design is implemented in a Xilinx Artix-7 FPGA, where the carry chains in the FPGA are explored to realize the precise phase shifting. The generator circuit is simple and resource-saving, so that multiple generation channels can run in parallel to scale the output throughput for specific applications. The prototype integrates 64 circuit units in the FPGA to provide a total output throughput of 7.68 Gbps, which meets the requirement of current high-speed quantum key distribution systems. The randomness evaluation, as well as its robustness to ambient temperature, confirms that the new method in a purely digital fashion can provide high-speed high-quality random bit sequences for a variety of embedded applications.
Implementation of Cloud based next generation sequencing data analysis in a clinical laboratory.
Onsongo, Getiria; Erdmann, Jesse; Spears, Michael D; Chilton, John; Beckman, Kenneth B; Hauge, Adam; Yohe, Sophia; Schomaker, Matthew; Bower, Matthew; Silverstein, Kevin A T; Thyagarajan, Bharat
2014-05-23
The introduction of next generation sequencing (NGS) has revolutionized molecular diagnostics, though several challenges remain limiting the widespread adoption of NGS testing into clinical practice. One such difficulty includes the development of a robust bioinformatics pipeline that can handle the volume of data generated by high-throughput sequencing in a cost-effective manner. Analysis of sequencing data typically requires a substantial level of computing power that is often cost-prohibitive to most clinical diagnostics laboratories. To address this challenge, our institution has developed a Galaxy-based data analysis pipeline which relies on a web-based, cloud-computing infrastructure to process NGS data and identify genetic variants. It provides additional flexibility, needed to control storage costs, resulting in a pipeline that is cost-effective on a per-sample basis. It does not require the usage of EBS disk to run a sample. We demonstrate the validation and feasibility of implementing this bioinformatics pipeline in a molecular diagnostics laboratory. Four samples were analyzed in duplicate pairs and showed 100% concordance in mutations identified. This pipeline is currently being used in the clinic and all identified pathogenic variants confirmed using Sanger sequencing further validating the software.
Sridharan, Vinod; Heimiller, Joseph; Robida, Mark D; Singh, Ravinder
2016-01-01
The Drosophila polypyrimidine tract-binding protein (dmPTB or hephaestus) plays an important role during spermatogenesis. The heph2 mutation in this gene results in a specific defect in spermatogenesis, causing aberrant spermatid individualization and male sterility. However, the array of molecular defects in the mutant remains uncharacterized. Using an unbiased high throughput sequencing approach, we have identified transcripts that are misregulated in this mutant. Aberrant transcripts show altered expression levels, exon skipping, and alternative 5' ends. We independently verified these findings by reverse-transcription and polymerase chain reaction (RT-PCR) analysis. Our analysis shows misregulation of transcripts that have been connected to spermatogenesis, including components of the actomyosin cytoskeletal apparatus. We show, for example, that the Myosin light chain 1 (Mlc1) transcript is aberrantly spliced. Furthermore, bioinformatics analysis reveals that Mlc1 contains a high affinity binding site(s) for dmPTB and that the site is conserved in many Drosophila species. We discuss that Mlc1 and other components of the actomyosin cytoskeletal apparatus offer important molecular links between the loss of dmPTB function and the observed developmental defect in spermatogenesis. This study provides the first comprehensive list of genes misregulated in vivo in the heph2 mutant in Drosophila and offers insight into the role of dmPTB during spermatogenesis.
Systems biology of cancer biomarker detection.
Mitra, Sanga; Das, Smarajit; Chakrabarti, Jayprokas
2013-01-01
Cancer systems-biology is an ever-growing area of research due to explosion of data; how to mine these data and extract useful information is the problem. To have an insight on carcinogenesis one need to systematically mine several resources, such as databases, microarray and next-generation sequences. This review encompasses management and analysis of cancer data, databases construction and data deposition, whole transcriptome and genome comparison, analysing results from high throughput experiments to uncover cellular pathways and molecular interactions, and the design of effective algorithms to identify potential biomarkers. Recent technical advances such as ChIP-on-chip, ChIP-seq and RNA-seq can be applied to get epigenetic information transformed into a high-throughput endeavour to which systems biology and bioinformatics are making significant inroads. The data from ENCODE and GENCODE projects available through UCSC genome browser can be considered as benchmark for comparison and meta-analysis. A pipeline for integrating next generation sequencing data, microarray data, and putting them together with the existing database is discussed. The understanding of cancer genomics is changing the way we approach cancer diagnosis and treatment. To give a better understanding of utilizing available resources' we have chosen oral cancer to show how and what kind of analysis can be done. This review is a computational genomic primer that provides a bird's eye view of computational and bioinformatics' tools currently available to perform integrated genomic and system biology analyses of several carcinoma.
Zhao, Zhehao; Yu, Siran; Li, Min; Gui, Xin; Li, Ping
2018-03-21
In this study, the presence of microRNAs in coconut water was identified by real-time polymerase chain reaction (PCR) based on the results of high-throughput small RNA sequencing. In addition, the differences in microRNA content between immature and mature coconut water were compared. A total of 47 known microRNAs belonging to 25 families and 14 new microRNAs were identified in coconut endosperm. Through analysis using a target gene prediction software, potential microRNA target genes were identified in the human genome. Real-time PCR showed that the level of most microRNAs was higher in mature coconut water than in immature coconut water. Then, exosome-like nanoparticles were isolated from coconut water. After ultracentrifugation, some particle structures were seen in coconut water samples using 1,1'-dioctadecyl-3,3,3',3'-tetramethylindocarbocyanine perchlorate fluorescence staining. Subsequent scanning electron microscopy observation and dynamic light scattering analysis also revealed some exosome-like nanoparticles in coconut water, and the mean diameters of the particles detected by the two methods were 13.16 and 59.72 nm, respectively. In conclusion, there are extracellular microRNAs in coconut water, and their levels are higher in mature coconut water than in immature coconut water. Some exosome-like nanoparticles were isolated from coconut water, and the diameter of these particles was smaller than that of animal-derived exosomes.
The challenges of sequencing by synthesis.
Fuller, Carl W; Middendorf, Lyle R; Benner, Steven A; Church, George M; Harris, Timothy; Huang, Xiaohua; Jovanovich, Stevan B; Nelson, John R; Schloss, Jeffery A; Schwartz, David C; Vezenov, Dmitri V
2009-11-01
DNA sequencing-by-synthesis (SBS) technology, using a polymerase or ligase enzyme as its core biochemistry, has already been incorporated in several second-generation DNA sequencing systems with significant performance. Notwithstanding the substantial success of these SBS platforms, challenges continue to limit the ability to reduce the cost of sequencing a human genome to $100,000 or less. Achieving dramatically reduced cost with enhanced throughput and quality will require the seamless integration of scientific and technological effort across disciplines within biochemistry, chemistry, physics and engineering. The challenges include sample preparation, surface chemistry, fluorescent labels, optimizing the enzyme-substrate system, optics, instrumentation, understanding tradeoffs of throughput versus accuracy, and read-length/phasing limitations. By framing these challenges in a manner accessible to a broad community of scientists and engineers, we hope to solicit input from the broader research community on means of accelerating the advancement of genome sequencing technology.
Deep sequencing in library selection projects: what insight does it bring?
Glanville, J; D'Angelo, S; Khan, T A; Reddy, S T; Naranjo, L; Ferrara, F; Bradbury, A R M
2015-08-01
High throughput sequencing is poised to change all aspects of the way antibodies and other binders are discovered and engineered. Millions of available sequence reads provide an unprecedented sampling depth able to guide the design and construction of effective, high quality naïve libraries containing tens of billions of unique molecules. Furthermore, during selections, high throughput sequencing enables quantitative tracing of enriched clones and position-specific guidance to amino acid variation under positive selection during antibody engineering. Successful application of the technologies relies on specific PCR reagent design, correct sequencing platform selection, and effective use of computational tools and statistical measures to remove error, identify antibodies, estimate diversity, and extract signatures of selection from the clone down to individual structural positions. Here we review these considerations and discuss some of the remaining challenges to the widespread adoption of the technology. Copyright © 2015 Elsevier Ltd. All rights reserved.
Deep sequencing in library selection projects: what insight does it bring?
Glanville, J; D’Angelo, S; Khan, T.A.; Reddy, S. T.; Naranjo, L.; Ferrara, F.; Bradbury, A.R.M.
2015-01-01
High throughput sequencing is poised to change all aspects of the way antibodies and other binders are discovered and engineered. Millions of available sequence reads provide an unprecedented sampling depth able to guide the design and construction of effective, high quality naïve libraries containing tens of billions of unique molecules. Furthermore, during selections, high throughput sequencing enables quantitative tracing of enriched clones and position-specific guidance to amino acid variation under positive selection during antibody engineering. Successful application of the technologies relies on specific PCR reagent design, correct sequencing platform selection, and effective use of computational tools and statistical measures to remove error, identify antibodies, estimate diversity, and extract signatures of selection from the clone down to individual structural positions. Here we review these considerations and discuss some of the remaining challenges to the widespread adoption of the technology. PMID:26451649
Measuring Sister Chromatid Cohesion Protein Genome Occupancy in Drosophila melanogaster by ChIP-seq.
Dorsett, Dale; Misulovin, Ziva
2017-01-01
This chapter presents methods to conduct and analyze genome-wide chromatin immunoprecipitation of the cohesin complex and the Nipped-B cohesin loading factor in Drosophila cells using high-throughput DNA sequencing (ChIP-seq). Procedures for isolation of chromatin, immunoprecipitation, and construction of sequencing libraries for the Ion Torrent Proton high throughput sequencer are detailed, and computational methods to calculate occupancy as input-normalized fold-enrichment are described. The results obtained by ChIP-seq are compared to those obtained by ChIP-chip (genomic ChIP using tiling microarrays), and the effects of sequencing depth on the accuracy are analyzed. ChIP-seq provides similar sensitivity and reproducibility as ChIP-chip, and identifies the same broad regions of occupancy. The locations of enrichment peaks, however, can differ between ChIP-chip and ChIP-seq, and low sequencing depth can splinter broad regions of occupancy into distinct peaks.
NASA Astrophysics Data System (ADS)
Moreland, Blythe; Oman, Kenji; Curfman, John; Yan, Pearlly; Bundschuh, Ralf
Methyl-binding domain (MBD) protein pulldown experiments have been a valuable tool in measuring the levels of methylated CpG dinucleotides. Due to the frequent use of this technique, high-throughput sequencing data sets are available that allow a detailed quantitative characterization of the underlying interaction between methylated DNA and MBD proteins. Analyzing such data sets, we first found that two such proteins cannot bind closer to each other than 2 bp, consistent with structural models of the DNA-protein interaction. Second, the large amount of sequencing data allowed us to find rather weak but nevertheless clearly statistically significant sequence preferences for several bases around the required CpG. These results demonstrate that pulldown sequencing is a high-precision tool in characterizing DNA-protein interactions. This material is based upon work supported by the National Science Foundation under Grant No. DMR-1410172.
A complete Neandertal mitochondrial genome sequence determined by high-throughput sequencing
Green, Richard E.; Malaspinas, Anna-Sapfo; Krause, Johannes; Briggs, Adrian W.; Johnson, Philip L. F.; Uhler, Caroline; Meyer, Matthias; Good, Jeffrey M.; Maricic, Tomislav; Stenzel, Udo; Prüfer, Kay; Siebauer, Michael; Burbano, Hernán A.; Ronan, Michael; Rothberg, Jonathan M.; Egholm, Michael; Rudan, Pavao; Brajković, Dejana; Kućan, Željko; Gušić, Ivan; Wikström, Mårten; Laakkonen, Liisa; Kelso, Janet; Slatkin, Montgomery; Pääbo, Svante
2008-01-01
Summary A complete mitochondrial (mt) genome sequence was reconstructed from a 38,000-year-old Neandertal individual using 8,341 mtDNA sequences identified among 4.8 Gb of DNA generated from ~0.3 grams of bone. Analysis of the assembled sequence unequivocally establishes that the Neandertal mtDNA falls outside the variation of extant human mtDNAs and allows an estimate of the divergence date between the two mtDNA lineages of 660,000±140,000 years. Of the 13 proteins encoded in the mtDNA, subunit 2 of cytochrome c oxidase of the mitochondrial electron transport chain has experienced the largest number of amino acid substitutions in human ancestors since the separation from Neandertals. There is evidence that purifying selection in the Neandertal mtDNA was reduced compared to other primate lineages suggesting that the effective population size of Neandertals was small. PMID:18692465
Proteome Studies of Filamentous Fungi
DOE Office of Scientific and Technical Information (OSTI.GOV)
Baker, Scott E.; Panisko, Ellen A.
2011-04-20
The continued fast pace of fungal genome sequence generation has enabled proteomic analysis of a wide breadth of organisms that span the breadth of the Kingdom Fungi. There is some phylogenetic bias to the current catalog of fungi with reasonable DNA sequence databases (genomic or EST) that could be analyzed at a global proteomic level. However, the rapid development of next generation sequencing platforms has lowered the cost of genome sequencing such that in the near future, having a genome sequence will no longer be a time or cost bottleneck for downstream proteomic (and transcriptomic) analyses. High throughput, non-gel basedmore » proteomics offers a snapshot of proteins present in a given sample at a single point in time. There are a number of different variations on the general method and technologies for identifying peptides in a given sample. We present a method that can serve as a “baseline” for proteomic studies of fungi.« less
Shotgun Optical Maps of the Whole Escherichia coli O157:H7 Genome
Lim, Alex; Dimalanta, Eileen T.; Potamousis, Konstantinos D.; Yen, Galex; Apodoca, Jennifer; Tao, Chunhong; Lin, Jieyi; Qi, Rong; Skiadas, John; Ramanathan, Arvind; Perna, Nicole T.; Plunkett, Guy; Burland, Valerie; Mau, Bob; Hackett, Jeremiah; Blattner, Frederick R.; Anantharaman, Thomas S.; Mishra, Bhubaneswar; Schwartz, David C.
2001-01-01
We have constructed NheI and XhoI optical maps of Escherichia coli O157:H7 solely from genomic DNA molecules to provide a uniquely valuable scaffold for contig closure and sequence validation. E. coli O157:H7 is a common pathogen found in contaminated food and water. Our approach obviated the need for the analysis of clones, PCR products, and hybridizations, because maps were constructed from ensembles of single DNA molecules. Shotgun sequencing of bacterial genomes remains labor-intensive, despite advances in sequencing technology. This is partly due to manual intervention required during the last stages of finishing. The applicability of optical mapping to this problem was enhanced by advances in machine vision techniques that improved mapping throughput and created a path to full automation of mapping. Comparisons were made between maps and sequence data that characterized sequence gaps and guided nascent assemblies. PMID:11544203
Proteome studies of filamentous fungi.
Baker, Scott E; Panisko, Ellen A
2011-01-01
The continued fast pace of fungal genome sequence generation has enabled proteomic analysis of a wide variety of organisms that span the breadth of the Kingdom Fungi. There is some phylogenetic bias to the current catalog of fungi with reasonable DNA sequence databases (genomic or EST) that could be analyzed at a global proteomic level. However, the rapid development of next generation sequencing platforms has lowered the cost of genome sequencing such that in the near future, having a genome sequence will no longer be a time or cost bottleneck for downstream proteomic (and transcriptomic) analyses. High throughput, nongel-based proteomics offers a snapshot of proteins present in a given sample at a single point in time. There are a number of variations on the general methods and technologies for identifying peptides in a given sample. We present a method that can serve as a "baseline" for proteomic studies of fungi.
Bidirectional Retroviral Integration Site PCR Methodology and Quantitative Data Analysis Workflow.
Suryawanshi, Gajendra W; Xu, Song; Xie, Yiming; Chou, Tom; Kim, Namshin; Chen, Irvin S Y; Kim, Sanggu
2017-06-14
Integration Site (IS) assays are a critical component of the study of retroviral integration sites and their biological significance. In recent retroviral gene therapy studies, IS assays, in combination with next-generation sequencing, have been used as a cell-tracking tool to characterize clonal stem cell populations sharing the same IS. For the accurate comparison of repopulating stem cell clones within and across different samples, the detection sensitivity, data reproducibility, and high-throughput capacity of the assay are among the most important assay qualities. This work provides a detailed protocol and data analysis workflow for bidirectional IS analysis. The bidirectional assay can simultaneously sequence both upstream and downstream vector-host junctions. Compared to conventional unidirectional IS sequencing approaches, the bidirectional approach significantly improves IS detection rates and the characterization of integration events at both ends of the target DNA. The data analysis pipeline described here accurately identifies and enumerates identical IS sequences through multiple steps of comparison that map IS sequences onto the reference genome and determine sequencing errors. Using an optimized assay procedure, we have recently published the detailed repopulation patterns of thousands of Hematopoietic Stem Cell (HSC) clones following transplant in rhesus macaques, demonstrating for the first time the precise time point of HSC repopulation and the functional heterogeneity of HSCs in the primate system. The following protocol describes the step-by-step experimental procedure and data analysis workflow that accurately identifies and quantifies identical IS sequences.
GENETIC-BASED ANALYTICAL METHODS FOR BACTERIA AND FUNGI
In the past two decades, advances in high-throughput sequencing technologies have lead to a veritable explosion in the generation of nucleic acid sequence information (1). While these advances are illustrated most prominently by the successful sequencing of the human genome, they...
Biosequence Similarity Search on the Mercury System
Krishnamurthy, Praveen; Buhler, Jeremy; Chamberlain, Roger; Franklin, Mark; Gyang, Kwame; Jacob, Arpith; Lancaster, Joseph
2007-01-01
Biosequence similarity search is an important application in modern molecular biology. Search algorithms aim to identify sets of sequences whose extensional similarity suggests a common evolutionary origin or function. The most widely used similarity search tool for biosequences is BLAST, a program designed to compare query sequences to a database. Here, we present the design of BLASTN, the version of BLAST that searches DNA sequences, on the Mercury system, an architecture that supports high-volume, high-throughput data movement off a data store and into reconfigurable hardware. An important component of application deployment on the Mercury system is the functional decomposition of the application onto both the reconfigurable hardware and the traditional processor. Both the Mercury BLASTN application design and its performance analysis are described. PMID:18846267
Gutarowska, Beata; Celikkol-Aydin, Sukriye; Bonifay, Vincent; Otlewska, Anna; Aydin, Egemen; Oldham, Athenia L; Brauer, Jonathan I; Duncan, Kathleen E; Adamiak, Justyna; Sunner, Jan A; Beech, Iwona B
2015-01-01
Preservation of cultural heritage is of paramount importance worldwide. Microbial colonization of construction materials, such as wood, brick, mortar, and stone in historic buildings can lead to severe deterioration. The aim of the present study was to give modern insight into the phylogenetic diversity and activated metabolic pathways of microbial communities colonized historic objects located in the former Auschwitz II-Birkenau concentration and extermination camp in Oświecim, Poland. For this purpose we combined molecular, microscopic and chemical methods. Selected specimens were examined using Field Emission Scanning Electron Microscopy (FESEM), metabolomic analysis and high-throughput Illumina sequencing. FESEM imaging revealed the presence of complex microbial communities comprising diatoms, fungi and bacteria, mainly cyanobacteria and actinobacteria, on sample surfaces. Microbial diversity of brick specimens appeared higher than that of the wood and was dominated by algae and cyanobacteria, while wood was mainly colonized by fungi. DNA sequences documented the presence of 15 bacterial phyla representing 99 genera including Halomonas, Halorhodospira, Salinisphaera, Salinibacterium, Rubrobacter, Streptomyces, Arthrobacter and nine fungal classes represented by 113 genera including Cladosporium, Acremonium, Alternaria, Engyodontium, Penicillium, Rhizopus, and Aureobasidium. Most of the identified sequences were characteristic of organisms implicated in deterioration of wood and brick. Metabolomic data indicated the activation of numerous metabolic pathways, including those regulating the production of primary and secondary metabolites, for example, metabolites associated with the production of antibiotics, organic acids and deterioration of organic compounds. The study demonstrated that a combination of electron microscopy imaging with metabolomic and genomic techniques allows to link the phylogenetic information and metabolic profiles of microbial communities and to shed new light on biodeterioration processes.
Tani, Akio; Sahin, Nurettin; Matsuyama, Yumiko; Enomoto, Takashi; Nishimura, Naoki; Yokota, Akira; Kimbara, Kazuhide
2012-01-01
Methylobacterium species are ubiquitous α-proteobacteria that reside in the phyllosphere and are fed by methanol that is emitted from plants. In this study, we applied whole-cell matrix-assisted laser desorption/ionization time-of-flight mass spectrometry analysis (WC-MS) to evaluate the diversity of Methylobacterium species collected from a variety of plants. The WC-MS spectrum was reproducible through two weeks of cultivation on different media. WC-MS spectrum peaks of M. extorquens strain AM1 cells were attributed to ribosomal proteins, but those were not were also found. We developed a simple method for rapid identification based on spectra similarity. Using all available type strains of Methylobacterium species, the method provided a certain threshold similarity value for species-level discrimination, although the genus contains some type strains that could not be easily discriminated solely by 16S rRNA gene sequence similarity. Next, we evaluated the WC-MS data of approximately 200 methylotrophs isolated from various plants with MALDI Biotyper software (Bruker Daltonics). Isolates representing each cluster were further identified by 16S rRNA gene sequencing. In most cases, the identification by WC-MS matched that by sequencing, and isolates with unique spectra represented possible novel species. The strains belonging to M. extorquens, M. adhaesivum, M. marchantiae, M. komagatae, M. brachiatum, M. radiotolerans, and novel lineages close to M. adhaesivum, many of which were isolated from bryophytes, were found to be the most frequent phyllospheric colonizers. The WC-MS technique provides emerging high-throughputness in the identification of known/novel species of bacteria, enabling the selection of novel species in a library and identification without 16S rRNA gene sequencing. PMID:22808262
Song, Zhewei; Du, Hai; Zhang, Yan; Xu, Yan
2017-01-01
Fermentation microbiota is specific microorganisms that generate different types of metabolites in many productions. In traditional solid-state fermentation, the structural composition and functional capacity of the core microbiota determine the quality and quantity of products. As a typical example of food fermentation, Chinese Maotai-flavor liquor production involves a complex of various microorganisms and a wide variety of metabolites. However, the microbial succession and functional shift of the core microbiota in this traditional food fermentation remain unclear. Here, high-throughput amplicons (16S rRNA gene amplicon sequencing and internal transcribed space amplicon sequencing) and metatranscriptomics sequencing technologies were combined to reveal the structure and function of the core microbiota in Chinese soy sauce aroma type liquor production. In addition, ultra-performance liquid chromatography and headspace-solid phase microextraction-gas chromatography-mass spectrometry were employed to provide qualitative and quantitative analysis of the major flavor metabolites. A total of 10 fungal and 11 bacterial genera were identified as the core microbiota. In addition, metatranscriptomic analysis revealed pyruvate metabolism in yeasts (genera Pichia, Schizosaccharomyces, Saccharomyces , and Zygosaccharomyces ) and lactic acid bacteria (genus Lactobacillus ) classified into two stages in the production of flavor components. Stage I involved high-level alcohol (ethanol) production, with the genus Schizosaccharomyces serving as the core functional microorganism. Stage II involved high-level acid (lactic acid and acetic acid) production, with the genus Lactobacillus serving as the core functional microorganism. The functional shift from the genus Schizosaccharomyces to the genus Lactobacillus drives flavor component conversion from alcohol (ethanol) to acid (lactic acid and acetic acid) in Chinese Maotai-flavor liquor production. Our findings provide insight into the effects of the core functional microbiota in soy sauce aroma type liquor production and the characteristics of the fermentation microbiota under different environmental conditions.
NASA Astrophysics Data System (ADS)
Ehlmann, B. L.; Dundar, M.
2016-12-01
Most clay minerals on Mars are Fe/Mg smectites or chlorites, which typically form from mafic protoliths in aqueous chemical systems that are relatively closed and thus require liquid water but not large amounts of water throughput and large-scale chemical leaching. They may thus form either in the subsurface or under select conditions at the surface. However, Al clay minerals, discovered in multiple locations on Mars (Arabia Terra, Northeast Syrtis, Libya Montes Terra Sirenum, Eridania, circum-Hellas, Valles Marineris) may provide evidence of substantial water throughput, if their protolith materials were basaltic. This is because formation of Al clays from a mafic protolith requires removal of Mg and either formation of accompanying Fe oxides or removal of Fe. Thus, the observed sequences of Al clays atop Fe/Mg clays were proposed to represent open system weathering and possibly a late climate optimum around the late Noachian/early Hesperian [1]. Later, they were comprehensively cataloged and reported to represent "weathering sequences" similar to those in terrestrial tropical environments [2]. However, key questions remain; in particular, how much water throughput over what time scale is required? The answer to this question has substantial bearing on the climate of early Mars. Recently, we employed a newly developed, non-parametric Bayesian algorithm [3,4] for semi-automatic identification of rare spectral classes on 139 CRISM images in areas with reported regional-scale occurrences of Al clays. Dozens of detections of the minerals alunite and jarosite were made with the algorithm and then verified by manual analysis. These sulfate hydroxides form only at low pHs, and thus their presence tightly constrains water chemistry. Here, we discuss the evidence for low pH surface waters associated with the weathering sequences and their implications for the cumulative duration of surface weathering. [1] Ehlmann et al., 2011, Nature | [2] Carter et al., 2015, Icarus | [3] Dundar et al., 2016, IEEE WHISPERS proceedings | [4] Ehlmann & Dundar, submitted
PRADA: pipeline for RNA sequencing data analysis.
Torres-García, Wandaliz; Zheng, Siyuan; Sivachenko, Andrey; Vegesna, Rahulsimham; Wang, Qianghu; Yao, Rong; Berger, Michael F; Weinstein, John N; Getz, Gad; Verhaak, Roel G W
2014-08-01
Technological advances in high-throughput sequencing necessitate improved computational tools for processing and analyzing large-scale datasets in a systematic automated manner. For that purpose, we have developed PRADA (Pipeline for RNA-Sequencing Data Analysis), a flexible, modular and highly scalable software platform that provides many different types of information available by multifaceted analysis starting from raw paired-end RNA-seq data: gene expression levels, quality metrics, detection of unsupervised and supervised fusion transcripts, detection of intragenic fusion variants, homology scores and fusion frame classification. PRADA uses a dual-mapping strategy that increases sensitivity and refines the analytical endpoints. PRADA has been used extensively and successfully in the glioblastoma and renal clear cell projects of The Cancer Genome Atlas program. http://sourceforge.net/projects/prada/ gadgetz@broadinstitute.org or rverhaak@mdanderson.org Supplementary data are available at Bioinformatics online. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Tipton, Jeremiah D; Tran, John C; Catherman, Adam D; Ahlf, Dorothy R; Durbin, Kenneth R; Lee, Ji Eun; Kellie, John F; Kelleher, Neil L; Hendrickson, Christopher L; Marshall, Alan G
2012-03-06
Current high-throughput top-down proteomic platforms provide routine identification of proteins less than 25 kDa with 4-D separations. This short communication reports the application of technological developments over the past few years that improve protein identification and characterization for masses greater than 25 kDa. Advances in separation science have allowed increased numbers of proteins to be identified, especially by nanoliquid chromatography (nLC) prior to mass spectrometry (MS) analysis. Further, a goal of high-throughput top-down proteomics is to extend the mass range for routine nLC MS analysis up to 80 kDa because gene sequence analysis predicts that ~70% of the human proteome is transcribed to be less than 80 kDa. Normally, large proteins greater than 50 kDa are identified and characterized by top-down proteomics through fraction collection and direct infusion at relatively low throughput. Further, other MS-based techniques provide top-down protein characterization, however at low resolution for intact mass measurement. Here, we present analysis of standard (up to 78 kDa) and whole cell lysate proteins by Fourier transform ion cyclotron resonance mass spectrometry (nLC electrospray ionization (ESI) FTICR MS). The separation platform reduced the complexity of the protein matrix so that, at 14.5 T, proteins from whole cell lysate up to 72 kDa are baseline mass resolved on a nano-LC chromatographic time scale. Further, the results document routine identification of proteins at improved throughput based on accurate mass measurement (less than 10 ppm mass error) of precursor and fragment ions for proteins up to 50 kDa.
Identification of microRNAs in PCV2 subclinically infected pigs by high throughput sequencing.
Núñez-Hernández, Fernando; Pérez, Lester J; Muñoz, Marta; Vera, Gonzalo; Tomás, Anna; Egea, Raquel; Córdoba, Sarai; Segalés, Joaquim; Sánchez, Armand; Núñez, José I
2015-03-03
Porcine circovirus type 2 (PCV2) is the essential etiological infectious agent of PCV2-systemic disease and has been associated with other swine diseases, all of them collectively known as porcine circovirus diseases. MicroRNAs (miRNAs) are a new class of small non-coding RNAs that regulate gene expression post-transcriptionally. miRNAs play an increasing role in many biological processes. The study of miRNA-mediated host-pathogen interactions has emerged in the last decade due to the important role that miRNAs play in antiviral defense. The objective of this study was to identify the miRNA expression pattern in PCV2 subclinically infected and non-infected pigs. For this purpose an experimental PCV2 infection was carried out and small-RNA libraries were constructed from tonsil and mediastinal lymph node (MLN) of infected and non-infected pigs. High throughput sequencing determined differences in miRNA expression in MLN between infected and non-infected while, in tonsil, a very conserved pattern was observed. In MLN, miRNA 126-3p, miRNA 126-5p, let-7d-3p, mir-129a and mir-let-7b-3p were up-regulated whereas mir-193a-5p, mir-574-5p and mir-34a down-regulated. Prediction of functional analysis showed that these miRNAs can be involved in pathways related to immune system and in processes related to the pathogenesis of PCV2, although functional assays are needed to support these predictions. This is the first study on miRNA gene expression in pigs infected with PCV2 using a high throughput sequencing approach in which several host miRNAs were differentially expressed in response to PCV2 infection.
A time-and-motion approach to micro-costing of high-throughput genomic assays
Costa, S.; Regier, D.A.; Meissner, B.; Cromwell, I.; Ben-Neriah, S.; Chavez, E.; Hung, S.; Steidl, C.; Scott, D.W.; Marra, M.A.; Peacock, S.J.; Connors, J.M.
2016-01-01
Background Genomic technologies are increasingly used to guide clinical decision-making in cancer control. Economic evidence about the cost-effectiveness of genomic technologies is limited, in part because of a lack of published comprehensive cost estimates. In the present micro-costing study, we used a time-and-motion approach to derive cost estimates for 3 genomic assays and processes—digital gene expression profiling (gep), fluorescence in situ hybridization (fish), and targeted capture sequencing, including bioinformatics analysis—in the context of lymphoma patient management. Methods The setting for the study was the Department of Lymphoid Cancer Research laboratory at the BC Cancer Agency in Vancouver, British Columbia. Mean per-case hands-on time and resource measurements were determined from a series of direct observations of each assay. Per-case cost estimates were calculated using a bottom-up costing approach, with labour, capital and equipment, supplies and reagents, and overhead costs included. Results The most labour-intensive assay was found to be fish at 258.2 minutes per case, followed by targeted capture sequencing (124.1 minutes per case) and digital gep (14.9 minutes per case). Based on a historical case throughput of 180 cases annually, the mean per-case cost (2014 Canadian dollars) was estimated to be $1,029.16 for targeted capture sequencing and bioinformatics analysis, $596.60 for fish, and $898.35 for digital gep with an 807-gene code set. Conclusions With the growing emphasis on personalized approaches to cancer management, the need for economic evaluations of high-throughput genomic assays is increasing. Through economic modelling and budget-impact analyses, the cost estimates presented here can be used to inform priority-setting decisions about the implementation of such assays in clinical practice. PMID:27803594
Identification of a novel vitivirus from grapevines in New Zealand.
Blouin, Arnaud G; Keenan, Sandi; Napier, Kathryn R; Barrero, Roberto A; MacDiarmid, Robin M
2018-01-01
We report a sequence of a novel vitivirus from Vitis vinifera obtained using two high-throughput sequencing (HTS) strategies on RNA. The initial discovery from small-RNA sequencing was confirmed by HTS of the total RNA and Sanger sequencing. The new virus has a genome structure similar to the one reported for other vitiviruses, with five open reading frames (ORFs) coding for the conserved domains described for members of that genus. Phylogenetic analysis of the complete genome sequence confirmed its affiliation to the genus Vitivirus, with the closest described viruses being grapevine virus E (GVE) and Agave tequilana leaf virus (ATLV). However, the virus we report is distinct and shares only 51% amino acid sequence identity with GVE in the replicase polyprotein and 66.8% amino acid sequence identity with ATLV in the coat protein. This is well below the threshold determined by the ICTV for species demarcation, and we propose that this virus represents a new species. It is provisionally named "grapevine virus G".
SPAR: small RNA-seq portal for analysis of sequencing experiments.
Kuksa, Pavel P; Amlie-Wolf, Alexandre; Katanic, Živadin; Valladares, Otto; Wang, Li-San; Leung, Yuk Yee
2018-05-04
The introduction of new high-throughput small RNA sequencing protocols that generate large-scale genomics datasets along with increasing evidence of the significant regulatory roles of small non-coding RNAs (sncRNAs) have highlighted the urgent need for tools to analyze and interpret large amounts of small RNA sequencing data. However, it remains challenging to systematically and comprehensively discover and characterize sncRNA genes and specifically-processed sncRNA products from these datasets. To fill this gap, we present Small RNA-seq Portal for Analysis of sequencing expeRiments (SPAR), a user-friendly web server for interactive processing, analysis, annotation and visualization of small RNA sequencing data. SPAR supports sequencing data generated from various experimental protocols, including smRNA-seq, short total RNA sequencing, microRNA-seq, and single-cell small RNA-seq. Additionally, SPAR includes publicly available reference sncRNA datasets from our DASHR database and from ENCODE across 185 human tissues and cell types to produce highly informative small RNA annotations across all major small RNA types and other features such as co-localization with various genomic features, precursor transcript cleavage patterns, and conservation. SPAR allows the user to compare the input experiment against reference ENCODE/DASHR datasets. SPAR currently supports analyses of human (hg19, hg38) and mouse (mm10) sequencing data. SPAR is freely available at https://www.lisanwanglab.org/SPAR.
Sato, Kengo; Kuroki, Yoko; Kumita, Wakako; Fujiyama, Asao; Toyoda, Atsushi; Kawai, Jun; Iriki, Atsushi; Sasaki, Erika; Okano, Hideyuki; Sakakibara, Yasubumi
2015-11-20
The first draft of the common marmoset (Callithrix jacchus) genome was published by the Marmoset Genome Sequencing and Analysis Consortium. The draft was based on whole-genome shotgun sequencing, and the current assembly version is Callithrix_jacches-3.2.1, but there still exist 187,214 undetermined gap regions and supercontigs and relatively short contigs that are unmapped to chromosomes in the draft genome. We performed resequencing and assembly of the genome of common marmoset by deep sequencing with high-throughput sequencing technology. Several different sequence runs using Illumina sequencing platforms were executed, and 181 Gbp of high-quality bases including mate-pairs with long insert lengths of 3, 8, 20, and 40 Kbp were obtained, that is, approximately 60× coverage. The resequencing significantly improved the MGSAC draft genome sequence. The N50 of the contigs, which is a statistical measure used to evaluate assembly quality, doubled. As a result, 51% of the contigs (total length: 299 Mbp) that were unmapped to chromosomes in the MGSAC draft were merged with chromosomal contigs, and the improved genome sequence helped to detect 5,288 new genes that are homologous to human cDNAs and the gaps in 5,187 transcripts of the Ensembl gene annotations were completely filled.
Cabrera, Ana R; Donohue, Kevin V; Khalil, Sayed M S; Scholl, Elizabeth; Opperman, Charles; Sonenshine, Daniel E; Roe, R Michael
2011-01-01
Many species of mites and ticks are of agricultural and medical importance. Much can be learned from the study of transcriptomes of acarines which can generate DNA-sequence information of potential target genes for the control of acarine pests. High throughput transcriptome sequencing can also yield sequences of genes critical during physiological processes poorly understood in acarines, i.e., the regulation of female reproduction in mites. The predatory mite, Phytoseiulus persimilis, was selected to conduct a transcriptome analysis using 454 pyrosequencing. The objective of this project was to obtain DNA-sequence information of expressed genes from P. persimilis with special interest in sequences corresponding to vitellogenin (Vg) and the vitellogenin receptor (VgR). These genes are critical to the understanding of vitellogenesis, and they will facilitate the study of the regulation of mite female reproduction. A total of 12,556 contiguous sequences (contigs) were assembled with an average size of 935bp. From these sequences, the putative translated peptides of 11 contigs were similar in amino acid sequences to other arthropod Vgs, while 6 were similar to VgRs. We selected some of these sequences to conduct stage-specific expression studies to further determine their function. 2010 Elsevier Ltd. All rights reserved.
Munger, Steven C.; Raghupathy, Narayanan; Choi, Kwangbom; Simons, Allen K.; Gatti, Daniel M.; Hinerfeld, Douglas A.; Svenson, Karen L.; Keller, Mark P.; Attie, Alan D.; Hibbs, Matthew A.; Graber, Joel H.; Chesler, Elissa J.; Churchill, Gary A.
2014-01-01
Massively parallel RNA sequencing (RNA-seq) has yielded a wealth of new insights into transcriptional regulation. A first step in the analysis of RNA-seq data is the alignment of short sequence reads to a common reference genome or transcriptome. Genetic variants that distinguish individual genomes from the reference sequence can cause reads to be misaligned, resulting in biased estimates of transcript abundance. Fine-tuning of read alignment algorithms does not correct this problem. We have developed Seqnature software to construct individualized diploid genomes and transcriptomes for multiparent populations and have implemented a complete analysis pipeline that incorporates other existing software tools. We demonstrate in simulated and real data sets that alignment to individualized transcriptomes increases read mapping accuracy, improves estimation of transcript abundance, and enables the direct estimation of allele-specific expression. Moreover, when applied to expression QTL mapping we find that our individualized alignment strategy corrects false-positive linkage signals and unmasks hidden associations. We recommend the use of individualized diploid genomes over reference sequence alignment for all applications of high-throughput sequencing technology in genetically diverse populations. PMID:25236449
SAMSA2: a standalone metatranscriptome analysis pipeline.
Westreich, Samuel T; Treiber, Michelle L; Mills, David A; Korf, Ian; Lemay, Danielle G
2018-05-21
Complex microbial communities are an area of growing interest in biology. Metatranscriptomics allows researchers to quantify microbial gene expression in an environmental sample via high-throughput sequencing. Metatranscriptomic experiments are computationally intensive because the experiments generate a large volume of sequence data and each sequence must be compared with reference sequences from thousands of organisms. SAMSA2 is an upgrade to the original Simple Annotation of Metatranscriptomes by Sequence Analysis (SAMSA) pipeline that has been redesigned for standalone use on a supercomputing cluster. SAMSA2 is faster due to the use of the DIAMOND aligner, and more flexible and reproducible because it uses local databases. SAMSA2 is available with detailed documentation, and example input and output files along with examples of master scripts for full pipeline execution. SAMSA2 is a rapid and efficient metatranscriptome pipeline for analyzing large RNA-seq datasets in a supercomputing cluster environment. SAMSA2 provides simplified output that can be examined directly or used for further analyses, and its reference databases may be upgraded, altered or customized to fit the needs of any experiment.
Yang, Cheng; Lates, Vasilica; Prieto-Simón, Beatriz; Marty, Jean-Louis; Yang, Xiurong
2013-11-15
We report a new label-free colorimetric aptasensor based on DNAzyme-aptamer conjugate for rapid and high-throughput detection of Ochratoxin A (OTA, a possible human carcinogen, group 2B) in wine. Two oligonucleotides were designed for this detection. One is N1 for biorecognition, which includes two adjacent sequences: the OTA-specific aptamer sequence and the horseradish peroxidase (HRP)-mimicking DNAzyme sequence. The other is a blocking DNA (B2), which is partially complementary to a part of the OTA aptamer and partially complementary to a part of the DNAzyme. The existence of OTA reduces the hybridization between N1 and B2. Thus, the activity of the non-hybridized DNAzyme is linearly correlated with the concentration of OTA up to 30 nM with a limit of detection of 4 nM (3σ). Meanwhile, a double liquid-liquid extraction (LLE) method is accordingly developed to purify OTA from wine. Compared with the existing HPLC-FD or immunoassay methods, the proposed strategy presents the most appropriate balance between accuracy and facility, resulting in a considerable improvement of real-time quality control, and thereby, preventing chronic poisoning caused by OTA contained red wine. Copyright © 2013 Elsevier B.V. All rights reserved.
Sergeant, Martin J.; Constantinidou, Chrystala; Cogan, Tristan; Penn, Charles W.; Pallen, Mark J.
2012-01-01
The analysis of 16S-rDNA sequences to assess the bacterial community composition of a sample is a widely used technique that has increased with the advent of high throughput sequencing. Although considerable effort has been devoted to identifying the most informative region of the 16S gene and the optimal informatics procedures to process the data, little attention has been paid to the PCR step, in particular annealing temperature and primer length. To address this, amplicons derived from 16S-rDNA were generated from chicken caecal content DNA using different annealing temperatures, primers and different DNA extraction procedures. The amplicons were pyrosequenced to determine the optimal protocols for capture of maximum bacterial diversity from a chicken caecal sample. Even at very low annealing temperatures there was little effect on the community structure, although the abundance of some OTUs such as Bifidobacterium increased. Using shorter primers did not reveal any novel OTUs but did change the community profile obtained. Mechanical disruption of the sample by bead beating had a significant effect on the results obtained, as did repeated freezing and thawing. In conclusion, existing primers and standard annealing temperatures captured as much diversity as lower annealing temperatures and shorter primers. PMID:22666455
Hackenberg, Michael; Rodríguez-Ezpeleta, Naiara; Aransay, Ana M.
2011-01-01
We present a new version of miRanalyzer, a web server and stand-alone tool for the detection of known and prediction of new microRNAs in high-throughput sequencing experiments. The new version has been notably improved regarding speed, scope and available features. Alignments are now based on the ultrafast short-read aligner Bowtie (granting also colour space support, allowing mismatches and improving speed) and 31 genomes, including 6 plant genomes, can now be analysed (previous version contained only 7). Differences between plant and animal microRNAs have been taken into account for the prediction models and differential expression of both, known and predicted microRNAs, between two conditions can be calculated. Additionally, consensus sequences of predicted mature and precursor microRNAs can be obtained from multiple samples, which increases the reliability of the predicted microRNAs. Finally, a stand-alone version of the miRanalyzer that is based on a local and easily customized database is also available; this allows the user to have more control on certain parameters as well as to use specific data such as unpublished assemblies or other libraries that are not available in the web server. miRanalyzer is available at http://bioinfo2.ugr.es/miRanalyzer/miRanalyzer.php. PMID:21515631
Smith, S; Joss, T; Stow, A
2011-10-01
The analysis of microsatellite loci has allowed significant advances in evolutionary biology and pest management. However, until very recently, the potential benefits have been compromised by the high costs of developing these neutral markers. High-throughput sequencing provides a solution to this problem. We describe the development of 13 microsatellite markers for the eusocial ambrosia beetle, Austroplatypus incompertus, a significant pest of forests in southeast Australia. The frequency of microsatellite repeats in the genome of A. incompertus was determined to be low, and previous attempts at microsatellite isolation using a traditional genomic library were problematic. Here, we utilised two protocols, microsatellite-enriched genomic library construction and high-throughput 454 sequencing and characterised 13 loci which were polymorphic among 32 individuals. Numbers of alleles per locus ranged from 2 to 17, and observed and expected heterozygosities from 0.344 to 0.767 and from 0.507 to 0.860, respectively. These microsatellites have the resolution required to analyse fine-scale colony and population genetic structure. Our work demonstrates the utility of next-generation 454 sequencing as a method for rapid and cost-effective acquisition of microsatellites where other techniques have failed, or for taxa where marker development has historically been both complicated and expensive.
Sergeant, Martin J; Constantinidou, Chrystala; Cogan, Tristan; Penn, Charles W; Pallen, Mark J
2012-01-01
The analysis of 16S-rDNA sequences to assess the bacterial community composition of a sample is a widely used technique that has increased with the advent of high throughput sequencing. Although considerable effort has been devoted to identifying the most informative region of the 16S gene and the optimal informatics procedures to process the data, little attention has been paid to the PCR step, in particular annealing temperature and primer length. To address this, amplicons derived from 16S-rDNA were generated from chicken caecal content DNA using different annealing temperatures, primers and different DNA extraction procedures. The amplicons were pyrosequenced to determine the optimal protocols for capture of maximum bacterial diversity from a chicken caecal sample. Even at very low annealing temperatures there was little effect on the community structure, although the abundance of some OTUs such as Bifidobacterium increased. Using shorter primers did not reveal any novel OTUs but did change the community profile obtained. Mechanical disruption of the sample by bead beating had a significant effect on the results obtained, as did repeated freezing and thawing. In conclusion, existing primers and standard annealing temperatures captured as much diversity as lower annealing temperatures and shorter primers.
Wan, Dongjin; Liu, Yongde; Niu, Zhenhua; Xiao, Shuhu; Li, Daorong
2016-02-01
Hydrogen autotrophic reduction of perchlorate have advantages of high removal efficiency and harmless to drinking water. But so far the reported information about the microbial community structure was comparatively limited, changes in the biodiversity and the dominant bacteria during acclimation process required detailed study. In this study, perchlorate-reducing hydrogen autotrophic bacteria were acclimated by hydrogen aeration from activated sludge. For the first time, high-throughput sequencing was applied to analyze changes in biodiversity and the dominant bacteria during acclimation process. The Michaelis-Menten model described the perchlorate reduction kinetics well. Model parameters q(max) and K(s) were 2.521-3.245 (mg ClO4(-)/gVSS h) and 5.44-8.23 (mg/l), respectively. Microbial perchlorate reduction occurred across at pH range 5.0-11.0; removal was highest at pH 9.0. The enriched mixed bacteria could use perchlorate, nitrate and sulfate as electron accepter, and the sequence of preference was: NO3(-) > ClO4(-) > SO4(2-). Compared to the feed culture, biodiversity decreased greatly during acclimation process, the microbial community structure gradually stabilized after 9 acclimation cycles. The Thauera genus related to Rhodocyclales was the dominated perchlorate reducing bacteria (PRB) in the mixed culture.
Toward a mtDNA locus-specific mutation database using the LOVD platform.
Elson, Joanna L; Sweeney, Mary G; Procaccio, Vincent; Yarham, John W; Salas, Antonio; Kong, Qing-Peng; van der Westhuizen, Francois H; Pitceathly, Robert D S; Thorburn, David R; Lott, Marie T; Wallace, Douglas C; Taylor, Robert W; McFarland, Robert
2012-09-01
The Human Variome Project (HVP) is a global effort to collect and curate all human genetic variation affecting health. Mutations of mitochondrial DNA (mtDNA) are an important cause of neurogenetic disease in humans; however, identification of the pathogenic mutations responsible can be problematic. In this article, we provide explanations as to why and suggest how such difficulties might be overcome. We put forward a case in support of a new Locus Specific Mutation Database (LSDB) implemented using the Leiden Open-source Variation Database (LOVD) system that will not only list primary mutations, but also present the evidence supporting their role in disease. Critically, we feel that this new database should have the capacity to store information on the observed phenotypes alongside the genetic variation, thereby facilitating our understanding of the complex and variable presentation of mtDNA disease. LOVD supports fast queries of both seen and hidden data and allows storage of sequence variants from high-throughput sequence analysis. The LOVD platform will allow construction of a secure mtDNA database; one that can fully utilize currently available data, as well as that being generated by high-throughput sequencing, to link genotype with phenotype enhancing our understanding of mitochondrial disease, with a view to providing better prognostic information. © 2012 Wiley Periodicals, Inc.
Toward a mtDNA Locus-Specific Mutation Database Using the LOVD Platform
Elson, Joanna L.; Sweeney, Mary G.; Procaccio, Vincent; Yarham, John W.; Salas, Antonio; Kong, Qing-Peng; van der Westhuizen, Francois H.; Pitceathly, Robert D.S.; Thorburn, David R.; Lott, Marie T.; Wallace, Douglas C.; Taylor, Robert W.; McFarland, Robert
2015-01-01
The Human Variome Project (HVP) is a global effort to collect and curate all human genetic variation affecting health. Mutations of mitochondrial DNA (mtDNA) are an important cause of neurogenetic disease in humans; however, identification of the pathogenic mutations responsible can be problematic. In this article, we provide explanations as to why and suggest how such difficulties might be overcome. We put forward a case in support of a new Locus Specific Mutation Database (LSDB) implemented using the Leiden Open-source Variation Database (LOVD) system that will not only list primary mutations, but also present the evidence supporting their role in disease. Critically, we feel that this new database should have the capacity to store information on the observed phenotypes alongside the genetic variation, thereby facilitating our understanding of the complex and variable presentation of mtDNA disease. LOVD supports fast queries of both seen and hidden data and allows storage of sequence variants from high-throughput sequence analysis. The LOVD platform will allow construction of a secure mtDNA database; one that can fully utilize currently available data, as well as that being generated by high-throughput sequencing, to link genotype with phenotype enhancing our understanding of mitochondrial disease, with a view to providing better prognostic information. PMID:22581690
Personalized Oncology Through Integrative High-Throughput Sequencing: A Pilot Study
Roychowdhury, Sameek; Iyer, Matthew K.; Robinson, Dan R.; Lonigro, Robert J.; Wu, Yi-Mi; Cao, Xuhong; Kalyana-Sundaram, Shanker; Sam, Lee; Balbin, O. Alejandro; Quist, Michael J.; Barrette, Terrence; Everett, Jessica; Siddiqui, Javed; Kunju, Lakshmi P.; Navone, Nora; Araujo, John C.; Troncoso, Patricia; Logothetis, Christopher J.; Innis, Jeffrey W.; Smith, David C.; Lao, Christopher D.; Kim, Scott Y.; Roberts, J. Scott; Gruber, Stephen B.; Pienta, Kenneth J.; Talpaz, Moshe; Chinnaiyan, Arul M.
2012-01-01
Individual cancers harbor a set of genetic aberrations that can be informative for identifying rational therapies currently available or in clinical trials. We implemented a pilot study to explore the practical challenges of applying high-throughput sequencing in clinical oncology. We enrolled patients with advanced or refractory cancer who were eligible for clinical trials. For each patient, we performed whole-genome sequencing of the tumor, targeted whole-exome sequencing of tumor and normal DNA, and transcriptome sequencing (RNA-Seq) of the tumor to identify potentially informative mutations in a clinically relevant time frame of 3 to 4 weeks. With this approach, we detected several classes of cancer mutations including structural rearrangements, copy number alterations, point mutations, and gene expression alterations. A multidisciplinary Sequencing Tumor Board (STB) deliberated on the clinical interpretation of the sequencing results obtained. We tested our sequencing strategy on human prostate cancer xenografts. Next, we enrolled two patients into the clinical protocol and were able to review the results at our STB within 24 days of biopsy. The first patient had metastatic colorectal cancer in which we identified somatic point mutations in NRAS, TP53, AURKA, FAS, and MYH11, plus amplification and overexpression of cyclin-dependent kinase 8 (CDK8). The second patient had malignant melanoma, in which we identified a somatic point mutation in HRAS and a structural rearrangement affecting CDKN2C. The STB identified the CDK8 amplification and Ras mutation as providing a rationale for clinical trials with CDK inhibitors or MEK (mitogenactivated or extracellular signal–regulated protein kinase kinase) and PI3K (phosphatidylinositol 3-kinase) inhibitors, respectively. Integrative high-throughput sequencing of patients with advanced cancer generates a comprehensive, individual mutational landscape to facilitate biomarker-driven clinical trials in oncology. PMID:22133722
Parente, Eugenio; Cocolin, Luca; De Filippis, Francesca; Zotta, Teresa; Ferrocino, Ilario; O'Sullivan, Orla; Neviani, Erasmo; De Angelis, Maria; Cotter, Paul D; Ercolini, Danilo
2016-02-16
Amplicon targeted high-throughput sequencing has become a popular tool for the culture-independent analysis of microbial communities. Although the data obtained with this approach are portable and the number of sequences available in public databases is increasing, no tool has been developed yet for the analysis and presentation of data obtained in different studies. This work describes an approach for the development of a database for the rapid exploration and analysis of data on food microbial communities. Data from seventeen studies investigating the structure of bacterial communities in dairy, meat, sourdough and fermented vegetable products, obtained by 16S rRNA gene targeted high-throughput sequencing, were collated and analysed using Gephi, a network analysis software. The resulting database, which we named FoodMicrobionet, was used to analyse nodes and network properties and to build an interactive web-based visualisation. The latter allows the visual exploration of the relationships between Operational Taxonomic Units (OTUs) and samples and the identification of core- and sample-specific bacterial communities. It also provides additional search tools and hyperlinks for the rapid selection of food groups and OTUs and for rapid access to external resources (NCBI taxonomy, digital versions of the original articles). Microbial interaction network analysis was carried out using CoNet on datasets extracted from FoodMicrobionet: the complexity of interaction networks was much lower than that found for other bacterial communities (human microbiome, soil and other environments). This may reflect both a bias in the dataset (which was dominated by fermented foods and starter cultures) and the lower complexity of food bacterial communities. Although some technical challenges exist, and are discussed here, the net result is a valuable tool for the exploration of food bacterial communities by the scientific community and food industry. Copyright © 2015. Published by Elsevier B.V.
Yamamoto, Toshio; Nagasaki, Hideki; Yonemaru, Jun-ichi; Ebana, Kaworu; Nakajima, Maiko; Shibaya, Taeko; Yano, Masahiro
2010-04-27
To create useful gene combinations in crop breeding, it is necessary to clarify the dynamics of the genome composition created by breeding practices. A large quantity of single-nucleotide polymorphism (SNP) data is required to permit discrimination of chromosome segments among modern cultivars, which are genetically related. Here, we used a high-throughput sequencer to conduct whole-genome sequencing of an elite Japanese rice cultivar, Koshihikari, which is closely related to Nipponbare, whose genome sequencing has been completed. Then we designed a high-throughput typing array based on the SNP information by comparison of the two sequences. Finally, we applied this array to analyze historical representative rice cultivars to understand the dynamics of their genome composition. The total 5.89-Gb sequence for Koshihikari, equivalent to 15.7 x the entire rice genome, was mapped using the Pseudomolecules 4.0 database for Nipponbare. The resultant Koshihikari genome sequence corresponded to 80.1% of the Nipponbare sequence and led to the identification of 67,051 SNPs. A high-throughput typing array consisting of 1917 SNP sites distributed throughout the genome was designed to genotype 151 representative Japanese cultivars that have been grown during the past 150 years. We could identify the ancestral origin of the pedigree haplotypes in 60.9% of the Koshihikari genome and 18 consensus haplotype blocks which are inherited from traditional landraces to current improved varieties. Moreover, it was predicted that modern breeding practices have generally decreased genetic diversity Detection of genome-wide SNPs by both high-throughput sequencer and typing array made it possible to evaluate genomic composition of genetically related rice varieties. With the aid of their pedigree information, we clarified the dynamics of chromosome recombination during the historical rice breeding process. We also found several genomic regions decreasing genetic diversity which might be caused by a recent human selection in rice breeding. The definition of pedigree haplotypes by means of genome-wide SNPs will facilitate next-generation breeding of rice and other crops.
Téllez-Sosa, Juan; Rodríguez, Mario Henry; Gómez-Barreto, Rosa E.; Valdovinos-Torres, Humberto; Hidalgo, Ana Cecilia; Cruz-Hervert, Pablo; Luna, René Santos; Carrillo-Valenzo, Erik; Ramos, Celso; García-García, Lourdes; Martínez-Barnetche, Jesús
2013-01-01
Background Influenza viruses display a high mutation rate and complex evolutionary patterns. Next-generation sequencing (NGS) has been widely used for qualitative and semi-quantitative assessment of genetic diversity in complex biological samples. The “deep sequencing” approach, enabled by the enormous throughput of current NGS platforms, allows the identification of rare genetic viral variants in targeted genetic regions, but is usually limited to a small number of samples. Methodology and Principal Findings We designed a proof-of-principle study to test whether redistributing sequencing throughput from a high depth-small sample number towards a low depth-large sample number approach is feasible and contributes to influenza epidemiological surveillance. Using 454-Roche sequencing, we sequenced at a rather low depth, a 307 bp amplicon of the neuraminidase gene of the Influenza A(H1N1) pandemic (A(H1N1)pdm) virus from cDNA amplicons pooled in 48 barcoded libraries obtained from nasal swab samples of infected patients (n = 299) taken from May to November, 2009 pandemic period in Mexico. This approach revealed that during the transition from the first (May-July) to second wave (September-November) of the pandemic, the initial genetic variants were replaced by the N248D mutation in the NA gene, and enabled the establishment of temporal and geographic associations with genetic diversity and the identification of mutations associated with oseltamivir resistance. Conclusions NGS sequencing of a short amplicon from the NA gene at low sequencing depth allowed genetic screening of a large number of samples, providing insights to viral genetic diversity dynamics and the identification of genetic variants associated with oseltamivir resistance. Further research is needed to explain the observed replacement of the genetic variants seen during the second wave. As sequencing throughput rises and library multiplexing and automation improves, we foresee that the approach presented here can be scaled up for global genetic surveillance of influenza and other infectious diseases. PMID:23843978
A Robust Framework for Microbial Archaeology
Warinner, Christina; Herbig, Alexander; Mann, Allison; Yates, James A. Fellows; Weiβ, Clemens L.; Burbano, Hernán A.; Orlando, Ludovic; Krause, Johannes
2017-01-01
Microbial archaeology is flourishing in the era of high-throughput sequencing, revealing the agents behind devastating historical plagues, identifying the cryptic movements of pathogens in prehistory, and reconstructing the ancestral microbiota of humans. Here, we introduce the fundamental concepts and theoretical framework of the discipline, then discuss applied methodologies for pathogen identification and microbiome characterization from archaeological samples. We give special attention to the process of identifying, validating, and authenticating ancient microbes using high-throughput DNA sequencing data. Finally, we outline standards and precautions to guide future research in the field. PMID:28460196
Digital transcriptome analysis of putative sex-determination genes in papaya (Carica papaya).
Urasaki, Naoya; Tarora, Kazuhiko; Shudo, Ayano; Ueno, Hiroki; Tamaki, Moritoshi; Miyagi, Norimichi; Adaniya, Shinichi; Matsumura, Hideo
2012-01-01
Papaya (Carica papaya) is a trioecious plant species that has male, female and hermaphrodite flowers on different plants. The primitive sex chromosomes genetically determine the sex of the papaya. Although draft sequences of the papaya genome are already available, the genes for sex determination have not been identified, likely due to the complicated structure of its sex-chromosome sequences. To identify the candidate genes for sex determination, we conducted a transcriptome analysis of flower samples from male, female and hermaphrodite plants using high-throughput SuperSAGE for digital gene expression analysis. Among the short sequence tags obtained from the transcripts, 312 unique tags were specifically mapped to the primitive sex chromosome (X or Y(h)) sequences. An annotation analysis revealed that retroelements are the most abundant sequences observed in the genes corresponding to these tags. The majority of tags on the sex chromosomes were located on the X chromosome, and only 30 tags were commonly mapped to both the X and Y(h) chromosome, implying a loss of many genes on the Y(h) chromosome. Nevertheless, candidate Y(h) chromosome-specific female determination genes, including a MADS-box gene, were identified. Information on these sex chromosome-specific expressed genes will help elucidating sex determination in the papaya.
Digital Transcriptome Analysis of Putative Sex-Determination Genes in Papaya (Carica papaya)
Urasaki, Naoya; Tarora, Kazuhiko; Shudo, Ayano; Ueno, Hiroki; Tamaki, Moritoshi; Miyagi, Norimichi; Adaniya, Shinichi; Matsumura, Hideo
2012-01-01
Papaya (Carica papaya) is a trioecious plant species that has male, female and hermaphrodite flowers on different plants. The primitive sex chromosomes genetically determine the sex of the papaya. Although draft sequences of the papaya genome are already available, the genes for sex determination have not been identified, likely due to the complicated structure of its sex-chromosome sequences. To identify the candidate genes for sex determination, we conducted a transcriptome analysis of flower samples from male, female and hermaphrodite plants using high-throughput SuperSAGE for digital gene expression analysis. Among the short sequence tags obtained from the transcripts, 312 unique tags were specifically mapped to the primitive sex chromosome (X or Yh) sequences. An annotation analysis revealed that retroelements are the most abundant sequences observed in the genes corresponding to these tags. The majority of tags on the sex chromosomes were located on the X chromosome, and only 30 tags were commonly mapped to both the X and Yh chromosome, implying a loss of many genes on the Yh chromosome. Nevertheless, candidate Yh chromosome-specific female determination genes, including a MADS-box gene, were identified. Information on these sex chromosome-specific expressed genes will help elucidating sex determination in the papaya. PMID:22815863
The Impact of Normalization Methods on RNA-Seq Data Analysis
Zyprych-Walczak, J.; Szabelska, A.; Handschuh, L.; Górczak, K.; Klamecka, K.; Figlerowicz, M.; Siatkowski, I.
2015-01-01
High-throughput sequencing technologies, such as the Illumina Hi-seq, are powerful new tools for investigating a wide range of biological and medical problems. Massive and complex data sets produced by the sequencers create a need for development of statistical and computational methods that can tackle the analysis and management of data. The data normalization is one of the most crucial steps of data processing and this process must be carefully considered as it has a profound effect on the results of the analysis. In this work, we focus on a comprehensive comparison of five normalization methods related to sequencing depth, widely used for transcriptome sequencing (RNA-seq) data, and their impact on the results of gene expression analysis. Based on this study, we suggest a universal workflow that can be applied for the selection of the optimal normalization procedure for any particular data set. The described workflow includes calculation of the bias and variance values for the control genes, sensitivity and specificity of the methods, and classification errors as well as generation of the diagnostic plots. Combining the above information facilitates the selection of the most appropriate normalization method for the studied data sets and determines which methods can be used interchangeably. PMID:26176014
Motato, Karina Edith; Milani, Christian; Ventura, Marco; Valencia, Francia Elena; Ruas-Madiedo, Patricia; Delgado, Susana
2017-12-01
"Suero Costeño" (SC) is a traditional soured cream elaborated from raw milk in the Northern-Caribbean coast of Colombia. The natural microbiota that characterizes this popular Colombian fermented milk is unknown, although several culturing studies have previously been attempted. In this work, the microbiota associated with SC from three manufacturers in two regions, "Planeta Rica" (Córdoba) and "Caucasia" (Antioquia), was analysed by means of culturing methods in combination with high-throughput sequencing and DGGE analysis of 16S rRNA gene amplicons. The bacterial ecosystem of SC samples was revealed to be composed of lactic acid bacteria belonging to the Streptococcaceae and Lactobacillaceae families; the proportions and genera varying among manufacturers and region of elaboration. Members of the Lactobacillus acidophilus group, Lactocococcus lactis, Streptococcus infantarius and Streptococcus salivarius characterized this artisanal product. In comparison with culturing, the use of molecular in deep culture-independent techniques provides a more realistic picture of the overall bacterial communities residing in SC. Besides the descriptive purpose, these approaches will facilitate a rational strategy to follow (culture media and growing conditions) for the isolation of indigenous strains that allow standardization in the manufacture of SC. Copyright © 2017 Elsevier Ltd. All rights reserved.
Guimaraes, S; Pruvost, M; Daligault, J; Stoetzel, E; Bennett, E A; Côté, N M-L; Nicolas, V; Lalis, A; Denys, C; Geigl, E-M; Grange, T
2017-05-01
We present a cost-effective metabarcoding approach, aMPlex Torrent, which relies on an improved multiplex PCR adapted to highly degraded DNA, combining barcoding and next-generation sequencing to simultaneously analyse many heterogeneous samples. We demonstrate the strength of these improvements by generating a phylochronology through the genotyping of ancient rodent remains from a Moroccan cave whose stratigraphy covers the last 120 000 years. Rodents are important for epidemiology, agronomy and ecological investigations and can act as bioindicators for human- and/or climate-induced environmental changes. Efficient and reliable genotyping of ancient rodent remains has the potential to deliver valuable phylogenetic and paleoecological information. The analysis of multiple ancient skeletal remains of very small size with poor DNA preservation, however, requires a sensitive high-throughput method to generate sufficient data. We show this approach to be particularly adapted at accessing this otherwise difficult taxonomic and genetic resource. As a highly scalable, lower cost and less labour-intensive alternative to targeted sequence capture approaches, we propose the aMPlex Torrent strategy to be a useful tool for the genetic analysis of multiple degraded samples in studies involving ecology, archaeology, conservation and evolutionary biology. © 2016 John Wiley & Sons Ltd.
Association Study of Gut Flora in Coronary Heart Disease through High-Throughput Sequencing.
Cui, Li; Zhao, Tingting; Hu, Haibing; Zhang, Wen; Hua, Xiuguo
2017-01-01
Objectives. We aimed to explore the impact of gut microbiota in coronary heart disease (CHD) patients through high-throughput sequencing. Methods. A total of 29 CHD in-hospital patients and 35 healthy volunteers as controls were included. Nucleic acids were extracted from fecal samples, followed by α diversity and principal coordinate analysis (PCoA). Based on unweighted UniFrac distance matrices, unweighted-pair group method with arithmetic mean (UPGMA) trees were created. Results. After data optimization, an average of 121312 ± 19293 reads in CHD patients and 234372 ± 108725 reads in controls was obtained. Reads corresponding to 38 phyla, 90 classes, and 584 genera were detected in CHD patients, whereas 40 phyla, 99 classes, and 775 genera were detected in controls. The proportion of phylum Bacteroidetes (56.12%) was lower and that of phylum Firmicutes was higher (37.06%) in CHD patients than those in the controls (60.92% and 32.06%, P < 0.05). PCoA and UPGMA tree analysis showed that there were significant differences of gut microbial compositions between the two groups. Conclusion. The diversity and compositions of gut flora were different between CHD patients and healthy controls. The incidence of CHD might be associated with the alteration of gut microbiota.
Comparative Transcriptomes and EVO-DEVO Studies Depending on Next Generation Sequencing.
Liu, Tiancheng; Yu, Lin; Liu, Lei; Li, Hong; Li, Yixue
2015-01-01
High throughput technology has prompted the progressive omics studies, including genomics and transcriptomics. We have reviewed the improvement of comparative omic studies, which are attributed to the high throughput measurement of next generation sequencing technology. Comparative genomics have been successfully applied to evolution analysis while comparative transcriptomics are adopted in comparison of expression profile from two subjects by differential expression or differential coexpression, which enables their application in evolutionary developmental biology (EVO-DEVO) studies. EVO-DEVO studies focus on the evolutionary pressure affecting the morphogenesis of development and previous works have been conducted to illustrate the most conserved stages during embryonic development. Old measurements of these studies are based on the morphological similarity from macro view and new technology enables the micro detection of similarity in molecular mechanism. Evolutionary model of embryo development, which includes the "funnel-like" model and the "hourglass" model, has been evaluated by combination of these new comparative transcriptomic methods with prior comparative genomic information. Although the technology has promoted the EVO-DEVO studies into a new era, technological and material limitation still exist and further investigations require more subtle study design and procedure.
Fang, Xin; Wang, Xin; Yang, Shaoguo; Meng, Fanjing; Wang, Xiaolei; Wei, Hua; Chen, Tingtao
2016-01-01
More and more evidences indicate that diseases of the central nervous system have been seriously affected by fecal microbes. However, little work is done to explore interaction between amyotrophic lateral sclerosis (ALS) and fecal microbes. In the present study, high-throughput sequencing method was used to compare the intestinal microbial diversity of healthy people and ALS patients. The principal coordinate analysis, Venn and unweighted pair-group method using arithmetic averages (UPGMA) showed an obvious microbial changes between healthy people (group H) and ALS patients (group A), and the average ratios of Bacteroides , Faecalibacterium , Anaerostipes , Prevotella , Escherichia , and Lachnospira at genus level between ALS patients and healthy people were 0.78, 2.18, 3.41, 0.35, 0.79, and 13.07. Furthermore, the decreased Firmicutes/Bacteroidetes ratio at phylum level using LEfSE (LDA > 4.0), together with the significant increased genus Dorea (harmful microorganisms) and significant reduced genus Oscillibacter , Anaerostipes , Lachnospiraceae (beneficial microorganisms) in ALS patients, indicated that the imbalance in intestinal microflora constitution had a strong association with the pathogenesis of ALS.
Baseline Survey of Root-Associated Microbes of Taxus chinensis (Pilger) Rehd
Sun, Guiling; Wilson, Iain W.; Wu, Jianqiang; Hoffman, Angela; Cheng, Junwen; Qiu, Deyou
2015-01-01
Taxol (paclitaxel) a diterpenoid is one of the most effective anticancer drugs identified. Biosynthesis of taxol was considered restricted to the Taxus genera until Stierle et al. discovered that an endophytic fungus isolated from Taxus brevifolia could independently synthesize taxol. Little is known about the mechanism of taxol biosynthesis in microbes, but it has been speculated that its biosynthesis may differ from plants. The microbiome from the roots of Taxus chinensis have been extensively investigated with culture-dependent methods to identify taxol synthesizing microbes, but not using culture independent methods.,Using bar-coded high-throughput sequencing in combination with a metagenomics approach, we surveyed the microbial diversity and gene composition of the root-associated microbiomefrom Taxus chinensis (Pilger) Rehd. High-throughput amplicon sequencing revealed 187 fungal OTUs which is higher than any previously reported fungal number identified with the culture-dependent method, suggesting that T. chinensis roots harbor novel and diverse fungi. Some operational taxonomic units (OTU) identified were identical to reported microbe strains possessing the ability to synthesis taxol and several genes previously associated with taxol biosynthesis were identified through metagenomics analysis. PMID:25821956
Baseline survey of root-associated microbes of Taxus chinensis (Pilger) Rehd.
Zhang, Qian; Liu, Hongwei; Sun, Guiling; Wilson, Iain W; Wu, Jianqiang; Hoffman, Angela; Cheng, Junwen; Qiu, Deyou
2015-01-01
Taxol (paclitaxel) a diterpenoid is one of the most effective anticancer drugs identified. Biosynthesis of taxol was considered restricted to the Taxus genera until Stierle et al. discovered that an endophytic fungus isolated from Taxus brevifolia could independently synthesize taxol. Little is known about the mechanism of taxol biosynthesis in microbes, but it has been speculated that its biosynthesis may differ from plants. The microbiome from the roots of Taxus chinensis have been extensively investigated with culture-dependent methods to identify taxol synthesizing microbes, but not using culture independent methods.,Using bar-coded high-throughput sequencing in combination with a metagenomics approach, we surveyed the microbial diversity and gene composition of the root-associated microbiomefrom Taxus chinensis (Pilger) Rehd. High-throughput amplicon sequencing revealed 187 fungal OTUs which is higher than any previously reported fungal number identified with the culture-dependent method, suggesting that T. chinensis roots harbor novel and diverse fungi. Some operational taxonomic units (OTU) identified were identical to reported microbe strains possessing the ability to synthesis taxol and several genes previously associated with taxol biosynthesis were identified through metagenomics analysis.
High-Throughput Sequencing of Plasma MicroRNA in Chronic Fatigue Syndrome/Myalgic Encephalomyelitis
Brenu, Ekua W.; Ashton, Kevin J.; Batovska, Jana; Staines, Donald R.; Marshall-Gradisnik, Sonya M.
2014-01-01
Background MicroRNAs (miRNAs) are known to regulate many biological processes and their dysregulation has been associated with a variety of diseases including Chronic Fatigue Syndrome/Myalgic Encephalomyelitis (CFS/ME). The recent discovery of stable and reproducible miRNA in plasma has raised the possibility that circulating miRNAs may serve as novel diagnostic markers. The objective of this study was to determine the role of plasma miRNA in CFS/ME. Results Using Illumina high-throughput sequencing we identified 19 miRNAs that were differentially expressed in the plasma of CFS/ME patients in comparison to non-fatigued controls. Following RT-qPCR analysis, we were able to confirm the significant up-regulation of three miRNAs (hsa-miR-127-3p, hsa-miR-142-5p and hsa-miR-143-3p) in the CFS/ME patients. Conclusion Our study is the first to identify circulating miRNAs from CFS/ME patients and also to confirm three differentially expressed circulating miRNAs in CFS/ME patients, providing a basis for further study to find useful CFS/ME biomarkers. PMID:25238588
Bicycle: a bioinformatics pipeline to analyze bisulfite sequencing data.
Graña, Osvaldo; López-Fernández, Hugo; Fdez-Riverola, Florentino; González Pisano, David; Glez-Peña, Daniel
2018-04-15
High-throughput sequencing of bisulfite-converted DNA is a technique used to measure DNA methylation levels. Although a considerable number of computational pipelines have been developed to analyze such data, none of them tackles all the peculiarities of the analysis together, revealing limitations that can force the user to manually perform additional steps needed for a complete processing of the data. This article presents bicycle, an integrated, flexible analysis pipeline for bisulfite sequencing data. Bicycle analyzes whole genome bisulfite sequencing data, targeted bisulfite sequencing data and hydroxymethylation data. To show how bicycle overtakes other available pipelines, we compared them on a defined number of features that are summarized in a table. We also tested bicycle with both simulated and real datasets, to show its level of performance, and compared it to different state-of-the-art methylation analysis pipelines. Bicycle is publicly available under GNU LGPL v3.0 license at http://www.sing-group.org/bicycle. Users can also download a customized Ubuntu LiveCD including bicycle and other bisulfite sequencing data pipelines compared here. In addition, a docker image with bicycle and its dependencies, which allows a straightforward use of bicycle in any platform (e.g. Linux, OS X or Windows), is also available. ograna@cnio.es or dgpena@uvigo.es. Supplementary data are available at Bioinformatics online.
Quality Control Test for Sequence-Phenotype Assignments
Ortiz, Maria Teresa Lara; Rosario, Pablo Benjamín Leon; Luna-Nevarez, Pablo; Gamez, Alba Savin; Martínez-del Campo, Ana; Del Rio, Gabriel
2015-01-01
Relating a gene mutation to a phenotype is a common task in different disciplines such as protein biochemistry. In this endeavour, it is common to find false relationships arising from mutations introduced by cells that may be depurated using a phenotypic assay; yet, such phenotypic assays may introduce additional false relationships arising from experimental errors. Here we introduce the use of high-throughput DNA sequencers and statistical analysis aimed to identify incorrect DNA sequence-phenotype assignments and observed that 10–20% of these false assignments are expected in large screenings aimed to identify critical residues for protein function. We further show that this level of incorrect DNA sequence-phenotype assignments may significantly alter our understanding about the structure-function relationship of proteins. We have made available an implementation of our method at http://bis.ifc.unam.mx/en/software/chispas. PMID:25700273
Sequencing of Oligourea Foldamers by Tandem Mass Spectrometry
NASA Astrophysics Data System (ADS)
Bathany, Katell; Owens, Neil W.; Guichard, Gilles; Schmitter, Jean-Marie
2013-03-01
This study is focused on sequence analysis of peptidomimetic helical oligoureas by means of tandem mass spectrometry, to build a basis for de novo sequencing for future high-throughput combinatorial library screening of oligourea foldamers. After the evaluation of MS/MS spectra obtained for model compounds with either MALDI or ESI sources, we found that the MALDI-TOF-TOF instrument gave more satisfactory results. MS/MS spectra of oligoureas generated by decay of singly charged precursor ions show major ion series corresponding to fragmentation across both CO-NH and N'H-CO urea bonds. Oligourea backbones fragment to produce a pattern of a, x, b, and y type fragment ions. De novo decoding of spectral information is facilitated by the occurrence of low mass reporter ions, representative of constitutive monomers, in an analogous manner to the use of immonium ions for peptide sequencing.
Evaluating imputation algorithms for low-depth genotyping-by-sequencing (GBS) data
USDA-ARS?s Scientific Manuscript database
Well-powered genomic studies require genome-wide marker coverage across many individuals. For non-model species with few genomic resources, high-throughput sequencing (HTS) methods, such as Genotyping-By-Sequencing (GBS), offer an inexpensive alternative to array-based genotyping. Although affordabl...
Karas, Vlad O; Sinnott-Armstrong, Nicholas A; Varghese, Vici; Shafer, Robert W; Greenleaf, William J; Sherlock, Gavin
2018-01-01
Abstract Much of the within species genetic variation is in the form of single nucleotide polymorphisms (SNPs), typically detected by whole genome sequencing (WGS) or microarray-based technologies. However, WGS produces mostly uninformative reads that perfectly match the reference, while microarrays require genome-specific reagents. We have developed Diff-seq, a sequencing-based mismatch detection assay for SNP discovery without the requirement for specialized nucleic-acid reagents. Diff-seq leverages the Surveyor endonuclease to cleave mismatched DNA molecules that are generated after cross-annealing of a complex pool of DNA fragments. Sequencing libraries enriched for Surveyor-cleaved molecules result in increased coverage at the variant sites. Diff-seq detected all mismatches present in an initial test substrate, with specific enrichment dependent on the identity and context of the variation. Application to viral sequences resulted in increased observation of variant alleles in a biologically relevant context. Diff-Seq has the potential to increase the sensitivity and efficiency of high-throughput sequencing in the detection of variation. PMID:29361139
Li, Peipei; Piao, Yongjun; Shon, Ho Sun; Ryu, Keun Ho
2015-10-28
Recently, rapid improvements in technology and decrease in sequencing costs have made RNA-Seq a widely used technique to quantify gene expression levels. Various normalization approaches have been proposed, owing to the importance of normalization in the analysis of RNA-Seq data. A comparison of recently proposed normalization methods is required to generate suitable guidelines for the selection of the most appropriate approach for future experiments. In this paper, we compared eight non-abundance (RC, UQ, Med, TMM, DESeq, Q, RPKM, and ERPKM) and two abundance estimation normalization methods (RSEM and Sailfish). The experiments were based on real Illumina high-throughput RNA-Seq of 35- and 76-nucleotide sequences produced in the MAQC project and simulation reads. Reads were mapped with human genome obtained from UCSC Genome Browser Database. For precise evaluation, we investigated Spearman correlation between the normalization results from RNA-Seq and MAQC qRT-PCR values for 996 genes. Based on this work, we showed that out of the eight non-abundance estimation normalization methods, RC, UQ, Med, TMM, DESeq, and Q gave similar normalization results for all data sets. For RNA-Seq of a 35-nucleotide sequence, RPKM showed the highest correlation results, but for RNA-Seq of a 76-nucleotide sequence, least correlation was observed than the other methods. ERPKM did not improve results than RPKM. Between two abundance estimation normalization methods, for RNA-Seq of a 35-nucleotide sequence, higher correlation was obtained with Sailfish than that with RSEM, which was better than without using abundance estimation methods. However, for RNA-Seq of a 76-nucleotide sequence, the results achieved by RSEM were similar to without applying abundance estimation methods, and were much better than with Sailfish. Furthermore, we found that adding a poly-A tail increased alignment numbers, but did not improve normalization results. Spearman correlation analysis revealed that RC, UQ, Med, TMM, DESeq, and Q did not noticeably improve gene expression normalization, regardless of read length. Other normalization methods were more efficient when alignment accuracy was low; Sailfish with RPKM gave the best normalization results. When alignment accuracy was high, RC was sufficient for gene expression calculation. And we suggest ignoring poly-A tail during differential gene expression analysis.
Cruz, Rochelle E.; Shokoples, Sandra E.; Manage, Dammika P.; Yanow, Stephanie K.
2010-01-01
Mutations within the Plasmodium falciparum dihydrofolate reductase gene (Pfdhfr) contribute to resistance to antimalarials such as sulfadoxine-pyrimethamine (SP). Of particular importance are the single nucleotide polymorphisms (SNPs) within codons 51, 59, 108, and 164 in the Pfdhfr gene that are associated with SP treatment failure. Given that traditional genotyping methods are time-consuming and laborious, we developed an assay that provides the rapid, high-throughput analysis of parasite DNA isolated from clinical samples. This assay is based on asymmetric real-time PCR and melt-curve analysis (MCA) performed on the LightCycler platform. Unlabeled probes specific to each SNP are included in the reaction mixture and hybridize differentially to the mutant and wild-type sequences within the amplicon, generating distinct melting curves. Since the probe is present throughout PCR and MCA, the assay proceeds seamlessly with no further addition of reagents. This assay was validated for analytical sensitivity and specificity using plasmids, purified genomic DNA from reference strains, and parasite cultures. For all four SNPs, correct genotypes were identified with 100 copies of the template. The performance of the assay was evaluated with a blind panel of clinical isolates from travelers with low-level parasitemia. The concordance between our assay and DNA sequencing ranged from 84 to 100% depending on the SNP. We also directly compared our MCA assay to a published TaqMan real-time PCR assay and identified major issues with the specificity of the TaqMan probes. Our assay provides a number of technical improvements that facilitate the high-throughput screening of patient samples to identify SP-resistant malaria. PMID:20631115
Leese, Florian; Mayer, Christoph; Agrawal, Shobhit; Dambach, Johannes; Dietz, Lars; Doemel, Jana S.; Goodall-Copstake, William P.; Held, Christoph; Jackson, Jennifer A.; Lampert, Kathrin P.; Linse, Katrin; Macher, Jan N.; Nolzen, Jennifer; Raupach, Michael J.; Rivera, Nicole T.; Schubart, Christoph D.; Striewski, Sebastian; Tollrian, Ralph; Sands, Chester J.
2012-01-01
High throughput sequencing technologies are revolutionizing genetic research. With this “rise of the machines”, genomic sequences can be obtained even for unknown genomes within a short time and for reasonable costs. This has enabled evolutionary biologists studying genetically unexplored species to identify molecular markers or genomic regions of interest (e.g. micro- and minisatellites, mitochondrial and nuclear genes) by sequencing only a fraction of the genome. However, when using such datasets from non-model species, it is possible that DNA from non-target contaminant species such as bacteria, viruses, fungi, or other eukaryotic organisms may complicate the interpretation of the results. In this study we analysed 14 genomic pyrosequencing libraries of aquatic non-model taxa from four major evolutionary lineages. We quantified the amount of suitable micro- and minisatellites, mitochondrial genomes, known nuclear genes and transposable elements and searched for contamination from various sources using bioinformatic approaches. Our results show that in all sequence libraries with estimated coverage of about 0.02–25%, many appropriate micro- and minisatellites, mitochondrial gene sequences and nuclear genes from different KEGG (Kyoto Encyclopedia of Genes and Genomes) pathways could be identified and characterized. These can serve as markers for phylogenetic and population genetic analyses. A central finding of our study is that several genomic libraries suffered from different biases owing to non-target DNA or mobile elements. In particular, viruses, bacteria or eukaryote endosymbionts contributed significantly (up to 10%) to some of the libraries analysed. If not identified as such, genetic markers developed from high-throughput sequencing data for non-model organisms may bias evolutionary studies or fail completely in experimental tests. In conclusion, our study demonstrates the enormous potential of low-coverage genome survey sequences and suggests bioinformatic analysis workflows. The results also advise a more sophisticated filtering for problematic sequences and non-target genome sequences prior to developing markers. PMID:23185309
NASA Astrophysics Data System (ADS)
Munday, Chris; De Deckker, Patrick; Tapper, Nigel; Allison, Gwen
2014-05-01
Following a prolonged drought in Australia in the first decade of the 21st century, several dust storms affected the heavily populated East coast of Australia. The largest such storm occurred on 22-23 September 2009 and had a front of an estimated 3000km. A 24hr average PM10 concentration of over 2,000μg/m3 was recorded in several locations and an hourly peak of over 15,000μg/m3 was recorded (Leys et al. 2011). Over two time periods duplicate aerosol samples were collected on 47mm diameter cellulose nitrate membranes at a location removed from anthropogenic influences. One set of samples was collected in the afternoon the dust event started and another was collected overnight. Additionally, overnight rainfall was collected in a sterile bottle.DNA was directly extracted one membrane from each time point for molecular cloning and high throughput sequencing, while the other was cultivated on Tryptic Soy Agar (TSA). High throughput sequencing was performed using the 454 Titanium platform. From the three samples, 19,945 curated sequences were obtained representing 942 OTUS, with the three samples approximately equal in number. Unclassified Rhizobiales and Stenotrophomonas were the most abundant groups which could be attributed names. A total of 942 OTUs were identified (cutoff = 0.03), and despite the temporal relation of the samples, only eleven were found in all three samples, indicating that the dust storm evolved in composition as it passed over the region. Approximately 800 and 500 CFU/m3 were found in the two cultivated samples, tenfold more than was collected from previous dust events (Lim et al, 2011). Identification of cultivars revealed a dominance of the gram positive Firmicutes phylum, while the clone library showed a more even distribution of taxa, with Actinobacteria the most common and Firmicutes comprising less than 10% of sequences. Collectively, the analyses indicate that the concentration of cultivable organisms during the dust storm dramatically relative to calm conditions. A diverse and variable population of microorganisms were present reflecting the vast source and dynamic nature of the storm.
2013-01-01
Background Besides the development of comprehensive tools for high-throughput 16S ribosomal RNA amplicon sequence analysis, there exists a growing need for protocols emphasizing alternative phylogenetic markers such as those representing eukaryotic organisms. Results Here we introduce CloVR-ITS, an automated pipeline for comparative analysis of internal transcribed spacer (ITS) pyrosequences amplified from metagenomic DNA isolates and representing fungal species. This pipeline performs a variety of steps similar to those commonly used for 16S rRNA amplicon sequence analysis, including preprocessing for quality, chimera detection, clustering of sequences into operational taxonomic units (OTUs), taxonomic assignment (at class, order, family, genus, and species levels) and statistical analysis of sample groups of interest based on user-provided information. Using ITS amplicon pyrosequencing data from a previous human gastric fluid study, we demonstrate the utility of CloVR-ITS for fungal microbiota analysis and provide runtime and cost examples, including analysis of extremely large datasets on the cloud. We show that the largest fractions of reads from the stomach fluid samples were assigned to Dothideomycetes, Saccharomycetes, Agaricomycetes and Sordariomycetes but that all samples were dominated by sequences that could not be taxonomically classified. Representatives of the Candida genus were identified in all samples, most notably C. quercitrusa, while sequence reads assigned to the Aspergillus genus were only identified in a subset of samples. CloVR-ITS is made available as a pre-installed, automated, and portable software pipeline for cloud-friendly execution as part of the CloVR virtual machine package (http://clovr.org). Conclusion The CloVR-ITS pipeline provides fungal microbiota analysis that can be complementary to bacterial 16S rRNA and total metagenome sequence analysis allowing for more comprehensive studies of environmental and host-associated microbial communities. PMID:24451270
A High-Throughput Process for the Solid-Phase Purification of Synthetic DNA Sequences
Grajkowski, Andrzej; Cieślak, Jacek; Beaucage, Serge L.
2017-01-01
An efficient process for the purification of synthetic phosphorothioate and native DNA sequences is presented. The process is based on the use of an aminopropylated silica gel support functionalized with aminooxyalkyl functions to enable capture of DNA sequences through an oximation reaction with the keto function of a linker conjugated to the 5′-terminus of DNA sequences. Deoxyribonucleoside phosphoramidites carrying this linker, as a 5′-hydroxyl protecting group, have been synthesized for incorporation into DNA sequences during the last coupling step of a standard solid-phase synthesis protocol executed on a controlled pore glass (CPG) support. Solid-phase capture of the nucleobase- and phosphate-deprotected DNA sequences released from the CPG support is demonstrated to proceed near quantitatively. Shorter than full-length DNA sequences are first washed away from the capture support; the solid-phase purified DNA sequences are then released from this support upon reaction with tetra-n-butylammonium fluoride in dry dimethylsulfoxide (DMSO) and precipitated in tetrahydrofuran (THF). The purity of solid-phase-purified DNA sequences exceeds 98%. The simulated high-throughput and scalability features of the solid-phase purification process are demonstrated without sacrificing purity of the DNA sequences. PMID:28628204
Hosokawa, Masahito; Nishikawa, Yohei; Kogawa, Masato; Takeyama, Haruko
2017-07-12
Massively parallel single-cell genome sequencing is required to further understand genetic diversities in complex biological systems. Whole genome amplification (WGA) is the first step for single-cell sequencing, but its throughput and accuracy are insufficient in conventional reaction platforms. Here, we introduce single droplet multiple displacement amplification (sd-MDA), a method that enables massively parallel amplification of single cell genomes while maintaining sequence accuracy and specificity. Tens of thousands of single cells are compartmentalized in millions of picoliter droplets and then subjected to lysis and WGA by passive droplet fusion in microfluidic channels. Because single cells are isolated in compartments, their genomes are amplified to saturation without contamination. This enables the high-throughput acquisition of contamination-free and cell specific sequence reads from single cells (21,000 single-cells/h), resulting in enhancement of the sequence data quality compared to conventional methods. This method allowed WGA of both single bacterial cells and human cancer cells. The obtained sequencing coverage rivals those of conventional techniques with superior sequence quality. In addition, we also demonstrate de novo assembly of uncultured soil bacteria and obtain draft genomes from single cell sequencing. This sd-MDA is promising for flexible and scalable use in single-cell sequencing.
Unbiased Characterization of Anopheles Mosquito Blood Meals by Targeted High-Throughput Sequencing
Logue, Kyle; Keven, John Bosco; Cannon, Matthew V.; Reimer, Lisa; Siba, Peter; Walker, Edward D.; Zimmerman, Peter A.; Serre, David
2016-01-01
Understanding mosquito host choice is important for assessing vector competence or identifying disease reservoirs. Unfortunately, the availability of an unbiased method for comprehensively evaluating the composition of insect blood meals is very limited, as most current molecular assays only test for the presence of a few pre-selected species. These approaches also have limited ability to identify the presence of multiple mammalian hosts in a single blood meal. Here, we describe a novel high-throughput sequencing method that enables analysis of 96 mosquitoes simultaneously and provides a comprehensive and quantitative perspective on the composition of each blood meal. We validated in silico that universal primers targeting the mammalian mitochondrial 16S ribosomal RNA genes (16S rRNA) should amplify more than 95% of the mammalian 16S rRNA sequences present in the NCBI nucleotide database. We applied this method to 442 female Anopheles punctulatus s. l. mosquitoes collected in Papua New Guinea (PNG). While human (52.9%), dog (15.8%) and pig (29.2%) were the most common hosts identified in our study, we also detected DNA from mice, one marsupial species and two bat species. Our analyses also revealed that 16.3% of the mosquitoes fed on more than one host. Analysis of the human mitochondrial hypervariable region I in 102 human blood meals showed that 5 (4.9%) of the mosquitoes unambiguously fed on more than one person. Overall, analysis of PNG mosquitoes illustrates the potential of this approach to identify unsuspected hosts and characterize mixed blood meals, and shows how this approach can be adapted to evaluate inter-individual variations among human blood meals. Furthermore, this approach can be applied to any disease-transmitting arthropod and can be easily customized to investigate non-mammalian host sources. PMID:26963245
NASA Astrophysics Data System (ADS)
Shi, Pengju; Dong, Shihang; Zhang, Huanjun; Wang, Peiliang; Niu, Zhuang; Fang, Yan
2018-03-01
Polybrominated diphenyl ethers (PBDEs) are ubiquitous global pollutants, which are known to have immune, development, reproduction, and endocrine toxicity in aquatic organisms, including bivalves. 2,2',4,4'-Tetrabromodiphenyl ether (BDE-47) is the predominant PBDE congener detected in environmental samples and the tissues of organisms. However, the mechanism of its toxicity remains unclear. In this study, high-throughput sequencing was performed using the clam Mactra veneriformis, a good model for toxicological research, to clarify the transcriptomic response to BDE-47 and the mechanism responsible for the toxicity of BDE-47. The clams were exposed to 5 μg/L BDE-47 for 3 days and the digestive glands were sampled for high-throughput sequencing analysis. We obtained 127 648, 154 225, and 124 985 unigenes by de novo assembly of the control group reads (CG), BDE-47 group reads (BDEG), and control and BDE-47 reads (CG & BDEG), respectively. We annotated 32 176 unigenes from the CG & BDEG reads using the NR database. We categorized 24 401 unigenes into 25 functional COG clusters and 21 749 unigenes were assigned to 259 KEGG pathways. Moreover, 17 625 differentially expressed genes (DEGs) were detected, with 10 028 upregulated DEGs and 7 597 downregulated DEGs. Functional enrichment analysis showed that the DEGs were involved with detoxification, antioxidant defense, immune response, apoptosis, and other functions. The mRNA expression levels of 26 DEGs were verified by quantitative real-time PCR, which demonstrated the high agreement between the two methods. These results provide a good basis for future research using the M. veneriformis model into the mechanism of PBDEs toxicity and molecular biomarkers for BDE-47 pollution. The regulation and interaction of the DEGs would be studied in the future for clarifying the mechanism of PBDEs toxicity.
Calvo, Sarah E; Tucker, Elena J; Compton, Alison G; Kirby, Denise M; Crawford, Gabriel; Burtt, Noel P; Rivas, Manuel A; Guiducci, Candace; Bruno, Damien L; Goldberger, Olga A; Redman, Michelle C; Wiltshire, Esko; Wilson, Callum J; Altshuler, David; Gabriel, Stacey B; Daly, Mark J; Thorburn, David R; Mootha, Vamsi K
2010-01-01
Discovering the molecular basis of mitochondrial respiratory chain disease is challenging given the large number of both mitochondrial and nuclear genes involved. We report a strategy of focused candidate gene prediction, high-throughput sequencing, and experimental validation to uncover the molecular basis of mitochondrial complex I (CI) disorders. We created five pools of DNA from a cohort of 103 patients and then performed deep sequencing of 103 candidate genes to spotlight 151 rare variants predicted to impact protein function. We used confirmatory experiments to establish genetic diagnoses in 22% of previously unsolved cases, and discovered that defects in NUBPL and FOXRED1 can cause CI deficiency. Our study illustrates how large-scale sequencing, coupled with functional prediction and experimental validation, can reveal novel disease-causing mutations in individual patients. PMID:20818383
Novel genetic tools for studying food-borne Salmonella.
Andrews-Polymenis, Helene L; Santiviago, Carlos A; McClelland, Michael
2009-04-01
Nontyphoidal Salmonellae are highly prevalent food-borne pathogens. High-throughput sequencing of Salmonella genomes is expanding our knowledge of the evolution of serovars and epidemic isolates. Genome sequences have also allowed the creation of complete microarrays. Microarrays have improved the throughput of in vivo expression technology (IVET) used to uncover promoters active during infection. In another method, signature tagged mutagenesis (STM), pools of mutants are subjected to selection. Changes in the population are monitored on a microarray, revealing genes under selection. Complete genome sequences permit the construction of pools of targeted in-frame deletions that have improved STM by minimizing the number of clones and the polarity of each mutant. Together, genome sequences and the continuing development of new tools for functional genomics will drive a revolution in the understanding of Salmonellae in many different niches that are critical for food safety.
Schönberg, Anna; Theunert, Christoph; Li, Mingkun; Stoneking, Mark; Nasidze, Ivan
2011-09-01
To investigate the demographic history of human populations from the Caucasus and surrounding regions, we used high-throughput sequencing to generate 147 complete mtDNA genome sequences from random samples of individuals from three groups from the Caucasus (Armenians, Azeri and Georgians), and one group each from Iran and Turkey. Overall diversity is very high, with 144 different sequences that fall into 97 different haplogroups found among the 147 individuals. Bayesian skyline plots (BSPs) of population size change through time show a population expansion around 40-50 kya, followed by a constant population size, and then another expansion around 15-18 kya for the groups from the Caucasus and Iran. The BSP for Turkey differs the most from the others, with an increase from 35 to 50 kya followed by a prolonged period of constant population size, and no indication of a second period of growth. An approximate Bayesian computation approach was used to estimate divergence times between each pair of populations; the oldest divergence times were between Turkey and the other four groups from the South Caucasus and Iran (~400-600 generations), while the divergence time of the three Caucasus groups from each other was comparable to their divergence time from Iran (average of ~360 generations). These results illustrate the value of random sampling of complete mtDNA genome sequences that can be obtained with high-throughput sequencing platforms.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gallaher, Sean D.; Fitz-Gibbon, Sorel T.; Strenkert, Daniela
Chlamydomonas reinhardtii is a unicellular chlorophyte alga that is widely studied as a reference organism for understanding photosynthesis, sensory and motile cilia, and for development of an algal-based platform for producing biofuels and bio-products. Its highly repetitive, ~205-kbp circular chloroplast genome and ~15.8-kbp linear mitochondrial genome were sequenced prior to the advent of high-throughput sequencing technologies. Here, high coverage shotgun sequencing was used to assemble both organellar genomes de novo. These new genomes correct dozens of errors in the prior genome sequences and annotations. Gen-ome sequencing coverage indicates that each cell contains on average 83 copies of the chloroplast genomemore » and 130 copies of the mitochondrial genome. Using protocols and analyses optimized for organellar tran-scripts, RNA-Seq was used to quantify their relative abundances across 12 different growth conditions. Forty-six percent of total cellular mRNA is attributable to high expression from a few dozen chloroplast genes. RNA-Seq data were used to guide gene annotation, to demonstrate polycistronic gene expression, and to quantify splicing of psaA and psbA introns. In contrast to a conclusion from a recent study, we found that chloroplast transcripts are not edited. Unexpectedly, cytosine-rich polynucleotide tails were observed at the 3’-end of all mitochondrial transcripts. A comparative genomics analysis of eight laboratory strains and 11 wild isolates of C. reinhardtii identified 2658 variants in the organellargenomes, which is 1/10th as much genetic diversity as is found in the nucleus.« less
Fonseca, Luiz Henrique M; Lohmann, Lúcia G
2018-06-01
Combining high-throughput sequencing data with amplicon sequences allows the reconstruction of robust phylogenies based on comprehensive sampling of characters and taxa. Here, we combine Next Generation Sequencing (NGS) and Sanger sequencing data to infer the phylogeny of the "Adenocalymma-Neojobertia" clade (Bignonieae, Bignoniaceae), a diverse lineage of Neotropical plants, using Maximum Likelihood and Bayesian approaches. We used NGS to obtain complete or nearly-complete plastomes of members of this clade, leading to a final dataset with 54 individuals, representing 44 members of ingroup and 10 outgroups. In addition, we obtained Sanger sequences of two plastid markers (ndhF and rpl32-trnL) for 44 individuals (43 ingroup and 1 outgroup) and the nuclear PepC for 64 individuals (63 ingroup and 1 outgroup). Our final dataset includes 87 individuals of members of the "Adenocalymma-Neojobertia" clade, representing 66 species (ca. 90% of the diversity), plus 11 outgroups. Plastid and nuclear datasets recovered congruent topologies and were combined. The combined analysis recovered a monophyletic "Adenocalymma-Neojobertia" clade and a paraphyletic Adenocalymma that also contained a monophyletic Neojobertia plus Pleonotoma albiflora. Relationships are strongly supported in all analyses, with most lineages within the "Adenocalymma-Neojobertia" clade receiving maximum posterior probabilities. Ancestral character state reconstructions using Bayesian approaches identified six morphological synapomorphies of clades namely, prophyll type, petiole and petiolule articulation, tendril ramification, inflorescence ramification, calyx shape, and fruit wings. Other characters such as habit, calyx cupular trichomes, corolla color, and corolla shape evolved multiple times. These characters are putatively related with the clade diversification and can be further explored in diversification studies. Copyright © 2018 Elsevier Inc. All rights reserved.
Oshiki, Mamoru; Segawa, Takahiro; Ishii, Satoshi
2018-02-02
Various microorganisms play key roles in the Nitrogen (N) cycle. Quantitative PCR (qPCR) and PCR-amplicon sequencing of the N cycle functional genes allow us to analyze the abundance and diversity of microbes responsible in the N transforming reactions in various environmental samples. However, analysis of multiple target genes can be cumbersome and expensive. PCR-independent analysis, such as metagenomics and metatranscriptomics, is useful but expensive especially when we analyze multiple samples and try to detect N cycle functional genes present at relatively low abundance. Here, we present the application of microfluidic qPCR chip technology to simultaneously quantify and prepare amplicon sequence libraries for multiple N cycle functional genes as well as taxon-specific 16S rRNA gene markers for many samples. This approach, named as N cycle evaluation (NiCE) chip, was evaluated by using DNA from pure and artificially mixed bacterial cultures and by comparing the results with those obtained by conventional qPCR and amplicon sequencing methods. Quantitative results obtained by the NiCE chip were comparable to those obtained by conventional qPCR. In addition, the NiCE chip was successfully applied to examine abundance and diversity of N cycle functional genes in wastewater samples. Although non-specific amplification was detected on the NiCE chip, this could be overcome by optimizing the primer sequences in the future. As the NiCE chip can provide high-throughput format to quantify and prepare sequence libraries for multiple N cycle functional genes, this tool should advance our ability to explore N cycling in various samples. Importance. We report a novel approach, namely Nitrogen Cycle Evaluation (NiCE) chip by using microfluidic qPCR chip technology. By sequencing the amplicons recovered from the NiCE chip, we can assess diversities of the N cycle functional genes. The NiCE chip technology is applicable to analyze the temporal dynamics of the N cycle gene transcriptions in wastewater treatment bioreactors. The NiCE chip can provide high-throughput format to quantify and prepare sequence libraries for multiple N cycle functional genes. While there is a room for future improvement, this tool should significantly advance our ability to explore the N cycle in various environmental samples. Copyright © 2018 American Society for Microbiology.
Analysis of Sequence Data Under Multivariate Trait-Dependent Sampling.
Tao, Ran; Zeng, Donglin; Franceschini, Nora; North, Kari E; Boerwinkle, Eric; Lin, Dan-Yu
2015-06-01
High-throughput DNA sequencing allows for the genotyping of common and rare variants for genetic association studies. At the present time and for the foreseeable future, it is not economically feasible to sequence all individuals in a large cohort. A cost-effective strategy is to sequence those individuals with extreme values of a quantitative trait. We consider the design under which the sampling depends on multiple quantitative traits. Under such trait-dependent sampling, standard linear regression analysis can result in bias of parameter estimation, inflation of type I error, and loss of power. We construct a likelihood function that properly reflects the sampling mechanism and utilizes all available data. We implement a computationally efficient EM algorithm and establish the theoretical properties of the resulting maximum likelihood estimators. Our methods can be used to perform separate inference on each trait or simultaneous inference on multiple traits. We pay special attention to gene-level association tests for rare variants. We demonstrate the superiority of the proposed methods over standard linear regression through extensive simulation studies. We provide applications to the Cohorts for Heart and Aging Research in Genomic Epidemiology Targeted Sequencing Study and the National Heart, Lung, and Blood Institute Exome Sequencing Project.
Galson, Jacob D; Trück, Johannes; Fowler, Anna; Clutterbuck, Elizabeth A; Münz, Márton; Cerundolo, Vincenzo; Reinhard, Claudia; van der Most, Robbert; Pollard, Andrew J; Lunter, Gerton; Kelly, Dominic F
2015-12-01
Generating a diverse B cell immunoglobulin repertoire is essential for protection against infection. The repertoire in humans can now be comprehensively measured by high-throughput sequencing. Using hepatitis B vaccination as a model, we determined how the total immunoglobulin sequence repertoire changes following antigen exposure in humans, and compared this to sequences from vaccine-specific sorted cells. Clonal sequence expansions were seen 7 days after vaccination, which correlated with vaccine-specific plasma cell numbers. These expansions caused an increase in mutation, and a decrease in diversity and complementarity-determining region 3 sequence length in the repertoire. We also saw an increase in sequence convergence between participants 14 and 21 days after vaccination, coinciding with an increase of vaccine-specific memory cells. These features allowed development of a model for in silico enrichment of vaccine-specific sequences from the total repertoire. Identifying antigen-specific sequences from total repertoire data could aid our understanding B cell driven immunity, and be used for disease diagnostics and vaccine evaluation.
Goossens, Dirk; Moens, Lotte N; Nelis, Eva; Lenaerts, An-Sofie; Glassee, Wim; Kalbe, Andreas; Frey, Bruno; Kopal, Guido; De Jonghe, Peter; De Rijk, Peter; Del-Favero, Jurgen
2009-03-01
We evaluated multiplex PCR amplification as a front-end for high-throughput sequencing, to widen the applicability of massive parallel sequencers for the detailed analysis of complex genomes. Using multiplex PCR reactions, we sequenced the complete coding regions of seven genes implicated in peripheral neuropathies in 40 individuals on a GS-FLX genome sequencer (Roche). The resulting dataset showed highly specific and uniform amplification. Comparison of the GS-FLX sequencing data with the dataset generated by Sanger sequencing confirmed the detection of all variants present and proved the sensitivity of the method for mutation detection. In addition, we showed that we could exploit the multiplexed PCR amplicons to determine individual copy number variation (CNV), increasing the spectrum of detected variations to both genetic and genomic variants. We conclude that our straightforward procedure substantially expands the applicability of the massive parallel sequencers for sequencing projects of a moderate number of amplicons (50-500) with typical applications in resequencing exons in positional or functional candidate regions and molecular genetic diagnostics. 2008 Wiley-Liss, Inc.
Madi, Asaf; Poran, Asaf; Shifrut, Eric; Reich-Zeliger, Shlomit; Greenstein, Erez; Zaretsky, Irena; Arnon, Tomer; Laethem, Francois Van; Singer, Alfred; Lu, Jinghua; Sun, Peter D; Cohen, Irun R; Friedman, Nir
2017-01-01
Diversity of T cell receptor (TCR) repertoires, generated by somatic DNA rearrangements, is central to immune system function. However, the level of sequence similarity of TCR repertoires within and between species has not been characterized. Using network analysis of high-throughput TCR sequencing data, we found that abundant CDR3-TCRβ sequences were clustered within networks generated by sequence similarity. We discovered a substantial number of public CDR3-TCRβ segments that were identical in mice and humans. These conserved public sequences were central within TCR sequence-similarity networks. Annotated TCR sequences, previously associated with self-specificities such as autoimmunity and cancer, were linked to network clusters. Mechanistically, CDR3 networks were promoted by MHC-mediated selection, and were reduced following immunization, immune checkpoint blockade or aging. Our findings provide a new view of T cell repertoire organization and physiology, and suggest that the immune system distributes its TCR sequences unevenly, attending to specific foci of reactivity. DOI: http://dx.doi.org/10.7554/eLife.22057.001 PMID:28731407
First report of Beet western yellows virus infecting Epiphyllum spp
USDA-ARS?s Scientific Manuscript database
Beet western yellow virus (BWYV) was identified from an orchid cactus (Epiphyllum spp.) hybrid without obvious symptoms by high-throughput sequencing. The nearly complete genomic sequence of 5,458 nucleotides of the virus was determined. The isolate has the highest nucleotide sequence identity (93%)...
Huang, Kailong; Zhang, Xu-Xiang; Shi, Peng; Wu, Bing; Ren, Hongqiang
2014-11-01
In order to comprehensively investigate bacterial virulence in drinking water, 454 pyrosequencing and Illumina high-throughput sequencing were used to detect potential pathogenic bacteria and virulence factors (VFs) in a full-scale drinking water treatment and distribution system. 16S rRNA gene pyrosequencing revealed high bacterial diversity in the drinking water (441-586 operational taxonomic units). Bacterial diversity decreased after chlorine disinfection, but increased after pipeline distribution. α-Proteobacteria was the most dominant taxonomic class. Alignment against the established pathogen database showed that several types of putative pathogens were present in the drinking water and Pseudomonas aeruginosa had the highest abundance (over 11‰ of total sequencing reads). Many pathogens disappeared after chlorine disinfection, but P. aeruginosa and Leptospira interrogans were still detected in the tap water. High-throughput sequencing revealed prevalence of various pathogenicity islands and virulence proteins in the drinking water, and translocases, transposons, Clp proteases and flagellar motor switch proteins were the predominant VFs. Both diversity and abundance of the detectable VFs increased after the chlorination, and decreased after the pipeline distribution. This study indicates that joint use of 454 pyrosequencing and Illumina sequencing can comprehensively characterize environmental pathogenesis, and several types of putative pathogens and various VFs are prevalent in drinking water. Copyright © 2014 Elsevier Inc. All rights reserved.
High-Performance Integrated Virtual Environment (HIVE) Tools and Applications for Big Data Analysis.
Simonyan, Vahan; Mazumder, Raja
2014-09-30
The High-performance Integrated Virtual Environment (HIVE) is a high-throughput cloud-based infrastructure developed for the storage and analysis of genomic and associated biological data. HIVE consists of a web-accessible interface for authorized users to deposit, retrieve, share, annotate, compute and visualize Next-generation Sequencing (NGS) data in a scalable and highly efficient fashion. The platform contains a distributed storage library and a distributed computational powerhouse linked seamlessly. Resources available through the interface include algorithms, tools and applications developed exclusively for the HIVE platform, as well as commonly used external tools adapted to operate within the parallel architecture of the system. HIVE is composed of a flexible infrastructure, which allows for simple implementation of new algorithms and tools. Currently, available HIVE tools include sequence alignment and nucleotide variation profiling tools, metagenomic analyzers, phylogenetic tree-building tools using NGS data, clone discovery algorithms, and recombination analysis algorithms. In addition to tools, HIVE also provides knowledgebases that can be used in conjunction with the tools for NGS sequence and metadata analysis.
High-Performance Integrated Virtual Environment (HIVE) Tools and Applications for Big Data Analysis
Simonyan, Vahan; Mazumder, Raja
2014-01-01
The High-performance Integrated Virtual Environment (HIVE) is a high-throughput cloud-based infrastructure developed for the storage and analysis of genomic and associated biological data. HIVE consists of a web-accessible interface for authorized users to deposit, retrieve, share, annotate, compute and visualize Next-generation Sequencing (NGS) data in a scalable and highly efficient fashion. The platform contains a distributed storage library and a distributed computational powerhouse linked seamlessly. Resources available through the interface include algorithms, tools and applications developed exclusively for the HIVE platform, as well as commonly used external tools adapted to operate within the parallel architecture of the system. HIVE is composed of a flexible infrastructure, which allows for simple implementation of new algorithms and tools. Currently, available HIVE tools include sequence alignment and nucleotide variation profiling tools, metagenomic analyzers, phylogenetic tree-building tools using NGS data, clone discovery algorithms, and recombination analysis algorithms. In addition to tools, HIVE also provides knowledgebases that can be used in conjunction with the tools for NGS sequence and metadata analysis. PMID:25271953
Denis, Jean-Baptiste; Vandenbogaert, Mathias; Caro, Valérie
2016-01-01
The detection and characterization of emerging infectious agents has been a continuing public health concern. High Throughput Sequencing (HTS) or Next-Generation Sequencing (NGS) technologies have proven to be promising approaches for efficient and unbiased detection of pathogens in complex biological samples, providing access to comprehensive analyses. As NGS approaches typically yield millions of putatively representative reads per sample, efficient data management and visualization resources have become mandatory. Most usually, those resources are implemented through a dedicated Laboratory Information Management System (LIMS), solely to provide perspective regarding the available information. We developed an easily deployable web-interface, facilitating management and bioinformatics analysis of metagenomics data-samples. It was engineered to run associated and dedicated Galaxy workflows for the detection and eventually classification of pathogens. The web application allows easy interaction with existing Galaxy metagenomic workflows, facilitates the organization, exploration and aggregation of the most relevant sample-specific sequences among millions of genomic sequences, allowing them to determine their relative abundance, and associate them to the most closely related organism or pathogen. The user-friendly Django-Based interface, associates the users’ input data and its metadata through a bio-IT provided set of resources (a Galaxy instance, and both sufficient storage and grid computing power). Galaxy is used to handle and analyze the user’s input data from loading, indexing, mapping, assembly and DB-searches. Interaction between our application and Galaxy is ensured by the BioBlend library, which gives API-based access to Galaxy’s main features. Metadata about samples, runs, as well as the workflow results are stored in the LIMS. For metagenomic classification and exploration purposes, we show, as a proof of concept, that integration of intuitive exploratory tools, like Krona for representation of taxonomic classification, can be achieved very easily. In the trend of Galaxy, the interface enables the sharing of scientific results to fellow team members. PMID:28451381
Correia, Damien; Doppelt-Azeroual, Olivia; Denis, Jean-Baptiste; Vandenbogaert, Mathias; Caro, Valérie
2015-01-01
The detection and characterization of emerging infectious agents has been a continuing public health concern. High Throughput Sequencing (HTS) or Next-Generation Sequencing (NGS) technologies have proven to be promising approaches for efficient and unbiased detection of pathogens in complex biological samples, providing access to comprehensive analyses. As NGS approaches typically yield millions of putatively representative reads per sample, efficient data management and visualization resources have become mandatory. Most usually, those resources are implemented through a dedicated Laboratory Information Management System (LIMS), solely to provide perspective regarding the available information. We developed an easily deployable web-interface, facilitating management and bioinformatics analysis of metagenomics data-samples. It was engineered to run associated and dedicated Galaxy workflows for the detection and eventually classification of pathogens. The web application allows easy interaction with existing Galaxy metagenomic workflows, facilitates the organization, exploration and aggregation of the most relevant sample-specific sequences among millions of genomic sequences, allowing them to determine their relative abundance, and associate them to the most closely related organism or pathogen. The user-friendly Django-Based interface, associates the users' input data and its metadata through a bio-IT provided set of resources (a Galaxy instance, and both sufficient storage and grid computing power). Galaxy is used to handle and analyze the user's input data from loading, indexing, mapping, assembly and DB-searches. Interaction between our application and Galaxy is ensured by the BioBlend library, which gives API-based access to Galaxy's main features. Metadata about samples, runs, as well as the workflow results are stored in the LIMS. For metagenomic classification and exploration purposes, we show, as a proof of concept, that integration of intuitive exploratory tools, like Krona for representation of taxonomic classification, can be achieved very easily. In the trend of Galaxy, the interface enables the sharing of scientific results to fellow team members.
2012-01-01
Background MicroRNAs (miRNAs) are a class of endogenous, small, non-coding RNAs that regulate gene expression by mediating gene silencing at transcriptional and post-transcriptional levels in high plants. However, the diversity of miRNAs and their roles in floral development in Japanese apricot (Prunus mume Sieb. et Zucc) remains largely unexplored. Imperfect flowers with pistil abortion seriously decrease production yields. To understand the role of miRNAs in pistil development, pistil development-related miRNAs were identified by Solexa sequencing in Japanese apricot. Results Solexa sequencing was used to identify and quantitatively profile small RNAs from perfect and imperfect flower buds of Japanese apricot. A total of 22,561,972 and 24,952,690 reads were sequenced from two small RNA libraries constructed from perfect and imperfect flower buds, respectively. Sixty-one known miRNAs, belonging to 24 families, were identified. Comparative profiling revealed that seven known miRNAs exhibited significant differential expression between perfect and imperfect flower buds. A total of 61 potentially novel miRNAs/new members of known miRNA families were also identified by the presence of mature miRNAs and corresponding miRNA*s in the sRNA libraries. Comparative analysis showed that six potentially novel miRNAs were differentially expressed between perfect and imperfect flower buds. Target predictions of the 13 differentially expressed miRNAs resulted in 212 target genes. Gene ontology (GO) annotation revealed that high-ranking miRNA target genes are those implicated in the developmental process, the regulation of transcription and response to stress. Conclusions This study represents the first comparative identification of miRNAomes between perfect and imperfect Japanese apricot flowers. Seven known miRNAs and six potentially novel miRNAs associated with pistil development were identified, using high-throughput sequencing of small RNAs. The findings, both computationally and experimentally, provide valuable information for further functional characterisation of miRNAs associated with pistil development in plants. PMID:22863067
Dobson, Alleson; O'Sullivan, Orla; Cotter, Paul D; Ross, Paul; Hill, Colin
2011-07-01
Lacticin 3147 is a two-peptide broad spectrum lantibiotic produced by Lactococcus lactis DPC3147 shown to inhibit a number of clinically relevant Gram-positive pathogens. Initially isolated from an Irish kefir grain, lacticin 3147 is one of the most extensively studied lantibiotics to date. In this study, the bacterial diversity of the Irish kefir grain from which L. lactis DPC3147 was originally isolated was for the first time investigated using a high-throughput parallel sequencing strategy. A total of 17 416 unique V4 variable regions of the 16S rRNA gene were analysed from both the kefir starter grain and its derivative kefir-fermented milk. Firmicutes (which includes the lactic acid bacteria) was the dominant phylum accounting for > 92% of sequences. Within the Firmicutes, dramatic differences in abundance were observed when the starter grain and kefir milk fermentate were compared. The kefir grain-associated bacterial community was largely composed of the Lactobacillaceae family while Streptococcaceae (primarily Lactococcus spp.) was the dominant family within the kefir milk fermentate. Sequencing data confirmed previous findings that the microbiota of kefir milk and the starter grain are quite different while at the same time, establishing that the microbial diversity of the starter grain is not uniform with a greater level of diversity associated with the interior kefir starter grain compared with the exterior. © 2011 Teagasc Food Research Centre, Moorepark. FEMS Microbiology Letters © 2011 Federation of European Microbiological Societies. Published by Blackwell Publishing Ltd.
Ding, Jian; Ruan, Chengjiang; Guan, Ying; Krishna, Priti
2018-03-05
Sea buckthorn is a plant of medicinal and nutritional importance owing in part to the high levels of essential fatty acids, linoleic (up to 42%) and α-linolenic (up to 39%) acids in the seed oil. Sea buckthorn can produce seeds either via the sexual pathway or by apomixis. The seed development and maturation programs are critically dependent on miRNAs. To understand miRNA-mediated regulation of sea buckthorn seed development, eight small RNA libraries were constructed for deep sequencing from developing seeds of a low oil content line 'SJ1' and a high oil content line 'XE3'. High-throughput sequencing identified 137 known miRNA from 27 families and 264 novel miRNAs. The potential targets of the identified miRNAs were predicted based on sequence homology. Nineteen (four known and 15 novel) and 22 (six known and 16 novel) miRNAs were found to be involved in lipid biosynthesis and seed size, respectively. An integrated analysis of mRNA and miRNA transcriptome and qRT-PCR identified some key miRNAs and their targets (miR164d-ARF2, miR168b-Δ9D, novelmiRNA-108-ACC, novelmiRNA-23-GPD1, novelmiRNA-58-DGAT1, and novelmiRNA-191-DGAT2) potentially involved in seed size and lipid biosynthesis of sea buckthorn seed. These results indicate the potential importance of miRNAs in regulating lipid biosynthesis and seed size in sea buckthorn.
Human genetics and genomics a decade after the release of the draft sequence of the human genome.
Naidoo, Nasheen; Pawitan, Yudi; Soong, Richie; Cooper, David N; Ku, Chee-Seng
2011-10-01
Substantial progress has been made in human genetics and genomics research over the past ten years since the publication of the draft sequence of the human genome in 2001. Findings emanating directly from the Human Genome Project, together with those from follow-on studies, have had an enormous impact on our understanding of the architecture and function of the human genome. Major developments have been made in cataloguing genetic variation, the International HapMap Project, and with respect to advances in genotyping technologies. These developments are vital for the emergence of genome-wide association studies in the investigation of complex diseases and traits. In parallel, the advent of high-throughput sequencing technologies has ushered in the 'personal genome sequencing' era for both normal and cancer genomes, and made possible large-scale genome sequencing studies such as the 1000 Genomes Project and the International Cancer Genome Consortium. The high-throughput sequencing and sequence-capture technologies are also providing new opportunities to study Mendelian disorders through exome sequencing and whole-genome sequencing. This paper reviews these major developments in human genetics and genomics over the past decade.