Lu, Rong; Neff, Norma F.; Quake, Stephen R.; Weissman, Irving L.
2011-01-01
Disentangling cellular heterogeneity is a challenge in many fields, particularly in the stem cell and cancer biology fields. Here, we demonstrate how to combine viral genetic barcoding with high-throughput sequencing to track single cells in a heterogeneous population. We use this technique to track the in vivo differentiation of unitary hematopoietic stem cells (HSCs). The results are consistent with single cell transplantation studies, but require two orders of magnitude fewer mice. In addition to its high throughput, the high sensitivity of the technique allows for a direct examination of the clonality of sparse cell populations such as HSCs. We show how these capabilities offer a clonal perspective of the HSC differentiation process. In particular, our data suggests that HSCs do not equally contribute to blood cells after irradiation-mediated transplantation, and that two distinct HSC differentiation patterns co-exist in the same recipient mouse post irradiation. This technique can be applied to any viral accessible cell type for both in vitro and in vivo processes. PMID:21964413
Pagès, Hervé
2018-01-01
Biological experiments involving genomics or other high-throughput assays typically yield a data matrix that can be explored and analyzed using the R programming language with packages from the Bioconductor project. Improvements in the throughput of these assays have resulted in an explosion of data even from routine experiments, which poses a challenge to the existing computational infrastructure for statistical data analysis. For example, single-cell RNA sequencing (scRNA-seq) experiments frequently generate large matrices containing expression values for each gene in each cell, requiring sparse or file-backed representations for memory-efficient manipulation in R. These alternative representations are not easily compatible with high-performance C++ code used for computationally intensive tasks in existing R/Bioconductor packages. Here, we describe a C++ interface named beachmat, which enables agnostic data access from various matrix representations. This allows package developers to write efficient C++ code that is interoperable with dense, sparse and file-backed matrices, amongst others. We evaluated the performance of beachmat for accessing data from each matrix representation using both simulated and real scRNA-seq data, and defined a clear memory/speed trade-off to motivate the choice of an appropriate representation. We also demonstrate how beachmat can be incorporated into the code of other packages to drive analyses of a very large scRNA-seq data set. PMID:29723188
Lun, Aaron T L; Pagès, Hervé; Smith, Mike L
2018-05-01
Biological experiments involving genomics or other high-throughput assays typically yield a data matrix that can be explored and analyzed using the R programming language with packages from the Bioconductor project. Improvements in the throughput of these assays have resulted in an explosion of data even from routine experiments, which poses a challenge to the existing computational infrastructure for statistical data analysis. For example, single-cell RNA sequencing (scRNA-seq) experiments frequently generate large matrices containing expression values for each gene in each cell, requiring sparse or file-backed representations for memory-efficient manipulation in R. These alternative representations are not easily compatible with high-performance C++ code used for computationally intensive tasks in existing R/Bioconductor packages. Here, we describe a C++ interface named beachmat, which enables agnostic data access from various matrix representations. This allows package developers to write efficient C++ code that is interoperable with dense, sparse and file-backed matrices, amongst others. We evaluated the performance of beachmat for accessing data from each matrix representation using both simulated and real scRNA-seq data, and defined a clear memory/speed trade-off to motivate the choice of an appropriate representation. We also demonstrate how beachmat can be incorporated into the code of other packages to drive analyses of a very large scRNA-seq data set.
Zhang, L; Liu, X J
2016-06-03
With the rapid development of next-generation high-throughput sequencing technology, RNA-seq has become a standard and important technique for transcriptome analysis. For multi-sample RNA-seq data, the existing expression estimation methods usually deal with each single-RNA-seq sample, and ignore that the read distributions are consistent across multiple samples. In the current study, we propose a structured sparse regression method, SSRSeq, to estimate isoform expression using multi-sample RNA-seq data. SSRSeq uses a non-parameter model to capture the general tendency of non-uniformity read distribution for all genes across multiple samples. Additionally, our method adds a structured sparse regularization, which not only incorporates the sparse specificity between a gene and its corresponding isoform expression levels, but also reduces the effects of noisy reads, especially for lowly expressed genes and isoforms. Four real datasets were used to evaluate our method on isoform expression estimation. Compared with other popular methods, SSRSeq reduced the variance between multiple samples, and produced more accurate isoform expression estimations, and thus more meaningful biological interpretations.
[Current applications of high-throughput DNA sequencing technology in antibody drug research].
Yu, Xin; Liu, Qi-Gang; Wang, Ming-Rong
2012-03-01
Since the publication of a high-throughput DNA sequencing technology based on PCR reaction was carried out in oil emulsions in 2005, high-throughput DNA sequencing platforms have been evolved to a robust technology in sequencing genomes and diverse DNA libraries. Antibody libraries with vast numbers of members currently serve as a foundation of discovering novel antibody drugs, and high-throughput DNA sequencing technology makes it possible to rapidly identify functional antibody variants with desired properties. Herein we present a review of current applications of high-throughput DNA sequencing technology in the analysis of antibody library diversity, sequencing of CDR3 regions, identification of potent antibodies based on sequence frequency, discovery of functional genes, and combination with various display technologies, so as to provide an alternative approach of discovery and development of antibody drugs.
Verzotto, Davide; M Teo, Audrey S; Hillmer, Axel M; Nagarajan, Niranjan
2016-01-01
Resolution of complex repeat structures and rearrangements in the assembly and analysis of large eukaryotic genomes is often aided by a combination of high-throughput sequencing and genome-mapping technologies (for example, optical restriction mapping). In particular, mapping technologies can generate sparse maps of large DNA fragments (150 kilo base pairs (kbp) to 2 Mbp) and thus provide a unique source of information for disambiguating complex rearrangements in cancer genomes. Despite their utility, combining high-throughput sequencing and mapping technologies has been challenging because of the lack of efficient and sensitive map-alignment algorithms for robustly aligning error-prone maps to sequences. We introduce a novel seed-and-extend glocal (short for global-local) alignment method, OPTIMA (and a sliding-window extension for overlap alignment, OPTIMA-Overlap), which is the first to create indexes for continuous-valued mapping data while accounting for mapping errors. We also present a novel statistical model, agnostic with respect to technology-dependent error rates, for conservatively evaluating the significance of alignments without relying on expensive permutation-based tests. We show that OPTIMA and OPTIMA-Overlap outperform other state-of-the-art approaches (1.6-2 times more sensitive) and are more efficient (170-200 %) and precise in their alignments (nearly 99 % precision). These advantages are independent of the quality of the data, suggesting that our indexing approach and statistical evaluation are robust, provide improved sensitivity and guarantee high precision.
Velagapudi, Sai Pradeep; Luo, Yiling; Tran, Tuan; Haniff, Hafeez S; Nakai, Yoshio; Fallahi, Mohammad; Martinez, Gustavo J; Childs-Disney, Jessica L; Disney, Matthew D
2017-03-22
RNA drug targets are pervasive in cells, but methods to design small molecules that target them are sparse. Herein, we report a general approach to score the affinity and selectivity of RNA motif-small molecule interactions identified via selection. Named High Throughput Structure-Activity Relationships Through Sequencing (HiT-StARTS), HiT-StARTS is statistical in nature and compares input nucleic acid sequences to selected library members that bind a ligand via high throughput sequencing. The approach allowed facile definition of the fitness landscape of hundreds of thousands of RNA motif-small molecule binding partners. These results were mined against folded RNAs in the human transcriptome and identified an avid interaction between a small molecule and the Dicer nuclease-processing site in the oncogenic microRNA (miR)-18a hairpin precursor, which is a member of the miR-17-92 cluster. Application of the small molecule, Targapremir-18a, to prostate cancer cells inhibited production of miR-18a from the cluster, de-repressed serine/threonine protein kinase 4 protein (STK4), and triggered apoptosis. Profiling the cellular targets of Targapremir-18a via Chemical Cross-Linking and Isolation by Pull Down (Chem-CLIP), a covalent small molecule-RNA cellular profiling approach, and other studies showed specific binding of the compound to the miR-18a precursor, revealing broadly applicable factors that govern small molecule drugging of noncoding RNAs.
2017-01-01
RNA drug targets are pervasive in cells, but methods to design small molecules that target them are sparse. Herein, we report a general approach to score the affinity and selectivity of RNA motif–small molecule interactions identified via selection. Named High Throughput Structure–Activity Relationships Through Sequencing (HiT-StARTS), HiT-StARTS is statistical in nature and compares input nucleic acid sequences to selected library members that bind a ligand via high throughput sequencing. The approach allowed facile definition of the fitness landscape of hundreds of thousands of RNA motif–small molecule binding partners. These results were mined against folded RNAs in the human transcriptome and identified an avid interaction between a small molecule and the Dicer nuclease-processing site in the oncogenic microRNA (miR)-18a hairpin precursor, which is a member of the miR-17-92 cluster. Application of the small molecule, Targapremir-18a, to prostate cancer cells inhibited production of miR-18a from the cluster, de-repressed serine/threonine protein kinase 4 protein (STK4), and triggered apoptosis. Profiling the cellular targets of Targapremir-18a via Chemical Cross-Linking and Isolation by Pull Down (Chem-CLIP), a covalent small molecule–RNA cellular profiling approach, and other studies showed specific binding of the compound to the miR-18a precursor, revealing broadly applicable factors that govern small molecule drugging of noncoding RNAs. PMID:28386598
Lyons, Pierre J; Govaere, Louise; Crapoulet, Nicolas; Storey, Kenneth B; Morin, Pier Jr
2016-12-01
Significant physiological and biochemical changes are observed in freeze-tolerant insects when confronted with cold temperatures. These insects have adapted to winter by retreating into a hypometabolic state of diapause and implementing cryoprotective mechanisms that allow them to survive whole body freezing. MicroRNAs (miRNAs), a family of short ribonucleic acids, are emerging as likely molecular players underlying the process of cold adaptation. Unfortunately, the data is sparse concerning the signature of miRNAs that are modulated following cold exposure in the freeze-tolerant goldenrod gall fly Eurosta solidaginis. Leveraging for the first time a next-generation sequencing approach, differentially expressed miRNAs were evaluated in 5°C and -15°C-exposed E. solidaginis larvae. Next-generation sequencing expression data was subsequently validated by qRT-PCR for selected miRNA targets. Results demonstrate 24 differentially expressed freeze-responsive miRNAs. Notable, miR-1-3p, a miRNA modulated at low temperature in another cold-hardy insect, and miR-14-3p, a miRNA associated with stress response in the fruit fly, were shown to be significantly up-regulated in -15°C-exposed larvae. Overall, this work identifies, for the first time in a high-throughput manner, differentially expressed miRNAs in cold-exposed E. solidaginis larvae and further clarifies an emerging signature of miRNAs modulated at low temperatures in cold-hardy insects. Copyright © 2016 Elsevier Inc. All rights reserved.
BiQ Analyzer HT: locus-specific analysis of DNA methylation by high-throughput bisulfite sequencing
Lutsik, Pavlo; Feuerbach, Lars; Arand, Julia; Lengauer, Thomas; Walter, Jörn; Bock, Christoph
2011-01-01
Bisulfite sequencing is a widely used method for measuring DNA methylation in eukaryotic genomes. The assay provides single-base pair resolution and, given sufficient sequencing depth, its quantitative accuracy is excellent. High-throughput sequencing of bisulfite-converted DNA can be applied either genome wide or targeted to a defined set of genomic loci (e.g. using locus-specific PCR primers or DNA capture probes). Here, we describe BiQ Analyzer HT (http://biq-analyzer-ht.bioinf.mpi-inf.mpg.de/), a user-friendly software tool that supports locus-specific analysis and visualization of high-throughput bisulfite sequencing data. The software facilitates the shift from time-consuming clonal bisulfite sequencing to the more quantitative and cost-efficient use of high-throughput sequencing for studying locus-specific DNA methylation patterns. In addition, it is useful for locus-specific visualization of genome-wide bisulfite sequencing data. PMID:21565797
Deciphering the genomic targets of alkylating polyamide conjugates using high-throughput sequencing
Chandran, Anandhakumar; Syed, Junetha; Taylor, Rhys D.; Kashiwazaki, Gengo; Sato, Shinsuke; Hashiya, Kaori; Bando, Toshikazu; Sugiyama, Hiroshi
2016-01-01
Chemically engineered small molecules targeting specific genomic sequences play an important role in drug development research. Pyrrole-imidazole polyamides (PIPs) are a group of molecules that can bind to the DNA minor-groove and can be engineered to target specific sequences. Their biological effects rely primarily on their selective DNA binding. However, the binding mechanism of PIPs at the chromatinized genome level is poorly understood. Herein, we report a method using high-throughput sequencing to identify the DNA-alkylating sites of PIP-indole-seco-CBI conjugates. High-throughput sequencing analysis of conjugate 2 showed highly similar DNA-alkylating sites on synthetic oligos (histone-free DNA) and on human genomes (chromatinized DNA context). To our knowledge, this is the first report identifying alkylation sites across genomic DNA by alkylating PIP conjugates using high-throughput sequencing. PMID:27098039
2016-12-01
AWARD NUMBER: W81XWH-13-1-0371 TITLE: High-Throughput Sequencing of Germline and Tumor From Men with Early- Onset Metastatic Prostate Cancer...DATES COVERED 30 Sep 2013 - 29 Sep 2016 4. TITLE AND SUBTITLE 5a. CONTRACT NUMBER High-Throughput Sequencing of Germline and Tumor From Men with...presenting with metastatic prostate cancer at a young age (before age 60 years). Whole exome sequencing identified a panel of germline variants that have
Advances in high throughput DNA sequence data compression.
Sardaraz, Muhammad; Tahir, Muhammad; Ikram, Ataul Aziz
2016-06-01
Advances in high throughput sequencing technologies and reduction in cost of sequencing have led to exponential growth in high throughput DNA sequence data. This growth has posed challenges such as storage, retrieval, and transmission of sequencing data. Data compression is used to cope with these challenges. Various methods have been developed to compress genomic and sequencing data. In this article, we present a comprehensive review of compression methods for genome and reads compression. Algorithms are categorized as referential or reference free. Experimental results and comparative analysis of various methods for data compression are presented. Finally, key challenges and research directions in DNA sequence data compression are highlighted.
High-throughput sequence alignment using Graphics Processing Units
Schatz, Michael C; Trapnell, Cole; Delcher, Arthur L; Varshney, Amitabh
2007-01-01
Background The recent availability of new, less expensive high-throughput DNA sequencing technologies has yielded a dramatic increase in the volume of sequence data that must be analyzed. These data are being generated for several purposes, including genotyping, genome resequencing, metagenomics, and de novo genome assembly projects. Sequence alignment programs such as MUMmer have proven essential for analysis of these data, but researchers will need ever faster, high-throughput alignment tools running on inexpensive hardware to keep up with new sequence technologies. Results This paper describes MUMmerGPU, an open-source high-throughput parallel pairwise local sequence alignment program that runs on commodity Graphics Processing Units (GPUs) in common workstations. MUMmerGPU uses the new Compute Unified Device Architecture (CUDA) from nVidia to align multiple query sequences against a single reference sequence stored as a suffix tree. By processing the queries in parallel on the highly parallel graphics card, MUMmerGPU achieves more than a 10-fold speedup over a serial CPU version of the sequence alignment kernel, and outperforms the exact alignment component of MUMmer on a high end CPU by 3.5-fold in total application time when aligning reads from recent sequencing projects using Solexa/Illumina, 454, and Sanger sequencing technologies. Conclusion MUMmerGPU is a low cost, ultra-fast sequence alignment program designed to handle the increasing volume of data produced by new, high-throughput sequencing technologies. MUMmerGPU demonstrates that even memory-intensive applications can run significantly faster on the relatively low-cost GPU than on the CPU. PMID:18070356
USDA-ARS?s Scientific Manuscript database
Contigs with sequence similarities to several nucleorhabdoviruses were identified by high-throughput sequencing analysis from a black currant (Ribes nigrum L.) cultivar. The complete genomic sequence of this new nucleorhabdovirus is 14,432 nucleotides. Its genomic organization is typical of nucleorh...
Athavale, Ajay
2018-01-04
Ajay Athavale (Monsanto) presents "High Throughput Plasmid Sequencing with Illumina and CLC Bio" at the 7th Annual Sequencing, Finishing, Analysis in the Future (SFAF) Meeting held in June, 2012 in Santa Fe, NM.
The application of the high throughput sequencing technology in the transposable elements.
Liu, Zhen; Xu, Jian-hong
2015-09-01
High throughput sequencing technology has dramatically improved the efficiency of DNA sequencing, and decreased the costs to a great extent. Meanwhile, this technology usually has advantages of better specificity, higher sensitivity and accuracy. Therefore, it has been applied to the research on genetic variations, transcriptomics and epigenomics. Recently, this technology has been widely employed in the studies of transposable elements and has achieved fruitful results. In this review, we summarize the application of high throughput sequencing technology in the fields of transposable elements, including the estimation of transposon content, preference of target sites and distribution, insertion polymorphism and population frequency, identification of rare copies, transposon horizontal transfers as well as transposon tagging. We also briefly introduce the major common sequencing strategies and algorithms, their advantages and disadvantages, and the corresponding solutions. Finally, we envision the developing trends of high throughput sequencing technology, especially the third generation sequencing technology, and its application in transposon studies in the future, hopefully providing a comprehensive understanding and reference for related scientific researchers.
Molecular characterization of a novel Luteovirus from peach identified by high-throughput sequencing
USDA-ARS?s Scientific Manuscript database
Contigs with sequence homologies to Cherry-associated luteovirus were identified by high-throughput sequencing analysis of two peach accessions undergoing quarantine testing. The complete genomic sequences of the two isolates of this virus are 5,819 and 5,814 nucleotides. Their genome organization i...
Kizaki, Seiichiro; Chandran, Anandhakumar; Sugiyama, Hiroshi
2016-03-02
Tet (ten-eleven translocation) family proteins have the ability to oxidize 5-methylcytosine (mC) to 5-hydroxymethylcytosine (hmC), 5-formylcytosine (fC), and 5-carboxycytosine (caC). However, the oxidation reaction of Tet is not understood completely. Evaluation of genomic-level epigenetic changes by Tet protein requires unbiased identification of the highly selective oxidation sites. In this study, we used high-throughput sequencing to investigate the sequence specificity of mC oxidation by Tet1. A 6.6×10(4) -member mC-containing random DNA-sequence library was constructed. The library was subjected to Tet-reactive pulldown followed by high-throughput sequencing. Analysis of the obtained sequence data identified the Tet1-reactive sequences. We identified mCpG as a highly reactive sequence of Tet1 protein. © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
SPARSE: quadratic time simultaneous alignment and folding of RNAs without sequence-based heuristics
Will, Sebastian; Otto, Christina; Miladi, Milad; Möhl, Mathias; Backofen, Rolf
2015-01-01
Motivation: RNA-Seq experiments have revealed a multitude of novel ncRNAs. The gold standard for their analysis based on simultaneous alignment and folding suffers from extreme time complexity of O(n6). Subsequently, numerous faster ‘Sankoff-style’ approaches have been suggested. Commonly, the performance of such methods relies on sequence-based heuristics that restrict the search space to optimal or near-optimal sequence alignments; however, the accuracy of sequence-based methods breaks down for RNAs with sequence identities below 60%. Alignment approaches like LocARNA that do not require sequence-based heuristics, have been limited to high complexity (≥ quartic time). Results: Breaking this barrier, we introduce the novel Sankoff-style algorithm ‘sparsified prediction and alignment of RNAs based on their structure ensembles (SPARSE)’, which runs in quadratic time without sequence-based heuristics. To achieve this low complexity, on par with sequence alignment algorithms, SPARSE features strong sparsification based on structural properties of the RNA ensembles. Following PMcomp, SPARSE gains further speed-up from lightweight energy computation. Although all existing lightweight Sankoff-style methods restrict Sankoff’s original model by disallowing loop deletions and insertions, SPARSE transfers the Sankoff algorithm to the lightweight energy model completely for the first time. Compared with LocARNA, SPARSE achieves similar alignment and better folding quality in significantly less time (speedup: 3.7). At similar run-time, it aligns low sequence identity instances substantially more accurate than RAF, which uses sequence-based heuristics. Availability and implementation: SPARSE is freely available at http://www.bioinf.uni-freiburg.de/Software/SPARSE. Contact: backofen@informatik.uni-freiburg.de Supplementary information: Supplementary data are available at Bioinformatics online. PMID:25838465
USDA-ARS?s Scientific Manuscript database
Bermuda grass samples were examined by transmission electron microscopy and 28-30 nm spherical virus particles were observed. Total RNA from these plants was subjected to high throughput sequencing (HTS). The nearly full genome sequence of a previously uncharacterized Panicovirus was identified from...
USDA-ARS?s Scientific Manuscript database
Recent developments in high-throughput sequencing technology have made low-cost sequencing an attractive approach for many genome analysis tasks. Increasing read lengths, improving quality and the production of increasingly larger numbers of usable sequences per instrument-run continue to make whole...
Notes on implementation of sparsely distributed memory
NASA Technical Reports Server (NTRS)
Keeler, J. D.; Denning, P. J.
1986-01-01
The Sparsely Distributed Memory (SDM) developed by Kanerva is an unconventional memory design with very interesting and desirable properties. The memory works in a manner that is closely related to modern theories of human memory. The SDM model is discussed in terms of its implementation in hardware. Two appendices discuss the unconventional approaches of the SDM: Appendix A treats a resistive circuit for fast, parallel address decoding; and Appendix B treats a systolic array for high throughput read and write operations.
SPARSE: quadratic time simultaneous alignment and folding of RNAs without sequence-based heuristics.
Will, Sebastian; Otto, Christina; Miladi, Milad; Möhl, Mathias; Backofen, Rolf
2015-08-01
RNA-Seq experiments have revealed a multitude of novel ncRNAs. The gold standard for their analysis based on simultaneous alignment and folding suffers from extreme time complexity of [Formula: see text]. Subsequently, numerous faster 'Sankoff-style' approaches have been suggested. Commonly, the performance of such methods relies on sequence-based heuristics that restrict the search space to optimal or near-optimal sequence alignments; however, the accuracy of sequence-based methods breaks down for RNAs with sequence identities below 60%. Alignment approaches like LocARNA that do not require sequence-based heuristics, have been limited to high complexity ([Formula: see text] quartic time). Breaking this barrier, we introduce the novel Sankoff-style algorithm 'sparsified prediction and alignment of RNAs based on their structure ensembles (SPARSE)', which runs in quadratic time without sequence-based heuristics. To achieve this low complexity, on par with sequence alignment algorithms, SPARSE features strong sparsification based on structural properties of the RNA ensembles. Following PMcomp, SPARSE gains further speed-up from lightweight energy computation. Although all existing lightweight Sankoff-style methods restrict Sankoff's original model by disallowing loop deletions and insertions, SPARSE transfers the Sankoff algorithm to the lightweight energy model completely for the first time. Compared with LocARNA, SPARSE achieves similar alignment and better folding quality in significantly less time (speedup: 3.7). At similar run-time, it aligns low sequence identity instances substantially more accurate than RAF, which uses sequence-based heuristics. © The Author 2015. Published by Oxford University Press.
Liu, Gary W; Livesay, Brynn R; Kacherovsky, Nataly A; Cieslewicz, Maryelise; Lutz, Emi; Waalkes, Adam; Jensen, Michael C; Salipante, Stephen J; Pun, Suzie H
2015-08-19
Peptide ligands are used to increase the specificity of drug carriers to their target cells and to facilitate intracellular delivery. One method to identify such peptide ligands, phage display, enables high-throughput screening of peptide libraries for ligands binding to therapeutic targets of interest. However, conventional methods for identifying target binders in a library by Sanger sequencing are low-throughput, labor-intensive, and provide a limited perspective (<0.01%) of the complete sequence space. Moreover, the small sample space can be dominated by nonspecific, preferentially amplifying "parasitic sequences" and plastic-binding sequences, which may lead to the identification of false positives or exclude the identification of target-binding sequences. To overcome these challenges, we employed next-generation Illumina sequencing to couple high-throughput screening and high-throughput sequencing, enabling more comprehensive access to the phage display library sequence space. In this work, we define the hallmarks of binding sequences in next-generation sequencing data, and develop a method that identifies several target-binding phage clones for murine, alternatively activated M2 macrophages with a high (100%) success rate: sequences and binding motifs were reproducibly present across biological replicates; binding motifs were identified across multiple unique sequences; and an unselected, amplified library accurately filtered out parasitic sequences. In addition, we validate the Multiple Em for Motif Elicitation tool as an efficient and principled means of discovering binding sequences.
Besaratinia, Ahmad; Li, Haiqing; Yoon, Jae-In; Zheng, Albert; Gao, Hanlin; Tommasi, Stella
2012-01-01
Many carcinogens leave a unique mutational fingerprint in the human genome. These mutational fingerprints manifest as specific types of mutations often clustering at certain genomic loci in tumor genomes from carcinogen-exposed individuals. To develop a high-throughput method for detecting the mutational fingerprint of carcinogens, we have devised a cost-, time- and labor-effective strategy, in which the widely used transgenic Big Blue® mouse mutation detection assay is made compatible with the Roche/454 Genome Sequencer FLX Titanium next-generation sequencing technology. As proof of principle, we have used this novel method to establish the mutational fingerprints of three prominent carcinogens with varying mutagenic potencies, including sunlight ultraviolet radiation, 4-aminobiphenyl and secondhand smoke that are known to be strong, moderate and weak mutagens, respectively. For verification purposes, we have compared the mutational fingerprints of these carcinogens obtained by our newly developed method with those obtained by parallel analyses using the conventional low-throughput approach, that is, standard mutation detection assay followed by direct DNA sequencing using a capillary DNA sequencer. We demonstrate that this high-throughput next-generation sequencing-based method is highly specific and sensitive to detect the mutational fingerprints of the tested carcinogens. The method is reproducible, and its accuracy is comparable with that of the currently available low-throughput method. In conclusion, this novel method has the potential to move the field of carcinogenesis forward by allowing high-throughput analysis of mutations induced by endogenous and/or exogenous genotoxic agents. PMID:22735701
Besaratinia, Ahmad; Li, Haiqing; Yoon, Jae-In; Zheng, Albert; Gao, Hanlin; Tommasi, Stella
2012-08-01
Many carcinogens leave a unique mutational fingerprint in the human genome. These mutational fingerprints manifest as specific types of mutations often clustering at certain genomic loci in tumor genomes from carcinogen-exposed individuals. To develop a high-throughput method for detecting the mutational fingerprint of carcinogens, we have devised a cost-, time- and labor-effective strategy, in which the widely used transgenic Big Blue mouse mutation detection assay is made compatible with the Roche/454 Genome Sequencer FLX Titanium next-generation sequencing technology. As proof of principle, we have used this novel method to establish the mutational fingerprints of three prominent carcinogens with varying mutagenic potencies, including sunlight ultraviolet radiation, 4-aminobiphenyl and secondhand smoke that are known to be strong, moderate and weak mutagens, respectively. For verification purposes, we have compared the mutational fingerprints of these carcinogens obtained by our newly developed method with those obtained by parallel analyses using the conventional low-throughput approach, that is, standard mutation detection assay followed by direct DNA sequencing using a capillary DNA sequencer. We demonstrate that this high-throughput next-generation sequencing-based method is highly specific and sensitive to detect the mutational fingerprints of the tested carcinogens. The method is reproducible, and its accuracy is comparable with that of the currently available low-throughput method. In conclusion, this novel method has the potential to move the field of carcinogenesis forward by allowing high-throughput analysis of mutations induced by endogenous and/or exogenous genotoxic agents.
Khan, Arifa S; Vacante, Dominick A; Cassart, Jean-Pol; Ng, Siemon H S; Lambert, Christophe; Charlebois, Robert L; King, Kathryn E
Several nucleic-acid based technologies have recently emerged with capabilities for broad virus detection. One of these, high throughput sequencing, has the potential for novel virus detection because this method does not depend upon prior viral sequence knowledge. However, the use of high throughput sequencing for testing biologicals poses greater challenges as compared to other newly introduced tests due to its technical complexities and big data bioinformatics. Thus, the Advanced Virus Detection Technologies Users Group was formed as a joint effort by regulatory and industry scientists to facilitate discussions and provide a forum for sharing data and experiences using advanced new virus detection technologies, with a focus on high throughput sequencing technologies. The group was initiated as a task force that was coordinated by the Parenteral Drug Association and subsequently became the Advanced Virus Detection Technologies Interest Group to continue efforts for using new technologies for detection of adventitious viruses with broader participation, including international government agencies, academia, and technology service providers. © PDA, Inc. 2016.
High Throughput Sequence Analysis for Disease Resistance in Maize
USDA-ARS?s Scientific Manuscript database
Preliminary results of a computational analysis of high throughput sequencing data from Zea mays and the fungus Aspergillus are reported. The Illumina Genome Analyzer was used to sequence RNA samples from two strains of Z. mays (Va35 and Mp313) collected over a time course as well as several specie...
Loeffler 4.0: Diagnostic Metagenomics.
Höper, Dirk; Wylezich, Claudia; Beer, Martin
2017-01-01
A new world of possibilities for "virus discovery" was opened up with high-throughput sequencing becoming available in the last decade. While scientifically metagenomic analysis was established before the start of the era of high-throughput sequencing, the availability of the first second-generation sequencers was the kick-off for diagnosticians to use sequencing for the detection of novel pathogens. Today, diagnostic metagenomics is becoming the standard procedure for the detection and genetic characterization of new viruses or novel virus variants. Here, we provide an overview about technical considerations of high-throughput sequencing-based diagnostic metagenomics together with selected examples of "virus discovery" for animal diseases or zoonoses and metagenomics for food safety or basic veterinary research. © 2017 Elsevier Inc. All rights reserved.
Li, Zhoufang; Liu, Guangjie; Tong, Yin; Zhang, Meng; Xu, Ying; Qin, Li; Wang, Zhanhui; Chen, Xiaoping; He, Jiankui
2015-01-01
Profiling immune repertoires by high throughput sequencing enhances our understanding of immune system complexity and immune-related diseases in humans. Previously, cloning and Sanger sequencing identified limited numbers of T cell receptor (TCR) nucleotide sequences in rhesus monkeys, thus their full immune repertoire is unknown. We applied multiplex PCR and Illumina high throughput sequencing to study the TCRβ of rhesus monkeys. We identified 1.26 million TCRβ sequences corresponding to 643,570 unique TCRβ sequences and 270,557 unique complementarity-determining region 3 (CDR3) gene sequences. Precise measurements of CDR3 length distribution, CDR3 amino acid distribution, length distribution of N nucleotide of junctional region, and TCRV and TCRJ gene usage preferences were performed. A comprehensive profile of rhesus monkey immune repertoire might aid human infectious disease studies using rhesus monkeys. PMID:25961410
Zador, Anthony M.; Dubnau, Joshua; Oyibo, Hassana K.; Zhan, Huiqing; Cao, Gang; Peikon, Ian D.
2012-01-01
Connectivity determines the function of neural circuits. Historically, circuit mapping has usually been viewed as a problem of microscopy, but no current method can achieve high-throughput mapping of entire circuits with single neuron precision. Here we describe a novel approach to determining connectivity. We propose BOINC (“barcoding of individual neuronal connections”), a method for converting the problem of connectivity into a form that can be read out by high-throughput DNA sequencing. The appeal of using sequencing is that its scale—sequencing billions of nucleotides per day is now routine—is a natural match to the complexity of neural circuits. An inexpensive high-throughput technique for establishing circuit connectivity at single neuron resolution could transform neuroscience research. PMID:23109909
Taggart, David J.; Camerlengo, Terry L.; Harrison, Jason K.; Sherrer, Shanen M.; Kshetry, Ajay K.; Taylor, John-Stephen; Huang, Kun; Suo, Zucai
2013-01-01
Cellular genomes are constantly damaged by endogenous and exogenous agents that covalently and structurally modify DNA to produce DNA lesions. Although most lesions are mended by various DNA repair pathways in vivo, a significant number of damage sites persist during genomic replication. Our understanding of the mutagenic outcomes derived from these unrepaired DNA lesions has been hindered by the low throughput of existing sequencing methods. Therefore, we have developed a cost-effective high-throughput short oligonucleotide sequencing assay that uses next-generation DNA sequencing technology for the assessment of the mutagenic profiles of translesion DNA synthesis catalyzed by any error-prone DNA polymerase. The vast amount of sequencing data produced were aligned and quantified by using our novel software. As an example, the high-throughput short oligonucleotide sequencing assay was used to analyze the types and frequencies of mutations upstream, downstream and at a site-specifically placed cis–syn thymidine–thymidine dimer generated individually by three lesion-bypass human Y-family DNA polymerases. PMID:23470999
AmpliVar: mutation detection in high-throughput sequence from amplicon-based libraries.
Hsu, Arthur L; Kondrashova, Olga; Lunke, Sebastian; Love, Clare J; Meldrum, Cliff; Marquis-Nicholson, Renate; Corboy, Greg; Pham, Kym; Wakefield, Matthew; Waring, Paul M; Taylor, Graham R
2015-04-01
Conventional means of identifying variants in high-throughput sequencing align each read against a reference sequence, and then call variants at each position. Here, we demonstrate an orthogonal means of identifying sequence variation by grouping the reads as amplicons prior to any alignment. We used AmpliVar to make key-value hashes of sequence reads and group reads as individual amplicons using a table of flanking sequences. Low-abundance reads were removed according to a selectable threshold, and reads above this threshold were aligned as groups, rather than as individual reads, permitting the use of sensitive alignment tools. We show that this approach is more sensitive, more specific, and more computationally efficient than comparable methods for the analysis of amplicon-based high-throughput sequencing data. The method can be extended to enable alignment-free confirmation of variants seen in hybridization capture target-enrichment data. © 2015 WILEY PERIODICALS, INC.
Erlich, Yaniv; Gordon, Assaf; Brand, Michael; Hannon, Gregory J.; Mitra, Partha P.
2011-01-01
Over the past three decades we have steadily increased our knowledge on the genetic basis of many severe disorders. Nevertheless, there are still great challenges in applying this knowledge routinely in the clinic, mainly due to the relatively tedious and expensive process of genotyping. Since the genetic variations that underlie the disorders are relatively rare in the population, they can be thought of as a sparse signal. Using methods and ideas from compressed sensing and group testing, we have developed a cost-effective genotyping protocol to detect carriers for severe genetic disorders. In particular, we have adapted our scheme to a recently developed class of high throughput DNA sequencing technologies. The mathematical framework presented here has some important distinctions from the ’traditional’ compressed sensing and group testing frameworks in order to address biological and technical constraints of our setting. PMID:21451737
Sparse models for correlative and integrative analysis of imaging and genetic data
Lin, Dongdong; Cao, Hongbao; Calhoun, Vince D.
2014-01-01
The development of advanced medical imaging technologies and high-throughput genomic measurements has enhanced our ability to understand their interplay as well as their relationship with human behavior by integrating these two types of datasets. However, the high dimensionality and heterogeneity of these datasets presents a challenge to conventional statistical methods; there is a high demand for the development of both correlative and integrative analysis approaches. Here, we review our recent work on developing sparse representation based approaches to address this challenge. We show how sparse models are applied to the correlation and integration of imaging and genetic data for biomarker identification. We present examples on how these approaches are used for the detection of risk genes and classification of complex diseases such as schizophrenia. Finally, we discuss future directions on the integration of multiple imaging and genomic datasets including their interactions such as epistasis. PMID:25218561
High-Throughput Block Optical DNA Sequence Identification.
Sagar, Dodderi Manjunatha; Korshoj, Lee Erik; Hanson, Katrina Bethany; Chowdhury, Partha Pratim; Otoupal, Peter Britton; Chatterjee, Anushree; Nagpal, Prashant
2018-01-01
Optical techniques for molecular diagnostics or DNA sequencing generally rely on small molecule fluorescent labels, which utilize light with a wavelength of several hundred nanometers for detection. Developing a label-free optical DNA sequencing technique will require nanoscale focusing of light, a high-throughput and multiplexed identification method, and a data compression technique to rapidly identify sequences and analyze genomic heterogeneity for big datasets. Such a method should identify characteristic molecular vibrations using optical spectroscopy, especially in the "fingerprinting region" from ≈400-1400 cm -1 . Here, surface-enhanced Raman spectroscopy is used to demonstrate label-free identification of DNA nucleobases with multiplexed 3D plasmonic nanofocusing. While nanometer-scale mode volumes prevent identification of single nucleobases within a DNA sequence, the block optical technique can identify A, T, G, and C content in DNA k-mers. The content of each nucleotide in a DNA block can be a unique and high-throughput method for identifying sequences, genes, and other biomarkers as an alternative to single-letter sequencing. Additionally, coupling two complementary vibrational spectroscopy techniques (infrared and Raman) can improve block characterization. These results pave the way for developing a novel, high-throughput block optical sequencing method with lossy genomic data compression using k-mer identification from multiplexed optical data acquisition. © 2017 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
High-throughput sequencing methods to study neuronal RNA-protein interactions.
Ule, Jernej
2009-12-01
UV-cross-linking and RNase protection, combined with high-throughput sequencing, have provided global maps of RNA sites bound by individual proteins or ribosomes. Using a stringent purification protocol, UV-CLIP (UV-cross-linking and immunoprecipitation) was able to identify intronic and exonic sites bound by splicing regulators in mouse brain tissue. Ribosome profiling has been used to quantify ribosome density on budding yeast mRNAs under different environmental conditions. Post-transcriptional regulation in neurons requires high spatial and temporal precision, as is evident from the role of localized translational control in synaptic plasticity. It remains to be seen if the high-throughput methods can be applied quantitatively to study the dynamics of RNP (ribonucleoprotein) remodelling in specific neuronal populations during the neurodegenerative process. It is certain, however, that applications of new biochemical techniques followed by high-throughput sequencing will continue to provide important insights into the mechanisms of neuronal post-transcriptional regulation.
Niland, Courtney N.; Jankowsky, Eckhard; Harris, Michael E.
2016-01-01
Quantification of the specificity of RNA binding proteins and RNA processing enzymes is essential to understanding their fundamental roles in biological processes. High Throughput Sequencing Kinetics (HTS-Kin) uses high throughput sequencing and internal competition kinetics to simultaneously monitor the processing rate constants of thousands of substrates by RNA processing enzymes. This technique has provided unprecedented insight into the substrate specificity of the tRNA processing endonuclease ribonuclease P. Here, we investigate the accuracy and robustness of measurements associated with each step of the HTS-Kin procedure. We examine the effect of substrate concentration on the observed rate constant, determine the optimal kinetic parameters, and provide guidelines for reducing error in amplification of the substrate population. Importantly, we find that high-throughput sequencing, and experimental reproducibility contribute their own sources of error, and these are the main sources of imprecision in the quantified results when otherwise optimized guidelines are followed. PMID:27296633
Fujimori, Shigeo; Hirai, Naoya; Ohashi, Hiroyuki; Masuoka, Kazuyo; Nishikimi, Akihiko; Fukui, Yoshinori; Washio, Takanori; Oshikubo, Tomohiro; Yamashita, Tatsuhiro; Miyamoto-Sato, Etsuko
2012-01-01
Next-generation sequencing (NGS) has been applied to various kinds of omics studies, resulting in many biological and medical discoveries. However, high-throughput protein-protein interactome datasets derived from detection by sequencing are scarce, because protein-protein interaction analysis requires many cell manipulations to examine the interactions. The low reliability of the high-throughput data is also a problem. Here, we describe a cell-free display technology combined with NGS that can improve both the coverage and reliability of interactome datasets. The completely cell-free method gives a high-throughput and a large detection space, testing the interactions without using clones. The quantitative information provided by NGS reduces the number of false positives. The method is suitable for the in vitro detection of proteins that interact not only with the bait protein, but also with DNA, RNA and chemical compounds. Thus, it could become a universal approach for exploring the large space of protein sequences and interactome networks. PMID:23056904
Using a multifrontal sparse solver in a high performance, finite element code
NASA Technical Reports Server (NTRS)
King, Scott D.; Lucas, Robert; Raefsky, Arthur
1990-01-01
We consider the performance of the finite element method on a vector supercomputer. The computationally intensive parts of the finite element method are typically the individual element forms and the solution of the global stiffness matrix both of which are vectorized in high performance codes. To further increase throughput, new algorithms are needed. We compare a multifrontal sparse solver to a traditional skyline solver in a finite element code on a vector supercomputer. The multifrontal solver uses the Multiple-Minimum Degree reordering heuristic to reduce the number of operations required to factor a sparse matrix and full matrix computational kernels (e.g., BLAS3) to enhance vector performance. The net result in an order-of-magnitude reduction in run time for a finite element application on one processor of a Cray X-MP.
High-throughput sequencing: a failure mode analysis.
Yang, George S; Stott, Jeffery M; Smailus, Duane; Barber, Sarah A; Balasundaram, Miruna; Marra, Marco A; Holt, Robert A
2005-01-04
Basic manufacturing principles are becoming increasingly important in high-throughput sequencing facilities where there is a constant drive to increase quality, increase efficiency, and decrease operating costs. While high-throughput centres report failure rates typically on the order of 10%, the causes of sporadic sequencing failures are seldom analyzed in detail and have not, in the past, been formally reported. Here we report the results of a failure mode analysis of our production sequencing facility based on detailed evaluation of 9,216 ESTs generated from two cDNA libraries. Two categories of failures are described; process-related failures (failures due to equipment or sample handling) and template-related failures (failures that are revealed by close inspection of electropherograms and are likely due to properties of the template DNA sequence itself). Preventative action based on a detailed understanding of failure modes is likely to improve the performance of other production sequencing pipelines.
Notredame, Cedric
2018-05-02
Cedric Notredame from the Centre for Genomic Regulation gives a presentation on New Challenges of the Computation of Multiple Sequence Alignments in the High-Throughput Era at the JGI/Argonne HPC Workshop on January 26, 2010.
Evaluation of Sequencing Approaches for High-Throughput Transcriptomics - (BOSC)
Whole-genome in vitro transcriptomics has shown the capability to identify mechanisms of action and estimates of potency for chemical-mediated effects in a toxicological framework, but with limited throughput and high cost. The generation of high-throughput global gene expression...
The promise and challenge of high-throughput sequencing of the antibody repertoire
Georgiou, George; Ippolito, Gregory C; Beausang, John; Busse, Christian E; Wardemann, Hedda; Quake, Stephen R
2014-01-01
Efforts to determine the antibody repertoire encoded by B cells in the blood or lymphoid organs using high-throughput DNA sequencing technologies have been advancing at an extremely rapid pace and are transforming our understanding of humoral immune responses. Information gained from high-throughput DNA sequencing of immunoglobulin genes (Ig-seq) can be applied to detect B-cell malignancies with high sensitivity, to discover antibodies specific for antigens of interest, to guide vaccine development and to understand autoimmunity. Rapid progress in the development of experimental protocols and informatics analysis tools is helping to reduce sequencing artifacts, to achieve more precise quantification of clonal diversity and to extract the most pertinent biological information. That said, broader application of Ig-seq, especially in clinical settings, will require the development of a standardized experimental design framework that will enable the sharing and meta-analysis of sequencing data generated by different laboratories. PMID:24441474
Pseudouridines have context-dependent mutation and stop rates in high-throughput sequencing.
Zhou, Katherine I; Clark, Wesley C; Pan, David W; Eckwahl, Matthew J; Dai, Qing; Pan, Tao
2018-05-11
The abundant RNA modification pseudouridine (Ψ) has been mapped transcriptome-wide by chemically modifying pseudouridines with carbodiimide and detecting the resulting reverse transcription stops in high-throughput sequencing. However, these methods have limited sensitivity and specificity, in part due to the use of reverse transcription stops. We sought to use mutations rather than just stops in sequencing data to identify pseudouridine sites. Here, we identify reverse transcription conditions that allow read-through of carbodiimide-modified pseudouridine (CMC-Ψ), and we show that pseudouridines in carbodiimide-treated human ribosomal RNA have context-dependent mutation and stop rates in high-throughput sequencing libraries prepared under these conditions. Furthermore, accounting for the context-dependence of mutation and stop rates can enhance the detection of pseudouridine sites. Similar approaches could contribute to the sequencing-based detection of many RNA modifications.
Discovery of viruses and virus-like pathogens in pistachio using high-throughput sequencing
USDA-ARS?s Scientific Manuscript database
Pistachio (Pistacia vera L.) trees from the National Clonal Germplasm Repository (NCGR) and orchards in California were surveyed for viruses and virus-like agents by high-throughput sequencing (HTS). Analyses of 60 trees including clonal UCB-1 hybrid rootstock (P. atlantica × P. integerrima) identif...
Vinner, Lasse; Mourier, Tobias; Friis-Nielsen, Jens; Gniadecki, Robert; Dybkaer, Karen; Rosenberg, Jacob; Langhoff, Jill Levin; Cruz, David Flores Santa; Fonager, Jannik; Izarzugaza, Jose M G; Gupta, Ramneek; Sicheritz-Ponten, Thomas; Brunak, Søren; Willerslev, Eske; Nielsen, Lars Peter; Hansen, Anders Johannes
2015-08-19
Although nearly one fifth of all human cancers have an infectious aetiology, the causes for the majority of cancers remain unexplained. Despite the enormous data output from high-throughput shotgun sequencing, viral DNA in a clinical sample typically constitutes a proportion of host DNA that is too small to be detected. Sequence variation among virus genomes complicates application of sequence-specific, and highly sensitive, PCR methods. Therefore, we aimed to develop and characterize a method that permits sensitive detection of sequences despite considerable variation. We demonstrate that our low-stringency in-solution hybridization method enables detection of <100 viral copies. Furthermore, distantly related proviral sequences may be enriched by orders of magnitude, enabling discovery of hitherto unknown viral sequences by high-throughput sequencing. The sensitivity was sufficient to detect retroviral sequences in clinical samples. We used this method to conduct an investigation for novel retrovirus in samples from three cancer types. In accordance with recent studies our investigation revealed no retroviral infections in human B-cell lymphoma cells, cutaneous T-cell lymphoma or colorectal cancer biopsies. Nonetheless, our generally applicable method makes sensitive detection possible and permits sequencing of distantly related sequences from complex material.
Tome, Jacob M; Ozer, Abdullah; Pagano, John M; Gheba, Dan; Schroth, Gary P; Lis, John T
2014-06-01
RNA-protein interactions play critical roles in gene regulation, but methods to quantitatively analyze these interactions at a large scale are lacking. We have developed a high-throughput sequencing-RNA affinity profiling (HiTS-RAP) assay by adapting a high-throughput DNA sequencer to quantify the binding of fluorescently labeled protein to millions of RNAs anchored to sequenced cDNA templates. Using HiTS-RAP, we measured the affinity of mutagenized libraries of GFP-binding and NELF-E-binding aptamers to their respective targets and identified critical regions of interaction. Mutations additively affected the affinity of the NELF-E-binding aptamer, whose interaction depended mainly on a single-stranded RNA motif, but not that of the GFP aptamer, whose interaction depended primarily on secondary structure.
Genome sequencing in microfabricated high-density picolitre reactors.
Margulies, Marcel; Egholm, Michael; Altman, William E; Attiya, Said; Bader, Joel S; Bemben, Lisa A; Berka, Jan; Braverman, Michael S; Chen, Yi-Ju; Chen, Zhoutao; Dewell, Scott B; Du, Lei; Fierro, Joseph M; Gomes, Xavier V; Godwin, Brian C; He, Wen; Helgesen, Scott; Ho, Chun Heen; Ho, Chun He; Irzyk, Gerard P; Jando, Szilveszter C; Alenquer, Maria L I; Jarvie, Thomas P; Jirage, Kshama B; Kim, Jong-Bum; Knight, James R; Lanza, Janna R; Leamon, John H; Lefkowitz, Steven M; Lei, Ming; Li, Jing; Lohman, Kenton L; Lu, Hong; Makhijani, Vinod B; McDade, Keith E; McKenna, Michael P; Myers, Eugene W; Nickerson, Elizabeth; Nobile, John R; Plant, Ramona; Puc, Bernard P; Ronan, Michael T; Roth, George T; Sarkis, Gary J; Simons, Jan Fredrik; Simpson, John W; Srinivasan, Maithreyan; Tartaro, Karrie R; Tomasz, Alexander; Vogt, Kari A; Volkmer, Greg A; Wang, Shally H; Wang, Yong; Weiner, Michael P; Yu, Pengguang; Begley, Richard F; Rothberg, Jonathan M
2005-09-15
The proliferation of large-scale DNA-sequencing projects in recent years has driven a search for alternative methods to reduce time and cost. Here we describe a scalable, highly parallel sequencing system with raw throughput significantly greater than that of state-of-the-art capillary electrophoresis instruments. The apparatus uses a novel fibre-optic slide of individual wells and is able to sequence 25 million bases, at 99% or better accuracy, in one four-hour run. To achieve an approximately 100-fold increase in throughput over current Sanger sequencing technology, we have developed an emulsion method for DNA amplification and an instrument for sequencing by synthesis using a pyrosequencing protocol optimized for solid support and picolitre-scale volumes. Here we show the utility, throughput, accuracy and robustness of this system by shotgun sequencing and de novo assembly of the Mycoplasma genitalium genome with 96% coverage at 99.96% accuracy in one run of the machine.
Sasagawa, Yohei; Danno, Hiroki; Takada, Hitomi; Ebisawa, Masashi; Tanaka, Kaori; Hayashi, Tetsutaro; Kurisaki, Akira; Nikaido, Itoshi
2018-03-09
High-throughput single-cell RNA-seq methods assign limited unique molecular identifier (UMI) counts as gene expression values to single cells from shallow sequence reads and detect limited gene counts. We thus developed a high-throughput single-cell RNA-seq method, Quartz-Seq2, to overcome these issues. Our improvements in the reaction steps make it possible to effectively convert initial reads to UMI counts, at a rate of 30-50%, and detect more genes. To demonstrate the power of Quartz-Seq2, we analyzed approximately 10,000 transcriptomes from in vitro embryonic stem cells and an in vivo stromal vascular fraction with a limited number of reads.
Pediatric Glioblastoma Therapies Based on Patient-Derived Stem Cell Resources
2014-11-01
genomic DNA and then subjected to Illumina high-throughput sequencing . In this analysis, shRNAs lost in the GSC population represent candidate gene...and genomic DNA and then subjected to Illumina high-throughput sequencing . In this analysis, shRNAs lost in the GSC population represent candidate...PRISM 7900 Sequence Detection System ( Genomics Resource, FHCRC). Relative transcript abundance was analyzed using the 2−ΔΔCt method. TRIzol (Invitrogen
DOE Office of Scientific and Technical Information (OSTI.GOV)
Leung, Elo; Huang, Amy; Cadag, Eithon
In this study, we introduce the Protein Sequence Annotation Tool (PSAT), a web-based, sequence annotation meta-server for performing integrated, high-throughput, genome-wide sequence analyses. Our goals in building PSAT were to (1) create an extensible platform for integration of multiple sequence-based bioinformatics tools, (2) enable functional annotations and enzyme predictions over large input protein fasta data sets, and (3) provide a web interface for convenient execution of the tools. In this paper, we demonstrate the utility of PSAT by annotating the predicted peptide gene products of Herbaspirillum sp. strain RV1423, importing the results of PSAT into EC2KEGG, and using the resultingmore » functional comparisons to identify a putative catabolic pathway, thereby distinguishing RV1423 from a well annotated Herbaspirillum species. This analysis demonstrates that high-throughput enzyme predictions, provided by PSAT processing, can be used to identify metabolic potential in an otherwise poorly annotated genome. Lastly, PSAT is a meta server that combines the results from several sequence-based annotation and function prediction codes, and is available at http://psat.llnl.gov/psat/. PSAT stands apart from other sequencebased genome annotation systems in providing a high-throughput platform for rapid de novo enzyme predictions and sequence annotations over large input protein sequence data sets in FASTA. PSAT is most appropriately applied in annotation of large protein FASTA sets that may or may not be associated with a single genome.« less
Leung, Elo; Huang, Amy; Cadag, Eithon; ...
2016-01-20
In this study, we introduce the Protein Sequence Annotation Tool (PSAT), a web-based, sequence annotation meta-server for performing integrated, high-throughput, genome-wide sequence analyses. Our goals in building PSAT were to (1) create an extensible platform for integration of multiple sequence-based bioinformatics tools, (2) enable functional annotations and enzyme predictions over large input protein fasta data sets, and (3) provide a web interface for convenient execution of the tools. In this paper, we demonstrate the utility of PSAT by annotating the predicted peptide gene products of Herbaspirillum sp. strain RV1423, importing the results of PSAT into EC2KEGG, and using the resultingmore » functional comparisons to identify a putative catabolic pathway, thereby distinguishing RV1423 from a well annotated Herbaspirillum species. This analysis demonstrates that high-throughput enzyme predictions, provided by PSAT processing, can be used to identify metabolic potential in an otherwise poorly annotated genome. Lastly, PSAT is a meta server that combines the results from several sequence-based annotation and function prediction codes, and is available at http://psat.llnl.gov/psat/. PSAT stands apart from other sequencebased genome annotation systems in providing a high-throughput platform for rapid de novo enzyme predictions and sequence annotations over large input protein sequence data sets in FASTA. PSAT is most appropriately applied in annotation of large protein FASTA sets that may or may not be associated with a single genome.« less
Barron, Martin; Zhang, Siyuan
2018-01-01
Abstract Cell types in cell populations change as the condition changes: some cell types die out, new cell types may emerge and surviving cell types evolve to adapt to the new condition. Using single-cell RNA-sequencing data that measure the gene expression of cells before and after the condition change, we propose an algorithm, SparseDC, which identifies cell types, traces their changes across conditions and identifies genes which are marker genes for these changes. By solving a unified optimization problem, SparseDC completes all three tasks simultaneously. SparseDC is highly computationally efficient and demonstrates its accuracy on both simulated and real data. PMID:29140455
USDA-ARS?s Scientific Manuscript database
The effect of refrigeration on bacterial communities within raw and pasteurized buffalo milk was studied using high-throughput sequencing. High quality samples of raw buffalo milk were obtained from five dairy farms in the Guangxi province of China. A sample of each milk was pasteurized, and both r...
Targeted Capture and High-Throughput Sequencing Using Molecular Inversion Probes (MIPs).
Cantsilieris, Stuart; Stessman, Holly A; Shendure, Jay; Eichler, Evan E
2017-01-01
Molecular inversion probes (MIPs) in combination with massively parallel DNA sequencing represent a versatile, yet economical tool for targeted sequencing of genomic DNA. Several thousand genomic targets can be selectively captured using long oligonucleotides containing unique targeting arms and universal linkers. The ability to append sequencing adaptors and sample-specific barcodes allows large-scale pooling and subsequent high-throughput sequencing at relatively low cost per sample. Here, we describe a "wet bench" protocol detailing the capture and subsequent sequencing of >2000 genomic targets from 192 samples, representative of a single lane on the Illumina HiSeq 2000 platform.
Gerrard, Gareth; Valgañón, Mikel; Foong, Hui En; Kasperaviciute, Dalia; Iskander, Deena; Game, Laurence; Müller, Michael; Aitman, Timothy J; Roberts, Irene; de la Fuente, Josu; Foroni, Letizia; Karadimitris, Anastasios
2013-08-01
Diamond-Blackfan anaemia (DBA) is caused by inactivating mutations in ribosomal protein (RP) genes, with mutations in 13 of the 80 RP genes accounting for 50-60% of cases. The remaining 40-50% cases may harbour mutations in one of the remaining RP genes, but the very low frequencies render conventional genetic screening as challenging. We, therefore, applied custom enrichment technology combined with high-throughput sequencing to screen all 80 RP genes. Using this approach, we identified and validated inactivating mutations in 15/17 (88%) DBA patients. Target enrichment combined with high-throughput sequencing is a robust and improved methodology for the genetic diagnosis of DBA. © 2013 John Wiley & Sons Ltd.
Overcoming bias and systematic errors in next generation sequencing data.
Taub, Margaret A; Corrada Bravo, Hector; Irizarry, Rafael A
2010-12-10
Considerable time and effort has been spent in developing analysis and quality assessment methods to allow the use of microarrays in a clinical setting. As is the case for microarrays and other high-throughput technologies, data from new high-throughput sequencing technologies are subject to technological and biological biases and systematic errors that can impact downstream analyses. Only when these issues can be readily identified and reliably adjusted for will clinical applications of these new technologies be feasible. Although much work remains to be done in this area, we describe consistently observed biases that should be taken into account when analyzing high-throughput sequencing data. In this article, we review current knowledge about these biases, discuss their impact on analysis results, and propose solutions.
Wu, L-P; Yang, T; Liu, H-W; Postman, J; Li, R
2018-05-01
A large contig with sequence similarities to several nucleorhabdoviruses was identified by high-throughput sequencing analysis from a black currant (Ribes nigrum L.) cultivar. The complete genome sequence of this new nucleorhabdovirus is 14,432 nucleotides long. Its genomic organization is very similar to those of unsegmented plant rhabdoviruses, containing six open reading frames in the order 3'-N-P-P3-M-G-L-5. The virus, which is provisionally named "black currant-associated rhabdovirus", is 41-52% identical in its genome nucleotide sequence to other nucleorhabdoviruses and may represent a new species in the genus Nucleorhabdovirus.
Ozer, Abdullah; Tome, Jacob M.; Friedman, Robin C.; Gheba, Dan; Schroth, Gary P.; Lis, John T.
2016-01-01
Because RNA-protein interactions play a central role in a wide-array of biological processes, methods that enable a quantitative assessment of these interactions in a high-throughput manner are in great demand. Recently, we developed the High Throughput Sequencing-RNA Affinity Profiling (HiTS-RAP) assay, which couples sequencing on an Illumina GAIIx with the quantitative assessment of one or several proteins’ interactions with millions of different RNAs in a single experiment. We have successfully used HiTS-RAP to analyze interactions of EGFP and NELF-E proteins with their corresponding canonical and mutant RNA aptamers. Here, we provide a detailed protocol for HiTS-RAP, which can be completed in about a month (8 days hands-on time) including the preparation and testing of recombinant proteins and DNA templates, clustering DNA templates on a flowcell, high-throughput sequencing and protein binding with GAIIx, and finally data analysis. We also highlight aspects of HiTS-RAP that can be further improved and points of comparison between HiTS-RAP and two other recently developed methods, RNA-MaP and RBNS. A successful HiTS-RAP experiment provides the sequence and binding curves for approximately 200 million RNAs in a single experiment. PMID:26182240
A high-throughput multiplex method adapted for GMO detection.
Chaouachi, Maher; Chupeau, Gaëlle; Berard, Aurélie; McKhann, Heather; Romaniuk, Marcel; Giancola, Sandra; Laval, Valérie; Bertheau, Yves; Brunel, Dominique
2008-12-24
A high-throughput multiplex assay for the detection of genetically modified organisms (GMO) was developed on the basis of the existing SNPlex method designed for SNP genotyping. This SNPlex assay allows the simultaneous detection of up to 48 short DNA sequences (approximately 70 bp; "signature sequences") from taxa endogenous reference genes, from GMO constructions, screening targets, construct-specific, and event-specific targets, and finally from donor organisms. This assay avoids certain shortcomings of multiplex PCR-based methods already in widespread use for GMO detection. The assay demonstrated high specificity and sensitivity. The results suggest that this assay is reliable, flexible, and cost- and time-effective for high-throughput GMO detection.
SMARTIV: combined sequence and structure de-novo motif discovery for in-vivo RNA binding data.
Polishchuk, Maya; Paz, Inbal; Yakhini, Zohar; Mandel-Gutfreund, Yael
2018-05-25
Gene expression regulation is highly dependent on binding of RNA-binding proteins (RBPs) to their RNA targets. Growing evidence supports the notion that both RNA primary sequence and its local secondary structure play a role in specific Protein-RNA recognition and binding. Despite the great advance in high-throughput experimental methods for identifying sequence targets of RBPs, predicting the specific sequence and structure binding preferences of RBPs remains a major challenge. We present a novel webserver, SMARTIV, designed for discovering and visualizing combined RNA sequence and structure motifs from high-throughput RNA-binding data, generated from in-vivo experiments. The uniqueness of SMARTIV is that it predicts motifs from enriched k-mers that combine information from ranked RNA sequences and their predicted secondary structure, obtained using various folding methods. Consequently, SMARTIV generates Position Weight Matrices (PWMs) in a combined sequence and structure alphabet with assigned P-values. SMARTIV concisely represents the sequence and structure motif content as a single graphical logo, which is informative and easy for visual perception. SMARTIV was examined extensively on a variety of high-throughput binding experiments for RBPs from different families, generated from different technologies, showing consistent and accurate results. Finally, SMARTIV is a user-friendly webserver, highly efficient in run-time and freely accessible via http://smartiv.technion.ac.il/.
Shibata, Kazuhiro; Itoh, Masayoshi; Aizawa, Katsunori; Nagaoka, Sumiharu; Sasaki, Nobuya; Carninci, Piero; Konno, Hideaki; Akiyama, Junichi; Nishi, Katsuo; Kitsunai, Tokuji; Tashiro, Hideo; Itoh, Mari; Sumi, Noriko; Ishii, Yoshiyuki; Nakamura, Shin; Hazama, Makoto; Nishine, Tsutomu; Harada, Akira; Yamamoto, Rintaro; Matsumoto, Hiroyuki; Sakaguchi, Sumito; Ikegami, Takashi; Kashiwagi, Katsuya; Fujiwake, Syuji; Inoue, Kouji; Togawa, Yoshiyuki; Izawa, Masaki; Ohara, Eiji; Watahiki, Masanori; Yoneda, Yuko; Ishikawa, Tomokazu; Ozawa, Kaori; Tanaka, Takumi; Matsuura, Shuji; Kawai, Jun; Okazaki, Yasushi; Muramatsu, Masami; Inoue, Yorinao; Kira, Akira; Hayashizaki, Yoshihide
2000-01-01
The RIKEN high-throughput 384-format sequencing pipeline (RISA system) including a 384-multicapillary sequencer (the so-called RISA sequencer) was developed for the RIKEN mouse encyclopedia project. The RISA system consists of colony picking, template preparation, sequencing reaction, and the sequencing process. A novel high-throughput 384-format capillary sequencer system (RISA sequencer system) was developed for the sequencing process. This system consists of a 384-multicapillary auto sequencer (RISA sequencer), a 384-multicapillary array assembler (CAS), and a 384-multicapillary casting device. The RISA sequencer can simultaneously analyze 384 independent sequencing products. The optical system is a scanning system chosen after careful comparison with an image detection system for the simultaneous detection of the 384-capillary array. This scanning system can be used with any fluorescent-labeled sequencing reaction (chain termination reaction), including transcriptional sequencing based on RNA polymerase, which was originally developed by us, and cycle sequencing based on thermostable DNA polymerase. For long-read sequencing, 380 out of 384 sequences (99.2%) were successfully analyzed and the average read length, with more than 99% accuracy, was 654.4 bp. A single RISA sequencer can analyze 216 kb with >99% accuracy in 2.7 h (90 kb/h). For short-read sequencing to cluster the 3′ end and 5′ end sequencing by reading 350 bp, 384 samples can be analyzed in 1.5 h. We have also developed a RISA inoculator, RISA filtrator and densitometer, RISA plasmid preparator which can handle throughput of 40,000 samples in 17.5 h, and a high-throughput RISA thermal cycler which has four 384-well sites. The combination of these technologies allowed us to construct the RISA system consisting of 16 RISA sequencers, which can process 50,000 DNA samples per day. One haploid genome shotgun sequence of a higher organism, such as human, mouse, rat, domestic animals, and plants, can be revealed by seven RISA systems within one month. PMID:11076861
D. Lee Taylor; Michael G. Booth; Jack W. McFarland; Ian C. Herriott; Niall J. Lennon; Chad Nusbaum; Thomas G. Marr
2008-01-01
High throughput sequencing methods are widely used in analyses of microbial diversity but are generally applied to small numbers of samples, which precludes charaterization of patterns of microbial diversity across space and time. We have designed a primer-tagging approach that allows pooling and subsequent sorting of numerous samples, which is directed to...
Large-Scale Biomonitoring of Remote and Threatened Ecosystems via High-Throughput Sequencing
Gibson, Joel F.; Shokralla, Shadi; Curry, Colin; Baird, Donald J.; Monk, Wendy A.; King, Ian; Hajibabaei, Mehrdad
2015-01-01
Biodiversity metrics are critical for assessment and monitoring of ecosystems threatened by anthropogenic stressors. Existing sorting and identification methods are too expensive and labour-intensive to be scaled up to meet management needs. Alternately, a high-throughput DNA sequencing approach could be used to determine biodiversity metrics from bulk environmental samples collected as part of a large-scale biomonitoring program. Here we show that both morphological and DNA sequence-based analyses are suitable for recovery of individual taxonomic richness, estimation of proportional abundance, and calculation of biodiversity metrics using a set of 24 benthic samples collected in the Peace-Athabasca Delta region of Canada. The high-throughput sequencing approach was able to recover all metrics with a higher degree of taxonomic resolution than morphological analysis. The reduced cost and increased capacity of DNA sequence-based approaches will finally allow environmental monitoring programs to operate at the geographical and temporal scale required by industrial and regulatory end-users. PMID:26488407
Tahir, Muhammad N; Lockhart, Ben; Grinstead, Samuel; Mollov, Dimitre
2017-04-01
Bermuda grass samples were examined by transmission electron microscopy and 28-30 nm spherical virus particles were observed. Total RNA from these plants was subjected to high-throughput sequencing (HTS). The nearly full genome sequence of a panicovirus was identified from one HTS scaffold. Sanger sequencing was used to confirm the HTS results and complete the genome sequence of 4404 nt. This virus was provisionally named Bermuda grass latent virus (BGLV). Its predicted open reading frames follow the typical arrangement of the genus Panicovirus. Based on sequence comparisons and phylogenetic analyses BGLV differs from other viruses and therefore taxonomically it is a new member of the genus Panicovirus, family Tombusviridae.
Ovaskainen, Otso; Schigel, Dmitry; Ali-Kovero, Heini; Auvinen, Petri; Paulin, Lars; Nordén, Björn; Nordén, Jenni
2013-01-01
Before the recent revolution in molecular biology, field studies on fungal communities were mostly confined to fruit bodies, whereas mycelial interactions were studied in the laboratory. Here we combine high-throughput sequencing with a fruit body inventory to study simultaneously mycelial and fruit body occurrences in a community of fungi inhabiting dead wood of Norway spruce. We studied mycelial occurrence by extracting DNA from wood samples followed by 454-sequencing of the ITS1 and ITS2 regions and an automated procedure for species identification. In total, we detected 198 species as mycelia and 137 species as fruit bodies. The correlation between mycelial and fruit body occurrences was high for the majority of the species, suggesting that high-throughput sequencing can successfully characterize the dominating fungal communities, despite possible biases related to sampling, PCR, sequencing and molecular identification. We used the fruit body and molecular data to test hypothesized links between life history and population dynamic parameters. We show that the species that have on average a high mycelial abundance also have a high fruiting rate and produce large fruit bodies, leading to a positive feedback loop in their population dynamics. Earlier studies have shown that species with specialized resource requirements are rarely seen fruiting, for which reason they are often classified as red-listed. We show with the help of high-throughput sequencing that some of these species are more abundant as mycelium in wood than what could be expected from their occurrence as fruit bodies. PMID:23575372
Sources of PCR-induced distortions in high-throughput sequencing data sets
Kebschull, Justus M.; Zador, Anthony M.
2015-01-01
PCR permits the exponential and sequence-specific amplification of DNA, even from minute starting quantities. PCR is a fundamental step in preparing DNA samples for high-throughput sequencing. However, there are errors associated with PCR-mediated amplification. Here we examine the effects of four important sources of error—bias, stochasticity, template switches and polymerase errors—on sequence representation in low-input next-generation sequencing libraries. We designed a pool of diverse PCR amplicons with a defined structure, and then used Illumina sequencing to search for signatures of each process. We further developed quantitative models for each process, and compared predictions of these models to our experimental data. We find that PCR stochasticity is the major force skewing sequence representation after amplification of a pool of unique DNA amplicons. Polymerase errors become very common in later cycles of PCR but have little impact on the overall sequence distribution as they are confined to small copy numbers. PCR template switches are rare and confined to low copy numbers. Our results provide a theoretical basis for removing distortions from high-throughput sequencing data. In addition, our findings on PCR stochasticity will have particular relevance to quantification of results from single cell sequencing, in which sequences are represented by only one or a few molecules. PMID:26187991
Research progress of plant population genomics based on high-throughput sequencing.
Wang, Yun-sheng
2016-08-01
Population genomics, a new paradigm for population genetics, combine the concepts and techniques of genomics with the theoretical system of population genetics and improve our understanding of microevolution through identification of site-specific effect and genome-wide effects using genome-wide polymorphic sites genotypeing. With the appearance and improvement of the next generation high-throughput sequencing technology, the numbers of plant species with complete genome sequences increased rapidly and large scale resequencing has also been carried out in recent years. Parallel sequencing has also been done in some plant species without complete genome sequences. These studies have greatly promoted the development of population genomics and deepened our understanding of the genetic diversity, level of linking disequilibium, selection effect, demographical history and molecular mechanism of complex traits of relevant plant population at a genomic level. In this review, I briely introduced the concept and research methods of population genomics and summarized the research progress of plant population genomics based on high-throughput sequencing. I also discussed the prospect as well as existing problems of plant population genomics in order to provide references for related studies.
Novel method for high-throughput colony PCR screening in nanoliter-reactors
Walser, Marcel; Pellaux, Rene; Meyer, Andreas; Bechtold, Matthias; Vanderschuren, Herve; Reinhardt, Richard; Magyar, Joseph; Panke, Sven; Held, Martin
2009-01-01
We introduce a technology for the rapid identification and sequencing of conserved DNA elements employing a novel suspension array based on nanoliter (nl)-reactors made from alginate. The reactors have a volume of 35 nl and serve as reaction compartments during monoseptic growth of microbial library clones, colony lysis, thermocycling and screening for sequence motifs via semi-quantitative fluorescence analyses. nl-Reactors were kept in suspension during all high-throughput steps which allowed performing the protocol in a highly space-effective fashion and at negligible expenses of consumables and reagents. As a first application, 11 high-quality microsatellites for polymorphism studies in cassava were isolated and sequenced out of a library of 20 000 clones in 2 days. The technology is widely scalable and we envision that throughputs for nl-reactor based screenings can be increased up to 100 000 and more samples per day thereby efficiently complementing protocols based on established deep-sequencing technologies. PMID:19282448
Mobile element biology – new possibilities with high-throughput sequencing
Xing, Jinchuan; Witherspoon, David J.; Jorde, Lynn B.
2014-01-01
Mobile elements compose more than half of the human genome, but until recently their large-scale detection was time-consuming and challenging. With the development of new high-throughput sequencing technologies, the complete spectrum of mobile element variation in humans can now be identified and analyzed. Thousands of new mobile element insertions have been discovered, yielding new insights into mobile element biology, evolution, and genomic variation. We review several high-throughput methods, with an emphasis on techniques that specifically target mobile element insertions in humans, and we highlight recent applications of these methods in evolutionary studies and in the analysis of somatic alterations in human cancers. PMID:23312846
Mapping of disease-associated variants in admixed populations
2011-01-01
Recent developments in high-throughput genotyping and whole-genome sequencing will enhance the identification of disease loci in admixed populations. We discuss how a more refined estimation of ancestry benefits both admixture mapping and association mapping, making disease loci identification in admixed populations more powerful. High-throughput genotyping and sequencing will enable refined estimation of ancestry, thus enhancing disease loci identification in admixed populations PMID:21635713
Camilo, Cesar M; Lima, Gustavo M A; Maluf, Fernando V; Guido, Rafael V C; Polikarpov, Igor
2016-01-01
Following burgeoning genomic and transcriptomic sequencing data, biochemical and molecular biology groups worldwide are implementing high-throughput cloning and mutagenesis facilities in order to obtain a large number of soluble proteins for structural and functional characterization. Since manual primer design can be a time-consuming and error-generating step, particularly when working with hundreds of targets, the automation of primer design process becomes highly desirable. HTP-OligoDesigner was created to provide the scientific community with a simple and intuitive online primer design tool for both laboratory-scale and high-throughput projects of sequence-independent gene cloning and site-directed mutagenesis and a Tm calculator for quick queries.
YAMAT-seq: an efficient method for high-throughput sequencing of mature transfer RNAs
Shigematsu, Megumi; Honda, Shozo; Loher, Phillipe; Telonis, Aristeidis G.; Rigoutsos, Isidore
2017-01-01
Abstract Besides translation, transfer RNAs (tRNAs) play many non-canonical roles in various biological pathways and exhibit highly variable expression profiles. To unravel the emerging complexities of tRNA biology and molecular mechanisms underlying them, an efficient tRNA sequencing method is required. However, the rigid structure of tRNA has been presenting a challenge to the development of such methods. We report the development of Y-shaped Adapter-ligated MAture TRNA sequencing (YAMAT-seq), an efficient and convenient method for high-throughput sequencing of mature tRNAs. YAMAT-seq circumvents the issue of inefficient adapter ligation, a characteristic of conventional RNA sequencing methods for mature tRNAs, by employing the efficient and specific ligation of Y-shaped adapter to mature tRNAs using T4 RNA Ligase 2. Subsequent cDNA amplification and next-generation sequencing successfully yield numerous mature tRNA sequences. YAMAT-seq has high specificity for mature tRNAs and high sensitivity to detect most isoacceptors from minute amount of total RNA. Moreover, YAMAT-seq shows quantitative capability to estimate expression levels of mature tRNAs, and has high reproducibility and broad applicability for various cell lines. YAMAT-seq thus provides high-throughput technique for identifying tRNA profiles and their regulations in various transcriptomes, which could play important regulatory roles in translation and other biological processes. PMID:28108659
Huang, Yu-An; You, Zhu-Hong; Chen, Xing; Yan, Gui-Ying
2016-12-23
Protein-protein interactions (PPIs) are essential to most biological processes. Since bioscience has entered into the era of genome and proteome, there is a growing demand for the knowledge about PPI network. High-throughput biological technologies can be used to identify new PPIs, but they are expensive, time-consuming, and tedious. Therefore, computational methods for predicting PPIs have an important role. For the past years, an increasing number of computational methods such as protein structure-based approaches have been proposed for predicting PPIs. The major limitation in principle of these methods lies in the prior information of the protein to infer PPIs. Therefore, it is of much significance to develop computational methods which only use the information of protein amino acids sequence. Here, we report a highly efficient approach for predicting PPIs. The main improvements come from the use of a novel protein sequence representation by combining continuous wavelet descriptor and Chou's pseudo amino acid composition (PseAAC), and from adopting weighted sparse representation based classifier (WSRC). This method, cross-validated on the PPIs datasets of Saccharomyces cerevisiae, Human and H. pylori, achieves an excellent results with accuracies as high as 92.50%, 95.54% and 84.28% respectively, significantly better than previously proposed methods. Extensive experiments are performed to compare the proposed method with state-of-the-art Support Vector Machine (SVM) classifier. The outstanding results yield by our model that the proposed feature extraction method combing two kinds of descriptors have strong expression ability and are expected to provide comprehensive and effective information for machine learning-based classification models. In addition, the prediction performance in the comparison experiments shows the well cooperation between the combined feature and WSRC. Thus, the proposed method is a very efficient method to predict PPIs and may be a useful supplementary tool for future proteomics studies.
Stiffler, Michael A; Subramanian, Subu K; Salinas, Victor H; Ranganathan, Rama
2016-07-03
Site-directed mutagenesis has long been used as a method to interrogate protein structure, function and evolution. Recent advances in massively-parallel sequencing technology have opened up the possibility of assessing the functional or fitness effects of large numbers of mutations simultaneously. Here, we present a protocol for experimentally determining the effects of all possible single amino acid mutations in a protein of interest utilizing high-throughput sequencing technology, using the 263 amino acid antibiotic resistance enzyme TEM-1 β-lactamase as an example. In this approach, a whole-protein saturation mutagenesis library is constructed by site-directed mutagenic PCR, randomizing each position individually to all possible amino acids. The library is then transformed into bacteria, and selected for the ability to confer resistance to β-lactam antibiotics. The fitness effect of each mutation is then determined by deep sequencing of the library before and after selection. Importantly, this protocol introduces methods which maximize sequencing read depth and permit the simultaneous selection of the entire mutation library, by mixing adjacent positions into groups of length accommodated by high-throughput sequencing read length and utilizing orthogonal primers to barcode each group. Representative results using this protocol are provided by assessing the fitness effects of all single amino acid mutations in TEM-1 at a clinically relevant dosage of ampicillin. The method should be easily extendable to other proteins for which a high-throughput selection assay is in place.
2014-01-01
Background RNA sequencing (RNA-seq) is emerging as a critical approach in biological research. However, its high-throughput advantage is significantly limited by the capacity of bioinformatics tools. The research community urgently needs user-friendly tools to efficiently analyze the complicated data generated by high throughput sequencers. Results We developed a standalone tool with graphic user interface (GUI)-based analytic modules, known as eRNA. The capacity of performing parallel processing and sample management facilitates large data analyses by maximizing hardware usage and freeing users from tediously handling sequencing data. The module miRNA identification” includes GUIs for raw data reading, adapter removal, sequence alignment, and read counting. The module “mRNA identification” includes GUIs for reference sequences, genome mapping, transcript assembling, and differential expression. The module “Target screening” provides expression profiling analyses and graphic visualization. The module “Self-testing” offers the directory setups, sample management, and a check for third-party package dependency. Integration of other GUIs including Bowtie, miRDeep2, and miRspring extend the program’s functionality. Conclusions eRNA focuses on the common tools required for the mapping and quantification analysis of miRNA-seq and mRNA-seq data. The software package provides an additional choice for scientists who require a user-friendly computing environment and high-throughput capacity for large data analysis. eRNA is available for free download at https://sourceforge.net/projects/erna/?source=directory. PMID:24593312
Yuan, Tiezheng; Huang, Xiaoyi; Dittmar, Rachel L; Du, Meijun; Kohli, Manish; Boardman, Lisa; Thibodeau, Stephen N; Wang, Liang
2014-03-05
RNA sequencing (RNA-seq) is emerging as a critical approach in biological research. However, its high-throughput advantage is significantly limited by the capacity of bioinformatics tools. The research community urgently needs user-friendly tools to efficiently analyze the complicated data generated by high throughput sequencers. We developed a standalone tool with graphic user interface (GUI)-based analytic modules, known as eRNA. The capacity of performing parallel processing and sample management facilitates large data analyses by maximizing hardware usage and freeing users from tediously handling sequencing data. The module miRNA identification" includes GUIs for raw data reading, adapter removal, sequence alignment, and read counting. The module "mRNA identification" includes GUIs for reference sequences, genome mapping, transcript assembling, and differential expression. The module "Target screening" provides expression profiling analyses and graphic visualization. The module "Self-testing" offers the directory setups, sample management, and a check for third-party package dependency. Integration of other GUIs including Bowtie, miRDeep2, and miRspring extend the program's functionality. eRNA focuses on the common tools required for the mapping and quantification analysis of miRNA-seq and mRNA-seq data. The software package provides an additional choice for scientists who require a user-friendly computing environment and high-throughput capacity for large data analysis. eRNA is available for free download at https://sourceforge.net/projects/erna/?source=directory.
A new fungal large subunit ribosomal RNA primer for high throughput sequencing surveys
DOE Office of Scientific and Technical Information (OSTI.GOV)
Mueller, Rebecca C.; Gallegos-Graves, La Verne; Kuske, Cheryl R.
The inclusion of phylogenetic metrics in community ecology has provided insights into important ecological processes, particularly when combined with high-throughput sequencing methods; however, these approaches have not been widely used in studies of fungal communities relative to other microbial groups. Two obstacles have been considered: (1) the internal transcribed spacer (ITS) region has limited utility for constructing phylogenies and (2) most PCR primers that target the large subunit (LSU) ribosomal unit generate amplicons that exceed current limits of high-throughput sequencing platforms. We designed and tested a PCR primer (LR22R) to target approximately 300–400 bp region of the D2 hypervariable regionmore » of the fungal LSU for use with the Illumina MiSeq platform. Both in silico and empirical analyses showed that the LR22R–LR3 pair captured a broad range of fungal taxonomic groups with a small fraction of non-fungal groups. Phylogenetic placement of publically available LSU D2 sequences showed broad agreement with taxonomic classification. Comparisons of the LSU D2 and the ITS2 ribosomal regions from environmental samples and known communities showed similar discriminatory abilities of the two primer sets. Altogether, these findings show that the LR22R–LR3 primer pair has utility for phylogenetic analyses of fungal communities using high-throughput sequencing methods.« less
A new fungal large subunit ribosomal RNA primer for high throughput sequencing surveys
Mueller, Rebecca C.; Gallegos-Graves, La Verne; Kuske, Cheryl R.
2015-12-09
The inclusion of phylogenetic metrics in community ecology has provided insights into important ecological processes, particularly when combined with high-throughput sequencing methods; however, these approaches have not been widely used in studies of fungal communities relative to other microbial groups. Two obstacles have been considered: (1) the internal transcribed spacer (ITS) region has limited utility for constructing phylogenies and (2) most PCR primers that target the large subunit (LSU) ribosomal unit generate amplicons that exceed current limits of high-throughput sequencing platforms. We designed and tested a PCR primer (LR22R) to target approximately 300–400 bp region of the D2 hypervariable regionmore » of the fungal LSU for use with the Illumina MiSeq platform. Both in silico and empirical analyses showed that the LR22R–LR3 pair captured a broad range of fungal taxonomic groups with a small fraction of non-fungal groups. Phylogenetic placement of publically available LSU D2 sequences showed broad agreement with taxonomic classification. Comparisons of the LSU D2 and the ITS2 ribosomal regions from environmental samples and known communities showed similar discriminatory abilities of the two primer sets. Altogether, these findings show that the LR22R–LR3 primer pair has utility for phylogenetic analyses of fungal communities using high-throughput sequencing methods.« less
Quigley, Lisa; O'Sullivan, Orla; Beresford, Tom P.; Ross, R. Paul; Fitzgerald, Gerald F.
2012-01-01
Here, high-throughput sequencing was employed to reveal the highly diverse bacterial populations present in 62 Irish artisanal cheeses and, in some cases, associated cheese rinds. Using this approach, we revealed the presence of several genera not previously associated with cheese, including Faecalibacterium, Prevotella, and Helcococcus and, for the first time, detected the presence of Arthrobacter and Brachybacterium in goats' milk cheese. Our analysis confirmed many previously observed patterns, such as the dominance of typical cheese bacteria, the fact that the microbiota of raw and pasteurized milk cheeses differ, and that the level of cheese maturation has a significant influence on Lactobacillus populations. It was also noted that cheeses containing adjunct ingredients had lower proportions of Lactococcus species. It is thus apparent that high-throughput sequencing-based investigations can provide valuable insights into the microbial populations of artisanal foods. PMID:22685131
Quigley, Lisa; O'Sullivan, Orla; Beresford, Tom P; Ross, R Paul; Fitzgerald, Gerald F; Cotter, Paul D
2012-08-01
Here, high-throughput sequencing was employed to reveal the highly diverse bacterial populations present in 62 Irish artisanal cheeses and, in some cases, associated cheese rinds. Using this approach, we revealed the presence of several genera not previously associated with cheese, including Faecalibacterium, Prevotella, and Helcococcus and, for the first time, detected the presence of Arthrobacter and Brachybacterium in goats' milk cheese. Our analysis confirmed many previously observed patterns, such as the dominance of typical cheese bacteria, the fact that the microbiota of raw and pasteurized milk cheeses differ, and that the level of cheese maturation has a significant influence on Lactobacillus populations. It was also noted that cheeses containing adjunct ingredients had lower proportions of Lactococcus species. It is thus apparent that high-throughput sequencing-based investigations can provide valuable insights into the microbial populations of artisanal foods.
Designing small universal k-mer hitting sets for improved analysis of high-throughput sequencing
Kingsford, Carl
2017-01-01
With the rapidly increasing volume of deep sequencing data, more efficient algorithms and data structures are needed. Minimizers are a central recent paradigm that has improved various sequence analysis tasks, including hashing for faster read overlap detection, sparse suffix arrays for creating smaller indexes, and Bloom filters for speeding up sequence search. Here, we propose an alternative paradigm that can lead to substantial further improvement in these and other tasks. For integers k and L > k, we say that a set of k-mers is a universal hitting set (UHS) if every possible L-long sequence must contain a k-mer from the set. We develop a heuristic called DOCKS to find a compact UHS, which works in two phases: The first phase is solved optimally, and for the second we propose several efficient heuristics, trading set size for speed and memory. The use of heuristics is motivated by showing the NP-hardness of a closely related problem. We show that DOCKS works well in practice and produces UHSs that are very close to a theoretical lower bound. We present results for various values of k and L and by applying them to real genomes show that UHSs indeed improve over minimizers. In particular, DOCKS uses less than 30% of the 10-mers needed to span the human genome compared to minimizers. The software and computed UHSs are freely available at github.com/Shamir-Lab/DOCKS/ and acgt.cs.tau.ac.il/docks/, respectively. PMID:28968408
Short-read, high-throughput sequencing technology for STR genotyping
Bornman, Daniel M.; Hester, Mark E.; Schuetter, Jared M.; Kasoji, Manjula D.; Minard-Smith, Angela; Barden, Curt A.; Nelson, Scott C.; Godbold, Gene D.; Baker, Christine H.; Yang, Boyu; Walther, Jacquelyn E.; Tornes, Ivan E.; Yan, Pearlly S.; Rodriguez, Benjamin; Bundschuh, Ralf; Dickens, Michael L.; Young, Brian A.; Faith, Seth A.
2013-01-01
DNA-based methods for human identification principally rely upon genotyping of short tandem repeat (STR) loci. Electrophoretic-based techniques for variable-length classification of STRs are universally utilized, but are limited in that they have relatively low throughput and do not yield nucleotide sequence information. High-throughput sequencing technology may provide a more powerful instrument for human identification, but is not currently validated for forensic casework. Here, we present a systematic method to perform high-throughput genotyping analysis of the Combined DNA Index System (CODIS) STR loci using short-read (150 bp) massively parallel sequencing technology. Open source reference alignment tools were optimized to evaluate PCR-amplified STR loci using a custom designed STR genome reference. Evaluation of this approach demonstrated that the 13 CODIS STR loci and amelogenin (AMEL) locus could be accurately called from individual and mixture samples. Sensitivity analysis showed that as few as 18,500 reads, aligned to an in silico referenced genome, were required to genotype an individual (>99% confidence) for the CODIS loci. The power of this technology was further demonstrated by identification of variant alleles containing single nucleotide polymorphisms (SNPs) and the development of quantitative measurements (reads) for resolving mixed samples. PMID:25621315
Cartwright, Reed A; Hussin, Julie; Keebler, Jonathan E M; Stone, Eric A; Awadalla, Philip
2012-01-06
Recent advances in high-throughput DNA sequencing technologies and associated statistical analyses have enabled in-depth analysis of whole-genome sequences. As this technology is applied to a growing number of individual human genomes, entire families are now being sequenced. Information contained within the pedigree of a sequenced family can be leveraged when inferring the donors' genotypes. The presence of a de novo mutation within the pedigree is indicated by a violation of Mendelian inheritance laws. Here, we present a method for probabilistically inferring genotypes across a pedigree using high-throughput sequencing data and producing the posterior probability of de novo mutation at each genomic site examined. This framework can be used to disentangle the effects of germline and somatic mutational processes and to simultaneously estimate the effect of sequencing error and the initial genetic variation in the population from which the founders of the pedigree arise. This approach is examined in detail through simulations and areas for method improvement are noted. By applying this method to data from members of a well-defined nuclear family with accurate pedigree information, the stage is set to make the most direct estimates of the human mutation rate to date.
Oono, Ryoko
2017-01-01
High-throughput sequencing technology has helped microbial community ecologists explore ecological and evolutionary patterns at unprecedented scales. The benefits of a large sample size still typically outweigh that of greater sequencing depths per sample for accurate estimations of ecological inferences. However, excluding or not sequencing rare taxa may mislead the answers to the questions 'how and why are communities different?' This study evaluates the confidence intervals of ecological inferences from high-throughput sequencing data of foliar fungal endophytes as case studies through a range of sampling efforts, sequencing depths, and taxonomic resolutions to understand how technical and analytical practices may affect our interpretations. Increasing sampling size reliably decreased confidence intervals across multiple community comparisons. However, the effects of sequencing depths on confidence intervals depended on how rare taxa influenced the dissimilarity estimates among communities and did not significantly decrease confidence intervals for all community comparisons. A comparison of simulated communities under random drift suggests that sequencing depths are important in estimating dissimilarities between microbial communities under neutral selective processes. Confidence interval analyses reveal important biases as well as biological trends in microbial community studies that otherwise may be ignored when communities are only compared for statistically significant differences.
2017-01-01
High-throughput sequencing technology has helped microbial community ecologists explore ecological and evolutionary patterns at unprecedented scales. The benefits of a large sample size still typically outweigh that of greater sequencing depths per sample for accurate estimations of ecological inferences. However, excluding or not sequencing rare taxa may mislead the answers to the questions ‘how and why are communities different?’ This study evaluates the confidence intervals of ecological inferences from high-throughput sequencing data of foliar fungal endophytes as case studies through a range of sampling efforts, sequencing depths, and taxonomic resolutions to understand how technical and analytical practices may affect our interpretations. Increasing sampling size reliably decreased confidence intervals across multiple community comparisons. However, the effects of sequencing depths on confidence intervals depended on how rare taxa influenced the dissimilarity estimates among communities and did not significantly decrease confidence intervals for all community comparisons. A comparison of simulated communities under random drift suggests that sequencing depths are important in estimating dissimilarities between microbial communities under neutral selective processes. Confidence interval analyses reveal important biases as well as biological trends in microbial community studies that otherwise may be ignored when communities are only compared for statistically significant differences. PMID:29253889
Hou, Weiguo; Wang, Shang; Briggs, Brandon R; Li, Gaoyuan; Xie, Wei; Dong, Hailiang
2018-01-01
Myocyanophages, a group of viruses infecting cyanobacteria, are abundant and play important roles in elemental cycling. Here we investigated the particle-associated viral communities retained on 0.2 μm filters and in sediment samples (representing ancient cyanophage communities) from four ocean and three lake locations, using high-throughput sequencing and a newly designed primer pair targeting a gene fragment (∼145-bp in length) encoding the cyanophage gp23 major capsid protein (MCP). Diverse viral communities were detected in all samples. The fragments of 142-, 145-, and 148-bp in length were most abundant in the amplicons, and most sequences (>92%) belonged to cyanophages. Additionally, different sequencing depths resulted in different diversity estimates of the viral community. Operational taxonomic units obtained from deep sequencing of the MCP gene covered the majority of those obtained from shallow sequencing, suggesting that deep sequencing exhibited a more complete picture of cyanophage community than shallow sequencing. Our results also revealed a wide geographic distribution of marine myocyanophages, i.e., higher dissimilarities of the myocyanophage communities corresponded with the larger distances between the sampling sites. Collectively, this study suggests that the newly designed primer pair can be effectively used to study the community and diversity of myocyanophage from different environments, and the high-throughput sequencing represents a good method to understand viral diversity.
Hou, Weiguo; Wang, Shang; Briggs, Brandon R.; Li, Gaoyuan; Xie, Wei; Dong, Hailiang
2018-01-01
Myocyanophages, a group of viruses infecting cyanobacteria, are abundant and play important roles in elemental cycling. Here we investigated the particle-associated viral communities retained on 0.2 μm filters and in sediment samples (representing ancient cyanophage communities) from four ocean and three lake locations, using high-throughput sequencing and a newly designed primer pair targeting a gene fragment (∼145-bp in length) encoding the cyanophage gp23 major capsid protein (MCP). Diverse viral communities were detected in all samples. The fragments of 142-, 145-, and 148-bp in length were most abundant in the amplicons, and most sequences (>92%) belonged to cyanophages. Additionally, different sequencing depths resulted in different diversity estimates of the viral community. Operational taxonomic units obtained from deep sequencing of the MCP gene covered the majority of those obtained from shallow sequencing, suggesting that deep sequencing exhibited a more complete picture of cyanophage community than shallow sequencing. Our results also revealed a wide geographic distribution of marine myocyanophages, i.e., higher dissimilarities of the myocyanophage communities corresponded with the larger distances between the sampling sites. Collectively, this study suggests that the newly designed primer pair can be effectively used to study the community and diversity of myocyanophage from different environments, and the high-throughput sequencing represents a good method to understand viral diversity.
Role of APOE Isoforms in the Pathogenesis of TBI induced Alzheimer’s Disease
2016-10-01
deletion, APOE targeted replacement, complex breeding, CCI model optimization, mRNA library generation, high throughput massive parallel sequencing...demonstrate that the lack of Abca1 increases amyloid plaques and decreased APOE protein levels in AD-model mice. In this proposal we will test the hypothesis...injury, inflammatory reaction, transcriptome, high throughput massive parallel sequencing, mRNA-seq., behavioral testing, memory impairment, recovery 3
History, applications, and challenges of immune repertoire research.
Liu, Xiao; Wu, Jinghua
2018-02-27
The diversity of T and B cells in terms of their receptor sequences is huge in the vertebrate's immune system and provides broad protection against the vast diversity of pathogens. Immune repertoire is defined as the sum of T cell receptors and B cell receptors (also named immunoglobulin) that makes the organism's adaptive immune system. Before the emergence of high-throughput sequencing, the studies on immune repertoire were limited by the underdeveloped methodologies, since it was impossible to capture the whole picture by the low-throughput tools. The massive paralleled sequencing technology suits perfectly the researches on immune repertoire. In this article, we review the history of immune repertoire studies, in terms of technologies and research applications. Particularly, we discuss several aspects of challenges in this field and highlight the efforts to develop potential solutions, in the era of high-throughput sequencing of the immune repertoire.
Mapping the miRNA interactome by crosslinking ligation and sequencing of hybrids (CLASH)
Helwak, Aleksandra; Tollervey, David
2014-01-01
RNA-RNA interactions play critical roles in many cellular processes but studying them is difficult and laborious. Here, we describe an experimental procedure, termed crosslinking ligation and sequencing of hybrids (CLASH), which allows high-throughput identification of sites of RNA-RNA interaction. During CLASH, a tagged bait protein is UV crosslinked in vivo to stabilise RNA interactions and purified under denaturing conditions. RNAs associated with the bait protein are partially truncated, and the ends of RNA-duplexes are ligated together. Following linker addition, cDNA library preparation and high-throughput sequencing, the ligated duplexes give rise to chimeric cDNAs, which unambiguously identify RNA-RNA interaction sites independent of bioinformatic predictions. This protocol is optimized for studying miRNA targets bound by Argonaute proteins, but should be easily adapted for other RNA-binding proteins and classes of RNA. The protocol requires around 5 days to complete, excluding the time required for high-throughput sequencing and bioinformatic analyses. PMID:24577361
Gore, Brooklin
2018-02-01
This presentation includes a brief background on High Throughput Computing, correlating gene transcription factors, optical mapping, genotype to phenotype mapping via QTL analysis, and current work on next gen sequencing.
Low-Cost, High-Throughput Sequencing of DNA Assemblies Using a Highly Multiplexed Nextera Process.
Shapland, Elaine B; Holmes, Victor; Reeves, Christopher D; Sorokin, Elena; Durot, Maxime; Platt, Darren; Allen, Christopher; Dean, Jed; Serber, Zach; Newman, Jack; Chandran, Sunil
2015-07-17
In recent years, next-generation sequencing (NGS) technology has greatly reduced the cost of sequencing whole genomes, whereas the cost of sequence verification of plasmids via Sanger sequencing has remained high. Consequently, industrial-scale strain engineers either limit the number of designs or take short cuts in quality control. Here, we show that over 4000 plasmids can be completely sequenced in one Illumina MiSeq run for less than $3 each (15× coverage), which is a 20-fold reduction over using Sanger sequencing (2× coverage). We reduced the volume of the Nextera tagmentation reaction by 100-fold and developed an automated workflow to prepare thousands of samples for sequencing. We also developed software to track the samples and associated sequence data and to rapidly identify correctly assembled constructs having the fewest defects. As DNA synthesis and assembly become a centralized commodity, this NGS quality control (QC) process will be essential to groups operating high-throughput pipelines for DNA construction.
Single-cell genome sequencing at ultra-high-throughput with microfluidic droplet barcoding.
Lan, Freeman; Demaree, Benjamin; Ahmed, Noorsher; Abate, Adam R
2017-07-01
The application of single-cell genome sequencing to large cell populations has been hindered by technical challenges in isolating single cells during genome preparation. Here we present single-cell genomic sequencing (SiC-seq), which uses droplet microfluidics to isolate, fragment, and barcode the genomes of single cells, followed by Illumina sequencing of pooled DNA. We demonstrate ultra-high-throughput sequencing of >50,000 cells per run in a synthetic community of Gram-negative and Gram-positive bacteria and fungi. The sequenced genomes can be sorted in silico based on characteristic sequences. We use this approach to analyze the distributions of antibiotic-resistance genes, virulence factors, and phage sequences in microbial communities from an environmental sample. The ability to routinely sequence large populations of single cells will enable the de-convolution of genetic heterogeneity in diverse cell populations.
High-throughput sequencing of forensic genetic samples using punches of FTA cards with buccal swabs.
Kampmann, Marie-Louise; Buchard, Anders; Børsting, Claus; Morling, Niels
2016-01-01
Here, we demonstrate that punches from buccal swab samples preserved on FTA cards can be used for high-throughput DNA sequencing, also known as massively parallel sequencing (MPS). We typed 44 reference samples with the HID-Ion AmpliSeq Identity Panel using washed 1.2 mm punches from FTA cards with buccal swabs and compared the results with those obtained with DNA extracted using the EZ1 DNA Investigator Kit. Concordant profiles were obtained for all samples. Our protocol includes simple punch, wash, and PCR steps, reducing cost and hands-on time in the laboratory. Furthermore, it facilitates automation of DNA sequencing.
Pair-barcode high-throughput sequencing for large-scale multiplexed sample analysis
2012-01-01
Background The multiplexing becomes the major limitation of the next-generation sequencing (NGS) in application to low complexity samples. Physical space segregation allows limited multiplexing, while the existing barcode approach only permits simultaneously analysis of up to several dozen samples. Results Here we introduce pair-barcode sequencing (PBS), an economic and flexible barcoding technique that permits parallel analysis of large-scale multiplexed samples. In two pilot runs using SOLiD sequencer (Applied Biosystems Inc.), 32 independent pair-barcoded miRNA libraries were simultaneously discovered by the combination of 4 unique forward barcodes and 8 unique reverse barcodes. Over 174,000,000 reads were generated and about 64% of them are assigned to both of the barcodes. After mapping all reads to pre-miRNAs in miRBase, different miRNA expression patterns are captured from the two clinical groups. The strong correlation using different barcode pairs and the high consistency of miRNA expression in two independent runs demonstrates that PBS approach is valid. Conclusions By employing PBS approach in NGS, large-scale multiplexed pooled samples could be practically analyzed in parallel so that high-throughput sequencing economically meets the requirements of samples which are low sequencing throughput demand. PMID:22276739
Pair-barcode high-throughput sequencing for large-scale multiplexed sample analysis.
Tu, Jing; Ge, Qinyu; Wang, Shengqin; Wang, Lei; Sun, Beili; Yang, Qi; Bai, Yunfei; Lu, Zuhong
2012-01-25
The multiplexing becomes the major limitation of the next-generation sequencing (NGS) in application to low complexity samples. Physical space segregation allows limited multiplexing, while the existing barcode approach only permits simultaneously analysis of up to several dozen samples. Here we introduce pair-barcode sequencing (PBS), an economic and flexible barcoding technique that permits parallel analysis of large-scale multiplexed samples. In two pilot runs using SOLiD sequencer (Applied Biosystems Inc.), 32 independent pair-barcoded miRNA libraries were simultaneously discovered by the combination of 4 unique forward barcodes and 8 unique reverse barcodes. Over 174,000,000 reads were generated and about 64% of them are assigned to both of the barcodes. After mapping all reads to pre-miRNAs in miRBase, different miRNA expression patterns are captured from the two clinical groups. The strong correlation using different barcode pairs and the high consistency of miRNA expression in two independent runs demonstrates that PBS approach is valid. By employing PBS approach in NGS, large-scale multiplexed pooled samples could be practically analyzed in parallel so that high-throughput sequencing economically meets the requirements of samples which are low sequencing throughput demand.
YAMAT-seq: an efficient method for high-throughput sequencing of mature transfer RNAs.
Shigematsu, Megumi; Honda, Shozo; Loher, Phillipe; Telonis, Aristeidis G; Rigoutsos, Isidore; Kirino, Yohei
2017-05-19
Besides translation, transfer RNAs (tRNAs) play many non-canonical roles in various biological pathways and exhibit highly variable expression profiles. To unravel the emerging complexities of tRNA biology and molecular mechanisms underlying them, an efficient tRNA sequencing method is required. However, the rigid structure of tRNA has been presenting a challenge to the development of such methods. We report the development of Y-shaped Adapter-ligated MAture TRNA sequencing (YAMAT-seq), an efficient and convenient method for high-throughput sequencing of mature tRNAs. YAMAT-seq circumvents the issue of inefficient adapter ligation, a characteristic of conventional RNA sequencing methods for mature tRNAs, by employing the efficient and specific ligation of Y-shaped adapter to mature tRNAs using T4 RNA Ligase 2. Subsequent cDNA amplification and next-generation sequencing successfully yield numerous mature tRNA sequences. YAMAT-seq has high specificity for mature tRNAs and high sensitivity to detect most isoacceptors from minute amount of total RNA. Moreover, YAMAT-seq shows quantitative capability to estimate expression levels of mature tRNAs, and has high reproducibility and broad applicability for various cell lines. YAMAT-seq thus provides high-throughput technique for identifying tRNA profiles and their regulations in various transcriptomes, which could play important regulatory roles in translation and other biological processes. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
Empirical Bayes scan statistics for detecting clusters of disease risk variants in genetic studies.
McCallum, Kenneth J; Ionita-Laza, Iuliana
2015-12-01
Recent developments of high-throughput genomic technologies offer an unprecedented detailed view of the genetic variation in various human populations, and promise to lead to significant progress in understanding the genetic basis of complex diseases. Despite this tremendous advance in data generation, it remains very challenging to analyze and interpret these data due to their sparse and high-dimensional nature. Here, we propose novel applications and new developments of empirical Bayes scan statistics to identify genomic regions significantly enriched with disease risk variants. We show that the proposed empirical Bayes methodology can be substantially more powerful than existing scan statistics methods especially so in the presence of many non-disease risk variants, and in situations when there is a mixture of risk and protective variants. Furthermore, the empirical Bayes approach has greater flexibility to accommodate covariates such as functional prediction scores and additional biomarkers. As proof-of-concept we apply the proposed methods to a whole-exome sequencing study for autism spectrum disorders and identify several promising candidate genes. © 2015, The International Biometric Society.
An ultra-sparse code underliesthe generation of neural sequences in a songbird
NASA Astrophysics Data System (ADS)
Hahnloser, Richard H. R.; Kozhevnikov, Alexay A.; Fee, Michale S.
2002-09-01
Sequences of motor activity are encoded in many vertebrate brains by complex spatio-temporal patterns of neural activity; however, the neural circuit mechanisms underlying the generation of these pre-motor patterns are poorly understood. In songbirds, one prominent site of pre-motor activity is the forebrain robust nucleus of the archistriatum (RA), which generates stereotyped sequences of spike bursts during song and recapitulates these sequences during sleep. We show that the stereotyped sequences in RA are driven from nucleus HVC (high vocal centre), the principal pre-motor input to RA. Recordings of identified HVC neurons in sleeping and singing birds show that individual HVC neurons projecting onto RA neurons produce bursts sparsely, at a single, precise time during the RA sequence. These HVC neurons burst sequentially with respect to one another. We suggest that at each time in the RA sequence, the ensemble of active RA neurons is driven by a subpopulation of RA-projecting HVC neurons that is active only at that time. As a population, these HVC neurons may form an explicit representation of time in the sequence. Such a sparse representation, a temporal analogue of the `grandmother cell' concept for object recognition, eliminates the problem of temporal interference during sequence generation and learning attributed to more distributed representations.
Dutta, Sanjib; Koide, Akiko; Koide, Shohei
2008-01-01
Stability evaluation of many mutants can lead to a better understanding of the sequence determinants of a structural motif and of factors governing protein stability and protein evolution. The traditional biophysical analysis of protein stability is low throughput, limiting our ability to widely explore the sequence space in a quantitative manner. In this study, we have developed a high-throughput library screening method for quantifying stability changes, which is based on protein fragment reconstitution and yeast surface display. Our method exploits the thermodynamic linkage between protein stability and fragment reconstitution and the ability of the yeast surface display technique to quantitatively evaluate protein-protein interactions. The method was applied to a fibronectin type III (FN3) domain. Characterization of fragment reconstitution was facilitated by the co-expression of two FN3 fragments, thus establishing a "yeast surface two-hybrid" method. Importantly, our method does not rely on competition between clones and thus eliminates a common limitation of high-throughput selection methods in which the most stable variants are predominantly recovered. Thus, it allows for the isolation of sequences that exhibits a desired level of stability. We identified over one hundred unique sequences for a β-bulge motif, which was significantly more informative than natural sequences of the FN3 family in revealing the sequence determinants for the β-bulge. Our method provides a powerful means to rapidly assess stability of many variants, to systematically assess contribution of different factors to protein stability and to enhance protein stability. PMID:18674545
Shinozuka, Hiroshi; Forster, John W
2016-01-01
Background. Multiplexed sequencing is commonly performed on massively parallel short-read sequencing platforms such as Illumina, and the efficiency of library normalisation can affect the quality of the output dataset. Although several library normalisation approaches have been established, none are ideal for highly multiplexed sequencing due to issues of cost and/or processing time. Methods. An inexpensive and high-throughput library quantification method has been developed, based on an adaptation of the melting curve assay. Sequencing libraries were subjected to the assay using the Bio-Rad Laboratories CFX Connect(TM) Real-Time PCR Detection System. The library quantity was calculated through summation of reduction of relative fluorescence units between 86 and 95 °C. Results.PCR-enriched sequencing libraries are suitable for this quantification without pre-purification of DNA. Short DNA molecules, which ideally should be eliminated from the library for subsequent processing, were differentiated from the target DNA in a mixture on the basis of differences in melting temperature. Quantification results for long sequences targeted using the melting curve assay were correlated with those from existing methods (R (2) > 0.77), and that observed from MiSeq sequencing (R (2) = 0.82). Discussion.The results of multiplexed sequencing suggested that the normalisation performance of the described method is equivalent to that of another recently reported high-throughput bead-based method, BeNUS. However, costs for the melting curve assay are considerably lower and processing times shorter than those of other existing methods, suggesting greater suitability for highly multiplexed sequencing applications.
"First generation" automated DNA sequencing technology.
Slatko, Barton E; Kieleczawa, Jan; Ju, Jingyue; Gardner, Andrew F; Hendrickson, Cynthia L; Ausubel, Frederick M
2011-10-01
Beginning in the 1980s, automation of DNA sequencing has greatly increased throughput, reduced costs, and enabled large projects to be completed more easily. The development of automation technology paralleled the development of other aspects of DNA sequencing: better enzymes and chemistry, separation and imaging technology, sequencing protocols, robotics, and computational advancements (including base-calling algorithms with quality scores, database developments, and sequence analysis programs). Despite the emergence of high-throughput sequencing platforms, automated Sanger sequencing technology remains useful for many applications. This unit provides background and a description of the "First-Generation" automated DNA sequencing technology. It also includes protocols for using the current Applied Biosystems (ABI) automated DNA sequencing machines. © 2011 by John Wiley & Sons, Inc.
False positives complicate ancient pathogen identifications using high-throughput shotgun sequencing
2014-01-01
Background Identification of historic pathogens is challenging since false positives and negatives are a serious risk. Environmental non-pathogenic contaminants are ubiquitous. Furthermore, public genetic databases contain limited information regarding these species. High-throughput sequencing may help reliably detect and identify historic pathogens. Results We shotgun-sequenced 8 16th-century Mixtec individuals from the site of Teposcolula Yucundaa (Oaxaca, Mexico) who are reported to have died from the huey cocoliztli (‘Great Pestilence’ in Nahautl), an unknown disease that decimated native Mexican populations during the Spanish colonial period, in order to identify the pathogen. Comparison of these sequences with those deriving from the surrounding soil and from 4 precontact individuals from the site found a wide variety of contaminant organisms that confounded analyses. Without the comparative sequence data from the precontact individuals and soil, false positives for Yersinia pestis and rickettsiosis could have been reported. Conclusions False positives and negatives remain problematic in ancient DNA analyses despite the application of high-throughput sequencing. Our results suggest that several studies claiming the discovery of ancient pathogens may need further verification. Additionally, true single molecule sequencing’s short read lengths, inability to sequence through DNA lesions, and limited ancient-DNA-specific technical development hinder its application to palaeopathology. PMID:24568097
Wonczak, Stephan; Thiele, Holger; Nieroda, Lech; Jabbari, Kamel; Borowski, Stefan; Sinha, Vishal; Gunia, Wilfried; Lang, Ulrich; Achter, Viktor; Nürnberg, Peter
2015-01-01
Next generation sequencing (NGS) has been a great success and is now a standard method of research in the life sciences. With this technology, dozens of whole genomes or hundreds of exomes can be sequenced in rather short time, producing huge amounts of data. Complex bioinformatics analyses are required to turn these data into scientific findings. In order to run these analyses fast, automated workflows implemented on high performance computers are state of the art. While providing sufficient compute power and storage to meet the NGS data challenge, high performance computing (HPC) systems require special care when utilized for high throughput processing. This is especially true if the HPC system is shared by different users. Here, stability, robustness and maintainability are as important for automated workflows as speed and throughput. To achieve all of these aims, dedicated solutions have to be developed. In this paper, we present the tricks and twists that we utilized in the implementation of our exome data processing workflow. It may serve as a guideline for other high throughput data analysis projects using a similar infrastructure. The code implementing our solutions is provided in the supporting information files. PMID:25942438
Evaluation of sequencing approaches for high-throughput toxicogenomics (SOT)
Whole-genome in vitro transcriptomics has shown the capability to identify mechanisms of action and estimates of potency for chemical-mediated effects in a toxicological framework, but with limited throughput and high cost. We present the evaluation of three toxicogenomics platfo...
Jordan, Scott
2018-01-24
Scott Jordan on "Advances in high-throughput speed, low-latency communication for embedded instrumentation" at the 2012 Sequencing, Finishing, Analysis in the Future Meeting held June 5-7, 2012 in Santa Fe, New Mexico.
Madanecki, Piotr; Bałut, Magdalena; Buckley, Patrick G; Ochocka, J Renata; Bartoszewski, Rafał; Crossman, David K; Messiaen, Ludwine M; Piotrowski, Arkadiusz
2018-01-01
High-throughput technologies generate considerable amount of data which often requires bioinformatic expertise to analyze. Here we present High-Throughput Tabular Data Processor (HTDP), a platform independent Java program. HTDP works on any character-delimited column data (e.g. BED, GFF, GTF, PSL, WIG, VCF) from multiple text files and supports merging, filtering and converting of data that is produced in the course of high-throughput experiments. HTDP can also utilize itemized sets of conditions from external files for complex or repetitive filtering/merging tasks. The program is intended to aid global, real-time processing of large data sets using a graphical user interface (GUI). Therefore, no prior expertise in programming, regular expression, or command line usage is required of the user. Additionally, no a priori assumptions are imposed on the internal file composition. We demonstrate the flexibility and potential of HTDP in real-life research tasks including microarray and massively parallel sequencing, i.e. identification of disease predisposing variants in the next generation sequencing data as well as comprehensive concurrent analysis of microarray and sequencing results. We also show the utility of HTDP in technical tasks including data merge, reduction and filtering with external criteria files. HTDP was developed to address functionality that is missing or rudimentary in other GUI software for processing character-delimited column data from high-throughput technologies. Flexibility, in terms of input file handling, provides long term potential functionality in high-throughput analysis pipelines, as the program is not limited by the currently existing applications and data formats. HTDP is available as the Open Source software (https://github.com/pmadanecki/htdp).
Bałut, Magdalena; Buckley, Patrick G.; Ochocka, J. Renata; Bartoszewski, Rafał; Crossman, David K.; Messiaen, Ludwine M.; Piotrowski, Arkadiusz
2018-01-01
High-throughput technologies generate considerable amount of data which often requires bioinformatic expertise to analyze. Here we present High-Throughput Tabular Data Processor (HTDP), a platform independent Java program. HTDP works on any character-delimited column data (e.g. BED, GFF, GTF, PSL, WIG, VCF) from multiple text files and supports merging, filtering and converting of data that is produced in the course of high-throughput experiments. HTDP can also utilize itemized sets of conditions from external files for complex or repetitive filtering/merging tasks. The program is intended to aid global, real-time processing of large data sets using a graphical user interface (GUI). Therefore, no prior expertise in programming, regular expression, or command line usage is required of the user. Additionally, no a priori assumptions are imposed on the internal file composition. We demonstrate the flexibility and potential of HTDP in real-life research tasks including microarray and massively parallel sequencing, i.e. identification of disease predisposing variants in the next generation sequencing data as well as comprehensive concurrent analysis of microarray and sequencing results. We also show the utility of HTDP in technical tasks including data merge, reduction and filtering with external criteria files. HTDP was developed to address functionality that is missing or rudimentary in other GUI software for processing character-delimited column data from high-throughput technologies. Flexibility, in terms of input file handling, provides long term potential functionality in high-throughput analysis pipelines, as the program is not limited by the currently existing applications and data formats. HTDP is available as the Open Source software (https://github.com/pmadanecki/htdp). PMID:29432475
Buxbaum, Joseph D; Daly, Mark J; Devlin, Bernie; Lehner, Thomas; Roeder, Kathryn; State, Matthew W
2012-12-20
Research during the past decade has seen significant progress in the understanding of the genetic architecture of autism spectrum disorders (ASDs), with gene discovery accelerating as the characterization of genomic variation has become increasingly comprehensive. At the same time, this research has highlighted ongoing challenges. Here we address the enormous impact of high-throughput sequencing (HTS) on ASD gene discovery, outline a consensus view for leveraging this technology, and describe a large multisite collaboration developed to accomplish these goals. Similar approaches could prove effective for severe neurodevelopmental disorders more broadly. Copyright © 2012 Elsevier Inc. All rights reserved.
Christiansen, Anders; Kringelum, Jens V; Hansen, Christian S; Bøgh, Katrine L; Sullivan, Eric; Patel, Jigar; Rigby, Neil M; Eiwegger, Thomas; Szépfalusi, Zsolt; de Masi, Federico; Nielsen, Morten; Lund, Ole; Dufva, Martin
2015-08-06
Phage display is a prominent screening technique with a multitude of applications including therapeutic antibody development and mapping of antigen epitopes. In this study, phages were selected based on their interaction with patient serum and exhaustively characterised by high-throughput sequencing. A bioinformatics approach was developed in order to identify peptide motifs of interest based on clustering and contrasting to control samples. Comparison of patient and control samples confirmed a major issue in phage display, namely the selection of unspecific peptides. The potential of the bioinformatic approach was demonstrated by identifying epitopes of a prominent peanut allergen, Ara h 1, in sera from patients with severe peanut allergy. The identified epitopes were confirmed by high-density peptide micro-arrays. The present study demonstrates that high-throughput sequencing can empower phage display by (i) enabling the analysis of complex biological samples, (ii) circumventing the traditional laborious picking and functional testing of individual phage clones and (iii) reducing the number of selection rounds.
ISRNA: an integrative online toolkit for short reads from high-throughput sequencing data.
Luo, Guan-Zheng; Yang, Wei; Ma, Ying-Ke; Wang, Xiu-Jie
2014-02-01
Integrative Short Reads NAvigator (ISRNA) is an online toolkit for analyzing high-throughput small RNA sequencing data. Besides the high-speed genome mapping function, ISRNA provides statistics for genomic location, length distribution and nucleotide composition bias analysis of sequence reads. Number of reads mapped to known microRNAs and other classes of short non-coding RNAs, coverage of short reads on genes, expression abundance of sequence reads as well as some other analysis functions are also supported. The versatile search functions enable users to select sequence reads according to their sub-sequences, expression abundance, genomic location, relationship to genes, etc. A specialized genome browser is integrated to visualize the genomic distribution of short reads. ISRNA also supports management and comparison among multiple datasets. ISRNA is implemented in Java/C++/Perl/MySQL and can be freely accessed at http://omicslab.genetics.ac.cn/ISRNA/.
SINA: accurate high-throughput multiple sequence alignment of ribosomal RNA genes.
Pruesse, Elmar; Peplies, Jörg; Glöckner, Frank Oliver
2012-07-15
In the analysis of homologous sequences, computation of multiple sequence alignments (MSAs) has become a bottleneck. This is especially troublesome for marker genes like the ribosomal RNA (rRNA) where already millions of sequences are publicly available and individual studies can easily produce hundreds of thousands of new sequences. Methods have been developed to cope with such numbers, but further improvements are needed to meet accuracy requirements. In this study, we present the SILVA Incremental Aligner (SINA) used to align the rRNA gene databases provided by the SILVA ribosomal RNA project. SINA uses a combination of k-mer searching and partial order alignment (POA) to maintain very high alignment accuracy while satisfying high throughput performance demands. SINA was evaluated in comparison with the commonly used high throughput MSA programs PyNAST and mothur. The three BRAliBase III benchmark MSAs could be reproduced with 99.3, 97.6 and 96.1 accuracy. A larger benchmark MSA comprising 38 772 sequences could be reproduced with 98.9 and 99.3% accuracy using reference MSAs comprising 1000 and 5000 sequences. SINA was able to achieve higher accuracy than PyNAST and mothur in all performed benchmarks. Alignment of up to 500 sequences using the latest SILVA SSU/LSU Ref datasets as reference MSA is offered at http://www.arb-silva.de/aligner. This page also links to Linux binaries, user manual and tutorial. SINA is made available under a personal use license.
Fast multiclonal clusterization of V(D)J recombinations from high-throughput sequencing.
Giraud, Mathieu; Salson, Mikaël; Duez, Marc; Villenet, Céline; Quief, Sabine; Caillault, Aurélie; Grardel, Nathalie; Roumier, Christophe; Preudhomme, Claude; Figeac, Martin
2014-05-28
V(D)J recombinations in lymphocytes are essential for immunological diversity. They are also useful markers of pathologies. In leukemia, they are used to quantify the minimal residual disease during patient follow-up. However, the full breadth of lymphocyte diversity is not fully understood. We propose new algorithms that process high-throughput sequencing (HTS) data to extract unnamed V(D)J junctions and gather them into clones for quantification. This analysis is based on a seed heuristic and is fast and scalable because in the first phase, no alignment is performed with germline database sequences. The algorithms were applied to TR γ HTS data from a patient with acute lymphoblastic leukemia, and also on data simulating hypermutations. Our methods identified the main clone, as well as additional clones that were not identified with standard protocols. The proposed algorithms provide new insight into the analysis of high-throughput sequencing data for leukemia, and also to the quantitative assessment of any immunological profile. The methods described here are implemented in a C++ open-source program called Vidjil.
REDItools: high-throughput RNA editing detection made easy.
Picardi, Ernesto; Pesole, Graziano
2013-07-15
The reliable detection of RNA editing sites from massive sequencing data remains challenging and, although several methodologies have been proposed, no computational tools have been released to date. Here, we introduce REDItools a suite of python scripts to perform high-throughput investigation of RNA editing using next-generation sequencing data. REDItools are in python programming language and freely available at http://code.google.com/p/reditools/. ernesto.picardi@uniba.it or graziano.pesole@uniba.it Supplementary data are available at Bioinformatics online.
High throughput protein production screening
Beernink, Peter T [Walnut Creek, CA; Coleman, Matthew A [Oakland, CA; Segelke, Brent W [San Ramon, CA
2009-09-08
Methods, compositions, and kits for the cell-free production and analysis of proteins are provided. The invention allows for the production of proteins from prokaryotic sequences or eukaryotic sequences, including human cDNAs using PCR and IVT methods and detecting the proteins through fluorescence or immunoblot techniques. This invention can be used to identify optimized PCR and WT conditions, codon usages and mutations. The methods are readily automated and can be used for high throughput analysis of protein expression levels, interactions, and functional states.
High-Throughput Sequencing: A Roadmap Toward Community Ecology
Poisot, Timothée; Péquin, Bérangère; Gravel, Dominique
2013-01-01
High-throughput sequencing is becoming increasingly important in microbial ecology, yet it is surprisingly under-used to generate or test biogeographic hypotheses. In this contribution, we highlight how adding these methods to the ecologist toolbox will allow the detection of new patterns, and will help our understanding of the structure and dynamics of diversity. Starting with a review of ecological questions that can be addressed, we move on to the technical and analytical issues that will benefit from an increased collaboration between different disciplines. PMID:23610649
Morgan, Martin; Anders, Simon; Lawrence, Michael; Aboyoun, Patrick; Pagès, Hervé; Gentleman, Robert
2009-01-01
Summary: ShortRead is a package for input, quality assessment, manipulation and output of high-throughput sequencing data. ShortRead is provided in the R and Bioconductor environments, allowing ready access to additional facilities for advanced statistical analysis, data transformation, visualization and integration with diverse genomic resources. Availability and Implementation: This package is implemented in R and available at the Bioconductor web site; the package contains a ‘vignette’ outlining typical work flows. Contact: mtmorgan@fhcrc.org PMID:19654119
Luo, Xin; You, Zhuhong; Zhou, Mengchu; Li, Shuai; Leung, Hareton; Xia, Yunni; Zhu, Qingsheng
2015-01-09
The comprehensive mapping of protein-protein interactions (PPIs) is highly desired for one to gain deep insights into both fundamental cell biology processes and the pathology of diseases. Finely-set small-scale experiments are not only very expensive but also inefficient to identify numerous interactomes despite their high accuracy. High-throughput screening techniques enable efficient identification of PPIs; yet the desire to further extract useful knowledge from these data leads to the problem of binary interactome mapping. Network topology-based approaches prove to be highly efficient in addressing this problem; however, their performance deteriorates significantly on sparse putative PPI networks. Motivated by the success of collaborative filtering (CF)-based approaches to the problem of personalized-recommendation on large, sparse rating matrices, this work aims at implementing a highly efficient CF-based approach to binary interactome mapping. To achieve this, we first propose a CF framework for it. Under this framework, we model the given data into an interactome weight matrix, where the feature-vectors of involved proteins are extracted. With them, we design the rescaled cosine coefficient to model the inter-neighborhood similarity among involved proteins, for taking the mapping process. Experimental results on three large, sparse datasets demonstrate that the proposed approach outperforms several sophisticated topology-based approaches significantly.
Luo, Xin; You, Zhuhong; Zhou, Mengchu; Li, Shuai; Leung, Hareton; Xia, Yunni; Zhu, Qingsheng
2015-01-01
The comprehensive mapping of protein-protein interactions (PPIs) is highly desired for one to gain deep insights into both fundamental cell biology processes and the pathology of diseases. Finely-set small-scale experiments are not only very expensive but also inefficient to identify numerous interactomes despite their high accuracy. High-throughput screening techniques enable efficient identification of PPIs; yet the desire to further extract useful knowledge from these data leads to the problem of binary interactome mapping. Network topology-based approaches prove to be highly efficient in addressing this problem; however, their performance deteriorates significantly on sparse putative PPI networks. Motivated by the success of collaborative filtering (CF)-based approaches to the problem of personalized-recommendation on large, sparse rating matrices, this work aims at implementing a highly efficient CF-based approach to binary interactome mapping. To achieve this, we first propose a CF framework for it. Under this framework, we model the given data into an interactome weight matrix, where the feature-vectors of involved proteins are extracted. With them, we design the rescaled cosine coefficient to model the inter-neighborhood similarity among involved proteins, for taking the mapping process. Experimental results on three large, sparse datasets demonstrate that the proposed approach outperforms several sophisticated topology-based approaches significantly. PMID:25572661
NASA Astrophysics Data System (ADS)
Luo, Xin; You, Zhuhong; Zhou, Mengchu; Li, Shuai; Leung, Hareton; Xia, Yunni; Zhu, Qingsheng
2015-01-01
The comprehensive mapping of protein-protein interactions (PPIs) is highly desired for one to gain deep insights into both fundamental cell biology processes and the pathology of diseases. Finely-set small-scale experiments are not only very expensive but also inefficient to identify numerous interactomes despite their high accuracy. High-throughput screening techniques enable efficient identification of PPIs; yet the desire to further extract useful knowledge from these data leads to the problem of binary interactome mapping. Network topology-based approaches prove to be highly efficient in addressing this problem; however, their performance deteriorates significantly on sparse putative PPI networks. Motivated by the success of collaborative filtering (CF)-based approaches to the problem of personalized-recommendation on large, sparse rating matrices, this work aims at implementing a highly efficient CF-based approach to binary interactome mapping. To achieve this, we first propose a CF framework for it. Under this framework, we model the given data into an interactome weight matrix, where the feature-vectors of involved proteins are extracted. With them, we design the rescaled cosine coefficient to model the inter-neighborhood similarity among involved proteins, for taking the mapping process. Experimental results on three large, sparse datasets demonstrate that the proposed approach outperforms several sophisticated topology-based approaches significantly.
Forecasting Ecological Genomics: High-Tech Animal Instrumentation Meets High-Throughput Sequencing
Shafer, Aaron B. A.; Northrup, Joseph M.; Wikelski, Martin; Wittemyer, George; Wolf, Jochen B. W.
2016-01-01
Recent advancements in animal tracking technology and high-throughput sequencing are rapidly changing the questions and scope of research in the biological sciences. The integration of genomic data with high-tech animal instrumentation comes as a natural progression of traditional work in ecological genetics, and we provide a framework for linking the separate data streams from these technologies. Such a merger will elucidate the genetic basis of adaptive behaviors like migration and hibernation and advance our understanding of fundamental ecological and evolutionary processes such as pathogen transmission, population responses to environmental change, and communication in natural populations. PMID:26745372
Multiplexed fragaria chloroplast genome sequencing
W. Njuguna; A. Liston; R. Cronn; N.V. Bassil
2010-01-01
A method to sequence multiple chloroplast genomes using ultra high throughput sequencing technologies was recently described. Complete chloroplast genome sequences can resolve phylogenetic relationships at low taxonomic levels and identify informative point mutations and indels. The objective of this research was to sequence multiple Fragaria...
Binladen, Jonas; Gilbert, M Thomas P; Bollback, Jonathan P; Panitz, Frank; Bendixen, Christian; Nielsen, Rasmus; Willerslev, Eske
2007-02-14
The invention of the Genome Sequence 20 DNA Sequencing System (454 parallel sequencing platform) has enabled the rapid and high-volume production of sequence data. Until now, however, individual emulsion PCR (emPCR) reactions and subsequent sequencing runs have been unable to combine template DNA from multiple individuals, as homologous sequences cannot be subsequently assigned to their original sources. We use conventional PCR with 5'-nucleotide tagged primers to generate homologous DNA amplification products from multiple specimens, followed by sequencing through the high-throughput Genome Sequence 20 DNA Sequencing System (GS20, Roche/454 Life Sciences). Each DNA sequence is subsequently traced back to its individual source through 5'tag-analysis. We demonstrate that this new approach enables the assignment of virtually all the generated DNA sequences to the correct source once sequencing anomalies are accounted for (miss-assignment rate<0.4%). Therefore, the method enables accurate sequencing and assignment of homologous DNA sequences from multiple sources in single high-throughput GS20 run. We observe a bias in the distribution of the differently tagged primers that is dependent on the 5' nucleotide of the tag. In particular, primers 5' labelled with a cytosine are heavily overrepresented among the final sequences, while those 5' labelled with a thymine are strongly underrepresented. A weaker bias also exists with regards to the distribution of the sequences as sorted by the second nucleotide of the dinucleotide tags. As the results are based on a single GS20 run, the general applicability of the approach requires confirmation. However, our experiments demonstrate that 5'primer tagging is a useful method in which the sequencing power of the GS20 can be applied to PCR-based assays of multiple homologous PCR products. The new approach will be of value to a broad range of research areas, such as those of comparative genomics, complete mitochondrial analyses, population genetics, and phylogenetics.
Caboche, Ségolène; Audebert, Christophe; Hot, David
2014-01-01
The recent progresses of high-throughput sequencing (HTS) technologies enable easy and cost-reduced access to whole genome sequencing (WGS) or re-sequencing. HTS associated with adapted, automatic and fast bioinformatics solutions for sequencing applications promises an accurate and timely identification and characterization of pathogenic agents. Many studies have demonstrated that data obtained from HTS analysis have allowed genome-based diagnosis, which has been consistent with phenotypic observations. These proofs of concept are probably the first steps toward the future of clinical microbiology. From concept to routine use, many parameters need to be considered to promote HTS as a powerful tool to help physicians and clinicians in microbiological investigations. This review highlights the milestones to be completed toward this purpose. PMID:25437800
[Genetic analysis of two children patients affected with CHARGE syndrome].
Li, Guoqiang; Li, Niu; Xu, Yufei; Li, Juan; Ding, Yu; Shen, Yiping; Wang, Xiumin; Wang, Jian
2018-04-10
To analyze two Chinese pediatric patients with multiple malformations and growth and development delay. Both patients were subjected to targeted gene sequencing, and the results were analyzed with Ingenuity Variant Analysis software. Suspected pathogenic variations were verified by Sanger sequencing. High-throughput sequencing showed that both patients have carried heterozygous variants of the CHD7 gene. Patient 1 carried a nonsense mutation in exon 36 (c.7957C>T, p.Arg2653*), while patient 2 carried a nonsense mutation of exon 2 (c.718C>T, p.Gln240*). Sanger sequencing confirmed the above mutations in both patients, while their parents were of wild-type for the corresponding sites, indicating that the two mutations have happened de novo. Two patients were diagnosed with CHARGE syndrome by high-throughput sequencing.
High-throughput gene mapping in Caenorhabditis elegans.
Swan, Kathryn A; Curtis, Damian E; McKusick, Kathleen B; Voinov, Alexander V; Mapa, Felipa A; Cancilla, Michael R
2002-07-01
Positional cloning of mutations in model genetic systems is a powerful method for the identification of targets of medical and agricultural importance. To facilitate the high-throughput mapping of mutations in Caenorhabditis elegans, we have identified a further 9602 putative new single nucleotide polymorphisms (SNPs) between two C. elegans strains, Bristol N2 and the Hawaiian mapping strain CB4856, by sequencing inserts from a CB4856 genomic DNA library and using an informatics pipeline to compare sequences with the canonical N2 genomic sequence. When combined with data from other laboratories, our marker set of 17,189 SNPs provides even coverage of the complete worm genome. To date, we have confirmed >1099 evenly spaced SNPs (one every 91 +/- 56 kb) across the six chromosomes and validated the utility of our SNP marker set and new fluorescence polarization-based genotyping methods for systematic and high-throughput identification of genes in C. elegans by cloning several proprietary genes. We illustrate our approach by recombination mapping and confirmation of the mutation in the cloned gene, dpy-18.
Burdick, David B; Cavnor, Chris C; Handcock, Jeremy; Killcoyne, Sarah; Lin, Jake; Marzolf, Bruz; Ramsey, Stephen A; Rovira, Hector; Bressler, Ryan; Shmulevich, Ilya; Boyle, John
2010-07-14
High throughput sequencing has become an increasingly important tool for biological research. However, the existing software systems for managing and processing these data have not provided the flexible infrastructure that research requires. Existing software solutions provide static and well-established algorithms in a restrictive package. However as high throughput sequencing is a rapidly evolving field, such static approaches lack the ability to readily adopt the latest advances and techniques which are often required by researchers. We have used a loosely coupled, service-oriented infrastructure to develop SeqAdapt. This system streamlines data management and allows for rapid integration of novel algorithms. Our approach also allows computational biologists to focus on developing and applying new methods instead of writing boilerplate infrastructure code. The system is based around the Addama service architecture and is available at our website as a demonstration web application, an installable single download and as a collection of individual customizable services.
2010-01-01
Background High throughput sequencing has become an increasingly important tool for biological research. However, the existing software systems for managing and processing these data have not provided the flexible infrastructure that research requires. Results Existing software solutions provide static and well-established algorithms in a restrictive package. However as high throughput sequencing is a rapidly evolving field, such static approaches lack the ability to readily adopt the latest advances and techniques which are often required by researchers. We have used a loosely coupled, service-oriented infrastructure to develop SeqAdapt. This system streamlines data management and allows for rapid integration of novel algorithms. Our approach also allows computational biologists to focus on developing and applying new methods instead of writing boilerplate infrastructure code. Conclusion The system is based around the Addama service architecture and is available at our website as a demonstration web application, an installable single download and as a collection of individual customizable services. PMID:20630057
Library Design-Facilitated High-Throughput Sequencing of Synthetic Peptide Libraries.
Vinogradov, Alexander A; Gates, Zachary P; Zhang, Chi; Quartararo, Anthony J; Halloran, Kathryn H; Pentelute, Bradley L
2017-11-13
A methodology to achieve high-throughput de novo sequencing of synthetic peptide mixtures is reported. The approach leverages shotgun nanoliquid chromatography coupled with tandem mass spectrometry-based de novo sequencing of library mixtures (up to 2000 peptides) as well as automated data analysis protocols to filter away incorrect assignments, noise, and synthetic side-products. For increasing the confidence in the sequencing results, mass spectrometry-friendly library designs were developed that enabled unambiguous decoding of up to 600 peptide sequences per hour while maintaining greater than 85% sequence identification rates in most cases. The reliability of the reported decoding strategy was additionally confirmed by matching fragmentation spectra for select authentic peptides identified from library sequencing samples. The methods reported here are directly applicable to screening techniques that yield mixtures of active compounds, including particle sorting of one-bead one-compound libraries and affinity enrichment of synthetic library mixtures performed in solution.
Mapping DNA polymerase errors by single-molecule sequencing
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lee, David F.; Lu, Jenny; Chang, Seungwoo
Genomic integrity is compromised by DNA polymerase replication errors, which occur in a sequence-dependent manner across the genome. Accurate and complete quantification of a DNA polymerase's error spectrum is challenging because errors are rare and difficult to detect. We report a high-throughput sequencing assay to map in vitro DNA replication errors at the single-molecule level. Unlike previous methods, our assay is able to rapidly detect a large number of polymerase errors at base resolution over any template substrate without quantification bias. To overcome the high error rate of high-throughput sequencing, our assay uses a barcoding strategy in which each replicationmore » product is tagged with a unique nucleotide sequence before amplification. Here, this allows multiple sequencing reads of the same product to be compared so that sequencing errors can be found and removed. We demonstrate the ability of our assay to characterize the average error rate, error hotspots and lesion bypass fidelity of several DNA polymerases.« less
Mapping DNA polymerase errors by single-molecule sequencing
Lee, David F.; Lu, Jenny; Chang, Seungwoo; ...
2016-05-16
Genomic integrity is compromised by DNA polymerase replication errors, which occur in a sequence-dependent manner across the genome. Accurate and complete quantification of a DNA polymerase's error spectrum is challenging because errors are rare and difficult to detect. We report a high-throughput sequencing assay to map in vitro DNA replication errors at the single-molecule level. Unlike previous methods, our assay is able to rapidly detect a large number of polymerase errors at base resolution over any template substrate without quantification bias. To overcome the high error rate of high-throughput sequencing, our assay uses a barcoding strategy in which each replicationmore » product is tagged with a unique nucleotide sequence before amplification. Here, this allows multiple sequencing reads of the same product to be compared so that sequencing errors can be found and removed. We demonstrate the ability of our assay to characterize the average error rate, error hotspots and lesion bypass fidelity of several DNA polymerases.« less
Deep sequencing in library selection projects: what insight does it bring?
Glanville, J; D'Angelo, S; Khan, T A; Reddy, S T; Naranjo, L; Ferrara, F; Bradbury, A R M
2015-08-01
High throughput sequencing is poised to change all aspects of the way antibodies and other binders are discovered and engineered. Millions of available sequence reads provide an unprecedented sampling depth able to guide the design and construction of effective, high quality naïve libraries containing tens of billions of unique molecules. Furthermore, during selections, high throughput sequencing enables quantitative tracing of enriched clones and position-specific guidance to amino acid variation under positive selection during antibody engineering. Successful application of the technologies relies on specific PCR reagent design, correct sequencing platform selection, and effective use of computational tools and statistical measures to remove error, identify antibodies, estimate diversity, and extract signatures of selection from the clone down to individual structural positions. Here we review these considerations and discuss some of the remaining challenges to the widespread adoption of the technology. Copyright © 2015 Elsevier Ltd. All rights reserved.
Deep sequencing in library selection projects: what insight does it bring?
Glanville, J; D’Angelo, S; Khan, T.A.; Reddy, S. T.; Naranjo, L.; Ferrara, F.; Bradbury, A.R.M.
2015-01-01
High throughput sequencing is poised to change all aspects of the way antibodies and other binders are discovered and engineered. Millions of available sequence reads provide an unprecedented sampling depth able to guide the design and construction of effective, high quality naïve libraries containing tens of billions of unique molecules. Furthermore, during selections, high throughput sequencing enables quantitative tracing of enriched clones and position-specific guidance to amino acid variation under positive selection during antibody engineering. Successful application of the technologies relies on specific PCR reagent design, correct sequencing platform selection, and effective use of computational tools and statistical measures to remove error, identify antibodies, estimate diversity, and extract signatures of selection from the clone down to individual structural positions. Here we review these considerations and discuss some of the remaining challenges to the widespread adoption of the technology. PMID:26451649
NASA Astrophysics Data System (ADS)
Moreland, Blythe; Oman, Kenji; Curfman, John; Yan, Pearlly; Bundschuh, Ralf
Methyl-binding domain (MBD) protein pulldown experiments have been a valuable tool in measuring the levels of methylated CpG dinucleotides. Due to the frequent use of this technique, high-throughput sequencing data sets are available that allow a detailed quantitative characterization of the underlying interaction between methylated DNA and MBD proteins. Analyzing such data sets, we first found that two such proteins cannot bind closer to each other than 2 bp, consistent with structural models of the DNA-protein interaction. Second, the large amount of sequencing data allowed us to find rather weak but nevertheless clearly statistically significant sequence preferences for several bases around the required CpG. These results demonstrate that pulldown sequencing is a high-precision tool in characterizing DNA-protein interactions. This material is based upon work supported by the National Science Foundation under Grant No. DMR-1410172.
Schönberg, Anna; Theunert, Christoph; Li, Mingkun; Stoneking, Mark; Nasidze, Ivan
2011-09-01
To investigate the demographic history of human populations from the Caucasus and surrounding regions, we used high-throughput sequencing to generate 147 complete mtDNA genome sequences from random samples of individuals from three groups from the Caucasus (Armenians, Azeri and Georgians), and one group each from Iran and Turkey. Overall diversity is very high, with 144 different sequences that fall into 97 different haplogroups found among the 147 individuals. Bayesian skyline plots (BSPs) of population size change through time show a population expansion around 40-50 kya, followed by a constant population size, and then another expansion around 15-18 kya for the groups from the Caucasus and Iran. The BSP for Turkey differs the most from the others, with an increase from 35 to 50 kya followed by a prolonged period of constant population size, and no indication of a second period of growth. An approximate Bayesian computation approach was used to estimate divergence times between each pair of populations; the oldest divergence times were between Turkey and the other four groups from the South Caucasus and Iran (~400-600 generations), while the divergence time of the three Caucasus groups from each other was comparable to their divergence time from Iran (average of ~360 generations). These results illustrate the value of random sampling of complete mtDNA genome sequences that can be obtained with high-throughput sequencing platforms.
He, Ji; Dai, Xinbin; Zhao, Xuechun
2007-02-09
BLAST searches are widely used for sequence alignment. The search results are commonly adopted for various functional and comparative genomics tasks such as annotating unknown sequences, investigating gene models and comparing two sequence sets. Advances in sequencing technologies pose challenges for high-throughput analysis of large-scale sequence data. A number of programs and hardware solutions exist for efficient BLAST searching, but there is a lack of generic software solutions for mining and personalized management of the results. Systematically reviewing the results and identifying information of interest remains tedious and time-consuming. Personal BLAST Navigator (PLAN) is a versatile web platform that helps users to carry out various personalized pre- and post-BLAST tasks, including: (1) query and target sequence database management, (2) automated high-throughput BLAST searching, (3) indexing and searching of results, (4) filtering results online, (5) managing results of personal interest in favorite categories, (6) automated sequence annotation (such as NCBI NR and ontology-based annotation). PLAN integrates, by default, the Decypher hardware-based BLAST solution provided by Active Motif Inc. with a greatly improved efficiency over conventional BLAST software. BLAST results are visualized by spreadsheets and graphs and are full-text searchable. BLAST results and sequence annotations can be exported, in part or in full, in various formats including Microsoft Excel and FASTA. Sequences and BLAST results are organized in projects, the data publication levels of which are controlled by the registered project owners. In addition, all analytical functions are provided to public users without registration. PLAN has proved a valuable addition to the community for automated high-throughput BLAST searches, and, more importantly, for knowledge discovery, management and sharing based on sequence alignment results. The PLAN web interface is platform-independent, easily configurable and capable of comprehensive expansion, and user-intuitive. PLAN is freely available to academic users at http://bioinfo.noble.org/plan/. The source code for local deployment is provided under free license. Full support on system utilization, installation, configuration and customization are provided to academic users.
He, Ji; Dai, Xinbin; Zhao, Xuechun
2007-01-01
Background BLAST searches are widely used for sequence alignment. The search results are commonly adopted for various functional and comparative genomics tasks such as annotating unknown sequences, investigating gene models and comparing two sequence sets. Advances in sequencing technologies pose challenges for high-throughput analysis of large-scale sequence data. A number of programs and hardware solutions exist for efficient BLAST searching, but there is a lack of generic software solutions for mining and personalized management of the results. Systematically reviewing the results and identifying information of interest remains tedious and time-consuming. Results Personal BLAST Navigator (PLAN) is a versatile web platform that helps users to carry out various personalized pre- and post-BLAST tasks, including: (1) query and target sequence database management, (2) automated high-throughput BLAST searching, (3) indexing and searching of results, (4) filtering results online, (5) managing results of personal interest in favorite categories, (6) automated sequence annotation (such as NCBI NR and ontology-based annotation). PLAN integrates, by default, the Decypher hardware-based BLAST solution provided by Active Motif Inc. with a greatly improved efficiency over conventional BLAST software. BLAST results are visualized by spreadsheets and graphs and are full-text searchable. BLAST results and sequence annotations can be exported, in part or in full, in various formats including Microsoft Excel and FASTA. Sequences and BLAST results are organized in projects, the data publication levels of which are controlled by the registered project owners. In addition, all analytical functions are provided to public users without registration. Conclusion PLAN has proved a valuable addition to the community for automated high-throughput BLAST searches, and, more importantly, for knowledge discovery, management and sharing based on sequence alignment results. The PLAN web interface is platform-independent, easily configurable and capable of comprehensive expansion, and user-intuitive. PLAN is freely available to academic users at . The source code for local deployment is provided under free license. Full support on system utilization, installation, configuration and customization are provided to academic users. PMID:17291345
High-Throughput Next-Generation Sequencing of Polioviruses
Montmayeur, Anna M.; Schmidt, Alexander; Zhao, Kun; Magaña, Laura; Iber, Jane; Castro, Christina J.; Chen, Qi; Henderson, Elizabeth; Ramos, Edward; Shaw, Jing; Tatusov, Roman L.; Dybdahl-Sissoko, Naomi; Endegue-Zanga, Marie Claire; Adeniji, Johnson A.; Oberste, M. Steven; Burns, Cara C.
2016-01-01
ABSTRACT The poliovirus (PV) is currently targeted for worldwide eradication and containment. Sanger-based sequencing of the viral protein 1 (VP1) capsid region is currently the standard method for PV surveillance. However, the whole-genome sequence is sometimes needed for higher resolution global surveillance. In this study, we optimized whole-genome sequencing protocols for poliovirus isolates and FTA cards using next-generation sequencing (NGS), aiming for high sequence coverage, efficiency, and throughput. We found that DNase treatment of poliovirus RNA followed by random reverse transcription (RT), amplification, and the use of the Nextera XT DNA library preparation kit produced significantly better results than other preparations. The average viral reads per total reads, a measurement of efficiency, was as high as 84.2% ± 15.6%. PV genomes covering >99 to 100% of the reference length were obtained and validated with Sanger sequencing. A total of 52 PV genomes were generated, multiplexing as many as 64 samples in a single Illumina MiSeq run. This high-throughput, sequence-independent NGS approach facilitated the detection of a diverse range of PVs, especially for those in vaccine-derived polioviruses (VDPV), circulating VDPV, or immunodeficiency-related VDPV. In contrast to results from previous studies on other viruses, our results showed that filtration and nuclease treatment did not discernibly increase the sequencing efficiency of PV isolates. However, DNase treatment after nucleic acid extraction to remove host DNA significantly improved the sequencing results. This NGS method has been successfully implemented to generate PV genomes for molecular epidemiology of the most recent PV isolates. Additionally, the ability to obtain full PV genomes from FTA cards will aid in facilitating global poliovirus surveillance. PMID:27927929
Whole-Genome Sequencing and Assembly with High-Throughput, Short-Read Technologies
Sundquist, Andreas; Ronaghi, Mostafa; Tang, Haixu; Pevzner, Pavel; Batzoglou, Serafim
2007-01-01
While recently developed short-read sequencing technologies may dramatically reduce the sequencing cost and eventually achieve the $1000 goal for re-sequencing, their limitations prevent the de novo sequencing of eukaryotic genomes with the standard shotgun sequencing protocol. We present SHRAP (SHort Read Assembly Protocol), a sequencing protocol and assembly methodology that utilizes high-throughput short-read technologies. We describe a variation on hierarchical sequencing with two crucial differences: (1) we select a clone library from the genome randomly rather than as a tiling path and (2) we sample clones from the genome at high coverage and reads from the clones at low coverage. We assume that 200 bp read lengths with a 1% error rate and inexpensive random fragment cloning on whole mammalian genomes is feasible. Our assembly methodology is based on first ordering the clones and subsequently performing read assembly in three stages: (1) local assemblies of regions significantly smaller than a clone size, (2) clone-sized assemblies of the results of stage 1, and (3) chromosome-sized assemblies. By aggressively localizing the assembly problem during the first stage, our method succeeds in assembling short, unpaired reads sampled from repetitive genomes. We tested our assembler using simulated reads from D. melanogaster and human chromosomes 1, 11, and 21, and produced assemblies with large sets of contiguous sequence and a misassembly rate comparable to other draft assemblies. Tested on D. melanogaster and the entire human genome, our clone-ordering method produces accurate maps, thereby localizing fragment assembly and enabling the parallelization of the subsequent steps of our pipeline. Thus, we have demonstrated that truly inexpensive de novo sequencing of mammalian genomes will soon be possible with high-throughput, short-read technologies using our methodology. PMID:17534434
A new arenavirus in a cluster of fatal transplant-associated diseases.
Palacios, Gustavo; Druce, Julian; Du, Lei; Tran, Thomas; Birch, Chris; Briese, Thomas; Conlan, Sean; Quan, Phenix-Lan; Hui, Jeffrey; Marshall, John; Simons, Jan Fredrik; Egholm, Michael; Paddock, Christopher D; Shieh, Wun-Ju; Goldsmith, Cynthia S; Zaki, Sherif R; Catton, Mike; Lipkin, W Ian
2008-03-06
Three patients who received visceral-organ transplants from a single donor on the same day died of a febrile illness 4 to 6 weeks after transplantation. Culture, polymerase-chain-reaction (PCR) and serologic assays, and oligonucleotide microarray analysis for a wide range of infectious agents were not informative. We evaluated RNA obtained from the liver and kidney transplant recipients. Unbiased high-throughput sequencing was used to identify microbial sequences not found by means of other methods. The specificity of sequences for a new candidate pathogen was confirmed by means of culture and by means of PCR, immunohistochemical, and serologic analyses. High-throughput sequencing yielded 103,632 sequences, of which 14 represented an Old World arenavirus. Additional sequence analysis showed that this new arenavirus was related to lymphocytic choriomeningitis viruses. Specific PCR assays based on a unique sequence confirmed the presence of the virus in the kidneys, liver, blood, and cerebrospinal fluid of the recipients. Immunohistochemical analysis revealed arenavirus antigen in the liver and kidney transplants in the recipients. IgM and IgG antiviral antibodies were detected in the serum of the donor. Seroconversion was evident in serum specimens obtained from one recipient at two time points. Unbiased high-throughput sequencing is a powerful tool for the discovery of pathogens. The use of this method during an outbreak of disease facilitated the identification of a new arenavirus transmitted through solid-organ transplantation. Copyright 2008 Massachusetts Medical Society.
NASA Astrophysics Data System (ADS)
Zhang, Xiao-Yong; Wang, Guang-Hua; Xu, Xin-Ya; Nong, Xu-Hua; Wang, Jie; Amin, Muhammad; Qi, Shu-Hua
2016-10-01
The present study investigated the fungal diversity in four different deep-sea sediments from Okinawa Trough using high-throughput Illumina sequencing of the nuclear ribosomal internal transcribed spacer-1 (ITS1). A total of 40,297 fungal ITS1 sequences clustered into 420 operational taxonomic units (OTUs) with 97% sequence similarity and 170 taxa were recovered from these sediments. Most ITS1 sequences (78%) belonged to the phylum Ascomycota, followed by Basidiomycota (17.3%), Zygomycota (1.5%) and Chytridiomycota (0.8%), and a small proportion (2.4%) belonged to unassigned fungal phyla. Compared with previous studies on fungal diversity of sediments from deep-sea environments by culture-dependent approach and clone library analysis, the present result suggested that Illumina sequencing had been dramatically accelerating the discovery of fungal community of deep-sea sediments. Furthermore, our results revealed that Sordariomycetes was the most diverse and abundant fungal class in this study, challenging the traditional view that the diversity of Sordariomycetes phylotypes was low in the deep-sea environments. In addition, more than 12 taxa accounted for 21.5% sequences were found to be rarely reported as deep-sea fungi, suggesting the deep-sea sediments from Okinawa Trough harbored a plethora of different fungal communities compared with other deep-sea environments. To our knowledge, this study is the first exploration of the fungal diversity in deep-sea sediments from Okinawa Trough using high-throughput Illumina sequencing.
Nagasaki, Hideki; Mochizuki, Takako; Kodama, Yuichi; Saruhashi, Satoshi; Morizaki, Shota; Sugawara, Hideaki; Ohyanagi, Hajime; Kurata, Nori; Okubo, Kousaku; Takagi, Toshihisa; Kaminuma, Eli; Nakamura, Yasukazu
2013-08-01
High-performance next-generation sequencing (NGS) technologies are advancing genomics and molecular biological research. However, the immense amount of sequence data requires computational skills and suitable hardware resources that are a challenge to molecular biologists. The DNA Data Bank of Japan (DDBJ) of the National Institute of Genetics (NIG) has initiated a cloud computing-based analytical pipeline, the DDBJ Read Annotation Pipeline (DDBJ Pipeline), for a high-throughput annotation of NGS reads. The DDBJ Pipeline offers a user-friendly graphical web interface and processes massive NGS datasets using decentralized processing by NIG supercomputers currently free of charge. The proposed pipeline consists of two analysis components: basic analysis for reference genome mapping and de novo assembly and subsequent high-level analysis of structural and functional annotations. Users may smoothly switch between the two components in the pipeline, facilitating web-based operations on a supercomputer for high-throughput data analysis. Moreover, public NGS reads of the DDBJ Sequence Read Archive located on the same supercomputer can be imported into the pipeline through the input of only an accession number. This proposed pipeline will facilitate research by utilizing unified analytical workflows applied to the NGS data. The DDBJ Pipeline is accessible at http://p.ddbj.nig.ac.jp/.
Nagasaki, Hideki; Mochizuki, Takako; Kodama, Yuichi; Saruhashi, Satoshi; Morizaki, Shota; Sugawara, Hideaki; Ohyanagi, Hajime; Kurata, Nori; Okubo, Kousaku; Takagi, Toshihisa; Kaminuma, Eli; Nakamura, Yasukazu
2013-01-01
High-performance next-generation sequencing (NGS) technologies are advancing genomics and molecular biological research. However, the immense amount of sequence data requires computational skills and suitable hardware resources that are a challenge to molecular biologists. The DNA Data Bank of Japan (DDBJ) of the National Institute of Genetics (NIG) has initiated a cloud computing-based analytical pipeline, the DDBJ Read Annotation Pipeline (DDBJ Pipeline), for a high-throughput annotation of NGS reads. The DDBJ Pipeline offers a user-friendly graphical web interface and processes massive NGS datasets using decentralized processing by NIG supercomputers currently free of charge. The proposed pipeline consists of two analysis components: basic analysis for reference genome mapping and de novo assembly and subsequent high-level analysis of structural and functional annotations. Users may smoothly switch between the two components in the pipeline, facilitating web-based operations on a supercomputer for high-throughput data analysis. Moreover, public NGS reads of the DDBJ Sequence Read Archive located on the same supercomputer can be imported into the pipeline through the input of only an accession number. This proposed pipeline will facilitate research by utilizing unified analytical workflows applied to the NGS data. The DDBJ Pipeline is accessible at http://p.ddbj.nig.ac.jp/. PMID:23657089
Johnson, Matthew G.; Gardner, Elliot M.; Liu, Yang; Medina, Rafael; Goffinet, Bernard; Shaw, A. Jonathan; Zerega, Nyree J. C.; Wickett, Norman J.
2016-01-01
Premise of the study: Using sequence data generated via target enrichment for phylogenetics requires reassembly of high-throughput sequence reads into loci, presenting a number of bioinformatics challenges. We developed HybPiper as a user-friendly platform for assembly of gene regions, extraction of exon and intron sequences, and identification of paralogous gene copies. We test HybPiper using baits designed to target 333 phylogenetic markers and 125 genes of functional significance in Artocarpus (Moraceae). Methods and Results: HybPiper implements parallel execution of sequence assembly in three phases: read mapping, contig assembly, and target sequence extraction. The pipeline was able to recover nearly complete gene sequences for all genes in 22 species of Artocarpus. HybPiper also recovered more than 500 bp of nontargeted intron sequence in over half of the phylogenetic markers and identified paralogous gene copies in Artocarpus. Conclusions: HybPiper was designed for Linux and Mac OS X and is freely available at https://github.com/mossmatters/HybPiper. PMID:27437175
High-throughput sequencing reveals unprecedented diversities of Aspergillus species in outdoor air.
Lee, S; An, C; Xu, S; Lee, S; Yamamoto, N
2016-09-01
This study used the Illumina MiSeq to analyse compositions and diversities of Aspergillus species in outdoor air. The seasonal air samplings were performed at two locations in Seoul, South Korea. The results showed the relative abundances of all Aspergillus species combined ranging from 0·20 to 18% and from 0·19 to 21% based on the number of the internal transcribed spacer 1 (ITS1) and β-tubulin (BenA) gene sequences respectively. Aspergillus fumigatus was the most dominant species with the mean relative abundances of 1·2 and 5·5% based on the number of the ITS1 and BenA sequences respectively. A total of 29 Aspergillus species were detected and identified down to the species rank, among which nine species were known opportunistic pathogens. Remarkably, eight of the nine pathogenic species were detected by either one of the two markers, suggesting the need of using multiple markers and/or primer pairs when the assessments are made based on the high-throughput sequencing. Due to diversity of species within the genus Aspergillus, the high-throughput sequencing was useful to characterize their compositions and diversities in outdoor air, which are thought to be difficult to be accurately characterized by conventional culture and/or Sanger sequencing-based techniques. Aspergillus is a diverse genus of fungi with more than 300 species reported in literature. Aspergillus is important since some species are known allergens and opportunistic human pathogens. Traditionally, growth-dependent methods have been used to detect Aspergillus species in air. However, these methods are limited in the number of isolates that can be analysed for their identities, resulting in inaccurate characterizations of Aspergillus diversities. This study used the high-throughput sequencing to explore Aspergillus diversities in outdoor, which are thought to be difficult to be accurately characterized by traditional growth-dependent techniques. © 2016 The Society for Applied Microbiology.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Athavale, Ajay
Ajay Athavale (Monsanto) presents "High Throughput Plasmid Sequencing with Illumina and CLC Bio" at the 7th Annual Sequencing, Finishing, Analysis in the Future (SFAF) Meeting held in June, 2012 in Santa Fe, NM.
Logares, Ramiro; Haverkamp, Thomas H A; Kumar, Surendra; Lanzén, Anders; Nederbragt, Alexander J; Quince, Christopher; Kauserud, Håvard
2012-10-01
The incursion of High-Throughput Sequencing (HTS) in environmental microbiology brings unique opportunities and challenges. HTS now allows a high-resolution exploration of the vast taxonomic and metabolic diversity present in the microbial world, which can provide an exceptional insight on global ecosystem functioning, ecological processes and evolution. This exploration has also economic potential, as we will have access to the evolutionary innovation present in microbial metabolisms, which could be used for biotechnological development. HTS is also challenging the research community, and the current bottleneck is present in the data analysis side. At the moment, researchers are in a sequence data deluge, with sequencing throughput advancing faster than the computer power needed for data analysis. However, new tools and approaches are being developed constantly and the whole process could be depicted as a fast co-evolution between sequencing technology, informatics and microbiologists. In this work, we examine the most popular and recently commercialized HTS platforms as well as bioinformatics methods for data handling and analysis used in microbial metagenomics. This non-exhaustive review is intended to serve as a broad state-of-the-art guide to researchers expanding into this rapidly evolving field. Copyright © 2012 Elsevier B.V. All rights reserved.
Lu, Xin; Zhang, Xu-Xiang; Wang, Zhu; Huang, Kailong; Wang, Yuan; Liang, Weigang; Tan, Yunfei; Liu, Bo; Tang, Junying
2015-01-01
This study used 454 pyrosequencing, Illumina high-throughput sequencing and metagenomic analysis to investigate bacterial pathogens and their potential virulence in a sewage treatment plant (STP) applying both conventional and advanced treatment processes. Pyrosequencing and Illumina sequencing consistently demonstrated that Arcobacter genus occupied over 43.42% of total abundance of potential pathogens in the STP. At species level, potential pathogens Arcobacter butzleri, Aeromonas hydrophila and Klebsiella pneumonia dominated in raw sewage, which was also confirmed by quantitative real time PCR. Illumina sequencing also revealed prevalence of various types of pathogenicity islands and virulence proteins in the STP. Most of the potential pathogens and virulence factors were eliminated in the STP, and the removal efficiency mainly depended on oxidation ditch. Compared with sand filtration, magnetic resin seemed to have higher removals in most of the potential pathogens and virulence factors. However, presence of the residual A. butzleri in the final effluent still deserves more concerns. The findings indicate that sewage acts as an important source of environmental pathogens, but STPs can effectively control their spread in the environment. Joint use of the high-throughput sequencing technologies is considered a reliable method for deep and comprehensive overview of environmental bacterial virulence. PMID:25938416
Bokulich, Nicholas A.
2013-01-01
Ultra-high-throughput sequencing (HTS) of fungal communities has been restricted by short read lengths and primer amplification bias, slowing the adoption of newer sequencing technologies to fungal community profiling. To address these issues, we evaluated the performance of several common internal transcribed spacer (ITS) primers and designed a novel primer set and work flow for simultaneous quantification and species-level interrogation of fungal consortia. Primer comparison and validation were predicted in silico and by sequencing a “mock community” of mixed yeast species to explore the challenges of amplicon length and amplification bias for reconstructing defined yeast community structures. The amplicon size and distribution of this primer set are smaller than for all preexisting ITS primer sets, maximizing sequencing coverage of hypervariable ITS domains by very-short-amplicon, high-throughput sequencing platforms. This feature also enables the optional integration of quantitative PCR (qPCR) directly into the HTS preparatory work flow by substituting qPCR with these primers for standard PCR, yielding quantification of individual community members. The complete work flow described here, utilizing any of the qualified primer sets evaluated, can rapidly profile mixed fungal communities and capably reconstructed well-characterized beer and wine fermentation fungal communities. PMID:23377949
Hu, Jiazhi; Meyers, Robin M; Dong, Junchao; Panchakshari, Rohit A; Alt, Frederick W; Frock, Richard L
2016-05-01
Unbiased, high-throughput assays for detecting and quantifying DNA double-stranded breaks (DSBs) across the genome in mammalian cells will facilitate basic studies of the mechanisms that generate and repair endogenous DSBs. They will also enable more applied studies, such as those to evaluate the on- and off-target activities of engineered nucleases. Here we describe a linear amplification-mediated high-throughput genome-wide sequencing (LAM-HTGTS) method for the detection of genome-wide 'prey' DSBs via their translocation in cultured mammalian cells to a fixed 'bait' DSB. Bait-prey junctions are cloned directly from isolated genomic DNA using LAM-PCR and unidirectionally ligated to bridge adapters; subsequent PCR steps amplify the single-stranded DNA junction library in preparation for Illumina Miseq paired-end sequencing. A custom bioinformatics pipeline identifies prey sequences that contribute to junctions and maps them across the genome. LAM-HTGTS differs from related approaches because it detects a wide range of broken end structures with nucleotide-level resolution. Familiarity with nucleic acid methods and next-generation sequencing analysis is necessary for library generation and data interpretation. LAM-HTGTS assays are sensitive, reproducible, relatively inexpensive, scalable and straightforward to implement with a turnaround time of <1 week.
High-Throughput Mapping of Single-Neuron Projections by Sequencing of Barcoded RNA.
Kebschull, Justus M; Garcia da Silva, Pedro; Reid, Ashlan P; Peikon, Ian D; Albeanu, Dinu F; Zador, Anthony M
2016-09-07
Neurons transmit information to distant brain regions via long-range axonal projections. In the mouse, area-to-area connections have only been systematically mapped using bulk labeling techniques, which obscure the diverse projections of intermingled single neurons. Here we describe MAPseq (Multiplexed Analysis of Projections by Sequencing), a technique that can map the projections of thousands or even millions of single neurons by labeling large sets of neurons with random RNA sequences ("barcodes"). Axons are filled with barcode mRNA, each putative projection area is dissected, and the barcode mRNA is extracted and sequenced. Applying MAPseq to the locus coeruleus (LC), we find that individual LC neurons have preferred cortical targets. By recasting neuroanatomy, which is traditionally viewed as a problem of microscopy, as a problem of sequencing, MAPseq harnesses advances in sequencing technology to permit high-throughput interrogation of brain circuits. Copyright © 2016 Elsevier Inc. All rights reserved.
Holst-Jensen, Arne; Spilsberg, Bjørn; Arulandhu, Alfred J; Kok, Esther; Shi, Jianxin; Zel, Jana
2016-07-01
The emergence of high-throughput, massive or next-generation sequencing technologies has created a completely new foundation for molecular analyses. Various selective enrichment processes are commonly applied to facilitate detection of predefined (known) targets. Such approaches, however, inevitably introduce a bias and are prone to miss unknown targets. Here we review the application of high-throughput sequencing technologies and the preparation of fit-for-purpose whole genome shotgun sequencing libraries for the detection and characterization of genetically modified and derived products. The potential impact of these new sequencing technologies for the characterization, breeding selection, risk assessment, and traceability of genetically modified organisms and genetically modified products is yet to be fully acknowledged. The published literature is reviewed, and the prospects for future developments and use of the new sequencing technologies for these purposes are discussed.
USDA-ARS?s Scientific Manuscript database
The rapid advancement in high-throughput SNP genotyping technologies along with next generation sequencing (NGS) platforms has decreased the cost, improved the quality of large-scale genome surveys, and allowed specialty crops with limited genomic resources such as carrot (Daucus carota) to access t...
High-throughput tetrad analysis.
Ludlow, Catherine L; Scott, Adrian C; Cromie, Gareth A; Jeffery, Eric W; Sirr, Amy; May, Patrick; Lin, Jake; Gilbert, Teresa L; Hays, Michelle; Dudley, Aimée M
2013-07-01
Tetrad analysis has been a gold-standard genetic technique for several decades. Unfortunately, the need to manually isolate, disrupt and space tetrads has relegated its application to small-scale studies and limited its integration with high-throughput DNA sequencing technologies. We have developed a rapid, high-throughput method, called barcode-enabled sequencing of tetrads (BEST), that uses (i) a meiosis-specific GFP fusion protein to isolate tetrads by FACS and (ii) molecular barcodes that are read during genotyping to identify spores derived from the same tetrad. Maintaining tetrad information allows accurate inference of missing genetic markers and full genotypes of missing (and presumably nonviable) individuals. An individual researcher was able to isolate over 3,000 yeast tetrads in 3 h, an output equivalent to that of almost 1 month of manual dissection. BEST is transferable to other microorganisms for which meiotic mapping is significantly more laborious.
A Robust Framework for Microbial Archaeology
Warinner, Christina; Herbig, Alexander; Mann, Allison; Yates, James A. Fellows; Weiβ, Clemens L.; Burbano, Hernán A.; Orlando, Ludovic; Krause, Johannes
2017-01-01
Microbial archaeology is flourishing in the era of high-throughput sequencing, revealing the agents behind devastating historical plagues, identifying the cryptic movements of pathogens in prehistory, and reconstructing the ancestral microbiota of humans. Here, we introduce the fundamental concepts and theoretical framework of the discipline, then discuss applied methodologies for pathogen identification and microbiome characterization from archaeological samples. We give special attention to the process of identifying, validating, and authenticating ancient microbes using high-throughput DNA sequencing data. Finally, we outline standards and precautions to guide future research in the field. PMID:28460196
Zhao, Meng-Meng; Du, Shan-Shan; Li, Qiu-Hong; Chen, Tao; Qiu, Hui; Wu, Qin; Chen, Shan-Shan; Zhou, Ying; Zhang, Yuan; Hu, Yang; Su, Yi-Liang; Shen, Li; Zhang, Fen; Weng, Dong; Li, Hui-Ping
2017-02-01
This study aims to use high throughput 16SrRNA gene sequencing to examine the bacterial profile of lymph node biopsy samples of patients with sarcoidosis and to further verify the association between Propionibacterium acnes (P. acnes) and sarcoidosis. A total of 36 mediastinal lymph node biopsy specimens were collected from 17 cases of sarcoidosis, 8 tuberculosis (TB group), and 11 non-infectious lung diseases (control group). The V4 region of the bacterial 16SrRNA gene in the specimens was amplified and sequenced using the high throughput sequencing platform MiSeq, and bacterial profile was established. The data analysis software QIIME and Metastats were used to compare bacterial relative abundance in the three patient groups. Overall, 545 genera were identified; 38 showed significantly lower and 29 had significantly higher relative abundance in the sarcoidosis group than in the TB and control groups (P < 0.01). P. acnes 16SrRNA was exclusively found in all the 17 samples of the sarcoidosis group, whereas was not detected in the TB and control groups. The relative abundance of P. acnes in the sarcoidosis group (0.16% ± 0. 11%) was significantly higher than that in the TB (Metastats analysis: P = 0.0010, q = 0.0044) and control groups (Metastats analysis: P = 0.0010, q = 0.0038). The relative abundance of P. granulosum was only 0.0022% ± 0. 0044% in the sarcoidosis group. P. granulosum 16SrRNA was not detected in the other two groups. High throughput 16SrRNA gene sequencing appears to be a useful tool to investigate the bacterial profile of sarcoidosis specimens. The results suggest that P. acnes may be involved in sarcoidosis development.
High-throughput sequencing in veterinary infection biology and diagnostics.
Belák, S; Karlsson, O E; Leijon, M; Granberg, F
2013-12-01
Sequencing methods have improved rapidly since the first versions of the Sanger techniques, facilitating the development of very powerful tools for detecting and identifying various pathogens, such as viruses, bacteria and other microbes. The ongoing development of high-throughput sequencing (HTS; also known as next-generation sequencing) technologies has resulted in a dramatic reduction in DNA sequencing costs, making the technology more accessible to the average laboratory. In this White Paper of the World Organisation for Animal Health (OIE) Collaborating Centre for the Biotechnology-based Diagnosis of Infectious Diseases in Veterinary Medicine (Uppsala, Sweden), several approaches and examples of HTS are summarised, and their diagnostic applicability is briefly discussed. Selected future aspects of HTS are outlined, including the need for bioinformatic resources, with a focus on improving the diagnosis and control of infectious diseases in veterinary medicine.
DnaSAM: Software to perform neutrality testing for large datasets with complex null models.
Eckert, Andrew J; Liechty, John D; Tearse, Brandon R; Pande, Barnaly; Neale, David B
2010-05-01
Patterns of DNA sequence polymorphisms can be used to understand the processes of demography and adaptation within natural populations. High-throughput generation of DNA sequence data has historically been the bottleneck with respect to data processing and experimental inference. Advances in marker technologies have largely solved this problem. Currently, the limiting step is computational, with most molecular population genetic software allowing a gene-by-gene analysis through a graphical user interface. An easy-to-use analysis program that allows both high-throughput processing of multiple sequence alignments along with the flexibility to simulate data under complex demographic scenarios is currently lacking. We introduce a new program, named DnaSAM, which allows high-throughput estimation of DNA sequence diversity and neutrality statistics from experimental data along with the ability to test those statistics via Monte Carlo coalescent simulations. These simulations are conducted using the ms program, which is able to incorporate several genetic parameters (e.g. recombination) and demographic scenarios (e.g. population bottlenecks). The output is a set of diversity and neutrality statistics with associated probability values under a user-specified null model that are stored in easy to manipulate text file. © 2009 Blackwell Publishing Ltd.
Huang, Kailong; Zhang, Xu-Xiang; Shi, Peng; Wu, Bing; Ren, Hongqiang
2014-11-01
In order to comprehensively investigate bacterial virulence in drinking water, 454 pyrosequencing and Illumina high-throughput sequencing were used to detect potential pathogenic bacteria and virulence factors (VFs) in a full-scale drinking water treatment and distribution system. 16S rRNA gene pyrosequencing revealed high bacterial diversity in the drinking water (441-586 operational taxonomic units). Bacterial diversity decreased after chlorine disinfection, but increased after pipeline distribution. α-Proteobacteria was the most dominant taxonomic class. Alignment against the established pathogen database showed that several types of putative pathogens were present in the drinking water and Pseudomonas aeruginosa had the highest abundance (over 11‰ of total sequencing reads). Many pathogens disappeared after chlorine disinfection, but P. aeruginosa and Leptospira interrogans were still detected in the tap water. High-throughput sequencing revealed prevalence of various pathogenicity islands and virulence proteins in the drinking water, and translocases, transposons, Clp proteases and flagellar motor switch proteins were the predominant VFs. Both diversity and abundance of the detectable VFs increased after the chlorination, and decreased after the pipeline distribution. This study indicates that joint use of 454 pyrosequencing and Illumina sequencing can comprehensively characterize environmental pathogenesis, and several types of putative pathogens and various VFs are prevalent in drinking water. Copyright © 2014 Elsevier Inc. All rights reserved.
Assembly and diploid architecture of an individual human genome via single-molecule technologies
Pendleton, Matthew; Sebra, Robert; Pang, Andy Wing Chun; Ummat, Ajay; Franzen, Oscar; Rausch, Tobias; Stütz, Adrian M; Stedman, William; Anantharaman, Thomas; Hastie, Alex; Dai, Heng; Fritz, Markus Hsi-Yang; Cao, Han; Cohain, Ariella; Deikus, Gintaras; Durrett, Russell E; Blanchard, Scott C; Altman, Roger; Chin, Chen-Shan; Guo, Yan; Paxinos, Ellen E; Korbel, Jan O; Darnell, Robert B; McCombie, W Richard; Kwok, Pui-Yan; Mason, Christopher E; Schadt, Eric E; Bashir, Ali
2015-01-01
We present the first comprehensive analysis of a diploid human genome that combines single-molecule sequencing with single-molecule genome maps. Our hybrid assembly markedly improves upon the contiguity observed from traditional shotgun sequencing approaches, with scaffold N50 values approaching 30 Mb, and we identified complex structural variants (SVs) missed by other high-throughput approaches. Furthermore, by combining Illumina short-read data with long reads, we phased both single-nucleotide variants and SVs, generating haplotypes with over 99% consistency with previous trio-based studies. Our work shows that it is now possible to integrate single-molecule and high-throughput sequence data to generate de novo assembled genomes that approach reference quality. PMID:26121404
Assembly and diploid architecture of an individual human genome via single-molecule technologies.
Pendleton, Matthew; Sebra, Robert; Pang, Andy Wing Chun; Ummat, Ajay; Franzen, Oscar; Rausch, Tobias; Stütz, Adrian M; Stedman, William; Anantharaman, Thomas; Hastie, Alex; Dai, Heng; Fritz, Markus Hsi-Yang; Cao, Han; Cohain, Ariella; Deikus, Gintaras; Durrett, Russell E; Blanchard, Scott C; Altman, Roger; Chin, Chen-Shan; Guo, Yan; Paxinos, Ellen E; Korbel, Jan O; Darnell, Robert B; McCombie, W Richard; Kwok, Pui-Yan; Mason, Christopher E; Schadt, Eric E; Bashir, Ali
2015-08-01
We present the first comprehensive analysis of a diploid human genome that combines single-molecule sequencing with single-molecule genome maps. Our hybrid assembly markedly improves upon the contiguity observed from traditional shotgun sequencing approaches, with scaffold N50 values approaching 30 Mb, and we identified complex structural variants (SVs) missed by other high-throughput approaches. Furthermore, by combining Illumina short-read data with long reads, we phased both single-nucleotide variants and SVs, generating haplotypes with over 99% consistency with previous trio-based studies. Our work shows that it is now possible to integrate single-molecule and high-throughput sequence data to generate de novo assembled genomes that approach reference quality.
David, Fabrice P A; Delafontaine, Julien; Carat, Solenne; Ross, Frederick J; Lefebvre, Gregory; Jarosz, Yohan; Sinclair, Lucas; Noordermeer, Daan; Rougemont, Jacques; Leleu, Marion
2014-01-01
The HTSstation analysis portal is a suite of simple web forms coupled to modular analysis pipelines for various applications of High-Throughput Sequencing including ChIP-seq, RNA-seq, 4C-seq and re-sequencing. HTSstation offers biologists the possibility to rapidly investigate their HTS data using an intuitive web application with heuristically pre-defined parameters. A number of open-source software components have been implemented and can be used to build, configure and run HTS analysis pipelines reactively. Besides, our programming framework empowers developers with the possibility to design their own workflows and integrate additional third-party software. The HTSstation web application is accessible at http://htsstation.epfl.ch.
HTSstation: A Web Application and Open-Access Libraries for High-Throughput Sequencing Data Analysis
David, Fabrice P. A.; Delafontaine, Julien; Carat, Solenne; Ross, Frederick J.; Lefebvre, Gregory; Jarosz, Yohan; Sinclair, Lucas; Noordermeer, Daan; Rougemont, Jacques; Leleu, Marion
2014-01-01
The HTSstation analysis portal is a suite of simple web forms coupled to modular analysis pipelines for various applications of High-Throughput Sequencing including ChIP-seq, RNA-seq, 4C-seq and re-sequencing. HTSstation offers biologists the possibility to rapidly investigate their HTS data using an intuitive web application with heuristically pre-defined parameters. A number of open-source software components have been implemented and can be used to build, configure and run HTS analysis pipelines reactively. Besides, our programming framework empowers developers with the possibility to design their own workflows and integrate additional third-party software. The HTSstation web application is accessible at http://htsstation.epfl.ch. PMID:24475057
Vinícius de Melo, Gilberto
2018-01-01
Summary Coffee bean fermentation is a spontaneous, on-farm process involving the action of different microbial groups, including bacteria and fungi. In this study, high-throughput sequencing approach was employed to study the diversity and dynamics of bacteria associated with Brazilian coffee bean fermentation. The total DNA from fermenting coffee samples was extracted at different time points, and the 16S rRNA gene with segments around the V4 variable region was sequenced by Illumina high-throughput platform. Using this approach, the presence of over eighty bacterial genera was determined, many of which have been detected for the first time during coffee bean fermentation, including Fructobacillus, Pseudonocardia, Pedobacter, Sphingomonas and Hymenobacter. The presence of Fructobacillus suggests an influence of these bacteria on fructose metabolism during coffee fermentation. Temporal analysis showed a strong dominance of lactic acid bacteria with over 97% of read sequences at the end of fermentation, mainly represented by the Leuconostoc and Lactococcus. Metabolism of lactic acid bacteria was associated with the high formation of lactic acid during fermentation, as determined by HPLC analysis. The results reported in this study confirm the underestimation of bacterial diversity associated with coffee fermentation. New microbial groups reported in this study may be explored as functional starter cultures for on-farm coffee processing.
Devailly, Guillaume; Mantsoki, Anna; Joshi, Anagha
2016-11-01
Better protocols and decreasing costs have made high-throughput sequencing experiments now accessible even to small experimental laboratories. However, comparing one or few experiments generated by an individual lab to the vast amount of relevant data freely available in the public domain might be limited due to lack of bioinformatics expertise. Though several tools, including genome browsers, allow such comparison at a single gene level, they do not provide a genome-wide view. We developed Heat*seq, a web-tool that allows genome scale comparison of high throughput experiments chromatin immuno-precipitation followed by sequencing, RNA-sequencing and Cap Analysis of Gene Expression) provided by a user, to the data in the public domain. Heat*seq currently contains over 12 000 experiments across diverse tissues and cell types in human, mouse and drosophila. Heat*seq displays interactive correlation heatmaps, with an ability to dynamically subset datasets to contextualize user experiments. High quality figures and tables are produced and can be downloaded in multiple formats. Web application: http://www.heatstarseq.roslin.ed.ac.uk/ Source code: https://github.com/gdevailly CONTACT: Guillaume.Devailly@roslin.ed.ac.uk or Anagha.Joshi@roslin.ed.ac.ukSupplementary information: Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.
Pooled assembly of marine metagenomic datasets: enriching annotation through chimerism.
Magasin, Jonathan D; Gerloff, Dietlind L
2015-02-01
Despite advances in high-throughput sequencing, marine metagenomic samples remain largely opaque. A typical sample contains billions of microbial organisms from thousands of genomes and quadrillions of DNA base pairs. Its derived metagenomic dataset underrepresents this complexity by orders of magnitude because of the sparseness and shortness of sequencing reads. Read shortness and sequencing errors pose a major challenge to accurate species and functional annotation. This includes distinguishing known from novel species. Often the majority of reads cannot be annotated and thus cannot help our interpretation of the sample. Here, we demonstrate quantitatively how careful assembly of marine metagenomic reads within, but also across, datasets can alleviate this problem. For 10 simulated datasets, each with species complexity modeled on a real counterpart, chimerism remained within the same species for most contigs (97%). For 42 real pyrosequencing ('454') datasets, assembly increased the proportion of annotated reads, and even more so when datasets were pooled, by on average 1.6% (max 6.6%) for species, 9.0% (max 28.7%) for Pfam protein domains and 9.4% (max 22.9%) for PANTHER gene families. Our results outline exciting prospects for data sharing in the metagenomics community. While chimeric sequences should be avoided in other areas of metagenomics (e.g. biodiversity analyses), conservative pooled assembly is advantageous for annotation specificity and sensitivity. Intriguingly, our experiment also found potential prospects for (low-cost) discovery of new species in 'old' data. dgerloff@ffame.org Supplementary data are available at Bioinformatics online. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
USDA-ARS?s Scientific Manuscript database
High-throughput next-generation sequencing was used to scan the genome and generate reliable sequence of high copy number regions. Using this method, we examined whole plastid genomes as well as nearly 6000 bases of nuclear ribosomal DNA sequences for nine genotypes of Theobroma cacao and an indivi...
Wright, Imogen A.; Travers, Simon A.
2014-01-01
The challenge presented by high-throughput sequencing necessitates the development of novel tools for accurate alignment of reads to reference sequences. Current approaches focus on using heuristics to map reads quickly to large genomes, rather than generating highly accurate alignments in coding regions. Such approaches are, thus, unsuited for applications such as amplicon-based analysis and the realignment phase of exome sequencing and RNA-seq, where accurate and biologically relevant alignment of coding regions is critical. To facilitate such analyses, we have developed a novel tool, RAMICS, that is tailored to mapping large numbers of sequence reads to short lengths (<10 000 bp) of coding DNA. RAMICS utilizes profile hidden Markov models to discover the open reading frame of each sequence and aligns to the reference sequence in a biologically relevant manner, distinguishing between genuine codon-sized indels and frameshift mutations. This approach facilitates the generation of highly accurate alignments, accounting for the error biases of the sequencing machine used to generate reads, particularly at homopolymer regions. Performance improvements are gained through the use of graphics processing units, which increase the speed of mapping through parallelization. RAMICS substantially outperforms all other mapping approaches tested in terms of alignment quality while maintaining highly competitive speed performance. PMID:24861618
Extensive Local Gene Duplication and Functional Divergence among Paralogs in Atlantic Salmon
Warren, Ian A.; Ciborowski, Kate L.; Casadei, Elisa; Hazlerigg, David G.; Martin, Sam; Jordan, William C.; Sumner, Seirian
2014-01-01
Many organisms can generate alternative phenotypes from the same genome, enabling individuals to exploit diverse and variable environments. A prevailing hypothesis is that such adaptation has been favored by gene duplication events, which generate redundant genomic material that may evolve divergent functions. Vertebrate examples of recent whole-genome duplications are sparse although one example is the salmonids, which have undergone a whole-genome duplication event within the last 100 Myr. The life-cycle of the Atlantic salmon, Salmo salar, depends on the ability to produce alternating phenotypes from the same genome, to facilitate migration and maintain its anadromous life history. Here, we investigate the hypothesis that genome-wide and local gene duplication events have contributed to the salmonid adaptation. We used high-throughput sequencing to characterize the transcriptomes of three key organs involved in regulating migration in S. salar: Brain, pituitary, and olfactory epithelium. We identified over 10,000 undescribed S. salar sequences and designed an analytic workflow to distinguish between paralogs originating from local gene duplication events or from whole-genome duplication events. These data reveal that substantial local gene duplications took place shortly after the whole-genome duplication event. Many of the identified paralog pairs have either diverged in function or become noncoding. Future functional genomics studies will reveal to what extent this rich source of divergence in genetic sequence is likely to have facilitated the evolution of extreme phenotypic plasticity required for an anadromous life-cycle. PMID:24951567
USDA-ARS?s Scientific Manuscript database
We assessed the genetic diversity and population structure among 148 cultivated lettuce (Lactuca sativa L.) accessions using the high-throughput GoldenGate assay and 384 EST (Expressed Sequence Tag)-derived SNP (single nucleotide polymorphism) markers. A custom OPA (Oligo Pool All), LSGermOPA was fo...
Lê Cao, Kim-Anh; Boitard, Simon; Besse, Philippe
2011-06-22
Variable selection on high throughput biological data, such as gene expression or single nucleotide polymorphisms (SNPs), becomes inevitable to select relevant information and, therefore, to better characterize diseases or assess genetic structure. There are different ways to perform variable selection in large data sets. Statistical tests are commonly used to identify differentially expressed features for explanatory purposes, whereas Machine Learning wrapper approaches can be used for predictive purposes. In the case of multiple highly correlated variables, another option is to use multivariate exploratory approaches to give more insight into cell biology, biological pathways or complex traits. A simple extension of a sparse PLS exploratory approach is proposed to perform variable selection in a multiclass classification framework. sPLS-DA has a classification performance similar to other wrapper or sparse discriminant analysis approaches on public microarray and SNP data sets. More importantly, sPLS-DA is clearly competitive in terms of computational efficiency and superior in terms of interpretability of the results via valuable graphical outputs. sPLS-DA is available in the R package mixOmics, which is dedicated to the analysis of large biological data sets.
Dissecting enzyme function with microfluidic-based deep mutational scanning.
Romero, Philip A; Tran, Tuan M; Abate, Adam R
2015-06-09
Natural enzymes are incredibly proficient catalysts, but engineering them to have new or improved functions is challenging due to the complexity of how an enzyme's sequence relates to its biochemical properties. Here, we present an ultrahigh-throughput method for mapping enzyme sequence-function relationships that combines droplet microfluidic screening with next-generation DNA sequencing. We apply our method to map the activity of millions of glycosidase sequence variants. Microfluidic-based deep mutational scanning provides a comprehensive and unbiased view of the enzyme function landscape. The mapping displays expected patterns of mutational tolerance and a strong correspondence to sequence variation within the enzyme family, but also reveals previously unreported sites that are crucial for glycosidase function. We modified the screening protocol to include a high-temperature incubation step, and the resulting thermotolerance landscape allowed the discovery of mutations that enhance enzyme thermostability. Droplet microfluidics provides a general platform for enzyme screening that, when combined with DNA-sequencing technologies, enables high-throughput mapping of enzyme sequence space.
Yamamoto, Toshio; Nagasaki, Hideki; Yonemaru, Jun-ichi; Ebana, Kaworu; Nakajima, Maiko; Shibaya, Taeko; Yano, Masahiro
2010-04-27
To create useful gene combinations in crop breeding, it is necessary to clarify the dynamics of the genome composition created by breeding practices. A large quantity of single-nucleotide polymorphism (SNP) data is required to permit discrimination of chromosome segments among modern cultivars, which are genetically related. Here, we used a high-throughput sequencer to conduct whole-genome sequencing of an elite Japanese rice cultivar, Koshihikari, which is closely related to Nipponbare, whose genome sequencing has been completed. Then we designed a high-throughput typing array based on the SNP information by comparison of the two sequences. Finally, we applied this array to analyze historical representative rice cultivars to understand the dynamics of their genome composition. The total 5.89-Gb sequence for Koshihikari, equivalent to 15.7 x the entire rice genome, was mapped using the Pseudomolecules 4.0 database for Nipponbare. The resultant Koshihikari genome sequence corresponded to 80.1% of the Nipponbare sequence and led to the identification of 67,051 SNPs. A high-throughput typing array consisting of 1917 SNP sites distributed throughout the genome was designed to genotype 151 representative Japanese cultivars that have been grown during the past 150 years. We could identify the ancestral origin of the pedigree haplotypes in 60.9% of the Koshihikari genome and 18 consensus haplotype blocks which are inherited from traditional landraces to current improved varieties. Moreover, it was predicted that modern breeding practices have generally decreased genetic diversity Detection of genome-wide SNPs by both high-throughput sequencer and typing array made it possible to evaluate genomic composition of genetically related rice varieties. With the aid of their pedigree information, we clarified the dynamics of chromosome recombination during the historical rice breeding process. We also found several genomic regions decreasing genetic diversity which might be caused by a recent human selection in rice breeding. The definition of pedigree haplotypes by means of genome-wide SNPs will facilitate next-generation breeding of rice and other crops.
SUGAR: graphical user interface-based data refiner for high-throughput DNA sequencing.
Sato, Yukuto; Kojima, Kaname; Nariai, Naoki; Yamaguchi-Kabata, Yumi; Kawai, Yosuke; Takahashi, Mamoru; Mimori, Takahiro; Nagasaki, Masao
2014-08-08
Next-generation sequencers (NGSs) have become one of the main tools for current biology. To obtain useful insights from the NGS data, it is essential to control low-quality portions of the data affected by technical errors such as air bubbles in sequencing fluidics. We develop a software SUGAR (subtile-based GUI-assisted refiner) which can handle ultra-high-throughput data with user-friendly graphical user interface (GUI) and interactive analysis capability. The SUGAR generates high-resolution quality heatmaps of the flowcell, enabling users to find possible signals of technical errors during the sequencing. The sequencing data generated from the error-affected regions of a flowcell can be selectively removed by automated analysis or GUI-assisted operations implemented in the SUGAR. The automated data-cleaning function based on sequence read quality (Phred) scores was applied to a public whole human genome sequencing data and we proved the overall mapping quality was improved. The detailed data evaluation and cleaning enabled by SUGAR would reduce technical problems in sequence read mapping, improving subsequent variant analysis that require high-quality sequence data and mapping results. Therefore, the software will be especially useful to control the quality of variant calls to the low population cells, e.g., cancers, in a sample with technical errors of sequencing procedures.
TriageTools: tools for partitioning and prioritizing analysis of high-throughput sequencing data.
Fimereli, Danai; Detours, Vincent; Konopka, Tomasz
2013-04-01
High-throughput sequencing is becoming a popular research tool but carries with it considerable costs in terms of computation time, data storage and bandwidth. Meanwhile, some research applications focusing on individual genes or pathways do not necessitate processing of a full sequencing dataset. Thus, it is desirable to partition a large dataset into smaller, manageable, but relevant pieces. We present a toolkit for partitioning raw sequencing data that includes a method for extracting reads that are likely to map onto pre-defined regions of interest. We show the method can be used to extract information about genes of interest from DNA or RNA sequencing samples in a fraction of the time and disk space required to process and store a full dataset. We report speedup factors between 2.6 and 96, depending on settings and samples used. The software is available at http://www.sourceforge.net/projects/triagetools/.
Measuring Sister Chromatid Cohesion Protein Genome Occupancy in Drosophila melanogaster by ChIP-seq.
Dorsett, Dale; Misulovin, Ziva
2017-01-01
This chapter presents methods to conduct and analyze genome-wide chromatin immunoprecipitation of the cohesin complex and the Nipped-B cohesin loading factor in Drosophila cells using high-throughput DNA sequencing (ChIP-seq). Procedures for isolation of chromatin, immunoprecipitation, and construction of sequencing libraries for the Ion Torrent Proton high throughput sequencer are detailed, and computational methods to calculate occupancy as input-normalized fold-enrichment are described. The results obtained by ChIP-seq are compared to those obtained by ChIP-chip (genomic ChIP using tiling microarrays), and the effects of sequencing depth on the accuracy are analyzed. ChIP-seq provides similar sensitivity and reproducibility as ChIP-chip, and identifies the same broad regions of occupancy. The locations of enrichment peaks, however, can differ between ChIP-chip and ChIP-seq, and low sequencing depth can splinter broad regions of occupancy into distinct peaks.
New Tools For Understanding Microbial Diversity Using High-throughput Sequence Data
NASA Astrophysics Data System (ADS)
Knight, R.; Hamady, M.; Liu, Z.; Lozupone, C.
2007-12-01
High-throughput sequencing techniques such as 454 are straining the limits of tools traditionally used to build trees, choose OTUs, and perform other essential sequencing tasks. We have developed a workflow for phylogenetic analysis of large-scale sequence data sets that combines existing tools, such as the Arb phylogeny package and the NAST multiple sequence alignment tool, with new methods for choosing and clustering OTUs and for performing phylogenetic community analysis with UniFrac. This talk discusses the cyberinfrastructure we are developing to support the human microbiome project, and the application of these workflows to analyze very large data sets that contrast the gut microbiota with a range of physical environments. These tools will ultimately help to define core and peripheral microbiomes in a range of environments, and will allow us to understand the physical and biotic factors that contribute most to differences in microbial diversity.
A Formal Messaging Notation for Alaskan Aviation Data
NASA Technical Reports Server (NTRS)
Rios, Joseph L.
2015-01-01
Data exchange is an increasingly important aspect of the National Airspace System. While many data communication channels have become more capable of sending and receiving data at higher throughput rates, there is still a need to use communication channels efficiently with limited throughput. The limitation can be based on technological issues, financial considerations, or both. This paper provides a complete description of several important aviation weather data in Abstract Syntax Notation format. By doing so, data providers can take advantage of Abstract Syntax Notation's ability to encode data in a highly compressed format. When data such as pilot weather reports, surface weather observations, and various weather predictions are compressed in such a manner, it allows for the efficient use of throughput-limited communication channels. This paper provides details on the Abstract Syntax Notation One (ASN.1) implementation for Alaskan aviation data, and demonstrates its use on real-world aviation weather data samples as Alaska has sparse terrestrial data infrastructure and data are often sent via relatively costly satellite channels.
USDA-ARS?s Scientific Manuscript database
Next-generation sequencing technologies are able to produce high-throughput short sequence reads in a cost-effective fashion. The emergence of these technologies has not only facilitated genome sequencing but also changed the landscape of life sciences. Here I survey their major applications ranging...
Recent Applications of DNA Sequencing Technologies in Food, Nutrition and Agriculture
USDA-ARS?s Scientific Manuscript database
Next-generation DNA sequencing technologies are able to produce millions of short sequence reads in a high-throughput, cost-effective fashion. The emergence of these technologies has not only facilitated genome sequencing but also changed the landscape of life sciences. This review surveys their rec...
Next generation sequencers: methods and applications in food-borne pathogens
USDA-ARS?s Scientific Manuscript database
Next generation sequencers are able to produce millions of short sequence reads in a high-throughput, low-cost way. The emergence of these technologies has not only facilitated genome sequencing but also started to change the landscape of life sciences. This chapter will survey their methods and app...
Zhang, Fantao; Luo, Xiangdong; Zhou, Yi; Xie, Jiankun
2016-04-01
To identify drought stress-responsive conserved microRNA (miRNA) from Dongxiang wild rice (Oryza rufipogon Griff., DXWR) on a genome-wide scale, high-throughput sequencing technology was used to sequence libraries of DXWR samples, treated with and without drought stress. 505 conserved miRNAs corresponding to 215 families were identified. 17 were significantly down-regulated and 16 were up-regulated under drought stress. Stem-loop qRT-PCR revealed the same expression patterns as high-throughput sequencing, suggesting the accuracy of the sequencing result was high. Potential target genes of the drought-responsive miRNA were predicted to be involved in diverse biological processes. Furthermore, 16 miRNA families were first identified to be involved in drought stress response from plants. These results present a comprehensive view of the conserved miRNA and their expression patterns under drought stress for DXWR, which will provide valuable information and sequence resources for future basis studies.
Dai, Hongying; Wu, Guodong; Wu, Michael; Zhi, Degui
2016-01-01
Next-generation sequencing data pose a severe curse of dimensionality, complicating traditional "single marker-single trait" analysis. We propose a two-stage combined p-value method for pathway analysis. The first stage is at the gene level, where we integrate effects within a gene using the Sequence Kernel Association Test (SKAT). The second stage is at the pathway level, where we perform a correlated Lancaster procedure to detect joint effects from multiple genes within a pathway. We show that the Lancaster procedure is optimal in Bahadur efficiency among all combined p-value methods. The Bahadur efficiency,[Formula: see text], compares sample sizes among different statistical tests when signals become sparse in sequencing data, i.e. ε →0. The optimal Bahadur efficiency ensures that the Lancaster procedure asymptotically requires a minimal sample size to detect sparse signals ([Formula: see text]). The Lancaster procedure can also be applied to meta-analysis. Extensive empirical assessments of exome sequencing data show that the proposed method outperforms Gene Set Enrichment Analysis (GSEA). We applied the competitive Lancaster procedure to meta-analysis data generated by the Global Lipids Genetics Consortium to identify pathways significantly associated with high-density lipoprotein cholesterol, low-density lipoprotein cholesterol, triglycerides, and total cholesterol.
Pietiainen, Vilja; Saarela, Jani; von Schantz, Carina; Turunen, Laura; Ostling, Paivi; Wennerberg, Krister
2014-05-01
The High Throughput Biomedicine (HTB) unit at the Institute for Molecular Medicine Finland FIMM was established in 2010 to serve as a national and international academic screening unit providing access to state of the art instrumentation for chemical and RNAi-based high throughput screening. The initial focus of the unit was multiwell plate based chemical screening and high content microarray-based siRNA screening. However, over the first four years of operation, the unit has moved to a more flexible service platform where both chemical and siRNA screening is performed at different scales primarily in multiwell plate-based assays with a wide range of readout possibilities with a focus on ultraminiaturization to allow for affordable screening for the academic users. In addition to high throughput screening, the equipment of the unit is also used to support miniaturized, multiplexed and high throughput applications for other types of research such as genomics, sequencing and biobanking operations. Importantly, with the translational research goals at FIMM, an increasing part of the operations at the HTB unit is being focused on high throughput systems biological platforms for functional profiling of patient cells in personalized and precision medicine projects.
Massouras, Andreas; Decouttere, Frederik; Hens, Korneel; Deplancke, Bart
2010-07-01
High-throughput sequencing (HTS) is revolutionizing our ability to obtain cheap, fast and reliable sequence information. Many experimental approaches are expected to benefit from the incorporation of such sequencing features in their pipeline. Consequently, software tools that facilitate such an incorporation should be of great interest. In this context, we developed WebPrInSeS, a web server tool allowing automated full-length clone sequence identification and verification using HTS data. WebPrInSeS encompasses two separate software applications. The first is WebPrInSeS-C which performs automated sequence verification of user-defined open-reading frame (ORF) clone libraries. The second is WebPrInSeS-E, which identifies positive hits in cDNA or ORF-based library screening experiments such as yeast one- or two-hybrid assays. Both tools perform de novo assembly using HTS data from any of the three major sequencing platforms. Thus, WebPrInSeS provides a highly integrated, cost-effective and efficient way to sequence-verify or identify clones of interest. WebPrInSeS is available at http://webprinses.epfl.ch/ and is open to all users.
Massouras, Andreas; Decouttere, Frederik; Hens, Korneel; Deplancke, Bart
2010-01-01
High-throughput sequencing (HTS) is revolutionizing our ability to obtain cheap, fast and reliable sequence information. Many experimental approaches are expected to benefit from the incorporation of such sequencing features in their pipeline. Consequently, software tools that facilitate such an incorporation should be of great interest. In this context, we developed WebPrInSeS, a web server tool allowing automated full-length clone sequence identification and verification using HTS data. WebPrInSeS encompasses two separate software applications. The first is WebPrInSeS-C which performs automated sequence verification of user-defined open-reading frame (ORF) clone libraries. The second is WebPrInSeS-E, which identifies positive hits in cDNA or ORF-based library screening experiments such as yeast one- or two-hybrid assays. Both tools perform de novo assembly using HTS data from any of the three major sequencing platforms. Thus, WebPrInSeS provides a highly integrated, cost-effective and efficient way to sequence-verify or identify clones of interest. WebPrInSeS is available at http://webprinses.epfl.ch/ and is open to all users. PMID:20501601
USDA-ARS?s Scientific Manuscript database
Rapid development of highly saturated genetic maps aids molecular breeding, which can accelerate gain per breeding cycle in woody perennial plants such as Rubus idaeus (red raspberry). Recently, robust genotyping methods based on high-throughput sequencing were developed, which provide high marker d...
A technological update of molecular diagnostics for infectious diseases
Liu, Yu-Tsueng
2008-01-01
Identification of a causative pathogen is essential for the choice of treatment for most infectious diseases. Many FDA approved molecular assays; usually more sensitive and specific compared to traditional tests, have been developed in the last decade. A new trend of high throughput and multiplexing assays are emerging thanks to technological developments for the human genome sequencing project. The applications of microarray and ultra high throughput sequencing technologies for diagnostic microbiology are reviewed. The race for the $1000 genome technology by 2014 will have a profound impact in diagnosis and treatment of infectious diseases in the near future. PMID:18782035
GENETIC-BASED ANALYTICAL METHODS FOR BACTERIA AND FUNGI
In the past two decades, advances in high-throughput sequencing technologies have lead to a veritable explosion in the generation of nucleic acid sequence information (1). While these advances are illustrated most prominently by the successful sequencing of the human genome, they...
Chwialkowska, Karolina; Korotko, Urszula; Kosinska, Joanna; Szarejko, Iwona; Kwasniewski, Miroslaw
2017-01-01
Epigenetic mechanisms, including histone modifications and DNA methylation, mutually regulate chromatin structure, maintain genome integrity, and affect gene expression and transposon mobility. Variations in DNA methylation within plant populations, as well as methylation in response to internal and external factors, are of increasing interest, especially in the crop research field. Methylation Sensitive Amplification Polymorphism (MSAP) is one of the most commonly used methods for assessing DNA methylation changes in plants. This method involves gel-based visualization of PCR fragments from selectively amplified DNA that are cleaved using methylation-sensitive restriction enzymes. In this study, we developed and validated a new method based on the conventional MSAP approach called Methylation Sensitive Amplification Polymorphism Sequencing (MSAP-Seq). We improved the MSAP-based approach by replacing the conventional separation of amplicons on polyacrylamide gels with direct, high-throughput sequencing using Next Generation Sequencing (NGS) and automated data analysis. MSAP-Seq allows for global sequence-based identification of changes in DNA methylation. This technique was validated in Hordeum vulgare . However, MSAP-Seq can be straightforwardly implemented in different plant species, including crops with large, complex and highly repetitive genomes. The incorporation of high-throughput sequencing into MSAP-Seq enables parallel and direct analysis of DNA methylation in hundreds of thousands of sites across the genome. MSAP-Seq provides direct genomic localization of changes and enables quantitative evaluation. We have shown that the MSAP-Seq method specifically targets gene-containing regions and that a single analysis can cover three-quarters of all genes in large genomes. Moreover, MSAP-Seq's simplicity, cost effectiveness, and high-multiplexing capability make this method highly affordable. Therefore, MSAP-Seq can be used for DNA methylation analysis in crop plants with large and complex genomes.
Chwialkowska, Karolina; Korotko, Urszula; Kosinska, Joanna; Szarejko, Iwona; Kwasniewski, Miroslaw
2017-01-01
Epigenetic mechanisms, including histone modifications and DNA methylation, mutually regulate chromatin structure, maintain genome integrity, and affect gene expression and transposon mobility. Variations in DNA methylation within plant populations, as well as methylation in response to internal and external factors, are of increasing interest, especially in the crop research field. Methylation Sensitive Amplification Polymorphism (MSAP) is one of the most commonly used methods for assessing DNA methylation changes in plants. This method involves gel-based visualization of PCR fragments from selectively amplified DNA that are cleaved using methylation-sensitive restriction enzymes. In this study, we developed and validated a new method based on the conventional MSAP approach called Methylation Sensitive Amplification Polymorphism Sequencing (MSAP-Seq). We improved the MSAP-based approach by replacing the conventional separation of amplicons on polyacrylamide gels with direct, high-throughput sequencing using Next Generation Sequencing (NGS) and automated data analysis. MSAP-Seq allows for global sequence-based identification of changes in DNA methylation. This technique was validated in Hordeum vulgare. However, MSAP-Seq can be straightforwardly implemented in different plant species, including crops with large, complex and highly repetitive genomes. The incorporation of high-throughput sequencing into MSAP-Seq enables parallel and direct analysis of DNA methylation in hundreds of thousands of sites across the genome. MSAP-Seq provides direct genomic localization of changes and enables quantitative evaluation. We have shown that the MSAP-Seq method specifically targets gene-containing regions and that a single analysis can cover three-quarters of all genes in large genomes. Moreover, MSAP-Seq's simplicity, cost effectiveness, and high-multiplexing capability make this method highly affordable. Therefore, MSAP-Seq can be used for DNA methylation analysis in crop plants with large and complex genomes. PMID:29250096
NASA Astrophysics Data System (ADS)
Cannon, M. V.; Hester, J.; Shalkhauser, A.; Chan, E. R.; Logue, K.; Small, S. T.; Serre, D.
2016-03-01
Analysis of environmental DNA (eDNA) enables the detection of species of interest from water and soil samples, typically using species-specific PCR. Here, we describe a method to characterize the biodiversity of a given environment by amplifying eDNA using primer pairs targeting a wide range of taxa and high-throughput sequencing for species identification. We tested this approach on 91 water samples of 40 mL collected along the Cuyahoga River (Ohio, USA). We amplified eDNA using 12 primer pairs targeting mammals, fish, amphibians, birds, bryophytes, arthropods, copepods, plants and several microorganism taxa and sequenced all PCR products simultaneously by high-throughput sequencing. Overall, we identified DNA sequences from 15 species of fish, 17 species of mammals, 8 species of birds, 15 species of arthropods, one turtle and one salamander. Interestingly, in addition to aquatic and semi-aquatic animals, we identified DNA from terrestrial species that live near the Cuyahoga River. We also identified DNA from one Asian carp species invasive to the Great Lakes but that had not been previously reported in the Cuyahoga River. Our study shows that analysis of eDNA extracted from small water samples using wide-range PCR amplification combined with high-throughput sequencing can provide a broad perspective on biological diversity.
Cannon, M. V.; Hester, J.; Shalkhauser, A.; Chan, E. R.; Logue, K.; Small, S. T.; Serre, D.
2016-01-01
Analysis of environmental DNA (eDNA) enables the detection of species of interest from water and soil samples, typically using species-specific PCR. Here, we describe a method to characterize the biodiversity of a given environment by amplifying eDNA using primer pairs targeting a wide range of taxa and high-throughput sequencing for species identification. We tested this approach on 91 water samples of 40 mL collected along the Cuyahoga River (Ohio, USA). We amplified eDNA using 12 primer pairs targeting mammals, fish, amphibians, birds, bryophytes, arthropods, copepods, plants and several microorganism taxa and sequenced all PCR products simultaneously by high-throughput sequencing. Overall, we identified DNA sequences from 15 species of fish, 17 species of mammals, 8 species of birds, 15 species of arthropods, one turtle and one salamander. Interestingly, in addition to aquatic and semi-aquatic animals, we identified DNA from terrestrial species that live near the Cuyahoga River. We also identified DNA from one Asian carp species invasive to the Great Lakes but that had not been previously reported in the Cuyahoga River. Our study shows that analysis of eDNA extracted from small water samples using wide-range PCR amplification combined with high-throughput sequencing can provide a broad perspective on biological diversity. PMID:26965911
Calvo, Sarah E; Tucker, Elena J; Compton, Alison G; Kirby, Denise M; Crawford, Gabriel; Burtt, Noel P; Rivas, Manuel A; Guiducci, Candace; Bruno, Damien L; Goldberger, Olga A; Redman, Michelle C; Wiltshire, Esko; Wilson, Callum J; Altshuler, David; Gabriel, Stacey B; Daly, Mark J; Thorburn, David R; Mootha, Vamsi K
2010-01-01
Discovering the molecular basis of mitochondrial respiratory chain disease is challenging given the large number of both mitochondrial and nuclear genes involved. We report a strategy of focused candidate gene prediction, high-throughput sequencing, and experimental validation to uncover the molecular basis of mitochondrial complex I (CI) disorders. We created five pools of DNA from a cohort of 103 patients and then performed deep sequencing of 103 candidate genes to spotlight 151 rare variants predicted to impact protein function. We used confirmatory experiments to establish genetic diagnoses in 22% of previously unsolved cases, and discovered that defects in NUBPL and FOXRED1 can cause CI deficiency. Our study illustrates how large-scale sequencing, coupled with functional prediction and experimental validation, can reveal novel disease-causing mutations in individual patients. PMID:20818383
Novel genetic tools for studying food-borne Salmonella.
Andrews-Polymenis, Helene L; Santiviago, Carlos A; McClelland, Michael
2009-04-01
Nontyphoidal Salmonellae are highly prevalent food-borne pathogens. High-throughput sequencing of Salmonella genomes is expanding our knowledge of the evolution of serovars and epidemic isolates. Genome sequences have also allowed the creation of complete microarrays. Microarrays have improved the throughput of in vivo expression technology (IVET) used to uncover promoters active during infection. In another method, signature tagged mutagenesis (STM), pools of mutants are subjected to selection. Changes in the population are monitored on a microarray, revealing genes under selection. Complete genome sequences permit the construction of pools of targeted in-frame deletions that have improved STM by minimizing the number of clones and the polarity of each mutant. Together, genome sequences and the continuing development of new tools for functional genomics will drive a revolution in the understanding of Salmonellae in many different niches that are critical for food safety.
Personalized Oncology Through Integrative High-Throughput Sequencing: A Pilot Study
Roychowdhury, Sameek; Iyer, Matthew K.; Robinson, Dan R.; Lonigro, Robert J.; Wu, Yi-Mi; Cao, Xuhong; Kalyana-Sundaram, Shanker; Sam, Lee; Balbin, O. Alejandro; Quist, Michael J.; Barrette, Terrence; Everett, Jessica; Siddiqui, Javed; Kunju, Lakshmi P.; Navone, Nora; Araujo, John C.; Troncoso, Patricia; Logothetis, Christopher J.; Innis, Jeffrey W.; Smith, David C.; Lao, Christopher D.; Kim, Scott Y.; Roberts, J. Scott; Gruber, Stephen B.; Pienta, Kenneth J.; Talpaz, Moshe; Chinnaiyan, Arul M.
2012-01-01
Individual cancers harbor a set of genetic aberrations that can be informative for identifying rational therapies currently available or in clinical trials. We implemented a pilot study to explore the practical challenges of applying high-throughput sequencing in clinical oncology. We enrolled patients with advanced or refractory cancer who were eligible for clinical trials. For each patient, we performed whole-genome sequencing of the tumor, targeted whole-exome sequencing of tumor and normal DNA, and transcriptome sequencing (RNA-Seq) of the tumor to identify potentially informative mutations in a clinically relevant time frame of 3 to 4 weeks. With this approach, we detected several classes of cancer mutations including structural rearrangements, copy number alterations, point mutations, and gene expression alterations. A multidisciplinary Sequencing Tumor Board (STB) deliberated on the clinical interpretation of the sequencing results obtained. We tested our sequencing strategy on human prostate cancer xenografts. Next, we enrolled two patients into the clinical protocol and were able to review the results at our STB within 24 days of biopsy. The first patient had metastatic colorectal cancer in which we identified somatic point mutations in NRAS, TP53, AURKA, FAS, and MYH11, plus amplification and overexpression of cyclin-dependent kinase 8 (CDK8). The second patient had malignant melanoma, in which we identified a somatic point mutation in HRAS and a structural rearrangement affecting CDKN2C. The STB identified the CDK8 amplification and Ras mutation as providing a rationale for clinical trials with CDK inhibitors or MEK (mitogenactivated or extracellular signal–regulated protein kinase kinase) and PI3K (phosphatidylinositol 3-kinase) inhibitors, respectively. Integrative high-throughput sequencing of patients with advanced cancer generates a comprehensive, individual mutational landscape to facilitate biomarker-driven clinical trials in oncology. PMID:22133722
Restructuring of the Aquatic Bacterial Community by Hydric Dynamics Associated with Superstorm Sandy
Ulrich, Nikea; Rosenberger, Abigail; Brislawn, Colin; Wright, Justin; Kessler, Collin; Toole, David; Solomon, Caroline; Strutt, Steven; McClure, Erin
2016-01-01
ABSTRACT Bacterial community composition and longitudinal fluctuations were monitored in a riverine system during and after Superstorm Sandy to better characterize inter- and intracommunity responses associated with the disturbance associated with a 100-year storm event. High-throughput sequencing of the 16S rRNA gene was used to assess microbial community structure within water samples from Muddy Creek Run, a second-order stream in Huntingdon, PA, at 12 different time points during the storm event (29 October to 3 November 2012) and under seasonally matched baseline conditions. High-throughput sequencing of the 16S rRNA gene was used to track changes in bacterial community structure and divergence during and after Superstorm Sandy. Bacterial community dynamics were correlated to measured physicochemical parameters and fecal indicator bacteria (FIB) concentrations. Bioinformatics analyses of 2.1 million 16S rRNA gene sequences revealed a significant increase in bacterial diversity in samples taken during peak discharge of the storm. Beta-diversity analyses revealed longitudinal shifts in the bacterial community structure. Successional changes were observed, in which Betaproteobacteria and Gammaproteobacteria decreased in 16S rRNA gene relative abundance, while the relative abundance of members of the Firmicutes increased. Furthermore, 16S rRNA gene sequences matching pathogenic bacteria, including strains of Legionella, Campylobacter, Arcobacter, and Helicobacter, as well as bacteria of fecal origin (e.g., Bacteroides), exhibited an increase in abundance after peak discharge of the storm. This study revealed a significant restructuring of in-stream bacterial community structure associated with hydric dynamics of a storm event. IMPORTANCE In order to better understand the microbial risks associated with freshwater environments during a storm event, a more comprehensive understanding of the variations in aquatic bacterial diversity is warranted. This study investigated the bacterial communities during and after Superstorm Sandy to provide fine time point resolution of dynamic changes in bacterial composition. This study adds to the current literature by revealing the variation in bacterial community structure during the course of a storm. This study employed high-throughput DNA sequencing, which generated a deep analysis of inter- and intracommunity responses during a significant storm event. This study has highlighted the utility of applying high-throughput sequencing for water quality monitoring purposes, as this approach enabled a more comprehensive investigation of the bacterial community structure. Altogether, these data suggest a drastic restructuring of the stream bacterial community during a storm event and highlight the potential of high-throughput sequencing approaches for assessing the microbiological quality of our environment. PMID:27060115
Lyons, Eli; Sheridan, Paul; Tremmel, Georg; Miyano, Satoru; Sugano, Sumio
2017-10-24
High-throughput screens allow for the identification of specific biomolecules with characteristics of interest. In barcoded screens, DNA barcodes are linked to target biomolecules in a manner allowing for the target molecules making up a library to be identified by sequencing the DNA barcodes using Next Generation Sequencing. To be useful in experimental settings, the DNA barcodes in a library must satisfy certain constraints related to GC content, homopolymer length, Hamming distance, and blacklisted subsequences. Here we report a novel framework to quickly generate large-scale libraries of DNA barcodes for use in high-throughput screens. We show that our framework dramatically reduces the computation time required to generate large-scale DNA barcode libraries, compared with a naїve approach to DNA barcode library generation. As a proof of concept, we demonstrate that our framework is able to generate a library consisting of one million DNA barcodes for use in a fragment antibody phage display screening experiment. We also report generating a general purpose one billion DNA barcode library, the largest such library yet reported in literature. Our results demonstrate the value of our novel large-scale DNA barcode library generation framework for use in high-throughput screening applications.
High-Throughput Analysis of T-DNA Location and Structure Using Sequence Capture.
Inagaki, Soichi; Henry, Isabelle M; Lieberman, Meric C; Comai, Luca
2015-01-01
Agrobacterium-mediated transformation of plants with T-DNA is used both to introduce transgenes and for mutagenesis. Conventional approaches used to identify the genomic location and the structure of the inserted T-DNA are laborious and high-throughput methods using next-generation sequencing are being developed to address these problems. Here, we present a cost-effective approach that uses sequence capture targeted to the T-DNA borders to select genomic DNA fragments containing T-DNA-genome junctions, followed by Illumina sequencing to determine the location and junction structure of T-DNA insertions. Multiple probes can be mixed so that transgenic lines transformed with different T-DNA types can be processed simultaneously, using a simple, index-based pooling approach. We also developed a simple bioinformatic tool to find sequence read pairs that span the junction between the genome and T-DNA or any foreign DNA. We analyzed 29 transgenic lines of Arabidopsis thaliana, each containing inserts from 4 different T-DNA vectors. We determined the location of T-DNA insertions in 22 lines, 4 of which carried multiple insertion sites. Additionally, our analysis uncovered a high frequency of unconventional and complex T-DNA insertions, highlighting the needs for high-throughput methods for T-DNA localization and structural characterization. Transgene insertion events have to be fully characterized prior to use as commercial products. Our method greatly facilitates the first step of this characterization of transgenic plants by providing an efficient screen for the selection of promising lines.
Improved Protocols for Illumina Sequencing
Bronner, Iraad F.; Quail, Michael A.; Turner, Daniel J.; Swerdlow, Harold
2013-01-01
In this unit, we describe a set of improvements we have made to the standard Illumina protocols to make the sequencing process more reliable in a high-throughput environment, reduce amplification bias, narrow the distribution of insert sizes, and reliably obtain high yields of data. PMID:19582764
Appliation of rad-sequencing to linkage mapping in citrus
USDA-ARS?s Scientific Manuscript database
High density linkage maps can be developed for modest cost using high-throughput DNA sequencing to genotype a defined fraction (representation) of the genome. We developed linkage maps in two citrus populations using the RAD (Restriction site Associated DNA) genotyping method which involves restrict...
Diroma, Maria Angela; Santorsola, Mariangela; Guttà, Cristiano; Gasparre, Giuseppe; Picardi, Ernesto; Pesole, Graziano; Attimonelli, Marcella
2014-01-01
Motivation: The increasing availability of mitochondria-targeted and off-target sequencing data in whole-exome and whole-genome sequencing studies (WXS and WGS) has risen the demand of effective pipelines to accurately measure heteroplasmy and to easily recognize the most functionally important mitochondrial variants among a huge number of candidates. To this purpose, we developed MToolBox, a highly automated pipeline to reconstruct and analyze human mitochondrial DNA from high-throughput sequencing data. Results: MToolBox implements an effective computational strategy for mitochondrial genomes assembling and haplogroup assignment also including a prioritization analysis of detected variants. MToolBox provides a Variant Call Format file featuring, for the first time, allele-specific heteroplasmy and annotation files with prioritized variants. MToolBox was tested on simulated samples and applied on 1000 Genomes WXS datasets. Availability and implementation: MToolBox package is available at https://sourceforge.net/projects/mtoolbox/. Contact: marcella.attimonelli@uniba.it Supplementary information: Supplementary data are available at Bioinformatics online. PMID:25028726
Goodacre, Norman; Aljanahi, Aisha; Nandakumar, Subhiksha; Mikailov, Mike
2018-01-01
ABSTRACT Detection of distantly related viruses by high-throughput sequencing (HTS) is bioinformatically challenging because of the lack of a public database containing all viral sequences, without abundant nonviral sequences, which can extend runtime and obscure viral hits. Our reference viral database (RVDB) includes all viral, virus-related, and virus-like nucleotide sequences (excluding bacterial viruses), regardless of length, and with overall reduced cellular sequences. Semantic selection criteria (SEM-I) were used to select viral sequences from GenBank, resulting in a first-generation viral database (VDB). This database was manually and computationally reviewed, resulting in refined, semantic selection criteria (SEM-R), which were applied to a new download of updated GenBank sequences to create a second-generation VDB. Viral entries in the latter were clustered at 98% by CD-HIT-EST to reduce redundancy while retaining high viral sequence diversity. The viral identity of the clustered representative sequences (creps) was confirmed by BLAST searches in NCBI databases and HMMER searches in PFAM and DFAM databases. The resulting RVDB contained a broad representation of viral families, sequence diversity, and a reduced cellular content; it includes full-length and partial sequences and endogenous nonretroviral elements, endogenous retroviruses, and retrotransposons. Testing of RVDBv10.2, with an in-house HTS transcriptomic data set indicated a significantly faster run for virus detection than interrogating the entirety of the NCBI nonredundant nucleotide database, which contains all viral sequences but also nonviral sequences. RVDB is publically available for facilitating HTS analysis, particularly for novel virus detection. It is meant to be updated on a regular basis to include new viral sequences added to GenBank. IMPORTANCE To facilitate bioinformatics analysis of high-throughput sequencing (HTS) data for the detection of both known and novel viruses, we have developed a new reference viral database (RVDB) that provides a broad representation of different virus species from eukaryotes by including all viral, virus-like, and virus-related sequences (excluding bacteriophages), regardless of their size. In particular, RVDB contains endogenous nonretroviral elements, endogenous retroviruses, and retrotransposons. Sequences were clustered to reduce redundancy while retaining high viral sequence diversity. A particularly useful feature of RVDB is the reduction of cellular sequences, which can enhance the run efficiency of large transcriptomic and genomic data analysis and increase the specificity of virus detection. PMID:29564396
Goodacre, Norman; Aljanahi, Aisha; Nandakumar, Subhiksha; Mikailov, Mike; Khan, Arifa S
2018-01-01
Detection of distantly related viruses by high-throughput sequencing (HTS) is bioinformatically challenging because of the lack of a public database containing all viral sequences, without abundant nonviral sequences, which can extend runtime and obscure viral hits. Our reference viral database (RVDB) includes all viral, virus-related, and virus-like nucleotide sequences (excluding bacterial viruses), regardless of length, and with overall reduced cellular sequences. Semantic selection criteria (SEM-I) were used to select viral sequences from GenBank, resulting in a first-generation viral database (VDB). This database was manually and computationally reviewed, resulting in refined, semantic selection criteria (SEM-R), which were applied to a new download of updated GenBank sequences to create a second-generation VDB. Viral entries in the latter were clustered at 98% by CD-HIT-EST to reduce redundancy while retaining high viral sequence diversity. The viral identity of the clustered representative sequences (creps) was confirmed by BLAST searches in NCBI databases and HMMER searches in PFAM and DFAM databases. The resulting RVDB contained a broad representation of viral families, sequence diversity, and a reduced cellular content; it includes full-length and partial sequences and endogenous nonretroviral elements, endogenous retroviruses, and retrotransposons. Testing of RVDBv10.2, with an in-house HTS transcriptomic data set indicated a significantly faster run for virus detection than interrogating the entirety of the NCBI nonredundant nucleotide database, which contains all viral sequences but also nonviral sequences. RVDB is publically available for facilitating HTS analysis, particularly for novel virus detection. It is meant to be updated on a regular basis to include new viral sequences added to GenBank. IMPORTANCE To facilitate bioinformatics analysis of high-throughput sequencing (HTS) data for the detection of both known and novel viruses, we have developed a new reference viral database (RVDB) that provides a broad representation of different virus species from eukaryotes by including all viral, virus-like, and virus-related sequences (excluding bacteriophages), regardless of their size. In particular, RVDB contains endogenous nonretroviral elements, endogenous retroviruses, and retrotransposons. Sequences were clustered to reduce redundancy while retaining high viral sequence diversity. A particularly useful feature of RVDB is the reduction of cellular sequences, which can enhance the run efficiency of large transcriptomic and genomic data analysis and increase the specificity of virus detection.
Evaluating imputation algorithms for low-depth genotyping-by-sequencing (GBS) data
USDA-ARS?s Scientific Manuscript database
Well-powered genomic studies require genome-wide marker coverage across many individuals. For non-model species with few genomic resources, high-throughput sequencing (HTS) methods, such as Genotyping-By-Sequencing (GBS), offer an inexpensive alternative to array-based genotyping. Although affordabl...
Karas, Vlad O; Sinnott-Armstrong, Nicholas A; Varghese, Vici; Shafer, Robert W; Greenleaf, William J; Sherlock, Gavin
2018-01-01
Abstract Much of the within species genetic variation is in the form of single nucleotide polymorphisms (SNPs), typically detected by whole genome sequencing (WGS) or microarray-based technologies. However, WGS produces mostly uninformative reads that perfectly match the reference, while microarrays require genome-specific reagents. We have developed Diff-seq, a sequencing-based mismatch detection assay for SNP discovery without the requirement for specialized nucleic-acid reagents. Diff-seq leverages the Surveyor endonuclease to cleave mismatched DNA molecules that are generated after cross-annealing of a complex pool of DNA fragments. Sequencing libraries enriched for Surveyor-cleaved molecules result in increased coverage at the variant sites. Diff-seq detected all mismatches present in an initial test substrate, with specific enrichment dependent on the identity and context of the variation. Application to viral sequences resulted in increased observation of variant alleles in a biologically relevant context. Diff-Seq has the potential to increase the sensitivity and efficiency of high-throughput sequencing in the detection of variation. PMID:29361139
FMLRC: Hybrid long read error correction using an FM-index.
Wang, Jeremy R; Holt, James; McMillan, Leonard; Jones, Corbin D
2018-02-09
Long read sequencing is changing the landscape of genomic research, especially de novo assembly. Despite the high error rate inherent to long read technologies, increased read lengths dramatically improve the continuity and accuracy of genome assemblies. However, the cost and throughput of these technologies limits their application to complex genomes. One solution is to decrease the cost and time to assemble novel genomes by leveraging "hybrid" assemblies that use long reads for scaffolding and short reads for accuracy. We describe a novel method leveraging a multi-string Burrows-Wheeler Transform with auxiliary FM-index to correct errors in long read sequences using a set of complementary short reads. We demonstrate that our method efficiently produces significantly more high quality corrected sequence than existing hybrid error-correction methods. We also show that our method produces more contiguous assemblies, in many cases, than existing state-of-the-art hybrid and long-read only de novo assembly methods. Our method accurately corrects long read sequence data using complementary short reads. We demonstrate higher total throughput of corrected long reads and a corresponding increase in contiguity of the resulting de novo assemblies. Improved throughput and computational efficiency than existing methods will help better economically utilize emerging long read sequencing technologies.
Analysis of Illumina Microbial Assemblies
DOE Office of Scientific and Technical Information (OSTI.GOV)
Clum, Alicia; Foster, Brian; Froula, Jeff
2010-05-28
Since the emerging of second generation sequencing technologies, the evaluation of different sequencing approaches and their assembly strategies for different types of genomes has become an important undertaken. Next generation sequencing technologies dramatically increase sequence throughput while decreasing cost, making them an attractive tool for whole genome shotgun sequencing. To compare different approaches for de-novo whole genome assembly, appropriate tools and a solid understanding of both quantity and quality of the underlying sequence data are crucial. Here, we performed an in-depth analysis of short-read Illumina sequence assembly strategies for bacterial and archaeal genomes. Different types of Illumina libraries as wellmore » as different trim parameters and assemblers were evaluated. Results of the comparative analysis and sequencing platforms will be presented. The goal of this analysis is to develop a cost-effective approach for the increased throughput of the generation of high quality microbial genomes.« less
Next Generation Sequencing at the University of Chicago Genomics Core
DOE Office of Scientific and Technical Information (OSTI.GOV)
Faber, Pieter
2013-04-24
The University of Chicago Genomics Core provides University of Chicago investigators (and external clients) access to State-of-the-Art genomics capabilities: next generation sequencing, Sanger sequencing / genotyping and micro-arrays (gene expression, genotyping, and methylation). The current presentation will highlight our capabilities in the area of ultra-high throughput sequencing analysis.
Mutation detection using automated fluorescence-based sequencing.
Montgomery, Kate T; Iartchouck, Oleg; Li, Li; Perera, Anoja; Yassin, Yosuf; Tamburino, Alex; Loomis, Stephanie; Kucherlapati, Raju
2008-04-01
The development of high-throughput DNA sequencing techniques has made direct DNA sequencing of PCR-amplified genomic DNA a rapid and economical approach to the identification of polymorphisms that may play a role in disease. Point mutations as well as small insertions or deletions are readily identified by DNA sequencing. The mutations may be heterozygous (occurring in one allele while the other allele retains the normal sequence) or homozygous (occurring in both alleles). Sequencing alone cannot discriminate between true homozygosity and apparent homozygosity due to the loss of one allele due to a large deletion. In this unit, strategies are presented for using PCR amplification and automated fluorescence-based sequencing to identify sequence variation. The size of the project and laboratory preference and experience will dictate how the data is managed and which software tools are used for analysis. A high-throughput protocol is given that has been used to search for mutations in over 200 different genes at the Harvard Medical School - Partners Center for Genetics and Genomics (HPCGG, http://www.hpcgg.org/). Copyright 2008 by John Wiley & Sons, Inc.
A computational method for estimating the PCR duplication rate in DNA and RNA-seq experiments.
Bansal, Vikas
2017-03-14
PCR amplification is an important step in the preparation of DNA sequencing libraries prior to high-throughput sequencing. PCR amplification introduces redundant reads in the sequence data and estimating the PCR duplication rate is important to assess the frequency of such reads. Existing computational methods do not distinguish PCR duplicates from "natural" read duplicates that represent independent DNA fragments and therefore, over-estimate the PCR duplication rate for DNA-seq and RNA-seq experiments. In this paper, we present a computational method to estimate the average PCR duplication rate of high-throughput sequence datasets that accounts for natural read duplicates by leveraging heterozygous variants in an individual genome. Analysis of simulated data and exome sequence data from the 1000 Genomes project demonstrated that our method can accurately estimate the PCR duplication rate on paired-end as well as single-end read datasets which contain a high proportion of natural read duplicates. Further, analysis of exome datasets prepared using the Nextera library preparation method indicated that 45-50% of read duplicates correspond to natural read duplicates likely due to fragmentation bias. Finally, analysis of RNA-seq datasets from individuals in the 1000 Genomes project demonstrated that 70-95% of read duplicates observed in such datasets correspond to natural duplicates sampled from genes with high expression and identified outlier samples with a 2-fold greater PCR duplication rate than other samples. The method described here is a useful tool for estimating the PCR duplication rate of high-throughput sequence datasets and for assessing the fraction of read duplicates that correspond to natural read duplicates. An implementation of the method is available at https://github.com/vibansal/PCRduplicates .
Xiao, Yongli; Sheng, Zong-Mei; Taubenberger, Jeffery K.
2015-01-01
The vast majority of surgical biopsy and post-mortem tissue samples are formalin-fixed and paraffin-embedded (FFPE), but this process leads to RNA degradation that limits gene expression analysis. As an example, the viral RNA genome of the 1918 pandemic influenza A virus was previously determined in a 9-year effort by overlapping RT-PCR from post-mortem samples. Using the protocols described here, the full genome of the 1918 virus at high coverage was determined in one high-throughput sequencing run of a cDNA library derived from total RNA of a 1918 FFPE sample after duplex-specific nuclease treatments. This basic methodological approach should assist in the analysis of FFPE tissue samples isolated over the past century from a variety of infectious diseases. PMID:26344216
Efficient Computation of Anharmonic Force Constants via q-space, with Application to Graphene
NASA Astrophysics Data System (ADS)
Kornbluth, Mordechai; Marianetti, Chris
We present a new approach for extracting anharmonic force constants from a sparse sampling of the anharmonic dynamical tensor. We calculate the derivative of the energy with respect to q-space displacements (phonons) and strain, which guarantees the absence of supercell image errors. Central finite differences provide a well-converged quadratic error tail for each derivative, separating the contribution of each anharmonic order. These derivatives populate the anharmonic dynamical tensor in a sparse mesh that bounds the Brillouin Zone, which ensures comprehensive sampling of q-space while exploiting small-cell calculations for efficient, high-throughput computation. This produces a well-converged and precisely-defined dataset, suitable for big-data approaches. We transform this sparsely-sampled anharmonic dynamical tensor to real-space anharmonic force constants that obey full space-group symmetries by construction. Machine-learning techniques identify the range of real-space interactions. We show the entire process executed for graphene, up to and including the fifth-order anharmonic force constants. This method successfully calculates strain-based phonon renormalization in graphene, even under large strains, which solves a major shortcoming of previous potentials.
Wright, Imogen A; Travers, Simon A
2014-07-01
The challenge presented by high-throughput sequencing necessitates the development of novel tools for accurate alignment of reads to reference sequences. Current approaches focus on using heuristics to map reads quickly to large genomes, rather than generating highly accurate alignments in coding regions. Such approaches are, thus, unsuited for applications such as amplicon-based analysis and the realignment phase of exome sequencing and RNA-seq, where accurate and biologically relevant alignment of coding regions is critical. To facilitate such analyses, we have developed a novel tool, RAMICS, that is tailored to mapping large numbers of sequence reads to short lengths (<10 000 bp) of coding DNA. RAMICS utilizes profile hidden Markov models to discover the open reading frame of each sequence and aligns to the reference sequence in a biologically relevant manner, distinguishing between genuine codon-sized indels and frameshift mutations. This approach facilitates the generation of highly accurate alignments, accounting for the error biases of the sequencing machine used to generate reads, particularly at homopolymer regions. Performance improvements are gained through the use of graphics processing units, which increase the speed of mapping through parallelization. RAMICS substantially outperforms all other mapping approaches tested in terms of alignment quality while maintaining highly competitive speed performance. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.
Wu, Nicholas C.; Young, Arthur P.; Al-Mawsawi, Laith Q.; Olson, C. Anders; Feng, Jun; Qi, Hangfei; Luan, Harding H.; Li, Xinmin; Wu, Ting-Ting
2014-01-01
ABSTRACT Viral proteins often display several functions which require multiple assays to dissect their genetic basis. Here, we describe a systematic approach to screen for loss-of-function mutations that confer a fitness disadvantage under a specified growth condition. Our methodology was achieved by genetically monitoring a mutant library under two growth conditions, with and without interferon, by deep sequencing. We employed a molecular tagging technique to distinguish true mutations from sequencing error. This approach enabled us to identify mutations that were negatively selected against, in addition to those that were positively selected for. Using this technique, we identified loss-of-function mutations in the influenza A virus NS segment that were sensitive to type I interferon in a high-throughput fashion. Mechanistic characterization further showed that a single substitution, D92Y, resulted in the inability of NS to inhibit RIG-I ubiquitination. The approach described in this study can be applied under any specified condition for any virus that can be genetically manipulated. IMPORTANCE Traditional genetics focuses on a single genotype-phenotype relationship, whereas high-throughput genetics permits phenotypic characterization of numerous mutants in parallel. High-throughput genetics often involves monitoring of a mutant library with deep sequencing. However, deep sequencing suffers from a high error rate (∼0.1 to 1%), which is usually higher than the occurrence frequency for individual point mutations within a mutant library. Therefore, only mutations that confer a fitness advantage can be identified with confidence due to an enrichment in the occurrence frequency. In contrast, it is impossible to identify deleterious mutations using most next-generation sequencing techniques. In this study, we have applied a molecular tagging technique to distinguish true mutations from sequencing errors. It enabled us to identify mutations that underwent negative selection, in addition to mutations that experienced positive selection. This study provides a proof of concept by screening for loss-of-function mutations on the influenza A virus NS segment that are involved in its anti-interferon activity. PMID:24965464
First report of Beet western yellows virus infecting Epiphyllum spp
USDA-ARS?s Scientific Manuscript database
Beet western yellow virus (BWYV) was identified from an orchid cactus (Epiphyllum spp.) hybrid without obvious symptoms by high-throughput sequencing. The nearly complete genomic sequence of 5,458 nucleotides of the virus was determined. The isolate has the highest nucleotide sequence identity (93%)...
Xu, Chen; Zhang, Nan; Huo, Qianyu; Chen, Minghui; Wang, Rengfeng; Liu, Zhili; Li, Xue; Liu, Yunde; Bao, Huijing
2016-04-15
In this article, we discuss the polymerase chain reaction (PCR)-hybridization assay that we developed for high-throughput simultaneous detection and differentiation of Ureaplasma urealyticum and Ureaplasma parvum using one set of primers and two specific DNA probes based on urease gene nucleotide sequence differences. First, U. urealyticum and U. parvum DNA samples were specifically amplified using one set of biotin-labeled primers. Furthermore, amine-modified DNA probes, which can specifically react with U. urealyticum or U. parvum DNA, were covalently immobilized to a DNA-BIND plate surface. The plate was then incubated with the PCR products to facilitate sequence-specific DNA binding. Horseradish peroxidase-streptavidin conjugation and a colorimetric assay were used. Based on the results, the PCR-hybridization assay we developed can specifically differentiate U. urealyticum and U. parvum with high sensitivity (95%) compared with cultivation (72.5%). Hence, this study demonstrates a new method for high-throughput simultaneous differentiation and detection of U. urealyticum and U. parvum with high sensitivity. Based on these observations, the PCR-hybridization assay developed in this study is ideal for detecting and discriminating U. urealyticum and U. parvum in clinical applications. Copyright © 2016 Elsevier Inc. All rights reserved.
Téllez-Sosa, Juan; Rodríguez, Mario Henry; Gómez-Barreto, Rosa E.; Valdovinos-Torres, Humberto; Hidalgo, Ana Cecilia; Cruz-Hervert, Pablo; Luna, René Santos; Carrillo-Valenzo, Erik; Ramos, Celso; García-García, Lourdes; Martínez-Barnetche, Jesús
2013-01-01
Background Influenza viruses display a high mutation rate and complex evolutionary patterns. Next-generation sequencing (NGS) has been widely used for qualitative and semi-quantitative assessment of genetic diversity in complex biological samples. The “deep sequencing” approach, enabled by the enormous throughput of current NGS platforms, allows the identification of rare genetic viral variants in targeted genetic regions, but is usually limited to a small number of samples. Methodology and Principal Findings We designed a proof-of-principle study to test whether redistributing sequencing throughput from a high depth-small sample number towards a low depth-large sample number approach is feasible and contributes to influenza epidemiological surveillance. Using 454-Roche sequencing, we sequenced at a rather low depth, a 307 bp amplicon of the neuraminidase gene of the Influenza A(H1N1) pandemic (A(H1N1)pdm) virus from cDNA amplicons pooled in 48 barcoded libraries obtained from nasal swab samples of infected patients (n = 299) taken from May to November, 2009 pandemic period in Mexico. This approach revealed that during the transition from the first (May-July) to second wave (September-November) of the pandemic, the initial genetic variants were replaced by the N248D mutation in the NA gene, and enabled the establishment of temporal and geographic associations with genetic diversity and the identification of mutations associated with oseltamivir resistance. Conclusions NGS sequencing of a short amplicon from the NA gene at low sequencing depth allowed genetic screening of a large number of samples, providing insights to viral genetic diversity dynamics and the identification of genetic variants associated with oseltamivir resistance. Further research is needed to explain the observed replacement of the genetic variants seen during the second wave. As sequencing throughput rises and library multiplexing and automation improves, we foresee that the approach presented here can be scaled up for global genetic surveillance of influenza and other infectious diseases. PMID:23843978
Du, Hongqi; Sun, Xiaoge; Shi, Yinghua; Wang, Chengzhang
2014-01-01
Background MicroRNAs (miRNAs) are a class of regulatory small RNAs (sRNAs) that regulate gene post-transcriptional expression in plants and animals. High-throughput sequencing technology is capable of identifying small RNAs in plant species. Alfalfa (Medicago sativa L.) is one of the most widely cultivated perennial forage legumes worldwide, and fall dormancy is an adaptive characteristic related to the biomass production and winter survival in alfalfa. Here, we applied high-throughput sRNA sequencing to identify some miRNAs that were responsive to fall dormancy in standard variety (Maverick and CUF101) of alfalfa. Results Four sRNA libraries were generated and sequenced from alfalfa leaves in two typical varieties at distinct seasons. Through integrative analysis, we identified 51 novel miRNA candidates of 206 families. Additionally, we identified 28 miRNAs associated with fall dormancy in standard variety (Maverick and CUF101), including 20 known miRNAs and eight novel miRNAs. Both high-throughput sequencing and RT-qPCR confirmed that eight known miRNA members were up-regulated and six known miRNA members were down-regulated in response to fall dormancy in standard variety (Maverick and CUF101). Among the 51 novel miRNA candidates, five miRNAs were up-regulated and three miRNAs were down-regulated in response to fall dormancy in standard variety (Maverick and CUF101), and five of them were confirmed by Northern blot analysis. Conclusion We identified 20 known miRNAs and eight new miRNA candidates that were responsive to fall dormancy in standard variety (Maverick and CUF101) by high-throughput sequencing of small RNAs from Medicago sativa. Our data provide a useful resource for investigating miRNA-mediated regulatory mechanisms of fall dormancy in alfalfa, and these findings are important for our understanding of the roles played by miRNAs in the response of plants to abiotic stress in general and fall dormancy in alfalfa. PMID:25473944
Fan, Wenna; Zhang, Senhao; Du, Hongqi; Sun, Xiaoge; Shi, Yinghua; Wang, Chengzhang
2014-01-01
MicroRNAs (miRNAs) are a class of regulatory small RNAs (sRNAs) that regulate gene post-transcriptional expression in plants and animals. High-throughput sequencing technology is capable of identifying small RNAs in plant species. Alfalfa (Medicago sativa L.) is one of the most widely cultivated perennial forage legumes worldwide, and fall dormancy is an adaptive characteristic related to the biomass production and winter survival in alfalfa. Here, we applied high-throughput sRNA sequencing to identify some miRNAs that were responsive to fall dormancy in standard variety (Maverick and CUF101) of alfalfa. Four sRNA libraries were generated and sequenced from alfalfa leaves in two typical varieties at distinct seasons. Through integrative analysis, we identified 51 novel miRNA candidates of 206 families. Additionally, we identified 28 miRNAs associated with fall dormancy in standard variety (Maverick and CUF101), including 20 known miRNAs and eight novel miRNAs. Both high-throughput sequencing and RT-qPCR confirmed that eight known miRNA members were up-regulated and six known miRNA members were down-regulated in response to fall dormancy in standard variety (Maverick and CUF101). Among the 51 novel miRNA candidates, five miRNAs were up-regulated and three miRNAs were down-regulated in response to fall dormancy in standard variety (Maverick and CUF101), and five of them were confirmed by Northern blot analysis. We identified 20 known miRNAs and eight new miRNA candidates that were responsive to fall dormancy in standard variety (Maverick and CUF101) by high-throughput sequencing of small RNAs from Medicago sativa. Our data provide a useful resource for investigating miRNA-mediated regulatory mechanisms of fall dormancy in alfalfa, and these findings are important for our understanding of the roles played by miRNAs in the response of plants to abiotic stress in general and fall dormancy in alfalfa.
Paul, Fiona; Otte, Jürgen; Schmitt, Imke; Dal Grande, Francesco
2018-06-05
The implementation of HTS (high-throughput sequencing) approaches is rapidly changing our understanding of the lichen symbiosis, by uncovering high bacterial and fungal diversity, which is often host-specific. Recently, HTS methods revealed the presence of multiple photobionts inside a single thallus in several lichen species. This differs from Sanger technology, which typically yields a single, unambiguous algal sequence per individual. Here we compared HTS and Sanger methods for estimating the diversity of green algal symbionts within lichen thalli using 240 lichen individuals belonging to two species of lichen-forming fungi. According to HTS data, Sanger technology consistently yielded the most abundant photobiont sequence in the sample. However, if the second most abundant photobiont exceeded 30% of the total HTS reads in a sample, Sanger sequencing generally failed. Our results suggest that most lichen individuals in the two analyzed species, Lasallia hispanica and L. pustulata, indeed contain a single, predominant green algal photobiont. We conclude that Sanger sequencing is a valid approach to detect the dominant photobionts in lichen individuals and populations. We discuss which research areas in lichen ecology and evolution will continue to benefit from Sanger sequencing, and which areas will profit from HTS approaches to assessing symbiont diversity.
De Groot, Anne S; Rappuoli, Rino
2004-02-01
Vaccine research entered a new era when the complete genome of a pathogenic bacterium was published in 1995. Since then, more than 97 bacterial pathogens have been sequenced and at least 110 additional projects are now in progress. Genome sequencing has also dramatically accelerated: high-throughput facilities can draft the sequence of an entire microbe (two to four megabases) in 1 to 2 days. Vaccine developers are using microarrays, immunoinformatics, proteomics and high-throughput immunology assays to reduce the truly unmanageable volume of information available in genome databases to a manageable size. Vaccines composed by novel antigens discovered from genome mining are already in clinical trials. Within 5 years we can expect to see a novel class of vaccines composed by genome-predicted, assembled and engineered T- and Bcell epitopes. This article addresses the convergence of three forces--microbial genome sequencing, computational immunology and new vaccine technologies--that are shifting genome mining for vaccines onto the forefront of immunology research.
Droplet barcoding for single cell transcriptomics applied to embryonic stem cells
Klein, Allon M; Mazutis, Linas; Akartuna, Ilke; Tallapragada, Naren; Veres, Adrian; Li, Victor; Peshkin, Leonid; Weitz, David A; Kirschner, Marc W
2015-01-01
Summary It has long been the dream of biologists to map gene expression at the single cell level. With such data one might track heterogeneous cell sub-populations, and infer regulatory relationships between genes and pathways. Recently, RNA sequencing has achieved single cell resolution. What is limiting is an effective way to routinely isolate and process large numbers of individual cells for quantitative in-depth sequencing. We have developed a high-throughput droplet-microfluidic approach for barcoding the RNA from thousands of individual cells for subsequent analysis by next-generation sequencing. The method shows a surprisingly low noise profile and is readily adaptable to other sequencing-based assays. We analyzed mouse embryonic stem cells, revealing in detail the population structure and the heterogeneous onset of differentiation after LIF withdrawal. The reproducibility of these high-throughput single cell data allowed us to deconstruct cell populations and infer gene expression relationships. PMID:26000487
A High-Throughput Process for the Solid-Phase Purification of Synthetic DNA Sequences
Grajkowski, Andrzej; Cieślak, Jacek; Beaucage, Serge L.
2017-01-01
An efficient process for the purification of synthetic phosphorothioate and native DNA sequences is presented. The process is based on the use of an aminopropylated silica gel support functionalized with aminooxyalkyl functions to enable capture of DNA sequences through an oximation reaction with the keto function of a linker conjugated to the 5′-terminus of DNA sequences. Deoxyribonucleoside phosphoramidites carrying this linker, as a 5′-hydroxyl protecting group, have been synthesized for incorporation into DNA sequences during the last coupling step of a standard solid-phase synthesis protocol executed on a controlled pore glass (CPG) support. Solid-phase capture of the nucleobase- and phosphate-deprotected DNA sequences released from the CPG support is demonstrated to proceed near quantitatively. Shorter than full-length DNA sequences are first washed away from the capture support; the solid-phase purified DNA sequences are then released from this support upon reaction with tetra-n-butylammonium fluoride in dry dimethylsulfoxide (DMSO) and precipitated in tetrahydrofuran (THF). The purity of solid-phase-purified DNA sequences exceeds 98%. The simulated high-throughput and scalability features of the solid-phase purification process are demonstrated without sacrificing purity of the DNA sequences. PMID:28628204
DOE Office of Scientific and Technical Information (OSTI.GOV)
Daum, Christopher; Zane, Matthew; Han, James
2011-01-31
The U.S. Department of Energy (DOE) Joint Genome Institute's (JGI) Production Sequencing group is committed to the generation of high-quality genomic DNA sequence to support the mission areas of renewable energy generation, global carbon management, and environmental characterization and clean-up. Within the JGI's Production Sequencing group, a robust Illumina Genome Analyzer and HiSeq pipeline has been established. Optimization of the sesequencer pipelines has been ongoing with the aim of continual process improvement of the laboratory workflow, reducing operational costs and project cycle times to increases ample throughput, and improving the overall quality of the sequence generated. A sequence QC analysismore » pipeline has been implemented to automatically generate read and assembly level quality metrics. The foremost of these optimization projects, along with sequencing and operational strategies, throughput numbers, and sequencing quality results will be presented.« less
USDA-ARS?s Scientific Manuscript database
Genetic diversity is an essential resource for breeders to improve new cultivars with desirable characteristics. Recently genotyping-by-sequencing (GBS), a next generation sequencing (NGS) based technology that can simplify complex genomes, has been used as a high-throughput and cost-effective molec...
High-Throughput resequencing of maize landraces at genomic regions associated with flowering time
USDA-ARS?s Scientific Manuscript database
Despite the reduction in the price of sequencing, it remains expensive to sequence and assemble whole, complex genomes of multiple samples for population studies, particularly for large genomes like those of many crop species. Enrichment of target genome regions coupled with next generation sequenci...
High-Throughput Gene Mapping in Caenorhabditis elegans
Swan, Kathryn A.; Curtis, Damian E.; McKusick, Kathleen B.; Voinov, Alexander V.; Mapa, Felipa A.; Cancilla, Michael R.
2002-01-01
Positional cloning of mutations in model genetic systems is a powerful method for the identification of targets of medical and agricultural importance. To facilitate the high-throughput mapping of mutations in Caenorhabditis elegans, we have identified a further 9602 putative new single nucleotide polymorphisms (SNPs) between two C. elegans strains, Bristol N2 and the Hawaiian mapping strain CB4856, by sequencing inserts from a CB4856 genomic DNA library and using an informatics pipeline to compare sequences with the canonical N2 genomic sequence. When combined with data from other laboratories, our marker set of 17,189 SNPs provides even coverage of the complete worm genome. To date, we have confirmed >1099 evenly spaced SNPs (one every 91 ± 56 kb) across the six chromosomes and validated the utility of our SNP marker set and new fluorescence polarization-based genotyping methods for systematic and high-throughput identification of genes in C. elegans by cloning several proprietary genes. We illustrate our approach by recombination mapping and confirmation of the mutation in the cloned gene, dpy-18. [The sequence data described in this paper have been submitted to the NCBI dbSNP data library under accession nos. 4388625–4389689 and GenBank dbSTS under accession nos. 973810–974874. The following individuals and institutions kindly provided reagents, samples, or unpublished information as indicated in the paper: The C. elegans Sequencing Consortium and The Caenorhabditis Genetics Center.] PMID:12097347
Asha, Srinivasan; Sreekumar, Sweda; Soniya, E V
2016-01-01
Analysis of high-throughput small RNA deep sequencing data, in combination with black pepper transcriptome sequences revealed microRNA-mediated gene regulation in black pepper ( Piper nigrum L.). Black pepper is an important spice crop and its berries are used worldwide as a natural food additive that contributes unique flavour to foods. In the present study to characterize microRNAs from black pepper, we generated a small RNA library from black pepper leaf and sequenced it by Illumina high-throughput sequencing technology. MicroRNAs belonging to a total of 303 conserved miRNA families were identified from the sRNAome data. Subsequent analysis from recently sequenced black pepper transcriptome confirmed precursor sequences of 50 conserved miRNAs and four potential novel miRNA candidates. Stem-loop qRT-PCR experiments demonstrated differential expression of eight conserved miRNAs in black pepper. Computational analysis of targets of the miRNAs showed 223 potential black pepper unigene targets that encode diverse transcription factors and enzymes involved in plant development, disease resistance, metabolic and signalling pathways. RLM-RACE experiments further mapped miRNA-mediated cleavage at five of the mRNA targets. In addition, miRNA isoforms corresponding to 18 miRNA families were also identified from black pepper. This study presents the first large-scale identification of microRNAs from black pepper and provides the foundation for the future studies of miRNA-mediated gene regulation of stress responses and diverse metabolic processes in black pepper.
LOCATE: a mouse protein subcellular localization database
Fink, J. Lynn; Aturaliya, Rajith N.; Davis, Melissa J.; Zhang, Fasheng; Hanson, Kelly; Teasdale, Melvena S.; Kai, Chikatoshi; Kawai, Jun; Carninci, Piero; Hayashizaki, Yoshihide; Teasdale, Rohan D.
2006-01-01
We present here LOCATE, a curated, web-accessible database that houses data describing the membrane organization and subcellular localization of proteins from the FANTOM3 Isoform Protein Sequence set. Membrane organization is predicted by the high-throughput, computational pipeline MemO. The subcellular locations of selected proteins from this set were determined by a high-throughput, immunofluorescence-based assay and by manually reviewing >1700 peer-reviewed publications. LOCATE represents the first effort to catalogue the experimentally verified subcellular location and membrane organization of mammalian proteins using a high-throughput approach and provides localization data for ∼40% of the mouse proteome. It is available at . PMID:16381849
Wang, Jingwen; Skoog, Tiina; Einarsdottir, Elisabet; Kaartokallio, Tea; Laivuori, Hannele; Grauers, Anna; Gerdhem, Paul; Hytönen, Marjo; Lohi, Hannes; Kere, Juha; Jiao, Hong
2016-01-01
High-throughput sequencing using pooled DNA samples can facilitate genome-wide studies on rare and low-frequency variants in a large population. Some major questions concerning the pooling sequencing strategy are whether rare and low-frequency variants can be detected reliably, and whether estimated minor allele frequencies (MAFs) can represent the actual values obtained from individually genotyped samples. In this study, we evaluated MAF estimates using three variant detection tools with two sets of pooled whole exome sequencing (WES) and one set of pooled whole genome sequencing (WGS) data. Both GATK and Freebayes displayed high sensitivity, specificity and accuracy when detecting rare or low-frequency variants. For the WGS study, 56% of the low-frequency variants in Illumina array have identical MAFs and 26% have one allele difference between sequencing and individual genotyping data. The MAF estimates from WGS correlated well (r = 0.94) with those from Illumina arrays. The MAFs from the pooled WES data also showed high concordance (r = 0.88) with those from the individual genotyping data. In conclusion, the MAFs estimated from pooled DNA sequencing data reflect the MAFs in individually genotyped samples well. The pooling strategy can thus be a rapid and cost-effective approach for the initial screening in large-scale association studies. PMID:27633116
Vergani, Stefano; Korsunsky, Ilya; Mazzarello, Andrea Nicola; Ferrer, Gerardo; Chiorazzi, Nicholas; Bagnara, Davide
2017-01-01
Efficient and accurate high-throughput DNA sequencing of the adaptive immune receptor repertoire (AIRR) is necessary to study immune diversity in healthy subjects and disease-related conditions. The high complexity and diversity of the AIRR coupled with the limited amount of starting material, which can compromise identification of the full biological diversity makes such sequencing particularly challenging. AIRR sequencing protocols often fail to fully capture the sampled AIRR diversity, especially for samples containing restricted numbers of B lymphocytes. Here, we describe a library preparation method for immunoglobulin sequencing that results in an exhaustive full-length repertoire where virtually every sampled B-cell is sequenced. This maximizes the likelihood of identifying and quantifying the entire IGHV-D-J repertoire of a sample, including the detection of rearrangements present in only one cell in the starting population. The methodology establishes the importance of circumventing genetic material dilution in the preamplification phases and incorporates the use of certain described concepts: (1) balancing the starting material amount and depth of sequencing, (2) avoiding IGHV gene-specific amplification, and (3) using Unique Molecular Identifier. Together, this methodology is highly efficient, in particular for detecting rare rearrangements in the sampled population and when only a limited amount of starting material is available.
High-Throughput, Data-Rich Cellular RNA Device Engineering
Townshend, Brent; Kennedy, Andrew B.; Xiang, Joy S.; Smolke, Christina D.
2015-01-01
Methods for rapidly assessing sequence-structure-function landscapes and developing conditional gene-regulatory devices are critical to our ability to manipulate and interface with biology. We describe a framework for engineering RNA devices from preexisting aptamers that exhibit ligand-responsive ribozyme tertiary interactions. Our methodology utilizes cell sorting, high-throughput sequencing, and statistical data analyses to enable parallel measurements of the activities of hundreds of thousands of sequences from RNA device libraries in the absence and presence of ligands. Our tertiary interaction RNA devices exhibit improved performance in terms of gene silencing, activation ratio, and ligand sensitivity as compared to optimized RNA devices that rely on secondary structure changes. We apply our method to building biosensors for diverse ligands and determine consensus sequences that enable ligand-responsive tertiary interactions. These methods advance our ability to develop broadly applicable genetic tools and to elucidate understanding of the underlying sequence-structure-function relationships that empower rational design of complex biomolecules. PMID:26258292
Nanowire-nanopore transistor sensor for DNA detection during translocation
NASA Astrophysics Data System (ADS)
Xie, Ping; Xiong, Qihua; Fang, Ying; Qing, Quan; Lieber, Charles
2011-03-01
Nanopore sequencing, as a promising low cost, high throughput sequencing technique, has been proposed more than a decade ago. Due to the incompatibility between small ionic current signal and fast translocation speed and the technical difficulties on large scale integration of nanopore for direct ionic current sequencing, alternative methods rely on integrated DNA sensors have been proposed, such as using capacitive coupling or tunnelling current etc. But none of them have been experimentally demonstrated yet. Here we show that for the first time an amplified sensor signal has been experimentally recorded from a nanowire-nanopore field effect transistor sensor during DNA translocation. Independent multi-channel recording was also demonstrated for the first time. Our results suggest that the signal is from highly localized potential change caused by DNA translocation in none-balanced buffer condition. Given this method may produce larger signal for smaller nanopores, we hope our experiment can be a starting point for a new generation of nanopore sequencing devices with larger signal, higher bandwidth and large-scale multiplexing capability and finally realize the ultimate goal of low cost high throughput sequencing.
High-throughput assays for DNA gyrase and other topoisomerases
Maxwell, Anthony; Burton, Nicolas P.; O'Hagan, Natasha
2006-01-01
We have developed high-throughput microtitre plate-based assays for DNA gyrase and other DNA topoisomerases. These assays exploit the fact that negatively supercoiled plasmids form intermolecular triplexes more efficiently than when they are relaxed. Two assays are presented, one using capture of a plasmid containing a single triplex-forming sequence by an oligonucleotide tethered to the surface of a microtitre plate and subsequent detection by staining with a DNA-specific fluorescent dye. The other uses capture of a plasmid containing two triplex-forming sequences by an oligonucleotide tethered to the surface of a microtitre plate and subsequent detection by a second oligonucleotide that is radiolabelled. The assays are shown to be appropriate for assaying DNA supercoiling by Escherichia coli DNA gyrase and DNA relaxation by eukaryotic topoisomerases I and II, and E.coli topoisomerase IV. The assays are readily adaptable to other enzymes that change DNA supercoiling (e.g. restriction enzymes) and are suitable for use in a high-throughput format. PMID:16936317
High-throughput assays for DNA gyrase and other topoisomerases.
Maxwell, Anthony; Burton, Nicolas P; O'Hagan, Natasha
2006-01-01
We have developed high-throughput microtitre plate-based assays for DNA gyrase and other DNA topoisomerases. These assays exploit the fact that negatively supercoiled plasmids form intermolecular triplexes more efficiently than when they are relaxed. Two assays are presented, one using capture of a plasmid containing a single triplex-forming sequence by an oligonucleotide tethered to the surface of a microtitre plate and subsequent detection by staining with a DNA-specific fluorescent dye. The other uses capture of a plasmid containing two triplex-forming sequences by an oligonucleotide tethered to the surface of a microtitre plate and subsequent detection by a second oligonucleotide that is radiolabelled. The assays are shown to be appropriate for assaying DNA supercoiling by Escherichia coli DNA gyrase and DNA relaxation by eukaryotic topoisomerases I and II, and E.coli topoisomerase IV. The assays are readily adaptable to other enzymes that change DNA supercoiling (e.g. restriction enzymes) and are suitable for use in a high-throughput format.
Yang, Wanneng; Guo, Zilong; Huang, Chenglong; Duan, Lingfeng; Chen, Guoxing; Jiang, Ni; Fang, Wei; Feng, Hui; Xie, Weibo; Lian, Xingming; Wang, Gongwei; Luo, Qingming; Zhang, Qifa; Liu, Qian; Xiong, Lizhong
2014-01-01
Even as the study of plant genomics rapidly develops through the use of high-throughput sequencing techniques, traditional plant phenotyping lags far behind. Here we develop a high-throughput rice phenotyping facility (HRPF) to monitor 13 traditional agronomic traits and 2 newly defined traits during the rice growth period. Using genome-wide association studies (GWAS) of the 15 traits, we identify 141 associated loci, 25 of which contain known genes such as the Green Revolution semi-dwarf gene, SD1. Based on a performance evaluation of the HRPF and GWAS results, we demonstrate that high-throughput phenotyping has the potential to replace traditional phenotyping techniques and can provide valuable gene identification information. The combination of the multifunctional phenotyping tools HRPF and GWAS provides deep insights into the genetic architecture of important traits. PMID:25295980
High-throughput analysis of T-DNA location and structure using sequence capture
DOE Office of Scientific and Technical Information (OSTI.GOV)
Inagaki, Soichi; Henry, Isabelle M.; Lieberman, Meric C.
Agrobacterium-mediated transformation of plants with T-DNA is used both to introduce transgenes and for mutagenesis. Conventional approaches used to identify the genomic location and the structure of the inserted T-DNA are laborious and high-throughput methods using next-generation sequencing are being developed to address these problems. Here, we present a cost-effective approach that uses sequence capture targeted to the T-DNA borders to select genomic DNA fragments containing T-DNA—genome junctions, followed by Illumina sequencing to determine the location and junction structure of T-DNA insertions. Multiple probes can be mixed so that transgenic lines transformed with different T-DNA types can be processed simultaneously,more » using a simple, index-based pooling approach. We also developed a simple bioinformatic tool to find sequence read pairs that span the junction between the genome and T-DNA or any foreign DNA. We analyzed 29 transgenic lines of Arabidopsis thaliana, each containing inserts from 4 different T-DNA vectors. We determined the location of T-DNA insertions in 22 lines, 4 of which carried multiple insertion sites. Additionally, our analysis uncovered a high frequency of unconventional and complex T-DNA insertions, highlighting the needs for high-throughput methods for T-DNA localization and structural characterization. Transgene insertion events have to be fully characterized prior to use as commercial products. As a result, our method greatly facilitates the first step of this characterization of transgenic plants by providing an efficient screen for the selection of promising lines.« less
High-throughput analysis of T-DNA location and structure using sequence capture
Inagaki, Soichi; Henry, Isabelle M.; Lieberman, Meric C.; ...
2015-10-07
Agrobacterium-mediated transformation of plants with T-DNA is used both to introduce transgenes and for mutagenesis. Conventional approaches used to identify the genomic location and the structure of the inserted T-DNA are laborious and high-throughput methods using next-generation sequencing are being developed to address these problems. Here, we present a cost-effective approach that uses sequence capture targeted to the T-DNA borders to select genomic DNA fragments containing T-DNA—genome junctions, followed by Illumina sequencing to determine the location and junction structure of T-DNA insertions. Multiple probes can be mixed so that transgenic lines transformed with different T-DNA types can be processed simultaneously,more » using a simple, index-based pooling approach. We also developed a simple bioinformatic tool to find sequence read pairs that span the junction between the genome and T-DNA or any foreign DNA. We analyzed 29 transgenic lines of Arabidopsis thaliana, each containing inserts from 4 different T-DNA vectors. We determined the location of T-DNA insertions in 22 lines, 4 of which carried multiple insertion sites. Additionally, our analysis uncovered a high frequency of unconventional and complex T-DNA insertions, highlighting the needs for high-throughput methods for T-DNA localization and structural characterization. Transgene insertion events have to be fully characterized prior to use as commercial products. As a result, our method greatly facilitates the first step of this characterization of transgenic plants by providing an efficient screen for the selection of promising lines.« less
Human genetics and genomics a decade after the release of the draft sequence of the human genome.
Naidoo, Nasheen; Pawitan, Yudi; Soong, Richie; Cooper, David N; Ku, Chee-Seng
2011-10-01
Substantial progress has been made in human genetics and genomics research over the past ten years since the publication of the draft sequence of the human genome in 2001. Findings emanating directly from the Human Genome Project, together with those from follow-on studies, have had an enormous impact on our understanding of the architecture and function of the human genome. Major developments have been made in cataloguing genetic variation, the International HapMap Project, and with respect to advances in genotyping technologies. These developments are vital for the emergence of genome-wide association studies in the investigation of complex diseases and traits. In parallel, the advent of high-throughput sequencing technologies has ushered in the 'personal genome sequencing' era for both normal and cancer genomes, and made possible large-scale genome sequencing studies such as the 1000 Genomes Project and the International Cancer Genome Consortium. The high-throughput sequencing and sequence-capture technologies are also providing new opportunities to study Mendelian disorders through exome sequencing and whole-genome sequencing. This paper reviews these major developments in human genetics and genomics over the past decade.
Human genetics and genomics a decade after the release of the draft sequence of the human genome
2011-01-01
Substantial progress has been made in human genetics and genomics research over the past ten years since the publication of the draft sequence of the human genome in 2001. Findings emanating directly from the Human Genome Project, together with those from follow-on studies, have had an enormous impact on our understanding of the architecture and function of the human genome. Major developments have been made in cataloguing genetic variation, the International HapMap Project, and with respect to advances in genotyping technologies. These developments are vital for the emergence of genome-wide association studies in the investigation of complex diseases and traits. In parallel, the advent of high-throughput sequencing technologies has ushered in the 'personal genome sequencing' era for both normal and cancer genomes, and made possible large-scale genome sequencing studies such as the 1000 Genomes Project and the International Cancer Genome Consortium. The high-throughput sequencing and sequence-capture technologies are also providing new opportunities to study Mendelian disorders through exome sequencing and whole-genome sequencing. This paper reviews these major developments in human genetics and genomics over the past decade. PMID:22155605
Re-engineering adenovirus vector systems to enable high-throughput analyses of gene function.
Stanton, Richard J; McSharry, Brian P; Armstrong, Melanie; Tomasec, Peter; Wilkinson, Gavin W G
2008-12-01
With the enhanced capacity of bioinformatics to interrogate extensive banks of sequence data, more efficient technologies are needed to test gene function predictions. Replication-deficient recombinant adenovirus (Ad) vectors are widely used in expression analysis since they provide for extremely efficient expression of transgenes in a wide range of cell types. To facilitate rapid, high-throughput generation of recombinant viruses, we have re-engineered an adenovirus vector (designated AdZ) to allow single-step, directional gene insertion using recombineering technology. Recombineering allows for direct insertion into the Ad vector of PCR products, synthesized sequences, or oligonucleotides encoding shRNAs without requirement for a transfer vector Vectors were optimized for high-throughput applications by making them "self-excising" through incorporating the I-SceI homing endonuclease into the vector removing the need to linearize vectors prior to transfection into packaging cells. AdZ vectors allow genes to be expressed in their native form or with strep, V5, or GFP tags. Insertion of tetracycline operators downstream of the human cytomegalovirus major immediate early (HCMV MIE) promoter permits silencing of transgenes in helper cells expressing the tet repressor thus making the vector compatible with the cloning of toxic gene products. The AdZ vector system is robust, straightforward, and suited to both sporadic and high-throughput applications.
Barbi, Florian; Bragalini, Claudia; Vallon, Laurent; Prudent, Elsa; Dubost, Audrey; Fraissinet-Tachet, Laurence; Marmeisse, Roland; Luis, Patricia
2014-01-01
Plant biomass degradation in soil is one of the key steps of carbon cycling in terrestrial ecosystems. Fungal saprotrophic communities play an essential role in this process by producing hydrolytic enzymes active on the main components of plant organic matter. Open questions in this field regard the diversity of the species involved, the major biochemical pathways implicated and how these are affected by external factors such as litter quality or climate changes. This can be tackled by environmental genomic approaches involving the systematic sequencing of key enzyme-coding gene families using soil-extracted RNA as material. Such an approach necessitates the design and evaluation of gene family-specific PCR primers producing sequence fragments compatible with high-throughput sequencing approaches. In the present study, we developed and evaluated PCR primers for the specific amplification of fungal CAZy Glycoside Hydrolase gene families GH5 (subfamily 5) and GH11 encoding endo-β-1,4-glucanases and endo-β-1,4-xylanases respectively as well as Basidiomycota class II peroxidases, corresponding to the CAZy Auxiliary Activity family 2 (AA2), active on lignin. These primers were experimentally validated using DNA extracted from a wide range of Ascomycota and Basidiomycota species including 27 with sequenced genomes. Along with the published primers for Glycoside Hydrolase GH7 encoding enzymes active on cellulose, the newly design primers were shown to be compatible with the Illumina MiSeq sequencing technology. Sequences obtained from RNA extracted from beech or spruce forest soils showed a high diversity and were uniformly distributed in gene trees featuring the global diversity of these gene families. This high-throughput sequencing approach using several degenerate primers constitutes a robust method, which allows the simultaneous characterization of the diversity of different fungal transcripts involved in plant organic matter degradation and may lead to the discovery of complex patterns in gene expression of soil fungal communities. PMID:25545363
Discovery of DNA viruses in wild-caught mosquitoes using small RNA high throughput sequencing.
Ma, Maijuan; Huang, Yong; Gong, Zhengda; Zhuang, Lu; Li, Cun; Yang, Hong; Tong, Yigang; Liu, Wei; Cao, Wuchun
2011-01-01
Mosquito-borne infectious diseases pose a severe threat to public health in many areas of the world. Current methods for pathogen detection and surveillance are usually dependent on prior knowledge of the etiologic agents involved. Hence, efficient approaches are required for screening wild mosquito populations to detect known and unknown pathogens. In this study, we explored the use of Next Generation Sequencing to identify viral agents in wild-caught mosquitoes. We extracted total RNA from different mosquito species from South China. Small 18-30 bp length RNA molecules were purified, reverse-transcribed into cDNA and sequenced using Illumina GAIIx instrumentation. Bioinformatic analyses to identify putative viral agents were conducted and the results confirmed by PCR. We identified a non-enveloped single-stranded DNA densovirus in the wild-caught Culex pipiens molestus mosquitoes. The majority of the viral transcripts (.>80% of the region) were covered by the small viral RNAs, with a few peaks of very high coverage obtained. The +/- strand sequence ratio of the small RNAs was approximately 7∶1, indicating that the molecules were mainly derived from the viral RNA transcripts. The small viral RNAs overlapped, enabling contig assembly of the viral genome sequence. We identified some small RNAs in the reverse repeat regions of the viral 5'- and 3' -untranslated regions where no transcripts were expected. Our results demonstrate for the first time that high throughput sequencing of small RNA is feasible for identifying viral agents in wild-caught mosquitoes. Our results show that it is possible to detect DNA viruses by sequencing the small RNAs obtained from insects, although the underlying mechanism of small viral RNA biogenesis is unclear. Our data and those of other researchers show that high throughput small RNA sequencing can be used for pathogen surveillance in wild mosquito vectors.
Jacob, Jacob H; Hussein, Emad I; Shakhatreh, Muhamad Ali K; Cornelison, Christopher T
2017-10-01
Amplicon sequencing using next-generation technology (bTEFAP ® ) has been utilized in describing the diversity of Dead Sea microbiota. The investigated area is a well-known salt lake in the western part of Jordan found in the lowest geographical location in the world (more than 420 m below sea level) and characterized by extreme salinity (approximately, 34%) in addition to other extreme conditions (low pH, unique ionic composition different from sea water). DNA was extracted from Dead Sea water. A total of 314,310 small subunit RNA (SSU rRNA) sequences were parsed, and 288,452 sequences were then clustered. For alpha diversity analysis, sample was rarefied to 3,000 sequences. The Shannon-Wiener index curve plot reached a plateau at approximately 3,000 sequences indicating that sequencing depth was sufficient to capture the full scope of microbial diversity. Archaea was found to be dominating the sequences (52%), whereas Bacteria constitute 45% of the sequences. Altogether, prokaryotic sequences (which constitute 97% of all sequences) were found to predominate. The findings expand on previous studies by using high-throughput amplicon sequencing to describe the microbial community in an environment which in recent years has been shown to hide some interesting diversity. © 2017 The Authors. MicrobiologyOpen published by John Wiley & Sons Ltd.
HYPOTHESIS TESTING FOR HIGH-DIMENSIONAL SPARSE BINARY REGRESSION
Mukherjee, Rajarshi; Pillai, Natesh S.; Lin, Xihong
2015-01-01
In this paper, we study the detection boundary for minimax hypothesis testing in the context of high-dimensional, sparse binary regression models. Motivated by genetic sequencing association studies for rare variant effects, we investigate the complexity of the hypothesis testing problem when the design matrix is sparse. We observe a new phenomenon in the behavior of detection boundary which does not occur in the case of Gaussian linear regression. We derive the detection boundary as a function of two components: a design matrix sparsity index and signal strength, each of which is a function of the sparsity of the alternative. For any alternative, if the design matrix sparsity index is too high, any test is asymptotically powerless irrespective of the magnitude of signal strength. For binary design matrices with the sparsity index that is not too high, our results are parallel to those in the Gaussian case. In this context, we derive detection boundaries for both dense and sparse regimes. For the dense regime, we show that the generalized likelihood ratio is rate optimal; for the sparse regime, we propose an extended Higher Criticism Test and show it is rate optimal and sharp. We illustrate the finite sample properties of the theoretical results using simulation studies. PMID:26246645
Senerchia, Natacha; Wicker, Thomas; Felber, François; Parisod, Christian
2013-01-01
Transposable elements (TEs) represent a major fraction of plant genomes and drive their evolution. An improved understanding of genome evolution requires the dynamics of a large number of TE families to be considered. We put forward an approach bypassing the required step of a complete reference genome to assess the evolutionary trajectories of high copy number TE families from genome snapshot with high-throughput sequencing. Low coverage sequencing of the complex genomes of Aegilops cylindrica and Ae. geniculata using 454 identified more than 70% of the sequences as known TEs, mainly long terminal repeat (LTR) retrotransposons. Comparing the abundance of reads as well as patterns of sequence diversity and divergence within and among genomes assessed the dynamics of 44 major LTR retrotransposon families of the 165 identified. In particular, molecular population genetics on individual TE copies distinguished recently active from quiescent families and highlighted different evolutionary trajectories of retrotransposons among related species. This work presents a suite of tools suitable for current sequencing data, allowing to address the genome-wide evolutionary dynamics of TEs at the family level and advancing our understanding of the evolution of nonmodel genomes.
Characterizing ncRNAs in Human Pathogenic Protists Using High-Throughput Sequencing Technology
Collins, Lesley Joan
2011-01-01
ncRNAs are key genes in many human diseases including cancer and viral infection, as well as providing critical functions in pathogenic organisms such as fungi, bacteria, viruses, and protists. Until now the identification and characterization of ncRNAs associated with disease has been slow or inaccurate requiring many years of testing to understand complicated RNA and protein gene relationships. High-throughput sequencing now offers the opportunity to characterize miRNAs, siRNAs, small nucleolar RNAs (snoRNAs), and long ncRNAs on a genomic scale, making it faster and easier to clarify how these ncRNAs contribute to the disease state. However, this technology is still relatively new, and ncRNA discovery is not an application of high priority for streamlined bioinformatics. Here we summarize background concepts and practical approaches for ncRNA analysis using high-throughput sequencing, and how it relates to understanding human disease. As a case study, we focus on the parasitic protists Giardia lamblia and Trichomonas vaginalis, where large evolutionary distance has meant difficulties in comparing ncRNAs with those from model eukaryotes. A combination of biological, computational, and sequencing approaches has enabled easier classification of ncRNA classes such as snoRNAs, but has also aided the identification of novel classes. It is hoped that a higher level of understanding of ncRNA expression and interaction may aid in the development of less harsh treatment for protist-based diseases. PMID:22303390
USDA-ARS?s Scientific Manuscript database
High-throughput sequencing of reduced representation genomic libraries has ushered in an era of genotyping-by-sequencing (GBS), where genome-wide genotype data can be obtained for nearly any species. However, there remains a need for imputation-free GBS methods for genotyping large samples taken fr...
Nguyen, Nguyen H; Rastas, Pasi M A; Premachandra, H K A; Knibb, Wayne
2018-01-01
The genetic resources available for the commercially important fish species Yellowtail kingfish (YTK) ( Seriola lalandi) are relative sparse. To overcome this, we aimed (1) to develop a linkage map for this species, and (2) to identify markers/variants associated with economically important traits in kingfish (with an emphasis on body weight). Genetic and genomic analyses were conducted using 13,898 single nucleotide polymorphisms (SNPs) generated from a new high-throughput genotyping by sequencing platform, Diversity Arrays Technology (DArTseq TM ) in a pedigreed population comprising 752 animals. The linkage analysis enabled to map about 4,000 markers to 24 linkage groups (LGs), with an average density of 3.4 SNPs per cM. The linkage map was integrated into a genome-wide association study (GWAS) and identified six variants/SNPs associated with body weight ( P < 5e -8 ) when a multi-locus mixed model was used. Two out of the six significant markers were mapped to LGs 17 and 23, and collectively they explained 5.8% of the total genetic variance. It is concluded that the newly developed linkage map and the significantly associated markers with body weight provide fundamental information to characterize genetic architecture of growth-related traits in this population of YTK S. lalandi .
OSG-GEM: Gene Expression Matrix Construction Using the Open Science Grid.
Poehlman, William L; Rynge, Mats; Branton, Chris; Balamurugan, D; Feltus, Frank A
2016-01-01
High-throughput DNA sequencing technology has revolutionized the study of gene expression while introducing significant computational challenges for biologists. These computational challenges include access to sufficient computer hardware and functional data processing workflows. Both these challenges are addressed with our scalable, open-source Pegasus workflow for processing high-throughput DNA sequence datasets into a gene expression matrix (GEM) using computational resources available to U.S.-based researchers on the Open Science Grid (OSG). We describe the usage of the workflow (OSG-GEM), discuss workflow design, inspect performance data, and assess accuracy in mapping paired-end sequencing reads to a reference genome. A target OSG-GEM user is proficient with the Linux command line and possesses basic bioinformatics experience. The user may run this workflow directly on the OSG or adapt it to novel computing environments.
OSG-GEM: Gene Expression Matrix Construction Using the Open Science Grid
Poehlman, William L.; Rynge, Mats; Branton, Chris; Balamurugan, D.; Feltus, Frank A.
2016-01-01
High-throughput DNA sequencing technology has revolutionized the study of gene expression while introducing significant computational challenges for biologists. These computational challenges include access to sufficient computer hardware and functional data processing workflows. Both these challenges are addressed with our scalable, open-source Pegasus workflow for processing high-throughput DNA sequence datasets into a gene expression matrix (GEM) using computational resources available to U.S.-based researchers on the Open Science Grid (OSG). We describe the usage of the workflow (OSG-GEM), discuss workflow design, inspect performance data, and assess accuracy in mapping paired-end sequencing reads to a reference genome. A target OSG-GEM user is proficient with the Linux command line and possesses basic bioinformatics experience. The user may run this workflow directly on the OSG or adapt it to novel computing environments. PMID:27499617
Turetschek, Reinhard; Lyon, David; Desalegn, Getinet; Kaul, Hans-Peter; Wienkoop, Stefanie
2016-01-01
The proteomic study of non-model organisms, such as many crop plants, is challenging due to the lack of comprehensive genome information. Changing environmental conditions require the study and selection of adapted cultivars. Mutations, inherent to cultivars, hamper protein identification and thus considerably complicate the qualitative and quantitative comparison in large-scale systems biology approaches. With this workflow, cultivar-specific mutations are detected from high-throughput comparative MS analyses, by extracting sequence polymorphisms with de novo sequencing. Stringent criteria are suggested to filter for confidential mutations. Subsequently, these polymorphisms complement the initially used database, which is ready to use with any preferred database search algorithm. In our example, we thereby identified 26 specific mutations in two cultivars of Pisum sativum and achieved an increased number (17 %) of peptide spectrum matches.
Liu, Ju; Li, Ruihua; Liu, Kun; Li, Liangliang; Zai, Xiaodong; Chi, Xiangyang; Fu, Ling; Xu, Junjie; Chen, Wei
2016-04-22
High-throughput sequencing of the antibody repertoire provides a large number of antibody variable region sequences that can be used to generate human monoclonal antibodies. However, current screening methods for identifying antigen-specific antibodies are inefficient. In the present study, we developed an antibody clone screening strategy based on clone dynamics and relative frequency, and used it to identify antigen-specific human monoclonal antibodies. Enzyme-linked immunosorbent assay showed that at least 52% of putative positive immunoglobulin heavy chains composed antigen-specific antibodies. Combining information on dynamics and relative frequency improved identification of positive clones and elimination of negative clones. and increase the credibility of putative positive clones. Therefore the screening strategy could simplify the subsequent experimental screening and may facilitate the generation of antigen-specific antibodies. Copyright © 2016 Elsevier Inc. All rights reserved.
Budavari, Tamas; Langmead, Ben; Wheelan, Sarah J.; Salzberg, Steven L.; Szalay, Alexander S.
2015-01-01
When computing alignments of DNA sequences to a large genome, a key element in achieving high processing throughput is to prioritize locations in the genome where high-scoring mappings might be expected. We formulated this task as a series of list-processing operations that can be efficiently performed on graphics processing unit (GPU) hardware.We followed this approach in implementing a read aligner called Arioc that uses GPU-based parallel sort and reduction techniques to identify high-priority locations where potential alignments may be found. We then carried out a read-by-read comparison of Arioc’s reported alignments with the alignments found by several leading read aligners. With simulated reads, Arioc has comparable or better accuracy than the other read aligners we tested. With human sequencing reads, Arioc demonstrates significantly greater throughput than the other aligners we evaluated across a wide range of sensitivity settings. The Arioc software is available at https://github.com/RWilton/Arioc. It is released under a BSD open-source license. PMID:25780763
Process in manufacturing high efficiency AlGaAs/GaAs solar cells by MO-CVD
NASA Technical Reports Server (NTRS)
Yeh, Y. C. M.; Chang, K. I.; Tandon, J.
1984-01-01
Manufacturing technology for mass producing high efficiency GaAs solar cells is discussed. A progress using a high throughput MO-CVD reactor to produce high efficiency GaAs solar cells is discussed. Thickness and doping concentration uniformity of metal oxide chemical vapor deposition (MO-CVD) GaAs and AlGaAs layer growth are discussed. In addition, new tooling designs are given which increase the throughput of solar cell processing. To date, 2cm x 2cm AlGaAs/GaAs solar cells with efficiency up to 16.5% were produced. In order to meet throughput goals for mass producing GaAs solar cells, a large MO-CVD system (Cambridge Instrument Model MR-200) with a susceptor which was initially capable of processing 20 wafers (up to 75 mm diameter) during a single growth run was installed. In the MR-200, the sequencing of the gases and the heating power are controlled by a microprocessor-based programmable control console. Hence, operator errors can be reduced, leading to a more reproducible production sequence.
Developing High-Throughput HIV Incidence Assay with Pyrosequencing Platform
Park, Sung Yong; Goeken, Nolan; Lee, Hyo Jin; Bolan, Robert; Dubé, Michael P.
2014-01-01
ABSTRACT Human immunodeficiency virus (HIV) incidence is an important measure for monitoring the epidemic and evaluating the efficacy of intervention and prevention trials. This study developed a high-throughput, single-measure incidence assay by implementing a pyrosequencing platform. We devised a signal-masking bioinformatics pipeline, which yielded a process error rate of 5.8 × 10−4 per base. The pipeline was then applied to analyze 18,434 envelope gene segments (HXB2 7212 to 7601) obtained from 12 incident and 24 chronic patients who had documented HIV-negative and/or -positive tests. The pyrosequencing data were cross-checked by using the single-genome-amplification (SGA) method to independently obtain 302 sequences from 13 patients. Using two genomic biomarkers that probe for the presence of similar sequences, the pyrosequencing platform correctly classified all 12 incident subjects (100% sensitivity) and 23 of 24 chronic subjects (96% specificity). One misclassified subject's chronic infection was correctly classified by conducting the same analysis with SGA data. The biomarkers were statistically associated across the two platforms, suggesting the assay's reproducibility and robustness. Sampling simulations showed that the biomarkers were tolerant of sequencing errors and template resampling, two factors most likely to affect the accuracy of pyrosequencing results. We observed comparable biomarker scores between AIDS and non-AIDS chronic patients (multivariate analysis of variance [MANOVA], P = 0.12), indicating that the stage of HIV disease itself does not affect the classification scheme. The high-throughput genomic HIV incidence marks a significant step toward determining incidence from a single measure in cross-sectional surveys. IMPORTANCE Annual HIV incidence, the number of newly infected individuals within a year, is the key measure of monitoring the epidemic's rise and decline. Developing reliable assays differentiating recent from chronic infections has been a long-standing quest in the HIV community. Over the past 15 years, these assays have traditionally measured various HIV-specific antibodies, but recent technological advancements have expanded the diversity of proposed accurate, user-friendly, and financially viable tools. Here we designed a high-throughput genomic HIV incidence assay based on the signature imprinted in the HIV gene sequence population. By combining next-generation sequencing techniques with bioinformatics analysis, we demonstrated that genomic fingerprints are capable of distinguishing recently infected patients from chronically infected patients with high precision. Our high-throughput platform is expected to allow us to process many patients' samples from a single experiment, permitting the assay to be cost-effective for routine surveillance. PMID:24371062
SeqTrim: a high-throughput pipeline for pre-processing any type of sequence read
2010-01-01
Background High-throughput automated sequencing has enabled an exponential growth rate of sequencing data. This requires increasing sequence quality and reliability in order to avoid database contamination with artefactual sequences. The arrival of pyrosequencing enhances this problem and necessitates customisable pre-processing algorithms. Results SeqTrim has been implemented both as a Web and as a standalone command line application. Already-published and newly-designed algorithms have been included to identify sequence inserts, to remove low quality, vector, adaptor, low complexity and contaminant sequences, and to detect chimeric reads. The availability of several input and output formats allows its inclusion in sequence processing workflows. Due to its specific algorithms, SeqTrim outperforms other pre-processors implemented as Web services or standalone applications. It performs equally well with sequences from EST libraries, SSH libraries, genomic DNA libraries and pyrosequencing reads and does not lead to over-trimming. Conclusions SeqTrim is an efficient pipeline designed for pre-processing of any type of sequence read, including next-generation sequencing. It is easily configurable and provides a friendly interface that allows users to know what happened with sequences at every pre-processing stage, and to verify pre-processing of an individual sequence if desired. The recommended pipeline reveals more information about each sequence than previously described pre-processors and can discard more sequencing or experimental artefacts. PMID:20089148
Stephen J. Amish,; Paul A. Hohenlohe,; Sally Painter,; Robb F. Leary,; Muhlfeld, Clint C.; Fred W. Allendorf,; Luikart, Gordon
2012-01-01
Hybridization with introduced rainbow trout threatens most native westslope cutthroat trout populations. Understanding the genetic effects of hybridization and introgression requires a large set of high-throughput, diagnostic genetic markers to inform conservation and management. Recently, we identified several thousand candidate single-nucleotide polymorphism (SNP) markers based on RAD sequencing of 11 westslope cutthroat trout and 13 rainbow trout individuals. Here, we used flanking sequence for 56 of these candidate SNP markers to design high-throughput genotyping assays. We validated the assays on a total of 92 individuals from 22 populations and seven hatchery strains. Forty-six assays (82%) amplified consistently and allowed easy identification of westslope cutthroat and rainbow trout alleles as well as heterozygote controls. The 46 SNPs will provide high power for early detection of population admixture and improved identification of hybrid and nonhybridized individuals. This technique shows promise as a very low-cost, reliable and relatively rapid method for developing and testing SNP markers for nonmodel organisms with limited genomic resources.
Nakazato, Takeru; Bono, Hidemasa
2017-01-01
Abstract It is important for public data repositories to promote the reuse of archived data. In the growing field of omics science, however, the increasing number of submissions of high-throughput sequencing (HTSeq) data to public repositories prevents users from choosing a suitable data set from among the large number of search results. Repository users need to be able to set a threshold to reduce the number of results to obtain a suitable subset of high-quality data for reanalysis. We calculated the quality of sequencing data archived in a public data repository, the Sequence Read Archive (SRA), by using the quality control software FastQC. We obtained quality values for 1 171 313 experiments, which can be used to evaluate the suitability of data for reuse. We also visualized the data distribution in SRA by integrating the quality information and metadata of experiments and samples. We provide quality information of all of the archived sequencing data, which enable users to obtain sufficient quality sequencing data for reanalyses. The calculated quality data are available to the public in various formats. Our data also provide an example of enhancing the reuse of public data by adding metadata to published research data by a third party. PMID:28449062
cDREM: inferring dynamic combinatorial gene regulation.
Wise, Aaron; Bar-Joseph, Ziv
2015-04-01
Genes are often combinatorially regulated by multiple transcription factors (TFs). Such combinatorial regulation plays an important role in development and facilitates the ability of cells to respond to different stresses. While a number of approaches have utilized sequence and ChIP-based datasets to study combinational regulation, these have often ignored the combinational logic and the dynamics associated with such regulation. Here we present cDREM, a new method for reconstructing dynamic models of combinatorial regulation. cDREM integrates time series gene expression data with (static) protein interaction data. The method is based on a hidden Markov model and utilizes the sparse group Lasso to identify small subsets of combinatorially active TFs, their time of activation, and the logical function they implement. We tested cDREM on yeast and human data sets. Using yeast we show that the predicted combinatorial sets agree with other high throughput genomic datasets and improve upon prior methods developed to infer combinatorial regulation. Applying cDREM to study human response to flu, we were able to identify several combinatorial TF sets, some of which were known to regulate immune response while others represent novel combinations of important TFs.
An algorithm for extraction of periodic signals from sparse, irregularly sampled data
NASA Technical Reports Server (NTRS)
Wilcox, J. Z.
1994-01-01
Temporal gaps in discrete sampling sequences produce spurious Fourier components at the intermodulation frequencies of an oscillatory signal and the temporal gaps, thus significantly complicating spectral analysis of such sparsely sampled data. A new fast Fourier transform (FFT)-based algorithm has been developed, suitable for spectral analysis of sparsely sampled data with a relatively small number of oscillatory components buried in background noise. The algorithm's principal idea has its origin in the so-called 'clean' algorithm used to sharpen images of scenes corrupted by atmospheric and sensor aperture effects. It identifies as the signal's 'true' frequency that oscillatory component which, when passed through the same sampling sequence as the original data, produces a Fourier image that is the best match to the original Fourier space. The algorithm has generally met with succession trials with simulated data with a low signal-to-noise ratio, including those of a type similar to hourly residuals for Earth orientation parameters extracted from VLBI data. For eight oscillatory components in the diurnal and semidiurnal bands, all components with an amplitude-noise ratio greater than 0.2 were successfully extracted for all sequences and duty cycles (greater than 0.1) tested; the amplitude-noise ratios of the extracted signals were as low as 0.05 for high duty cycles and long sampling sequences. When, in addition to these high frequencies, strong low-frequency components are present in the data, the low-frequency components are generally eliminated first, by employing a version of the algorithm that searches for non-integer multiples of the discrete FET minimum frequency.
Application of Genomic Technologies to the Breeding of Trees
Badenes, Maria L.; Fernández i Martí, Angel; Ríos, Gabino; Rubio-Cabetas, María J.
2016-01-01
The recent introduction of next generation sequencing (NGS) technologies represents a major revolution in providing new tools for identifying the genes and/or genomic intervals controlling important traits for selection in breeding programs. In perennial fruit trees with long generation times and large sizes of adult plants, the impact of these techniques is even more important. High-throughput DNA sequencing technologies have provided complete annotated sequences in many important tree species. Most of the high-throughput genotyping platforms described are being used for studies of genetic diversity and population structure. Dissection of complex traits became possible through the availability of genome sequences along with phenotypic variation data, which allow to elucidate the causative genetic differences that give rise to observed phenotypic variation. Association mapping facilitates the association between genetic markers and phenotype in unstructured and complex populations, identifying molecular markers for assisted selection and breeding. Also, genomic data provide in silico identification and characterization of genes and gene families related to important traits, enabling new tools for molecular marker assisted selection in tree breeding. Deep sequencing of transcriptomes is also a powerful tool for the analysis of precise expression levels of each gene in a sample. It consists in quantifying short cDNA reads, obtained by NGS technologies, in order to compare the entire transcriptomes between genotypes and environmental conditions. The miRNAs are non-coding short RNAs involved in the regulation of different physiological processes, which can be identified by high-throughput sequencing of RNA libraries obtained by reverse transcription of purified short RNAs, and by in silico comparison with known miRNAs from other species. All together, NGS techniques and their applications have increased the resources for plant breeding in tree species, closing the former gap of genetic tools between trees and annual species. PMID:27895664
Using high throughput sequencing to explore the biodiversity in oral bacterial communities.
Diaz, P I; Dupuy, A K; Abusleme, L; Reese, B; Obergfell, C; Choquette, L; Dongari-Bagtzoglou, A; Peterson, D E; Terzi, E; Strausbaugh, L D
2012-06-01
High throughput sequencing of 16S ribosomal RNA gene amplicons is a cost-effective method for characterization of oral bacterial communities. However, before undertaking large-scale studies, it is necessary to understand the technique-associated limitations and intrinsic variability of the oral ecosystem. In this work we evaluated bias in species representation using an in vitro-assembled mock community of oral bacteria. We then characterized the bacterial communities in saliva and buccal mucosa of five healthy subjects to investigate the power of high throughput sequencing in revealing their diversity and biogeography patterns. Mock community analysis showed primer and DNA isolation biases and an overestimation of diversity that was reduced after eliminating singleton operational taxonomic units (OTUs). Sequencing of salivary and mucosal communities found a total of 455 OTUs (0.3% dissimilarity) with only 78 of these present in all subjects. We demonstrate that this variability was partly the result of incomplete richness coverage even at great sequencing depths, and so comparing communities by their structure was more effective than comparisons based solely on membership. With respect to oral biogeography, we found inter-subject variability in community structure was lower than site differences between salivary and mucosal communities within subjects. These differences were evident at very low sequencing depths and were mostly caused by the abundance of Streptococcus mitis and Gemella haemolysans in mucosa. In summary, we present an experimental and data analysis framework that will facilitate design and interpretation of pyrosequencing-based studies. Despite challenges associated with this technique, we demonstrate its power for evaluation of oral diversity and biogeography patterns. © 2012 John Wiley & Sons A/S.
Application of Genomic Technologies to the Breeding of Trees.
Badenes, Maria L; Fernández I Martí, Angel; Ríos, Gabino; Rubio-Cabetas, María J
2016-01-01
The recent introduction of next generation sequencing (NGS) technologies represents a major revolution in providing new tools for identifying the genes and/or genomic intervals controlling important traits for selection in breeding programs. In perennial fruit trees with long generation times and large sizes of adult plants, the impact of these techniques is even more important. High-throughput DNA sequencing technologies have provided complete annotated sequences in many important tree species. Most of the high-throughput genotyping platforms described are being used for studies of genetic diversity and population structure. Dissection of complex traits became possible through the availability of genome sequences along with phenotypic variation data, which allow to elucidate the causative genetic differences that give rise to observed phenotypic variation. Association mapping facilitates the association between genetic markers and phenotype in unstructured and complex populations, identifying molecular markers for assisted selection and breeding. Also, genomic data provide in silico identification and characterization of genes and gene families related to important traits, enabling new tools for molecular marker assisted selection in tree breeding. Deep sequencing of transcriptomes is also a powerful tool for the analysis of precise expression levels of each gene in a sample. It consists in quantifying short cDNA reads, obtained by NGS technologies, in order to compare the entire transcriptomes between genotypes and environmental conditions. The miRNAs are non-coding short RNAs involved in the regulation of different physiological processes, which can be identified by high-throughput sequencing of RNA libraries obtained by reverse transcription of purified short RNAs, and by in silico comparison with known miRNAs from other species. All together, NGS techniques and their applications have increased the resources for plant breeding in tree species, closing the former gap of genetic tools between trees and annual species.
del Campo, Javier; Mallo, Diego; Massana, Ramon; de Vargas, Colomban; Richards, Thomas A.; Ruiz-Trillo, Iñaki
2015-01-01
Summary The opisthokonts are one of the major super-groups of eukaryotes. It comprises two major clades: 1) the Metazoa and their unicellular relatives and 2) the Fungi and their unicellular relatives. There is, however, little knowledge of the role of opisthokont microbes in many natural environments, especially among non-metazoan and non-fungal opisthokonts. Here we begin to address this gap by analyzing high throughput 18S rDNA and 18S rRNA sequencing data from different European coastal sites, sampled at different size fractions and depths. In particular, we analyze the diversity and abundance of choanoflagellates, filastereans, ichthyosporeans, nucleariids, corallochytreans and their related lineages. Our results show the great diversity of choanoflagellates in coastal waters as well as a relevant role of the ichthyosporeans and the uncultured marine opisthokonts (MAOP). Furthermore, we describe a new lineage of marine fonticulids (MAFO) that appears to be abundant in sediments. Therefore, our work points to a greater potential ecological role for unicellular opisthokonts than previously appreciated in marine environments, both in water column and sediments, and also provides evidence of novel opisthokont phylogenetic lineages. This study highlights the importance of high throughput sequencing approaches to unravel the diversity and distribution of both known and novel eukaryotic lineages. PMID:25556908
Networking Omic Data to Envisage Systems Biological Regulation.
Kalapanulak, Saowalak; Saithong, Treenut; Thammarongtham, Chinae
To understand how biological processes work, it is necessary to explore the systematic regulation governing the behaviour of the processes. Not only driving the normal behavior of organisms, the systematic regulation evidently underlies the temporal responses to surrounding environments (dynamics) and long-term phenotypic adaptation (evolution). The systematic regulation is, in effect, formulated from the regulatory components which collaboratively work together as a network. In the drive to decipher such a code of lives, a spectrum of technologies has continuously been developed in the post-genomic era. With current advances, high-throughput sequencing technologies are tremendously powerful for facilitating genomics and systems biology studies in the attempt to understand system regulation inside the cells. The ability to explore relevant regulatory components which infer transcriptional and signaling regulation, driving core cellular processes, is thus enhanced. This chapter reviews high-throughput sequencing technologies, including second and third generation sequencing technologies, which support the investigation of genomics and transcriptomics data. Utilization of this high-throughput data to form the virtual network of systems regulation is explained, particularly transcriptional regulatory networks. Analysis of the resulting regulatory networks could lead to an understanding of cellular systems regulation at the mechanistic and dynamics levels. The great contribution of the biological networking approach to envisage systems regulation is finally demonstrated by a broad range of examples.
Lossius, Andreas; Johansen, Jorunn N; Vartdal, Frode; Robins, Harlan; Jūratė Šaltytė, Benth; Holmøy, Trygve; Olweus, Johanna
2014-11-01
Epstein-Barr virus (EBV) has long been suggested as a pathogen in multiple sclerosis (MS). Here, we used high-throughput sequencing to determine the diversity, compartmentalization, persistence, and EBV-reactivity of the T-cell receptor (TCR) repertoires in MS. TCR-β genes were sequenced in paired samples of cerebrospinal fluid (CSF) and blood from patients with MS and controls with other inflammatory neurological diseases. The TCR repertoires were highly diverse in both compartments and patient groups. Expanded T-cell clones, represented by TCR-β sequences >0.1%, were of different identity in CSF and blood of MS patients, and persisted for more than a year. Reference TCR-β libraries generated from peripheral blood T cells reactive against autologous EBV-transformed B cells were highly enriched for public EBV-specific sequences and were used to quantify EBV-reactive TCR-β sequences in CSF. TCR-β sequences of EBV-reactive CD8+ T cells, including several public EBV-specific sequences, were intrathecally enriched in MS patients only, whereas those of EBV-reactive CD4+ T cells were also enriched in CSF of controls. These data provide evidence for a clonally diverse, yet compartmentalized and persistent, intrathecal T-cell response in MS. The presented strategy links TCR sequence to intrathecal T-cell specificity, demonstrating enrichment of EBV-reactive CD8+ T cells in MS. © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Non-biological synthetic spike-in controls and the AMPtk software pipeline improve mycobiome data
Jonathan M. Palmer; Michelle A. Jusino; Mark T. Banik; Daniel L. Lindner
2018-01-01
High-throughput amplicon sequencing (HTAS) of conserved DNA regions is a powerful technique to characterize microbial communities. Recently, spike-in mock communities have been used to measure accuracy of sequencing platforms and data analysis pipelines. To assess the ability of sequencing platforms and data processing pipelines using fungal internal transcribed spacer...
USDA-ARS?s Scientific Manuscript database
A small fast neutron mutant population has been established from Phaseolus vulgaris cv. Red Hawk. We leveraged the available P. vulgaris genome sequence and high throughput next generation DNA sequencing to examine the genomic structure of five Phaseolus vulgaris cv. Red Hawk fast neutron mutants wi...
A fungal mock community control for amplicon sequencing experiments
USDA-ARS?s Scientific Manuscript database
The field of microbial ecology has been profoundly advanced by the ability to profile the composition of complex microbial communities by means of high throughput amplicon sequencing of marker genes amplified directly from environmental genomic DNA extracts. However, it has become increasingly clear...
CT Image Sequence Restoration Based on Sparse and Low-Rank Decomposition
Gou, Shuiping; Wang, Yueyue; Wang, Zhilong; Peng, Yong; Zhang, Xiaopeng; Jiao, Licheng; Wu, Jianshe
2013-01-01
Blurry organ boundaries and soft tissue structures present a major challenge in biomedical image restoration. In this paper, we propose a low-rank decomposition-based method for computed tomography (CT) image sequence restoration, where the CT image sequence is decomposed into a sparse component and a low-rank component. A new point spread function of Weiner filter is employed to efficiently remove blur in the sparse component; a wiener filtering with the Gaussian PSF is used to recover the average image of the low-rank component. And then we get the recovered CT image sequence by combining the recovery low-rank image with all recovery sparse image sequence. Our method achieves restoration results with higher contrast, sharper organ boundaries and richer soft tissue structure information, compared with existing CT image restoration methods. The robustness of our method was assessed with numerical experiments using three different low-rank models: Robust Principle Component Analysis (RPCA), Linearized Alternating Direction Method with Adaptive Penalty (LADMAP) and Go Decomposition (GoDec). Experimental results demonstrated that the RPCA model was the most suitable for the small noise CT images whereas the GoDec model was the best for the large noisy CT images. PMID:24023764
HTSeq--a Python framework to work with high-throughput sequencing data.
Anders, Simon; Pyl, Paul Theodor; Huber, Wolfgang
2015-01-15
A large choice of tools exists for many standard tasks in the analysis of high-throughput sequencing (HTS) data. However, once a project deviates from standard workflows, custom scripts are needed. We present HTSeq, a Python library to facilitate the rapid development of such scripts. HTSeq offers parsers for many common data formats in HTS projects, as well as classes to represent data, such as genomic coordinates, sequences, sequencing reads, alignments, gene model information and variant calls, and provides data structures that allow for querying via genomic coordinates. We also present htseq-count, a tool developed with HTSeq that preprocesses RNA-Seq data for differential expression analysis by counting the overlap of reads with genes. HTSeq is released as an open-source software under the GNU General Public Licence and available from http://www-huber.embl.de/HTSeq or from the Python Package Index at https://pypi.python.org/pypi/HTSeq. © The Author 2014. Published by Oxford University Press.
Droplet barcoding for single-cell transcriptomics applied to embryonic stem cells.
Klein, Allon M; Mazutis, Linas; Akartuna, Ilke; Tallapragada, Naren; Veres, Adrian; Li, Victor; Peshkin, Leonid; Weitz, David A; Kirschner, Marc W
2015-05-21
It has long been the dream of biologists to map gene expression at the single-cell level. With such data one might track heterogeneous cell sub-populations, and infer regulatory relationships between genes and pathways. Recently, RNA sequencing has achieved single-cell resolution. What is limiting is an effective way to routinely isolate and process large numbers of individual cells for quantitative in-depth sequencing. We have developed a high-throughput droplet-microfluidic approach for barcoding the RNA from thousands of individual cells for subsequent analysis by next-generation sequencing. The method shows a surprisingly low noise profile and is readily adaptable to other sequencing-based assays. We analyzed mouse embryonic stem cells, revealing in detail the population structure and the heterogeneous onset of differentiation after leukemia inhibitory factor (LIF) withdrawal. The reproducibility of these high-throughput single-cell data allowed us to deconstruct cell populations and infer gene expression relationships. VIDEO ABSTRACT. Copyright © 2015 Elsevier Inc. All rights reserved.
Hayden, Eric J
2016-08-15
RNA molecules provide a realistic but tractable model of a genotype to phenotype relationship. This relationship has been extensively investigated computationally using secondary structure prediction algorithms. Enzymatic RNA molecules, or ribozymes, offer access to genotypic and phenotypic information in the laboratory. Advancements in high-throughput sequencing technologies have enabled the analysis of sequences in the lab that now rivals what can be accomplished computationally. This has motivated a resurgence of in vitro selection experiments and opened new doors for the analysis of the distribution of RNA functions in genotype space. A body of computational experiments has investigated the persistence of specific RNA structures despite changes in the primary sequence, and how this mutational robustness can promote adaptations. This article summarizes recent approaches that were designed to investigate the role of mutational robustness during the evolution of RNA molecules in the laboratory, and presents theoretical motivations, experimental methods and approaches to data analysis. Copyright © 2016 Elsevier Inc. All rights reserved.
Smeele, Zoe E; Ainley, David G; Varsani, Arvind
2018-01-02
The Antarctic, sub-Antarctic islands and surrounding sea-ice provide a unique environment for the existence of organisms. Nonetheless, birds and seals of a variety of species inhabit them, particularly during their breeding seasons. Early research on Antarctic wildlife health, using serology-based assays, showed exposure to viruses in the families Birnaviridae, Flaviviridae, Herpesviridae, Orthomyxoviridae and Paramyxoviridae circulating in seals (Phocidae), penguins (Spheniscidae), petrels (Procellariidae) and skuas (Stercorariidae). It is only during the last decade or so that polymerase chain reaction-based assays have been used to characterize viruses associated with Antarctic animals. Furthermore, it is only during the last five years that full/whole genomes of viruses (adenoviruses, anelloviruses, orthomyxoviruses, a papillomavirus, paramyoviruses, polyomaviruses and a togavirus) have been sequenced using Sanger sequencing or high throughput sequencing (HTS) approaches. This review summaries the knowledge of animal Antarctic virology and discusses potential future directions with the advent of HTS in virus discovery and ecology. Copyright © 2017 Elsevier B.V. All rights reserved.
High-purity circular RNA isolation method (RPAD) reveals vast collection of intronic circRNAs.
Panda, Amaresh C; De, Supriyo; Grammatikakis, Ioannis; Munk, Rachel; Yang, Xiaoling; Piao, Yulan; Dudekula, Dawood B; Abdelmohsen, Kotb; Gorospe, Myriam
2017-07-07
High-throughput RNA sequencing methods coupled with specialized bioinformatic analyses have recently uncovered tens of thousands of unique circular (circ)RNAs, but their complete sequences, genes of origin and functions are largely unknown. Given that circRNAs lack free ends and are thus relatively stable, their association with microRNAs (miRNAs) and RNA-binding proteins (RBPs) can influence gene expression programs. While exoribonuclease treatment is widely used to degrade linear RNAs and enrich circRNAs in RNA samples, it does not efficiently eliminate all linear RNAs. Here, we describe a novel method for the isolation of highly pure circRNA populations involving RNase R treatment followed by Polyadenylation and poly(A)+ RNA Depletion (RPAD), which removes linear RNA to near completion. High-throughput sequencing of RNA prepared using RPAD from human cervical carcinoma HeLa cells and mouse C2C12 myoblasts led to two surprising discoveries: (i) many exonic circRNA (EcircRNA) isoforms share an identical backsplice sequence but have different body sizes and sequences, and (ii) thousands of novel intronic circular RNAs (IcircRNAs) are expressed in cells. In sum, isolating high-purity circRNAs using the RPAD method can enable quantitative and qualitative analyses of circRNA types and sequence composition, paving the way for the elucidation of circRNA functions. Published by Oxford University Press on behalf of Nucleic Acids Research 2017.
High-purity circular RNA isolation method (RPAD) reveals vast collection of intronic circRNAs
De, Supriyo; Grammatikakis, Ioannis; Munk, Rachel; Yang, Xiaoling; Piao, Yulan; Dudekula, Dawood B.; Gorospe, Myriam
2017-01-01
Abstract High-throughput RNA sequencing methods coupled with specialized bioinformatic analyses have recently uncovered tens of thousands of unique circular (circ)RNAs, but their complete sequences, genes of origin and functions are largely unknown. Given that circRNAs lack free ends and are thus relatively stable, their association with microRNAs (miRNAs) and RNA-binding proteins (RBPs) can influence gene expression programs. While exoribonuclease treatment is widely used to degrade linear RNAs and enrich circRNAs in RNA samples, it does not efficiently eliminate all linear RNAs. Here, we describe a novel method for the isolation of highly pure circRNA populations involving RNase R treatment followed by Polyadenylation and poly(A)+ RNA Depletion (RPAD), which removes linear RNA to near completion. High-throughput sequencing of RNA prepared using RPAD from human cervical carcinoma HeLa cells and mouse C2C12 myoblasts led to two surprising discoveries: (i) many exonic circRNA (EcircRNA) isoforms share an identical backsplice sequence but have different body sizes and sequences, and (ii) thousands of novel intronic circular RNAs (IcircRNAs) are expressed in cells. In sum, isolating high-purity circRNAs using the RPAD method can enable quantitative and qualitative analyses of circRNA types and sequence composition, paving the way for the elucidation of circRNA functions. PMID:28444238
The draft genome sequence of cork oak
Ramos, António Marcos; Usié, Ana; Barbosa, Pedro; Barros, Pedro M.; Capote, Tiago; Chaves, Inês; Simões, Fernanda; Abreu, Isabl; Carrasquinho, Isabel; Faro, Carlos; Guimarães, Joana B.; Mendonça, Diogo; Nóbrega, Filomena; Rodrigues, Leandra; Saibo, Nelson J. M.; Varela, Maria Carolina; Egas, Conceição; Matos, José; Miguel, Célia M.; Oliveira, M. Margarida; Ricardo, Cândido P.; Gonçalves, Sónia
2018-01-01
Cork oak (Quercus suber) is native to southwest Europe and northwest Africa where it plays a crucial environmental and economical role. To tackle the cork oak production and industrial challenges, advanced research is imperative but dependent on the availability of a sequenced genome. To address this, we produced the first draft version of the cork oak genome. We followed a de novo assembly strategy based on high-throughput sequence data, which generated a draft genome comprising 23,347 scaffolds and 953.3 Mb in size. A total of 79,752 genes and 83,814 transcripts were predicted, including 33,658 high-confidence genes. An InterPro signature assignment was detected for 69,218 transcripts, which represented 82.6% of the total. Validation studies demonstrated the genome assembly and annotation completeness and highlighted the usefulness of the draft genome for read mapping of high-throughput sequence data generated using different protocols. All data generated is available through the public databases where it was deposited, being therefore ready to use by the academic and industry communities working on cork oak and/or related species. PMID:29786699
The draft genome sequence of cork oak.
Ramos, António Marcos; Usié, Ana; Barbosa, Pedro; Barros, Pedro M; Capote, Tiago; Chaves, Inês; Simões, Fernanda; Abreu, Isabl; Carrasquinho, Isabel; Faro, Carlos; Guimarães, Joana B; Mendonça, Diogo; Nóbrega, Filomena; Rodrigues, Leandra; Saibo, Nelson J M; Varela, Maria Carolina; Egas, Conceição; Matos, José; Miguel, Célia M; Oliveira, M Margarida; Ricardo, Cândido P; Gonçalves, Sónia
2018-05-22
Cork oak (Quercus suber) is native to southwest Europe and northwest Africa where it plays a crucial environmental and economical role. To tackle the cork oak production and industrial challenges, advanced research is imperative but dependent on the availability of a sequenced genome. To address this, we produced the first draft version of the cork oak genome. We followed a de novo assembly strategy based on high-throughput sequence data, which generated a draft genome comprising 23,347 scaffolds and 953.3 Mb in size. A total of 79,752 genes and 83,814 transcripts were predicted, including 33,658 high-confidence genes. An InterPro signature assignment was detected for 69,218 transcripts, which represented 82.6% of the total. Validation studies demonstrated the genome assembly and annotation completeness and highlighted the usefulness of the draft genome for read mapping of high-throughput sequence data generated using different protocols. All data generated is available through the public databases where it was deposited, being therefore ready to use by the academic and industry communities working on cork oak and/or related species.
Using high-throughput barcode sequencing to efficiently map connectomes.
Peikon, Ian D; Kebschull, Justus M; Vagin, Vasily V; Ravens, Diana I; Sun, Yu-Chi; Brouzes, Eric; Corrêa, Ivan R; Bressan, Dario; Zador, Anthony M
2017-07-07
The function of a neural circuit is determined by the details of its synaptic connections. At present, the only available method for determining a neural wiring diagram with single synapse precision-a 'connectome'-is based on imaging methods that are slow, labor-intensive and expensive. Here, we present SYNseq, a method for converting the connectome into a form that can exploit the speed and low cost of modern high-throughput DNA sequencing. In SYNseq, each neuron is labeled with a unique random nucleotide sequence-an RNA 'barcode'-which is targeted to the synapse using engineered proteins. Barcodes in pre- and postsynaptic neurons are then associated through protein-protein crosslinking across the synapse, extracted from the tissue, and joined into a form suitable for sequencing. Although our failure to develop an efficient barcode joining scheme precludes the widespread application of this approach, we expect that with further development SYNseq will enable tracing of complex circuits at high speed and low cost. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
Chen, Dana; Orenstein, Yaron; Golodnitsky, Rada; Pellach, Michal; Avrahami, Dorit; Wachtel, Chaim; Ovadia-Shochat, Avital; Shir-Shapira, Hila; Kedmi, Adi; Juven-Gershon, Tamar; Shamir, Ron; Gerber, Doron
2016-01-01
Transcription factors (TFs) alter gene expression in response to changes in the environment through sequence-specific interactions with the DNA. These interactions are best portrayed as a landscape of TF binding affinities. Current methods to study sequence-specific binding preferences suffer from limited dynamic range, sequence bias, lack of specificity and limited throughput. We have developed a microfluidic-based device for SELEX Affinity Landscape MAPping (SELMAP) of TF binding, which allows high-throughput measurement of 16 proteins in parallel. We used it to measure the relative affinities of Pho4, AtERF2 and Btd full-length proteins to millions of different DNA binding sites, and detected both high and low-affinity interactions in equilibrium conditions, generating a comprehensive landscape of the relative TF affinities to all possible DNA 6-mers, and even DNA10-mers with increased sequencing depth. Low quantities of both the TFs and DNA oligomers were sufficient for obtaining high-quality results, significantly reducing experimental costs. SELMAP allows in-depth screening of hundreds of TFs, and provides a means for better understanding of the regulatory processes that govern gene expression. PMID:27628341
PANGEA: pipeline for analysis of next generation amplicons
Giongo, Adriana; Crabb, David B; Davis-Richardson, Austin G; Chauliac, Diane; Mobberley, Jennifer M; Gano, Kelsey A; Mukherjee, Nabanita; Casella, George; Roesch, Luiz FW; Walts, Brandon; Riva, Alberto; King, Gary; Triplett, Eric W
2010-01-01
High-throughput DNA sequencing can identify organisms and describe population structures in many environmental and clinical samples. Current technologies generate millions of reads in a single run, requiring extensive computational strategies to organize, analyze and interpret those sequences. A series of bioinformatics tools for high-throughput sequencing analysis, including preprocessing, clustering, database matching and classification, have been compiled into a pipeline called PANGEA. The PANGEA pipeline was written in Perl and can be run on Mac OSX, Windows or Linux. With PANGEA, sequences obtained directly from the sequencer can be processed quickly to provide the files needed for sequence identification by BLAST and for comparison of microbial communities. Two different sets of bacterial 16S rRNA sequences were used to show the efficiency of this workflow. The first set of 16S rRNA sequences is derived from various soils from Hawaii Volcanoes National Park. The second set is derived from stool samples collected from diabetes-resistant and diabetes-prone rats. The workflow described here allows the investigator to quickly assess libraries of sequences on personal computers with customized databases. PANGEA is provided for users as individual scripts for each step in the process or as a single script where all processes, except the χ2 step, are joined into one program called the ‘backbone’. PMID:20182525
PANGEA: pipeline for analysis of next generation amplicons.
Giongo, Adriana; Crabb, David B; Davis-Richardson, Austin G; Chauliac, Diane; Mobberley, Jennifer M; Gano, Kelsey A; Mukherjee, Nabanita; Casella, George; Roesch, Luiz F W; Walts, Brandon; Riva, Alberto; King, Gary; Triplett, Eric W
2010-07-01
High-throughput DNA sequencing can identify organisms and describe population structures in many environmental and clinical samples. Current technologies generate millions of reads in a single run, requiring extensive computational strategies to organize, analyze and interpret those sequences. A series of bioinformatics tools for high-throughput sequencing analysis, including pre-processing, clustering, database matching and classification, have been compiled into a pipeline called PANGEA. The PANGEA pipeline was written in Perl and can be run on Mac OSX, Windows or Linux. With PANGEA, sequences obtained directly from the sequencer can be processed quickly to provide the files needed for sequence identification by BLAST and for comparison of microbial communities. Two different sets of bacterial 16S rRNA sequences were used to show the efficiency of this workflow. The first set of 16S rRNA sequences is derived from various soils from Hawaii Volcanoes National Park. The second set is derived from stool samples collected from diabetes-resistant and diabetes-prone rats. The workflow described here allows the investigator to quickly assess libraries of sequences on personal computers with customized databases. PANGEA is provided for users as individual scripts for each step in the process or as a single script where all processes, except the chi(2) step, are joined into one program called the 'backbone'.
High-throughput annotation of full-length long noncoding RNAs with capture long-read sequencing.
Lagarde, Julien; Uszczynska-Ratajczak, Barbara; Carbonell, Silvia; Pérez-Lluch, Sílvia; Abad, Amaya; Davis, Carrie; Gingeras, Thomas R; Frankish, Adam; Harrow, Jennifer; Guigo, Roderic; Johnson, Rory
2017-12-01
Accurate annotation of genes and their transcripts is a foundation of genomics, but currently no annotation technique combines throughput and accuracy. As a result, reference gene collections remain incomplete-many gene models are fragmentary, and thousands more remain uncataloged, particularly for long noncoding RNAs (lncRNAs). To accelerate lncRNA annotation, the GENCODE consortium has developed RNA Capture Long Seq (CLS), which combines targeted RNA capture with third-generation long-read sequencing. Here we present an experimental reannotation of the GENCODE intergenic lncRNA populations in matched human and mouse tissues that resulted in novel transcript models for 3,574 and 561 gene loci, respectively. CLS approximately doubled the annotated complexity of targeted loci, outperforming existing short-read techniques. Full-length transcript models produced by CLS enabled us to definitively characterize the genomic features of lncRNAs, including promoter and gene structure, and protein-coding potential. Thus, CLS removes a long-standing bottleneck in transcriptome annotation and generates manual-quality full-length transcript models at high-throughput scales.
'PACLIMS': a component LIM system for high-throughput functional genomic analysis.
Donofrio, Nicole; Rajagopalon, Ravi; Brown, Douglas; Diener, Stephen; Windham, Donald; Nolin, Shelly; Floyd, Anna; Mitchell, Thomas; Galadima, Natalia; Tucker, Sara; Orbach, Marc J; Patel, Gayatri; Farman, Mark; Pampanwar, Vishal; Soderlund, Cari; Lee, Yong-Hwan; Dean, Ralph A
2005-04-12
Recent advances in sequencing techniques leading to cost reduction have resulted in the generation of a growing number of sequenced eukaryotic genomes. Computational tools greatly assist in defining open reading frames and assigning tentative annotations. However, gene functions cannot be asserted without biological support through, among other things, mutational analysis. In taking a genome-wide approach to functionally annotate an entire organism, in this application the approximately 11,000 predicted genes in the rice blast fungus (Magnaporthe grisea), an effective platform for tracking and storing both the biological materials created and the data produced across several participating institutions was required. The platform designed, named PACLIMS, was built to support our high throughput pipeline for generating 50,000 random insertion mutants of Magnaporthe grisea. To be a useful tool for materials and data tracking and storage, PACLIMS was designed to be simple to use, modifiable to accommodate refinement of research protocols, and cost-efficient. Data entry into PACLIMS was simplified through the use of barcodes and scanners, thus reducing the potential human error, time constraints, and labor. This platform was designed in concert with our experimental protocol so that it leads the researchers through each step of the process from mutant generation through phenotypic assays, thus ensuring that every mutant produced is handled in an identical manner and all necessary data is captured. Many sequenced eukaryotes have reached the point where computational analyses are no longer sufficient and require biological support for their predicted genes. Consequently, there is an increasing need for platforms that support high throughput genome-wide mutational analyses. While PACLIMS was designed specifically for this project, the source and ideas present in its implementation can be used as a model for other high throughput mutational endeavors.
'PACLIMS': A component LIM system for high-throughput functional genomic analysis
Donofrio, Nicole; Rajagopalon, Ravi; Brown, Douglas; Diener, Stephen; Windham, Donald; Nolin, Shelly; Floyd, Anna; Mitchell, Thomas; Galadima, Natalia; Tucker, Sara; Orbach, Marc J; Patel, Gayatri; Farman, Mark; Pampanwar, Vishal; Soderlund, Cari; Lee, Yong-Hwan; Dean, Ralph A
2005-01-01
Background Recent advances in sequencing techniques leading to cost reduction have resulted in the generation of a growing number of sequenced eukaryotic genomes. Computational tools greatly assist in defining open reading frames and assigning tentative annotations. However, gene functions cannot be asserted without biological support through, among other things, mutational analysis. In taking a genome-wide approach to functionally annotate an entire organism, in this application the ~11,000 predicted genes in the rice blast fungus (Magnaporthe grisea), an effective platform for tracking and storing both the biological materials created and the data produced across several participating institutions was required. Results The platform designed, named PACLIMS, was built to support our high throughput pipeline for generating 50,000 random insertion mutants of Magnaporthe grisea. To be a useful tool for materials and data tracking and storage, PACLIMS was designed to be simple to use, modifiable to accommodate refinement of research protocols, and cost-efficient. Data entry into PACLIMS was simplified through the use of barcodes and scanners, thus reducing the potential human error, time constraints, and labor. This platform was designed in concert with our experimental protocol so that it leads the researchers through each step of the process from mutant generation through phenotypic assays, thus ensuring that every mutant produced is handled in an identical manner and all necessary data is captured. Conclusion Many sequenced eukaryotes have reached the point where computational analyses are no longer sufficient and require biological support for their predicted genes. Consequently, there is an increasing need for platforms that support high throughput genome-wide mutational analyses. While PACLIMS was designed specifically for this project, the source and ideas present in its implementation can be used as a model for other high throughput mutational endeavors. PMID:15826298
Whole Wiskott‑Aldrich syndrome protein gene deletion identified by high throughput sequencing.
He, Xiangling; Zou, Runying; Zhang, Bing; You, Yalan; Yang, Yang; Tian, Xin
2017-11-01
Wiskott‑Aldrich syndrome (WAS) is a rare X‑linked recessive immunodeficiency disorder, characterized by thrombocytopenia, small platelets, eczema and recurrent infections associated with increased risk of autoimmunity and malignancy disorders. Mutations in the WAS protein (WASP) gene are responsible for WAS. To date, WASP mutations, including missense/nonsense, splicing, small deletions, small insertions, gross deletions, and gross insertions have been identified in patients with WAS. In addition, WASP‑interacting proteins are suspected in patients with clinical features of WAS, in whom the WASP gene sequence and mRNA levels are normal. The present study aimed to investigate the application of next generation sequencing in definitive diagnosis and clinical therapy for WAS. A 5 month‑old child with WAS who displayed symptoms of thrombocytopenia was examined. Whole exome sequence analysis of genomic DNA showed that the coverage and depth of WASP were extremely low. Quantitative polymerase chain reaction indicated total WASP gene deletion in the proband. In conclusion, high throughput sequencing is useful for the verification of WAS on the genetic profile, and has implications for family planning guidance and establishment of clinical programs.
The use of museum specimens with high-throughput DNA sequencers
Burrell, Andrew S.; Disotell, Todd R.; Bergey, Christina M.
2015-01-01
Natural history collections have long been used by morphologists, anatomists, and taxonomists to probe the evolutionary process and describe biological diversity. These biological archives also offer great opportunities for genetic research in taxonomy, conservation, systematics, and population biology. They allow assays of past populations, including those of extinct species, giving context to present patterns of genetic variation and direct measures of evolutionary processes. Despite this potential, museum specimens are difficult to work with because natural postmortem processes and preservation methods fragment and damage DNA. These problems have restricted geneticists’ ability to use natural history collections primarily by limiting how much of the genome can be surveyed. Recent advances in DNA sequencing technology, however, have radically changed this, making truly genomic studies from museum specimens possible. We review the opportunities and drawbacks of the use of museum specimens, and suggest how to best execute projects when incorporating such samples. Several high-throughput (HT) sequencing methodologies, including whole genome shotgun sequencing, sequence capture, and restriction digests (demonstrated here), can be used with archived biomaterials. PMID:25532801
Dentinger, Bryn T M; Margaritescu, Simona; Moncalvo, Jean-Marc
2010-07-01
We present two methods for DNA extraction from fresh and dried mushrooms that are adaptable to high-throughput sequencing initiatives, such as DNA barcoding. Our results show that these protocols yield ∼85% sequencing success from recently collected materials. Tests with both recent (<2 year) and older (>100 years) specimens reveal that older collections have low success rates and may be an inefficient resource for populating a barcode database. However, our method of extracting DNA from herbarium samples using small amount of tissue is reliable and could be used for important historical specimens. The application of these protocols greatly reduces time, and therefore cost, of generating DNA sequences from mushrooms and other fungi vs. traditional extraction methods. The efficiency of these methods illustrates that standardization and streamlining of sample processing should be shifted from the laboratory to the field. © 2009 Blackwell Publishing Ltd.
High-Throughput Single-Cell RNA Sequencing and Data Analysis.
Sagar; Herman, Josip Stefan; Pospisilik, John Andrew; Grün, Dominic
2018-01-01
Understanding biological systems at a single cell resolution may reveal several novel insights which remain masked by the conventional population-based techniques providing an average readout of the behavior of cells. Single-cell transcriptome sequencing holds the potential to identify novel cell types and characterize the cellular composition of any organ or tissue in health and disease. Here, we describe a customized high-throughput protocol for single-cell RNA-sequencing (scRNA-seq) combining flow cytometry and a nanoliter-scale robotic system. Since scRNA-seq requires amplification of a low amount of endogenous cellular RNA, leading to substantial technical noise in the dataset, downstream data filtering and analysis require special care. Therefore, we also briefly describe in-house state-of-the-art data analysis algorithms developed to identify cellular subpopulations including rare cell types as well as to derive lineage trees by ordering the identified subpopulations of cells along the inferred differentiation trajectories.
Epigenetics and Epigenomics of Plants.
Yadav, Chandra Bhan; Pandey, Garima; Muthamilarasan, Mehanathan; Prasad, Manoj
2018-01-23
The genetic material DNA in association with histone proteins forms the complex structure called chromatin, which is prone to undergo modification through certain epigenetic mechanisms including cytosine DNA methylation, histone modifications, and small RNA-mediated methylation. Alterations in chromatin structure lead to inaccessibility of genomic DNA to various regulatory proteins such as transcription factors, which eventually modulates gene expression. Advancements in high-throughput sequencing technologies have provided the opportunity to study the epigenetic mechanisms at genome-wide levels. Epigenomic studies using high-throughput technologies will widen the understanding of mechanisms as well as functions of regulatory pathways in plant genomes, which will further help in manipulating these pathways using genetic and biochemical approaches. This technology could be a potential research tool for displaying the systematic associations of genetic and epigenetic variations, especially in terms of cytosine methylation onto the genomic region in a specific cell or tissue. A comprehensive study of plant populations to correlate genotype to epigenotype and to phenotype, and also the study of methyl quantitative trait loci (QTL) or epiGWAS, is possible by using high-throughput sequencing methods, which will further accelerate molecular breeding programs for crop improvement. Graphical Abstract.
Development and use of molecular markers: past and present.
Grover, Atul; Sharma, P C
2016-01-01
Molecular markers, due to their stability, cost-effectiveness and ease of use provide an immensely popular tool for a variety of applications including genome mapping, gene tagging, genetic diversity diversity, phylogenetic analysis and forensic investigations. In the last three decades, a number of molecular marker techniques have been developed and exploited worldwide in different systems. However, only a handful of these techniques, namely RFLPs, RAPDs, AFLPs, ISSRs, SSRs and SNPs have received global acceptance. A recent revolution in DNA sequencing techniques has taken the discovery and application of molecular markers to high-throughput and ultrahigh-throughput levels. Although, the choice of marker will obviously depend on the targeted use, microsatellites, SNPs and genotyping by sequencing (GBS) largely fulfill most of the user requirements. Further, modern transcriptomic and functional markers will lead the ventures onto high-density genetic map construction, identification of QTLs, breeding and conservation strategies in times to come in combination with other high throughput techniques. This review presents an overview of different marker technologies and their variants with a comparative account of their characteristic features and applications.
Liu, Shanlin; Yang, Chentao; Zhou, Chengran; Zhou, Xin
2017-12-01
Over the past decade, biodiversity researchers have dedicated tremendous efforts to constructing DNA reference barcodes for rapid species registration and identification. Although analytical cost for standard DNA barcoding has been significantly reduced since early 2000, further dramatic reduction in barcoding costs is unlikely because Sanger sequencing is approaching its limits in throughput and chemistry cost. Constraints in barcoding cost not only led to unbalanced barcoding efforts around the globe, but also prevented high-throughput sequencing (HTS)-based taxonomic identification from applying binomial species names, which provide crucial linkages to biological knowledge. We developed an Illumina-based pipeline, HIFI-Barcode, to produce full-length Cytochrome c oxidase subunit I (COI) barcodes from pooled polymerase chain reaction amplicons generated by individual specimens. The new pipeline generated accurate barcode sequences that were comparable to Sanger standards, even for different haplotypes of the same species that were only a few nucleotides different from each other. Additionally, the new pipeline was much more sensitive in recovering amplicons at low quantity. The HIFI-Barcode pipeline successfully recovered barcodes from more than 78% of the polymerase chain reactions that didn't show clear bands on the electrophoresis gel. Moreover, sequencing results based on the single molecular sequencing platform Pacbio confirmed the accuracy of the HIFI-Barcode results. Altogether, the new pipeline can provide an improved solution to produce full-length reference barcodes at about one-tenth of the current cost, enabling construction of comprehensive barcode libraries for local fauna, leading to a feasible direction for DNA barcoding global biomes. © The Authors 2017. Published by Oxford University Press.
A critical analysis of computational protein design with sparse residue interaction graphs
Georgiev, Ivelin S.
2017-01-01
Protein design algorithms enumerate a combinatorial number of candidate structures to compute the Global Minimum Energy Conformation (GMEC). To efficiently find the GMEC, protein design algorithms must methodically reduce the conformational search space. By applying distance and energy cutoffs, the protein system to be designed can thus be represented using a sparse residue interaction graph, where the number of interacting residue pairs is less than all pairs of mutable residues, and the corresponding GMEC is called the sparse GMEC. However, ignoring some pairwise residue interactions can lead to a change in the energy, conformation, or sequence of the sparse GMEC vs. the original or the full GMEC. Despite the widespread use of sparse residue interaction graphs in protein design, the above mentioned effects of their use have not been previously analyzed. To analyze the costs and benefits of designing with sparse residue interaction graphs, we computed the GMECs for 136 different protein design problems both with and without distance and energy cutoffs, and compared their energies, conformations, and sequences. Our analysis shows that the differences between the GMECs depend critically on whether or not the design includes core, boundary, or surface residues. Moreover, neglecting long-range interactions can alter local interactions and introduce large sequence differences, both of which can result in significant structural and functional changes. Designs on proteins with experimentally measured thermostability show it is beneficial to compute both the full and the sparse GMEC accurately and efficiently. To this end, we show that a provable, ensemble-based algorithm can efficiently compute both GMECs by enumerating a small number of conformations, usually fewer than 1000. This provides a novel way to combine sparse residue interaction graphs with provable, ensemble-based algorithms to reap the benefits of sparse residue interaction graphs while avoiding their potential inaccuracies. PMID:28358804
Transcription profile of boar spermatozoa as revealed by RNA-sequencing
USDA-ARS?s Scientific Manuscript database
High-throughput RNA sequencing (RNA-Seq) overcomes the limitations of the current hybridization-based techniques to detect the actual pool of RNA transcripts in spermatozoa. The application of this technology in livestock can speed the discovery of potential predictors of male fertility. As a first ...
Advances in high-throughput next-generation sequencing (NGS) technology for direct sequencing of environmental DNA (i.e. shotgun metagenomics) is transforming the field of microbiology. NGS technologies are now regularly being applied in comparative metagenomic studies, which pr...
2013-01-01
Following recent trends in environmental microbiology, food microbiology has benefited from the advances in molecular biology and adopted novel strategies to detect, identify, and monitor microbes in food. An in-depth study of the microbial diversity in food can now be achieved by using high-throughput sequencing (HTS) approaches after direct nucleic acid extraction from the sample to be studied. In this review, the workflow of applying culture-independent HTS to food matrices is described. The current scenario and future perspectives of HTS uses to study food microbiota are presented, and the decision-making process leading to the best choice of working conditions to fulfill the specific needs of food research is described. PMID:23475615
Using high-throughput barcode sequencing to efficiently map connectomes
Peikon, Ian D.; Kebschull, Justus M.; Vagin, Vasily V.; Ravens, Diana I.; Sun, Yu-Chi; Brouzes, Eric; Corrêa, Ivan R.; Bressan, Dario
2017-01-01
Abstract The function of a neural circuit is determined by the details of its synaptic connections. At present, the only available method for determining a neural wiring diagram with single synapse precision—a ‘connectome’—is based on imaging methods that are slow, labor-intensive and expensive. Here, we present SYNseq, a method for converting the connectome into a form that can exploit the speed and low cost of modern high-throughput DNA sequencing. In SYNseq, each neuron is labeled with a unique random nucleotide sequence—an RNA ‘barcode’—which is targeted to the synapse using engineered proteins. Barcodes in pre- and postsynaptic neurons are then associated through protein-protein crosslinking across the synapse, extracted from the tissue, and joined into a form suitable for sequencing. Although our failure to develop an efficient barcode joining scheme precludes the widespread application of this approach, we expect that with further development SYNseq will enable tracing of complex circuits at high speed and low cost. PMID:28449067
RIPiT-Seq: A high-throughput approach for footprinting RNA:protein complexes
Singh, Guramrit; Ricci, Emiliano P.; Moore, Melissa J.
2013-01-01
Development of high-throughput approaches to map the RNA interaction sites of individual RNA binding proteins (RBPs) transcriptome-wide is rapidly transforming our understanding of post-transcriptional gene regulatory mechanisms. Here we describe a ribonucleoprotein (RNP) footprinting approach we recently developed for identifying occupancy sites of both individual RBPs and multi-subunit RNP complexes. RNA:protein immunoprecipitation in tandem (RIPiT) yields highly specific RNA footprints of cellular RNPs isolated via two sequential purifications; the resulting RNA footprints can then be identified by high-throughput sequencing (Seq). RIPiT-Seq is broadly applicable to all RBPs regardless of their RNA binding mode and thus provides a means to map the RNA binding sites of RBPs with poor inherent ultraviolet (UV) crosslinkability. Further, among current high-throughput approaches, RIPiT has the unique capacity to differentiate binding sites of RNPs with overlapping protein composition. It is therefore particularly suited for studying dynamic RNP assemblages whose composition evolves as gene expression proceeds. PMID:24096052
Denoising DNA deep sequencing data—high-throughput sequencing errors and their correction
Laehnemann, David; Borkhardt, Arndt
2016-01-01
Characterizing the errors generated by common high-throughput sequencing platforms and telling true genetic variation from technical artefacts are two interdependent steps, essential to many analyses such as single nucleotide variant calling, haplotype inference, sequence assembly and evolutionary studies. Both random and systematic errors can show a specific occurrence profile for each of the six prominent sequencing platforms surveyed here: 454 pyrosequencing, Complete Genomics DNA nanoball sequencing, Illumina sequencing by synthesis, Ion Torrent semiconductor sequencing, Pacific Biosciences single-molecule real-time sequencing and Oxford Nanopore sequencing. There is a large variety of programs available for error removal in sequencing read data, which differ in the error models and statistical techniques they use, the features of the data they analyse, the parameters they determine from them and the data structures and algorithms they use. We highlight the assumptions they make and for which data types these hold, providing guidance which tools to consider for benchmarking with regard to the data properties. While no benchmarking results are included here, such specific benchmarks would greatly inform tool choices and future software development. The development of stand-alone error correctors, as well as single nucleotide variant and haplotype callers, could also benefit from using more of the knowledge about error profiles and from (re)combining ideas from the existing approaches presented here. PMID:26026159
Ancient pathogen DNA in archaeological samples detected with a Microbial Detection Array.
Devault, Alison M; McLoughlin, Kevin; Jaing, Crystal; Gardner, Shea; Porter, Teresita M; Enk, Jacob M; Thissen, James; Allen, Jonathan; Borucki, Monica; DeWitte, Sharon N; Dhody, Anna N; Poinar, Hendrik N
2014-03-06
Ancient human remains of paleopathological interest typically contain highly degraded DNA in which pathogenic taxa are often minority components, making sequence-based metagenomic characterization costly. Microarrays may hold a potential solution to these challenges, offering a rapid, affordable, and highly informative snapshot of microbial diversity in complex samples without the lengthy analysis and/or high cost associated with high-throughput sequencing. Their versatility is well established for modern clinical specimens, but they have yet to be applied to ancient remains. Here we report bacterial profiles of archaeological and historical human remains using the Lawrence Livermore Microbial Detection Array (LLMDA). The array successfully identified previously-verified bacterial human pathogens, including Vibrio cholerae (cholera) in a 19th century intestinal specimen and Yersinia pestis ("Black Death" plague) in a medieval tooth, which represented only minute fractions (0.03% and 0.08% alignable high-throughput shotgun sequencing reads) of their respective DNA content. This demonstrates that the LLMDA can identify primary and/or co-infecting bacterial pathogens in ancient samples, thereby serving as a rapid and inexpensive paleopathological screening tool to study health across both space and time.
BarraCUDA - a fast short read sequence aligner using graphics processing units
2012-01-01
Background With the maturation of next-generation DNA sequencing (NGS) technologies, the throughput of DNA sequencing reads has soared to over 600 gigabases from a single instrument run. General purpose computing on graphics processing units (GPGPU), extracts the computing power from hundreds of parallel stream processors within graphics processing cores and provides a cost-effective and energy efficient alternative to traditional high-performance computing (HPC) clusters. In this article, we describe the implementation of BarraCUDA, a GPGPU sequence alignment software that is based on BWA, to accelerate the alignment of sequencing reads generated by these instruments to a reference DNA sequence. Findings Using the NVIDIA Compute Unified Device Architecture (CUDA) software development environment, we ported the most computational-intensive alignment component of BWA to GPU to take advantage of the massive parallelism. As a result, BarraCUDA offers a magnitude of performance boost in alignment throughput when compared to a CPU core while delivering the same level of alignment fidelity. The software is also capable of supporting multiple CUDA devices in parallel to further accelerate the alignment throughput. Conclusions BarraCUDA is designed to take advantage of the parallelism of GPU to accelerate the alignment of millions of sequencing reads generated by NGS instruments. By doing this, we could, at least in part streamline the current bioinformatics pipeline such that the wider scientific community could benefit from the sequencing technology. BarraCUDA is currently available from http://seqbarracuda.sf.net PMID:22244497
USDA-ARS?s Scientific Manuscript database
The soybean Consensus Map 4.0 facilitated the anchoring of 95.6% of the soybean whole genome sequence developed by the Joint Genome Institute, Department of Energy but only properly oriented 66% of the sequence scaffolds. To find additional single nucleotide polymorphism (SNP) markers for additiona...
USDA-ARS?s Scientific Manuscript database
Ongoing developments and cost decreases in next-generation sequencing (NGS) technologies have led to an increase in their application, which has greatly enhanced the fields of genetics and genomics. Mapping sequence reads onto a reference genome is a fundamental step in the analysis of NGS data. Eff...
Anslan, Sten; Bahram, Mohammad; Hiiesalu, Indrek; Tedersoo, Leho
2017-11-01
High-throughput sequencing methods have become a routine analysis tool in environmental sciences as well as in public and private sector. These methods provide vast amount of data, which need to be analysed in several steps. Although the bioinformatics may be applied using several public tools, many analytical pipelines allow too few options for the optimal analysis for more complicated or customized designs. Here, we introduce PipeCraft, a flexible and handy bioinformatics pipeline with a user-friendly graphical interface that links several public tools for analysing amplicon sequencing data. Users are able to customize the pipeline by selecting the most suitable tools and options to process raw sequences from Illumina, Pacific Biosciences, Ion Torrent and Roche 454 sequencing platforms. We described the design and options of PipeCraft and evaluated its performance by analysing the data sets from three different sequencing platforms. We demonstrated that PipeCraft is able to process large data sets within 24 hr. The graphical user interface and the automated links between various bioinformatics tools enable easy customization of the workflow. All analytical steps and options are recorded in log files and are easily traceable. © 2017 John Wiley & Sons Ltd.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Davenport, Karen
Karen Davenport of Los Alamos National Laboratory discusses a high-throughput next generation genome finishing pipeline on June 3, 2010 at the "Sequencing, Finishing, Analysis in the Future" meeting in Santa Fe, NM.
Single Nucleobase Identification Using Biophysical Signatures from Nanoelectronic Quantum Tunneling.
Korshoj, Lee E; Afsari, Sepideh; Khan, Sajida; Chatterjee, Anushree; Nagpal, Prashant
2017-03-01
Nanoelectronic DNA sequencing can provide an important alternative to sequencing-by-synthesis by reducing sample preparation time, cost, and complexity as a high-throughput next-generation technique with accurate single-molecule identification. However, sample noise and signature overlap continue to prevent high-resolution and accurate sequencing results. Probing the molecular orbitals of chemically distinct DNA nucleobases offers a path for facile sequence identification, but molecular entropy (from nucleotide conformations) makes such identification difficult when relying only on the energies of lowest-unoccupied and highest-occupied molecular orbitals (LUMO and HOMO). Here, nine biophysical parameters are developed to better characterize molecular orbitals of individual nucleobases, intended for single-molecule DNA sequencing using quantum tunneling of charges. For this analysis, theoretical models for quantum tunneling are combined with transition voltage spectroscopy to obtain measurable parameters unique to the molecule within an electronic junction. Scanning tunneling spectroscopy is then used to measure these nine biophysical parameters for DNA nucleotides, and a modified machine learning algorithm identified nucleobases. The new parameters significantly improve base calling over merely using LUMO and HOMO frontier orbital energies. Furthermore, high accuracies for identifying DNA nucleobases were observed at different pH conditions. These results have significant implications for developing a robust and accurate high-throughput nanoelectronic DNA sequencing technique. © 2017 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Lights, camera, action: high-throughput plant phenotyping is ready for a close-up
USDA-ARS?s Scientific Manuscript database
Modern techniques for crop improvement rely on both DNA sequencing and accurate quantification of plant traits to identify genes and germplasm of interest. With rapid advances in DNA sequencing technologies, plant phenotyping is now a bottleneck in advancing crop yields [1,2]. Furthermore, the envir...
Alfalfa virus S, a new species in the family Alphaflexiviridae
USDA-ARS?s Scientific Manuscript database
A new species of the family Alphaflexiviridae provisionally named alfalfa virus S (AVS) was discovered in alfalfa samples originating from Sudan. A complete nucleotide sequence of the viral genome consisting of 8,349 nucleotides excluding the 3’ poly(A) tail was determined by high throughput sequenc...
High-throughput illumina strand-specific RNA sequencing library preparation
USDA-ARS?s Scientific Manuscript database
Conventional Illumina RNA-Seq does not have the resolution to decode the complex eukaryote transcriptome due to the lack of RNA polarity information. Strand-specific RNA sequencing (ssRNA-Seq) can overcome these limitations and as such is better suited for genome annotation, de novo transcriptome as...
USDA-ARS?s Scientific Manuscript database
Genotyping-by-Sequencing (GBS) is a low-cost, high-throughput, method for genome-wide polymorphism discovery and genotyping adjacent to restriction sites. Since 2010, GBS has been applied for the genotyping of over 12,000 grape breeding lines, with a primary focus on identifying markers predictive ...
USDA-ARS?s Scientific Manuscript database
Next generation fungal amplicon sequencing is being used with increasing frequency to study fungal diversity in various ecosystems; however, the influence of sample preparation on the characterization of fungal community is poorly understood. We investigated the effects of four procedural modificati...
USDA-ARS?s Scientific Manuscript database
We describe a suite of software tools for identifying possible functional changes in gene structure that may result from sequence variants. ACE (“Assessing Changes to Exons”) converts phased genotype calls to a collection of explicit haplotype sequences, maps transcript annotations onto them, detect...
SNP-based genotyping in lentil: linking sequence information with phenotypes
USDA-ARS?s Scientific Manuscript database
Lentil (Lens culinaris) has been late to enter the world of high throughput molecular analysis due to a general lack of genomic resources. Using a 454 sequencing-based approach, SNPs have been identified in genes across the lentil genome. Several hundred have been turned into single SNP KASP assay...
USDA-ARS?s Scientific Manuscript database
Next-generation sequencing (NGS) technologies are revolutionizing both medical and biological research through generation of massive SNP data sets for identifying heritable genome variation underlying key traits, from rare human diseases to important agronomic phenotypes in crop species. We evaluate...
Ultra high-throughput nucleic acid sequencing as a tool for virus discovery in the turkey gut.
USDA-ARS?s Scientific Manuscript database
Recently, the use of the next generation of nucleic acid sequencing technology (i.e., 454 pyrosequencing, as developed by Roche/454 Life Sciences) has allowed an in-depth look at the uncultivated microorganisms present in complex environmental samples, including samples with agricultural importance....
Maize HapMap2 identifies extant variation from a genome in flux
USDA-ARS?s Scientific Manuscript database
The maize genome is the largest, most diverse and complex plant genome sequenced to date. Using high-throughput sequencing to access genetic variation and a population genetics model to score the polymorphisms, we characterize and unite the diversity of the world’s key breeding germplasm, wild rela...
Ning, Yi; Li, Yan-Ling; Zhou, Guo-Ying; Yang, Lu-Cun; Xu, Wen-Hua
2016-04-01
High throughput sequencing technology is also called Next Generation Sequencing (NGS), which can sequence hundreds and thousands sequences in different samples at the same time. In the present study, the culture-independent high throughput sequencing technology was applied to sequence the fungi metagenomic DNA of the fungal internal transcribed spacer 1(ITS 1) in the root of Sinopodophyllum hexandrum. Sequencing data suggested that after the quality control, 22 565 reads were remained. Cluster similarity analysis was done based on 97% sequence similarity, which obtained 517 OTUs for the three samples (LD1, LD2 and LD3). All the fungi which identified from all the reads of OTUs based on 0.8 classification thresholds using the software of RDP classifier were classified as 13 classes, 35 orders, 44 family, 55 genera. Among these genera, the genus of Tetracladium was the dominant genera in all samples(35.49%, 68.55% and 12.96%).The Shannon's diversity indices and the Simpson indices of the endophytic fungi in the samples ranged from 1.75-2.92, 0.11-0.32, respectively.This is the first time for applying high through put sequencing technol-ogyto analyze the community composition and diversity of endophytic fungi in the medicinal plant, and the results showed that there were hyper diver sity and high community composition complexity of endophytic fungi in the root of S. hexandrum. It is also proved that the high through put sequencing technology has great advantage for analyzing ecommunity composition and diversity of endophtye in the plant. Copyright© by the Chinese Pharmaceutical Association.
The vast datasets generated by next generation gene sequencing and expression profiling have transformed biological and translational research. However, technologies to produce large-scale functional genomics datasets, such as high-throughput detection of protein-protein interactions (PPIs), are still in early development. While a number of powerful technologies have been employed to detect PPIs, a singular PPI biosensor platform featured with both high sensitivity and robustness in a mammalian cell environment remains to be established.
Coding Complete Genome for the Mogiana Tick Virus, a Jingmenvirus Isolated from Ticks in Brazil
2017-05-04
sequences for all four genome segments. We downloaded the raw Illumina sequence reads from the NCBI Short Read Archive (GenBank...MGTV genome segments through sequence similarity (BLASTN) to the published genome of Jingmen tick virus (JMTV) isolate SY84 (GenBank: KJ001579-KJ001582...2014. Standards for sequencing viral genomes in the era of high-throughput sequencing . MBio 5:e01360–14. 8. Bankevich A, Nurk S, Antipov
Transcriptome-based differentiation of closely-related Miscanthus lines.
Chouvarine, Philippe; Cooksey, Amanda M; McCarthy, Fiona M; Ray, David A; Baldwin, Brian S; Burgess, Shane C; Peterson, Daniel G
2012-01-01
Distinguishing between individuals is critical to those conducting animal/plant breeding, food safety/quality research, diagnostic and clinical testing, and evolutionary biology studies. Classical genetic identification studies are based on marker polymorphisms, but polymorphism-based techniques are time and labor intensive and often cannot distinguish between closely related individuals. Illumina sequencing technologies provide the detailed sequence data required for rapid and efficient differentiation of related species, lines/cultivars, and individuals in a cost-effective manner. Here we describe the use of Illumina high-throughput exome sequencing, coupled with SNP mapping, as a rapid means of distinguishing between related cultivars of the lignocellulosic bioenergy crop giant miscanthus (Miscanthus × giganteus). We provide the first exome sequence database for Miscanthus species complete with Gene Ontology (GO) functional annotations. A SNP comparative analysis of rhizome-derived cDNA sequences was successfully utilized to distinguish three Miscanthus × giganteus cultivars from each other and from other Miscanthus species. Moreover, the resulting phylogenetic tree generated from SNP frequency data parallels the known breeding history of the plants examined. Some of the giant miscanthus plants exhibit considerable sequence divergence. Here we describe an analysis of Miscanthus in which high-throughput exome sequencing was utilized to differentiate between closely related genotypes despite the current lack of a reference genome sequence. We functionally annotated the exome sequences and provide resources to support Miscanthus systems biology. In addition, we demonstrate the use of the commercial high-performance cloud computing to do computational GO annotation.
Bilton, Timothy P.; Schofield, Matthew R.; Black, Michael A.; Chagné, David; Wilcox, Phillip L.; Dodds, Ken G.
2018-01-01
Next-generation sequencing is an efficient method that allows for substantially more markers than previous technologies, providing opportunities for building high-density genetic linkage maps, which facilitate the development of nonmodel species’ genomic assemblies and the investigation of their genes. However, constructing genetic maps using data generated via high-throughput sequencing technology (e.g., genotyping-by-sequencing) is complicated by the presence of sequencing errors and genotyping errors resulting from missing parental alleles due to low sequencing depth. If unaccounted for, these errors lead to inflated genetic maps. In addition, map construction in many species is performed using full-sibling family populations derived from the outcrossing of two individuals, where unknown parental phase and varying segregation types further complicate construction. We present a new methodology for modeling low coverage sequencing data in the construction of genetic linkage maps using full-sibling populations of diploid species, implemented in a package called GUSMap. Our model is based on the Lander–Green hidden Markov model but extended to account for errors present in sequencing data. We were able to obtain accurate estimates of the recombination fractions and overall map distance using GUSMap, while most existing mapping packages produced inflated genetic maps in the presence of errors. Our results demonstrate the feasibility of using low coverage sequencing data to produce genetic maps without requiring extensive filtering of potentially erroneous genotypes, provided that the associated errors are correctly accounted for in the model. PMID:29487138
Bilton, Timothy P; Schofield, Matthew R; Black, Michael A; Chagné, David; Wilcox, Phillip L; Dodds, Ken G
2018-05-01
Next-generation sequencing is an efficient method that allows for substantially more markers than previous technologies, providing opportunities for building high-density genetic linkage maps, which facilitate the development of nonmodel species' genomic assemblies and the investigation of their genes. However, constructing genetic maps using data generated via high-throughput sequencing technology ( e.g. , genotyping-by-sequencing) is complicated by the presence of sequencing errors and genotyping errors resulting from missing parental alleles due to low sequencing depth. If unaccounted for, these errors lead to inflated genetic maps. In addition, map construction in many species is performed using full-sibling family populations derived from the outcrossing of two individuals, where unknown parental phase and varying segregation types further complicate construction. We present a new methodology for modeling low coverage sequencing data in the construction of genetic linkage maps using full-sibling populations of diploid species, implemented in a package called GUSMap. Our model is based on the Lander-Green hidden Markov model but extended to account for errors present in sequencing data. We were able to obtain accurate estimates of the recombination fractions and overall map distance using GUSMap, while most existing mapping packages produced inflated genetic maps in the presence of errors. Our results demonstrate the feasibility of using low coverage sequencing data to produce genetic maps without requiring extensive filtering of potentially erroneous genotypes, provided that the associated errors are correctly accounted for in the model. Copyright © 2018 Bilton et al.
Ohta, Tazro; Nakazato, Takeru; Bono, Hidemasa
2017-06-01
It is important for public data repositories to promote the reuse of archived data. In the growing field of omics science, however, the increasing number of submissions of high-throughput sequencing (HTSeq) data to public repositories prevents users from choosing a suitable data set from among the large number of search results. Repository users need to be able to set a threshold to reduce the number of results to obtain a suitable subset of high-quality data for reanalysis. We calculated the quality of sequencing data archived in a public data repository, the Sequence Read Archive (SRA), by using the quality control software FastQC. We obtained quality values for 1 171 313 experiments, which can be used to evaluate the suitability of data for reuse. We also visualized the data distribution in SRA by integrating the quality information and metadata of experiments and samples. We provide quality information of all of the archived sequencing data, which enable users to obtain sufficient quality sequencing data for reanalyses. The calculated quality data are available to the public in various formats. Our data also provide an example of enhancing the reuse of public data by adding metadata to published research data by a third party. © The Authors 2017. Published by Oxford University Press.
Yusuf, Noor Hydayaty Md; Ong, Wen Dee; Redwan, Raimi Mohamed; Latip, Mariam Abd; Kumar, S Vijay
2015-10-15
MicroRNAs (miRNAs) are a class of small, endogenous non-coding RNAs that negatively regulate gene expression, resulting in the silencing of target mRNA transcripts through mRNA cleavage or translational inhibition. MiRNAs play significant roles in various biological and physiological processes in plants. However, the miRNA-mediated gene regulatory network in pineapple, the model tropical non-climacteric fruit, remains largely unexplored. Here, we report a complete list of pineapple mature miRNAs obtained from high-throughput small RNA sequencing and precursor miRNAs (pre-miRNAs) obtained from ESTs. Two small RNA libraries were constructed from pineapple fruits and leaves, respectively, using Illumina's Solexa technology. Sequence similarity analysis using miRBase revealed 579,179 reads homologous to 153 miRNAs from 41 miRNA families. In addition, a pineapple fruit transcriptome library consisting of approximately 30,000 EST contigs constructed using Solexa sequencing was used for the discovery of pre-miRNAs. In all, four pre-miRNAs were identified (MIR156, MIR399, MIR444 and MIR2673). Furthermore, the same pineapple transcriptome was used to dissect the function of the miRNAs in pineapple by predicting their putative targets in conjunction with their regulatory networks. In total, 23 metabolic pathways were found to be regulated by miRNAs in pineapple. The use of high-throughput sequencing in pineapples to unveil the presence of miRNAs and their regulatory pathways provides insight into the repertoire of miRNA regulation used exclusively in this non-climacteric model plant. Copyright © 2015 Elsevier B.V. All rights reserved.
Wang, Yao; Cui, Yazhou; Zhou, Xiaoyan; Han, Jinxiang
2015-01-01
Objective Osteogenesis imperfecta (OI) is a rare inherited skeletal disease, characterized by bone fragility and low bone density. The mutations in this disorder have been widely reported to be on various exonal hotspots of the candidate genes, including COL1A1, COL1A2, CRTAP, LEPRE1, and FKBP10, thus creating a great demand for precise genetic tests. However, large genome sizes make the process daunting and the analyses, inefficient and expensive. Therefore, we aimed at developing a fast, accurate, efficient, and cheaper sequencing platform for OI diagnosis; and to this end, use of an advanced array-based technique was proposed. Method A CustomSeq Affymetrix Resequencing Array was established for high-throughput sequencing of five genes simultaneously. Genomic DNA extraction from 13 OI patients and 85 normal controls and amplification using long-range PCR (LR-PCR) were followed by DNA fragmentation and chip hybridization, according to standard Affymetrix protocols. Hybridization signals were determined using GeneChip Sequence Analysis Software (GSEQ). To examine the feasibility, the outcome from new resequencing approach was validated by conventional capillary sequencing method. Result Overall call rates using resequencing array was 96–98% and the agreement between microarray and capillary sequencing was 99.99%. 11 out of 13 OI patients with pathogenic mutations were successfully detected by the chip analysis without adjustment, and one mutation could also be identified using manual visual inspection. Conclusion A high-throughput resequencing array was developed that detects the disease-associated mutations in OI, providing a potential tool to facilitate large-scale genetic screening for OI patients. Through this method, a novel mutation was also found. PMID:25742658
Persson, Nils E; Rafshoon, Joshua; Naghshpour, Kaylie; Fast, Tony; Chu, Ping-Hsun; McBride, Michael; Risteen, Bailey; Grover, Martha; Reichmanis, Elsa
2017-10-18
High-throughput discovery of process-structure-property relationships in materials through an informatics-enabled empirical approach is an increasingly utilized technique in materials research due to the rapidly expanding availability of data. Here, process-structure-property relationships are extracted for the nucleation, growth, and deposition of semiconducting poly(3-hexylthiophene) (P3HT) nanofibers used in organic field effect transistors, via high-throughput image analysis. This study is performed using an automated image analysis pipeline combining existing open-source software and new algorithms, enabling the rapid evaluation of structural metrics for images of fibrillar materials, including local orientational order, fiber length density, and fiber length distributions. We observe that microfluidic processing leads to fibers that pack with unusually high density, while sonication yields fibers that pack sparsely with low alignment. This is attributed to differences in their crystallization mechanisms. P3HT nanofiber packing during thin film deposition exhibits behavior suggesting that fibers are confined to packing in two-dimensional layers. We find that fiber alignment, a feature correlated with charge carrier mobility, is driven by increasing fiber length, and that shorter fibers tend to segregate to the buried dielectric interface during deposition, creating potentially performance-limiting defects in alignment. Another barrier to perfect alignment is the curvature of P3HT fibers; we propose a mechanistic simulation of fiber growth that reconciles both this curvature and the log-normal distribution of fiber lengths inherent to the fiber populations under consideration.
Kuhn, Alexandre; Ong, Yao Min; Quake, Stephen R; Burkholder, William F
2015-07-08
Like other structural variants, transposable element insertions can be highly polymorphic across individuals. Their functional impact, however, remains poorly understood. Current genome-wide approaches for genotyping insertion-site polymorphisms based on targeted or whole-genome sequencing remain very expensive and can lack accuracy, hence new large-scale genotyping methods are needed. We describe a high-throughput method for genotyping transposable element insertions and other types of structural variants that can be assayed by breakpoint PCR. The method relies on next-generation sequencing of multiplex, site-specific PCR amplification products and read count-based genotype calls. We show that this method is flexible, efficient (it does not require rounds of optimization), cost-effective and highly accurate. This method can benefit a wide range of applications from the routine genotyping of animal and plant populations to the functional study of structural variants in humans.
NCBI GEO: archive for high-throughput functional genomic data.
Barrett, Tanya; Troup, Dennis B; Wilhite, Stephen E; Ledoux, Pierre; Rudnev, Dmitry; Evangelista, Carlos; Kim, Irene F; Soboleva, Alexandra; Tomashevsky, Maxim; Marshall, Kimberly A; Phillippy, Katherine H; Sherman, Patti M; Muertter, Rolf N; Edgar, Ron
2009-01-01
The Gene Expression Omnibus (GEO) at the National Center for Biotechnology Information (NCBI) is the largest public repository for high-throughput gene expression data. Additionally, GEO hosts other categories of high-throughput functional genomic data, including those that examine genome copy number variations, chromatin structure, methylation status and transcription factor binding. These data are generated by the research community using high-throughput technologies like microarrays and, more recently, next-generation sequencing. The database has a flexible infrastructure that can capture fully annotated raw and processed data, enabling compliance with major community-derived scientific reporting standards such as 'Minimum Information About a Microarray Experiment' (MIAME). In addition to serving as a centralized data storage hub, GEO offers many tools and features that allow users to effectively explore, analyze and download expression data from both gene-centric and experiment-centric perspectives. This article summarizes the GEO repository structure, content and operating procedures, as well as recently introduced data mining features. GEO is freely accessible at http://www.ncbi.nlm.nih.gov/geo/.
A high-throughput Sanger strategy for human mitochondrial genome sequencing
2013-01-01
Background A population reference database of complete human mitochondrial genome (mtGenome) sequences is needed to enable the use of mitochondrial DNA (mtDNA) coding region data in forensic casework applications. However, the development of entire mtGenome haplotypes to forensic data quality standards is difficult and laborious. A Sanger-based amplification and sequencing strategy that is designed for automated processing, yet routinely produces high quality sequences, is needed to facilitate high-volume production of these mtGenome data sets. Results We developed a robust 8-amplicon Sanger sequencing strategy that regularly produces complete, forensic-quality mtGenome haplotypes in the first pass of data generation. The protocol works equally well on samples representing diverse mtDNA haplogroups and DNA input quantities ranging from 50 pg to 1 ng, and can be applied to specimens of varying DNA quality. The complete workflow was specifically designed for implementation on robotic instrumentation, which increases throughput and reduces both the opportunities for error inherent to manual processing and the cost of generating full mtGenome sequences. Conclusions The described strategy will assist efforts to generate complete mtGenome haplotypes which meet the highest data quality expectations for forensic genetic and other applications. Additionally, high-quality data produced using this protocol can be used to assess mtDNA data developed using newer technologies and chemistries. Further, the amplification strategy can be used to enrich for mtDNA as a first step in sample preparation for targeted next-generation sequencing. PMID:24341507
Prediction of siRNA potency using sparse logistic regression.
Hu, Wei; Hu, John
2014-06-01
RNA interference (RNAi) can modulate gene expression at post-transcriptional as well as transcriptional levels. Short interfering RNA (siRNA) serves as a trigger for the RNAi gene inhibition mechanism, and therefore is a crucial intermediate step in RNAi. There have been extensive studies to identify the sequence characteristics of potent siRNAs. One such study built a linear model using LASSO (Least Absolute Shrinkage and Selection Operator) to measure the contribution of each siRNA sequence feature. This model is simple and interpretable, but it requires a large number of nonzero weights. We have introduced a novel technique, sparse logistic regression, to build a linear model using single-position specific nucleotide compositions which has the same prediction accuracy of the linear model based on LASSO. The weights in our new model share the same general trend as those in the previous model, but have only 25 nonzero weights out of a total 84 weights, a 54% reduction compared to the previous model. Contrary to the linear model based on LASSO, our model suggests that only a few positions are influential on the efficacy of the siRNA, which are the 5' and 3' ends and the seed region of siRNA sequences. We also employed sparse logistic regression to build a linear model using dual-position specific nucleotide compositions, a task LASSO is not able to accomplish well due to its high dimensional nature. Our results demonstrate the superiority of sparse logistic regression as a technique for both feature selection and regression over LASSO in the context of siRNA design.
Bit error rate tester using fast parallel generation of linear recurring sequences
Pierson, Lyndon G.; Witzke, Edward L.; Maestas, Joseph H.
2003-05-06
A fast method for generating linear recurring sequences by parallel linear recurring sequence generators (LRSGs) with a feedback circuit optimized to balance minimum propagation delay against maximal sequence period. Parallel generation of linear recurring sequences requires decimating the sequence (creating small contiguous sections of the sequence in each LRSG). A companion matrix form is selected depending on whether the LFSR is right-shifting or left-shifting. The companion matrix is completed by selecting a primitive irreducible polynomial with 1's most closely grouped in a corner of the companion matrix. A decimation matrix is created by raising the companion matrix to the (n*k).sup.th power, where k is the number of parallel LRSGs and n is the number of bits to be generated at a time by each LRSG. Companion matrices with 1's closely grouped in a corner will yield sparse decimation matrices. A feedback circuit comprised of XOR logic gates implements the decimation matrix in hardware. Sparse decimation matrices can be implemented with minimum number of XOR gates, and therefore a minimum propagation delay through the feedback circuit. The LRSG of the invention is particularly well suited to use as a bit error rate tester on high speed communication lines because it permits the receiver to synchronize to the transmitted pattern within 2n bits.
Del Campo, Javier; Mallo, Diego; Massana, Ramon; de Vargas, Colomban; Richards, Thomas A; Ruiz-Trillo, Iñaki
2015-09-01
The opisthokonts are one of the major super groups of eukaryotes. It comprises two major clades: (i) the Metazoa and their unicellular relatives and (ii) the Fungi and their unicellular relatives. There is, however, little knowledge of the role of opisthokont microbes in many natural environments, especially among non-metazoan and non-fungal opisthokonts. Here, we begin to address this gap by analysing high-throughput 18S rDNA and 18S rRNA sequencing data from different European coastal sites, sampled at different size fractions and depths. In particular, we analyse the diversity and abundance of choanoflagellates, filastereans, ichthyosporeans, nucleariids, corallochytreans and their related lineages. Our results show the great diversity of choanoflagellates in coastal waters as well as a relevant representation of the ichthyosporeans and the uncultured marine opisthokonts (MAOP). Furthermore, we describe a new lineage of marine fonticulids (MAFO) that appears to be abundant in sediments. Taken together, our work points to a greater potential ecological role for unicellular opisthokonts than previously appreciated in marine environments, both in water column and sediments, and also provides evidence of novel opisthokont phylogenetic lineages. This study highlights the importance of high-throughput sequencing approaches to unravel the diversity and distribution of both known and novel eukaryotic lineages. © 2014 Society for Applied Microbiology and John Wiley & Sons Ltd.
Ozer, Abdullah; Tome, Jacob M; Friedman, Robin C; Gheba, Dan; Schroth, Gary P; Lis, John T
2015-08-01
Because RNA-protein interactions have a central role in a wide array of biological processes, methods that enable a quantitative assessment of these interactions in a high-throughput manner are in great demand. Recently, we developed the high-throughput sequencing-RNA affinity profiling (HiTS-RAP) assay that couples sequencing on an Illumina GAIIx genome analyzer with the quantitative assessment of protein-RNA interactions. This assay is able to analyze interactions between one or possibly several proteins with millions of different RNAs in a single experiment. We have successfully used HiTS-RAP to analyze interactions of the EGFP and negative elongation factor subunit E (NELF-E) proteins with their corresponding canonical and mutant RNA aptamers. Here we provide a detailed protocol for HiTS-RAP that can be completed in about a month (8 d hands-on time). This includes the preparation and testing of recombinant proteins and DNA templates, clustering DNA templates on a flowcell, HiTS and protein binding with a GAIIx instrument, and finally data analysis. We also highlight aspects of HiTS-RAP that can be further improved and points of comparison between HiTS-RAP and two other recently developed methods, quantitative analysis of RNA on a massively parallel array (RNA-MaP) and RNA Bind-n-Seq (RBNS), for quantitative analysis of RNA-protein interactions.
Methods for processing high-throughput RNA sequencing data.
Ares, Manuel
2014-11-03
High-throughput sequencing (HTS) methods for analyzing RNA populations (RNA-Seq) are gaining rapid application to many experimental situations. The steps in an RNA-Seq experiment require thought and planning, especially because the expense in time and materials is currently higher and the protocols are far less routine than those used for other high-throughput methods, such as microarrays. As always, good experimental design will make analysis and interpretation easier. Having a clear biological question, an idea about the best way to do the experiment, and an understanding of the number of replicates needed will make the entire process more satisfying. Whether the goal is capturing transcriptome complexity from a tissue or identifying small fragments of RNA cross-linked to a protein of interest, conversion of the RNA to cDNA followed by direct sequencing using the latest methods is a developing practice, with new technical modifications and applications appearing every day. Even more rapid are the development and improvement of methods for analysis of the very large amounts of data that arrive at the end of an RNA-Seq experiment, making considerations regarding reproducibility, validation, visualization, and interpretation increasingly important. This introduction is designed to review and emphasize a pathway of analysis from experimental design through data presentation that is likely to be successful, with the recognition that better methods are right around the corner. © 2014 Cold Spring Harbor Laboratory Press.
Miyai, Manami; Eikawa, Shingo; Hosoi, Akihiro; Iino, Tamaki; Matsushita, Hirokazu; Isobe, Midori; Uenaka, Akiko; Udono, Heiichiro; Nakajima, Jun; Nakayama, Eiichi; Kakimi, Kazuhiro
2015-01-01
Comprehensive immunological evaluation is crucial for monitoring patients undergoing antigen-specific cancer immunotherapy. The identification and quantification of T cell responses is most important for the further development of such therapies. Using well-characterized clinical samples from a high responder patient (TK-f01) in an NY-ESO-1f peptide vaccine study, we performed high-throughput T cell receptor β-chain (TCRB) gene next generation sequencing (NGS) to monitor the frequency of NY-ESO-1-specific CD8+ T cells. We compared these results with those of conventional immunological assays, such as IFN-γ capture, tetramer binding and limiting dilution clonality assays. We sequenced human TCRB complementarity-determining region 3 (CDR3) rearrangements of two NY-ESO-1f-specific CD8+ T cell clones, 6-8L and 2F6, as well as PBMCs over the course of peptide vaccination. Clone 6-8L possessed the TCRB CDR3 gene TCRBV11-03*01 and BJ02-01*01 with amino acid sequence CASSLRGNEQFF, whereas 2F6 possessed TCRBV05-08*01 and BJ02-04*01 (CASSLVGTNIQYF). Using these two sequences as models, we evaluated the frequency of NY-ESO-1-specific CD8+ T cells in PBMCs ex vivo. The 6-8L CDR3 sequence was the second most frequent in PBMC and was present at high frequency (0.7133%) even prior to vaccination, and sustained over the course of vaccination. Despite a marked expansion of NY-ESO-1-specific CD8+ T cells detected from the first through 6th vaccination by tetramer staining and IFN-γ capture assays, as evaluated by CDR3 sequencing the frequency did not increase with increasing rounds of peptide vaccination. By clonal analysis using 12 day in vitro stimulation, the frequency of B*52:01-restricted NY-ESO-1f peptide-specific CD8+ T cells in PBMCs was estimated as only 0.0023%, far below the 0.7133% by NGS sequencing. Thus, assays requiring in vitro stimulation might be underestimating the frequency of clones with lower proliferation potential. High-throughput TCRB sequencing using NGS can potentially better estimate the actual frequency of antigen-specific T cells and thus provide more accurate patient monitoring.
Miyai, Manami; Eikawa, Shingo; Hosoi, Akihiro; Iino, Tamaki; Matsushita, Hirokazu; Isobe, Midori; Uenaka, Akiko; Udono, Heiichiro; Nakajima, Jun; Nakayama, Eiichi; Kakimi, Kazuhiro
2015-01-01
Comprehensive immunological evaluation is crucial for monitoring patients undergoing antigen-specific cancer immunotherapy. The identification and quantification of T cell responses is most important for the further development of such therapies. Using well-characterized clinical samples from a high responder patient (TK-f01) in an NY-ESO-1f peptide vaccine study, we performed high-throughput T cell receptor β-chain (TCRB) gene next generation sequencing (NGS) to monitor the frequency of NY-ESO-1-specific CD8+ T cells. We compared these results with those of conventional immunological assays, such as IFN-γ capture, tetramer binding and limiting dilution clonality assays. We sequenced human TCRB complementarity-determining region 3 (CDR3) rearrangements of two NY-ESO-1f-specific CD8+ T cell clones, 6-8L and 2F6, as well as PBMCs over the course of peptide vaccination. Clone 6-8L possessed the TCRB CDR3 gene TCRBV11-03*01 and BJ02-01*01 with amino acid sequence CASSLRGNEQFF, whereas 2F6 possessed TCRBV05-08*01 and BJ02-04*01 (CASSLVGTNIQYF). Using these two sequences as models, we evaluated the frequency of NY-ESO-1-specific CD8+ T cells in PBMCs ex vivo. The 6-8L CDR3 sequence was the second most frequent in PBMC and was present at high frequency (0.7133%) even prior to vaccination, and sustained over the course of vaccination. Despite a marked expansion of NY-ESO-1-specific CD8+ T cells detected from the first through 6th vaccination by tetramer staining and IFN-γ capture assays, as evaluated by CDR3 sequencing the frequency did not increase with increasing rounds of peptide vaccination. By clonal analysis using 12 day in vitro stimulation, the frequency of B*52:01-restricted NY-ESO-1f peptide-specific CD8+ T cells in PBMCs was estimated as only 0.0023%, far below the 0.7133% by NGS sequencing. Thus, assays requiring in vitro stimulation might be underestimating the frequency of clones with lower proliferation potential. High-throughput TCRB sequencing using NGS can potentially better estimate the actual frequency of antigen-specific T cells and thus provide more accurate patient monitoring. PMID:26291626
The ChIP-exo Method: Identifying Protein-DNA Interactions with Near Base Pair Precision.
Perreault, Andrea A; Venters, Bryan J
2016-12-23
Chromatin immunoprecipitation (ChIP) is an indispensable tool in the fields of epigenetics and gene regulation that isolates specific protein-DNA interactions. ChIP coupled to high throughput sequencing (ChIP-seq) is commonly used to determine the genomic location of proteins that interact with chromatin. However, ChIP-seq is hampered by relatively low mapping resolution of several hundred base pairs and high background signal. The ChIP-exo method is a refined version of ChIP-seq that substantially improves upon both resolution and noise. The key distinction of the ChIP-exo methodology is the incorporation of lambda exonuclease digestion in the library preparation workflow to effectively footprint the left and right 5' DNA borders of the protein-DNA crosslink site. The ChIP-exo libraries are then subjected to high throughput sequencing. The resulting data can be leveraged to provide unique and ultra-high resolution insights into the functional organization of the genome. Here, we describe the ChIP-exo method that we have optimized and streamlined for mammalian systems and next-generation sequencing-by-synthesis platform.
2012-01-01
Background The central role of the somatotrophic axis in animal post-natal growth, development and fertility is well established. Therefore, the identification of genetic variants affecting quantitative traits within this axis is an attractive goal. However, large sample numbers are a pre-requisite for the identification of genetic variants underlying complex traits and although technologies are improving rapidly, high-throughput sequencing of large numbers of complete individual genomes remains prohibitively expensive. Therefore using a pooled DNA approach coupled with target enrichment and high-throughput sequencing, the aim of this study was to identify polymorphisms and estimate allele frequency differences across 83 candidate genes of the somatotrophic axis, in 150 Holstein-Friesian dairy bulls divided into two groups divergent for genetic merit for fertility. Results In total, 4,135 SNPs and 893 indels were identified during the resequencing of the 83 candidate genes. Nineteen percent (n = 952) of variants were located within 5' and 3' UTRs. Seventy-two percent (n = 3,612) were intronic and 9% (n = 464) were exonic, including 65 indels and 236 SNPs resulting in non-synonymous substitutions (NSS). Significant (P < 0.01) mean allele frequency differentials between the low and high fertility groups were observed for 720 SNPs (58 NSS). Allele frequencies for 43 of the SNPs were also determined by genotyping the 150 individual animals (Sequenom® MassARRAY). No significant differences (P > 0.1) were observed between the two methods for any of the 43 SNPs across both pools (i.e., 86 tests in total). Conclusions The results of the current study support previous findings of the use of DNA sample pooling and high-throughput sequencing as a viable strategy for polymorphism discovery and allele frequency estimation. Using this approach we have characterised the genetic variation within genes of the somatotrophic axis and related pathways, central to mammalian post-natal growth and development and subsequent lactogenesis and fertility. We have identified a large number of variants segregating at significantly different frequencies between cattle groups divergent for calving interval plausibly harbouring causative variants contributing to heritable variation. To our knowledge, this is the first report describing sequencing of targeted genomic regions in any livestock species using groups with divergent phenotypes for an economically important trait. PMID:22235840
Angiuoli, Samuel V; Matalka, Malcolm; Gussman, Aaron; Galens, Kevin; Vangala, Mahesh; Riley, David R; Arze, Cesar; White, James R; White, Owen; Fricke, W Florian
2011-08-30
Next-generation sequencing technologies have decentralized sequence acquisition, increasing the demand for new bioinformatics tools that are easy to use, portable across multiple platforms, and scalable for high-throughput applications. Cloud computing platforms provide on-demand access to computing infrastructure over the Internet and can be used in combination with custom built virtual machines to distribute pre-packaged with pre-configured software. We describe the Cloud Virtual Resource, CloVR, a new desktop application for push-button automated sequence analysis that can utilize cloud computing resources. CloVR is implemented as a single portable virtual machine (VM) that provides several automated analysis pipelines for microbial genomics, including 16S, whole genome and metagenome sequence analysis. The CloVR VM runs on a personal computer, utilizes local computer resources and requires minimal installation, addressing key challenges in deploying bioinformatics workflows. In addition CloVR supports use of remote cloud computing resources to improve performance for large-scale sequence processing. In a case study, we demonstrate the use of CloVR to automatically process next-generation sequencing data on multiple cloud computing platforms. The CloVR VM and associated architecture lowers the barrier of entry for utilizing complex analysis protocols on both local single- and multi-core computers and cloud systems for high throughput data processing.
Coprolites as a source of information on the genome and diet of the cave hyena
Bon, Céline; Berthonaud, Véronique; Maksud, Frédéric; Labadie, Karine; Poulain, Julie; Artiguenave, François; Wincker, Patrick; Aury, Jean-Marc; Elalouf, Jean-Marc
2012-01-01
We performed high-throughput sequencing of DNA from fossilized faeces to evaluate this material as a source of information on the genome and diet of Pleistocene carnivores. We analysed coprolites derived from the extinct cave hyena (Crocuta crocuta spelaea), and sequenced 90 million DNA fragments from two specimens. The DNA reads enabled a reconstruction of the cave hyena mitochondrial genome with up to a 158-fold coverage. This genome, and those sequenced from extant spotted (Crocuta crocuta) and striped (Hyaena hyaena) hyena specimens, allows for the establishment of a robust phylogeny that supports a close relationship between the cave and the spotted hyena. We also demonstrate that high-throughput sequencing yields data for cave hyena multi-copy and single-copy nuclear genes, and that about 50 per cent of the coprolite DNA can be ascribed to this species. Analysing the data for additional species to indicate the cave hyena diet, we retrieved abundant sequences for the red deer (Cervus elaphus), and characterized its mitochondrial genome with up to a 3.8-fold coverage. In conclusion, we have demonstrated the presence of abundant ancient DNA in the coprolites surveyed. Shotgun sequencing of this material yielded a wealth of DNA sequences for a Pleistocene carnivore and allowed unbiased identification of diet. PMID:22456883
Characterization of a new apple luteovirus identified by high-throughput sequencing.
Liu, Huawei; Wu, Liping; Nikolaeva, Ekaterina; Peter, Kari; Liu, Zongrang; Mollov, Dimitre; Cao, Mengji; Li, Ruhui
2018-05-15
'Rapid Apple Decline' (RAD) is a newly emerging problem of young, dwarf apple trees in the Northeastern USA. The affected trees show trunk necrosis, cracking and canker before collapse in summer. In this study, we discovered and characterized a new luteovirus from apple trees in RAD-affected orchards using high-throughput sequencing (HTS) technology and subsequent Sanger sequencing. Illumina NextSeq sequencing was applied to total RNAs prepared from three diseased apple trees. Sequence reads were de novo assembled, and contigs were annotated by BLASTx. RT-PCR and 5'/3' RACE sequencing were used to obtain the complete genome of a new virus. RT-PCR was used to detect the virus. Three common apple viruses and a new luteovirus were identified from the diseased trees by HTS and RT-PCR. Sequence analyses of the complete genome of the new virus show that it is a new species of the genus Luteovirus in the family Luteoviridae. The virus is graft transmissible and detected by RT-PCR in apple trees in a couple of orchards. A new luteovirus and/or three known viruses were found to be associated with RAD. Molecular characterization of the new luteovirus provides important information for further investigation of its distribution and etiological role.
Brewster, Aaron S.; Sawaya, Michael R.; Rodriguez, Jose; ...
2015-01-23
Still diffraction patterns from peptide nanocrystals with small unit cells are challenging to index using conventional methods owing to the limited number of spots and the lack of crystal orientation information for individual images. New indexing algorithms have been developed as part of the Computational Crystallography Toolbox( cctbx) to overcome these challenges. Accurate unit-cell information derived from an aggregate data set from thousands of diffraction patterns can be used to determine a crystal orientation matrix for individual images with as few as five reflections. These algorithms are potentially applicable not only to amyloid peptides but also to any set ofmore » diffraction patterns with sparse properties, such as low-resolution virus structures or high-throughput screening of still images captured by raster-scanning at synchrotron sources. As a proof of concept for this technique, successful integration of X-ray free-electron laser (XFEL) data to 2.5 Å resolution for the amyloid segment GNNQQNY from the Sup35 yeast prion is presented.« less
Alignment of high-throughput sequencing data inside in-memory databases.
Firnkorn, Daniel; Knaup-Gregori, Petra; Lorenzo Bermejo, Justo; Ganzinger, Matthias
2014-01-01
In times of high-throughput DNA sequencing techniques, performance-capable analysis of DNA sequences is of high importance. Computer supported DNA analysis is still an intensive time-consuming task. In this paper we explore the potential of a new In-Memory database technology by using SAP's High Performance Analytic Appliance (HANA). We focus on read alignment as one of the first steps in DNA sequence analysis. In particular, we examined the widely used Burrows-Wheeler Aligner (BWA) and implemented stored procedures in both, HANA and the free database system MySQL, to compare execution time and memory management. To ensure that the results are comparable, MySQL has been running in memory as well, utilizing its integrated memory engine for database table creation. We implemented stored procedures, containing exact and inexact searching of DNA reads within the reference genome GRCh37. Due to technical restrictions in SAP HANA concerning recursion, the inexact matching problem could not be implemented on this platform. Hence, performance analysis between HANA and MySQL was made by comparing the execution time of the exact search procedures. Here, HANA was approximately 27 times faster than MySQL which means, that there is a high potential within the new In-Memory concepts, leading to further developments of DNA analysis procedures in the future.
Hosokawa, Masahito; Nishikawa, Yohei; Kogawa, Masato; Takeyama, Haruko
2017-07-12
Massively parallel single-cell genome sequencing is required to further understand genetic diversities in complex biological systems. Whole genome amplification (WGA) is the first step for single-cell sequencing, but its throughput and accuracy are insufficient in conventional reaction platforms. Here, we introduce single droplet multiple displacement amplification (sd-MDA), a method that enables massively parallel amplification of single cell genomes while maintaining sequence accuracy and specificity. Tens of thousands of single cells are compartmentalized in millions of picoliter droplets and then subjected to lysis and WGA by passive droplet fusion in microfluidic channels. Because single cells are isolated in compartments, their genomes are amplified to saturation without contamination. This enables the high-throughput acquisition of contamination-free and cell specific sequence reads from single cells (21,000 single-cells/h), resulting in enhancement of the sequence data quality compared to conventional methods. This method allowed WGA of both single bacterial cells and human cancer cells. The obtained sequencing coverage rivals those of conventional techniques with superior sequence quality. In addition, we also demonstrate de novo assembly of uncultured soil bacteria and obtain draft genomes from single cell sequencing. This sd-MDA is promising for flexible and scalable use in single-cell sequencing.
Assessment of the cPAS-based BGISEQ-500 platform for metagenomic sequencing.
Fang, Chao; Zhong, Huanzi; Lin, Yuxiang; Chen, Bing; Han, Mo; Ren, Huahui; Lu, Haorong; Luber, Jacob M; Xia, Min; Li, Wangsheng; Stein, Shayna; Xu, Xun; Zhang, Wenwei; Drmanac, Radoje; Wang, Jian; Yang, Huanming; Hammarström, Lennart; Kostic, Aleksandar D; Kristiansen, Karsten; Li, Junhua
2018-03-01
More extensive use of metagenomic shotgun sequencing in microbiome research relies on the development of high-throughput, cost-effective sequencing. Here we present a comprehensive evaluation of the performance of the new high-throughput sequencing platform BGISEQ-500 for metagenomic shotgun sequencing and compare its performance with that of 2 Illumina platforms. Using fecal samples from 20 healthy individuals, we evaluated the intra-platform reproducibility for metagenomic sequencing on the BGISEQ-500 platform in a setup comprising 8 library replicates and 8 sequencing replicates. Cross-platform consistency was evaluated by comparing 20 pairwise replicates on the BGISEQ-500 platform vs the Illumina HiSeq 2000 platform and the Illumina HiSeq 4000 platform. In addition, we compared the performance of the 2 Illumina platforms against each other. By a newly developed overall accuracy quality control method, an average of 82.45 million high-quality reads (96.06% of raw reads) per sample, with 90.56% of bases scoring Q30 and above, was obtained using the BGISEQ-500 platform. Quantitative analyses revealed extremely high reproducibility between BGISEQ-500 intra-platform replicates. Cross-platform replicates differed slightly more than intra-platform replicates, yet a high consistency was observed. Only a low percentage (2.02%-3.25%) of genes exhibited significant differences in relative abundance comparing the BGISEQ-500 and HiSeq platforms, with a bias toward genes with higher GC content being enriched on the HiSeq platforms. Our study provides the first set of performance metrics for human gut metagenomic sequencing data using BGISEQ-500. The high accuracy and technical reproducibility confirm the applicability of the new platform for metagenomic studies, though caution is still warranted when combining metagenomic data from different platforms.
Kebschull, Moritz; Fittler, Melanie Julia; Demmer, Ryan T; Papapanou, Panos N
2017-01-01
Today, -omics analyses, including the systematic cataloging of messenger RNA and microRNA sequences or DNA methylation patterns in a cell population, organ, or tissue sample, allow for an unbiased, comprehensive genome-level analysis of complex diseases, offering a large advantage over earlier "candidate" gene or pathway analyses. A primary goal in the analysis of these high-throughput assays is the detection of those features among several thousand that differ between different groups of samples. In the context of oral biology, our group has successfully utilized -omics technology to identify key molecules and pathways in different diagnostic entities of periodontal disease.A major issue when inferring biological information from high-throughput -omics studies is the fact that the sheer volume of high-dimensional data generated by contemporary technology is not appropriately analyzed using common statistical methods employed in the biomedical sciences.In this chapter, we outline a robust and well-accepted bioinformatics workflow for the initial analysis of -omics data generated using microarrays or next-generation sequencing technology using open-source tools. Starting with quality control measures and necessary preprocessing steps for data originating from different -omics technologies, we next outline a differential expression analysis pipeline that can be used for data from both microarray and sequencing experiments, and offers the possibility to account for random or fixed effects. Finally, we present an overview of the possibilities for a functional analysis of the obtained data.
Ito, Yuji
2017-01-01
As an alternative to hybridoma technology, the antibody phage library system can also be used for antibody selection. This method enables the isolation of antigen-specific binders through an in vitro selection process known as biopanning. While it has several advantages, such as an avoidance of animal immunization, the phage cloning and screening steps of biopanning are time-consuming and problematic. Here, we introduce a novel biopanning method combined with high-throughput sequencing (HTS) using a next-generation sequencer (NGS) to save time and effort in antibody selection, and to increase the diversity of acquired antibody sequences. Biopannings against a target antigen were performed using a human single chain Fv (scFv) antibody phage library. VH genes in pooled phages at each round of biopanning were analyzed by HTS on a NGS. The obtained data were trimmed, merged, and translated into amino acid sequences. The frequencies (%) of the respective VH sequences at each biopanning step were calculated, and the amplification factor (change of frequency through biopanning) was obtained to estimate the potential for antigen binding. A phylogenetic tree was drawn using the top 50 VH sequences with high amplification factors. Representative VH sequences forming the cluster were then picked up and used to reconstruct scFv genes harboring these VHs. Their derived scFv-Fc fusion proteins showed clear antigen binding activity. These results indicate that a combination of biopanning and HTS enables the rapid and comprehensive identification of specific binders from antibody phage libraries.
Zhou, Chengran
2017-01-01
Abstract Over the past decade, biodiversity researchers have dedicated tremendous efforts to constructing DNA reference barcodes for rapid species registration and identification. Although analytical cost for standard DNA barcoding has been significantly reduced since early 2000, further dramatic reduction in barcoding costs is unlikely because Sanger sequencing is approaching its limits in throughput and chemistry cost. Constraints in barcoding cost not only led to unbalanced barcoding efforts around the globe, but also prevented high-throughput sequencing (HTS)–based taxonomic identification from applying binomial species names, which provide crucial linkages to biological knowledge. We developed an Illumina-based pipeline, HIFI-Barcode, to produce full-length Cytochrome c oxidase subunit I (COI) barcodes from pooled polymerase chain reaction amplicons generated by individual specimens. The new pipeline generated accurate barcode sequences that were comparable to Sanger standards, even for different haplotypes of the same species that were only a few nucleotides different from each other. Additionally, the new pipeline was much more sensitive in recovering amplicons at low quantity. The HIFI-Barcode pipeline successfully recovered barcodes from more than 78% of the polymerase chain reactions that didn’t show clear bands on the electrophoresis gel. Moreover, sequencing results based on the single molecular sequencing platform Pacbio confirmed the accuracy of the HIFI-Barcode results. Altogether, the new pipeline can provide an improved solution to produce full-length reference barcodes at about one-tenth of the current cost, enabling construction of comprehensive barcode libraries for local fauna, leading to a feasible direction for DNA barcoding global biomes. PMID:29077841
Report for the NGFA-5 project.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jaing, C; Jackson, P; Thissen, J
The objective of this project is to provide DHS a comprehensive evaluation of the current genomic technologies including genotyping, TaqMan PCR, multiple locus variable tandem repeat analysis (MLVA), microarray and high-throughput DNA sequencing in the analysis of biothreat agents from complex environmental samples. To effectively compare the sensitivity and specificity of the different genomic technologies, we used SNP TaqMan PCR, MLVA, microarray and high-throughput illumine and 454 sequencing to test various strains from B. anthracis, B. thuringiensis, BioWatch aerosol filter extracts or soil samples that were spiked with B. anthracis, and samples that were previously collected during DHS and EPAmore » environmental release exercises that were known to contain B. thuringiensis spores. The results of all the samples against the various assays are discussed in this report.« less
He, Guo-qing; Liu, Tong-jie; Sadiq, Faizan A.; Gu, Jing-si; Zhang, Guo-hua
2017-01-01
Chinese traditional fermented foods have a very long history dating back thousands of years and have become an indispensable part of Chinese dietary culture. A plethora of research has been conducted to unravel the composition and dynamics of microbial consortia associated with Chinese traditional fermented foods using culture-dependent as well as culture-independent methods, like different high-throughput sequencing (HTS) techniques. These HTS techniques enable us to understand the relationship between a food product and its microbes to a greater extent than ever before. Considering the importance of Chinese traditional fermented products, the objective of this paper is to review the diversity and dynamics of microbiota in Chinese traditional fermented foods revealed by HTS approaches. PMID:28378567
Gold nanoparticles for high-throughput genotyping of long-range haplotypes
NASA Astrophysics Data System (ADS)
Chen, Peng; Pan, Dun; Fan, Chunhai; Chen, Jianhua; Huang, Ke; Wang, Dongfang; Zhang, Honglu; Li, You; Feng, Guoyin; Liang, Peiji; He, Lin; Shi, Yongyong
2011-10-01
Completion of the Human Genome Project and the HapMap Project has led to increasing demands for mapping complex traits in humans to understand the aetiology of diseases. Identifying variations in the DNA sequence, which affect how we develop disease and respond to pathogens and drugs, is important for this purpose, but it is difficult to identify these variations in large sample sets. Here we show that through a combination of capillary sequencing and polymerase chain reaction assisted by gold nanoparticles, it is possible to identify several DNA variations that are associated with age-related macular degeneration and psoriasis on significant regions of human genomic DNA. Our method is accurate and promising for large-scale and high-throughput genetic analysis of susceptibility towards disease and drug resistance.
Huang, Xiwei; Yu, Hao; Liu, Xu; Jiang, Yu; Yan, Mei; Wu, Dongping
2015-09-01
The existing ISFET-based DNA sequencing detects hydrogen ions released during the polymerization of DNA strands on microbeads, which are scattered into microwell array above the ISFET sensor with unknown distribution. However, false pH detection happens at empty microwells due to crosstalk from neighboring microbeads. In this paper, a dual-mode CMOS ISFET sensor is proposed to have accurate pH detection toward DNA sequencing. Dual-mode sensing, optical and chemical modes, is realized by integrating a CMOS image sensor (CIS) with ISFET pH sensor, and is fabricated in a standard 0.18-μm CIS process. With accurate determination of microbead physical locations with CIS pixel by contact imaging, the dual-mode sensor can correlate local pH for one DNA slice at one location-determined microbead, which can result in improved pH detection accuracy. Moreover, toward a high-throughput DNA sequencing, a correlated-double-sampling readout that supports large array for both modes is deployed to reduce pixel-to-pixel nonuniformity such as threshold voltage mismatch. The proposed CMOS dual-mode sensor is experimentally examined to show a well correlated pH map and optical image for microbeads with a pH sensitivity of 26.2 mV/pH, a fixed pattern noise (FPN) reduction from 4% to 0.3%, and a readout speed of 1200 frames/s. A dual-mode CMOS ISFET sensor with suppressed FPN for accurate large-arrayed pH sensing is proposed and demonstrated with state-of-the-art measured results toward accurate and high-throughput DNA sequencing. The developed dual-mode CMOS ISFET sensor has great potential for future personal genome diagnostics with high accuracy and low cost.
Pathogenic bacteria in sewage treatment plants as revealed by 454 pyrosequencing.
Ye, Lin; Zhang, Tong
2011-09-01
This study applied 454 high-throughput pyrosequencing to analyze potentially pathogenic bacteria in activated sludge from 14 municipal wastewater treatment plants (WWTPs) across four countries (China, U.S., Canada, and Singapore), plus the influent and effluent of one of the 14 WWTPs. A total of 370,870 16S rRNA gene sequences with average length of 207 bps were obtained and all of them were assigned to corresponding taxonomic ranks by using RDP classifier and MEGAN. It was found that the most abundant potentially pathogenic bacteria in the WWTPs were affiliated with the genera of Aeromonas and Clostridium. Aeromonas veronii, Aeromonas hydrophila, and Clostridium perfringens were species most similar to the potentially pathogenic bacteria found in this study. Some sequences highly similar (>99%) to Corynebacterium diphtheriae were found in the influent and activated sludge samples from a saline WWTP. Overall, the percentage of the sequences closely related (>99%) to known pathogenic bacteria sequences was about 0.16% of the total sequences. Additionally, a platform-independent Java application (BAND) was developed for graphical visualization of the data of microbial abundance generated by high-throughput pyrosequencing. The approach demonstrated in this study could examine most of the potentially pathogenic bacteria simultaneously instead of one-by-one detection by other methods.
Bartram, Jack; Mountjoy, Edward; Brooks, Tony; Hancock, Jeremy; Williamson, Helen; Wright, Gary; Moppett, John; Goulden, Nick; Hubank, Mike
2016-07-01
High-throughput sequencing (HTS) (next-generation sequencing) of the rearranged Ig and T-cell receptor genes promises to be less expensive and more sensitive than current methods of monitoring minimal residual disease (MRD) in patients with acute lymphoblastic leukemia. However, the adoption of new approaches by clinical laboratories requires careful evaluation of all potential sources of error and the development of strategies to ensure the highest accuracy. Timely and efficient clinical use of HTS platforms will depend on combining multiple samples (multiplexing) in each sequencing run. Here we examine the Ig heavy-chain gene HTS on the Illumina MiSeq platform for MRD. We identify errors associated with multiplexing that could potentially impact the accuracy of MRD analysis. We optimize a strategy that combines high-purity, sequence-optimized oligonucleotides, dual indexing, and an error-aware demultiplexing approach to minimize errors and maximize sensitivity. We present a probability-based, demultiplexing pipeline Error-Aware Demultiplexer that is suitable for all MiSeq strategies and accurately assigns samples to the correct identifier without excessive loss of data. Finally, using controls quantified by digital PCR, we show that HTS-MRD can accurately detect as few as 1 in 10(6) copies of specific leukemic MRD. Crown Copyright © 2016. Published by Elsevier Inc. All rights reserved.
Fluorescence-based high-throughput screening of dicer cleavage activity.
Podolska, Katerina; Sedlak, David; Bartunek, Petr; Svoboda, Petr
2014-03-01
Production of small RNAs by ribonuclease III Dicer is a key step in microRNA and RNA interference pathways, which employ Dicer-produced small RNAs as sequence-specific silencing guides. Further studies and manipulations of microRNA and RNA interference pathways would benefit from identification of small-molecule modulators. Here, we report a study of a fluorescence-based in vitro Dicer cleavage assay, which was adapted for high-throughput screening. The kinetic assay can be performed under single-turnover conditions (35 nM substrate and 70 nM Dicer) in a small volume (5 µL), which makes it suitable for high-throughput screening in a 1536-well format. As a proof of principle, a small library of bioactive compounds was analyzed, demonstrating potential of the assay.
Hykin, Sarah M.; Bi, Ke; McGuire, Jimmy A.
2015-01-01
For 150 years or more, specimens were routinely collected and deposited in natural history collections without preserving fresh tissue samples for genetic analysis. In the case of most herpetological specimens (i.e. amphibians and reptiles), attempts to extract and sequence DNA from formalin-fixed, ethanol-preserved specimens—particularly for use in phylogenetic analyses—has been laborious and largely ineffective due to the highly fragmented nature of the DNA. As a result, tens of thousands of specimens in herpetological collections have not been available for sequence-based phylogenetic studies. Massively parallel High-Throughput Sequencing methods and the associated bioinformatics, however, are particularly suited to recovering meaningful genetic markers from severely degraded/fragmented DNA sequences such as DNA damaged by formalin-fixation. In this study, we compared previously published DNA extraction methods on three tissue types subsampled from formalin-fixed specimens of Anolis carolinensis, followed by sequencing. Sufficient quality DNA was recovered from liver tissue, making this technique minimally destructive to museum specimens. Sequencing was only successful for the more recently collected specimen (collected ~30 ybp). We suspect this could be due either to the conditions of preservation and/or the amount of tissue used for extraction purposes. For the successfully sequenced sample, we found a high rate of base misincorporation. After rigorous trimming, we successfully mapped 27.93% of the cleaned reads to the reference genome, were able to reconstruct the complete mitochondrial genome, and recovered an accurate phylogenetic placement for our specimen. We conclude that the amount of DNA available, which can vary depending on specimen age and preservation conditions, will determine if sequencing will be successful. The technique described here will greatly improve the value of museum collections by making many formalin-fixed specimens available for genetic analysis. PMID:26505622
Hykin, Sarah M; Bi, Ke; McGuire, Jimmy A
2015-01-01
For 150 years or more, specimens were routinely collected and deposited in natural history collections without preserving fresh tissue samples for genetic analysis. In the case of most herpetological specimens (i.e. amphibians and reptiles), attempts to extract and sequence DNA from formalin-fixed, ethanol-preserved specimens-particularly for use in phylogenetic analyses-has been laborious and largely ineffective due to the highly fragmented nature of the DNA. As a result, tens of thousands of specimens in herpetological collections have not been available for sequence-based phylogenetic studies. Massively parallel High-Throughput Sequencing methods and the associated bioinformatics, however, are particularly suited to recovering meaningful genetic markers from severely degraded/fragmented DNA sequences such as DNA damaged by formalin-fixation. In this study, we compared previously published DNA extraction methods on three tissue types subsampled from formalin-fixed specimens of Anolis carolinensis, followed by sequencing. Sufficient quality DNA was recovered from liver tissue, making this technique minimally destructive to museum specimens. Sequencing was only successful for the more recently collected specimen (collected ~30 ybp). We suspect this could be due either to the conditions of preservation and/or the amount of tissue used for extraction purposes. For the successfully sequenced sample, we found a high rate of base misincorporation. After rigorous trimming, we successfully mapped 27.93% of the cleaned reads to the reference genome, were able to reconstruct the complete mitochondrial genome, and recovered an accurate phylogenetic placement for our specimen. We conclude that the amount of DNA available, which can vary depending on specimen age and preservation conditions, will determine if sequencing will be successful. The technique described here will greatly improve the value of museum collections by making many formalin-fixed specimens available for genetic analysis.
High-throughput physical mapping of chromosomes using automated in situ hybridization.
George, Phillip; Sharakhova, Maria V; Sharakhov, Igor V
2012-06-28
Projects to obtain whole-genome sequences for 10,000 vertebrate species and for 5,000 insect and related arthropod species are expected to take place over the next 5 years. For example, the sequencing of the genomes for 15 malaria mosquitospecies is currently being done using an Illumina platform. This Anopheles species cluster includes both vectors and non-vectors of malaria. When the genome assemblies become available, researchers will have the unique opportunity to perform comparative analysis for inferring evolutionary changes relevant to vector ability. However, it has proven difficult to use next-generation sequencing reads to generate high-quality de novo genome assemblies. Moreover, the existing genome assemblies for Anopheles gambiae, although obtained using the Sanger method, are gapped or fragmented. Success of comparative genomic analyses will be limited if researchers deal with numerous sequencing contigs, rather than with chromosome-based genome assemblies. Fragmented, unmapped sequences create problems for genomic analyses because: (i) unidentified gaps cause incorrect or incomplete annotation of genomic sequences; (ii) unmapped sequences lead to confusion between paralogous genes and genes from different haplotypes; and (iii) the lack of chromosome assignment and orientation of the sequencing contigs does not allow for reconstructing rearrangement phylogeny and studying chromosome evolution. Developing high-resolution physical maps for species with newly sequenced genomes is a timely and cost-effective investment that will facilitate genome annotation, evolutionary analysis, and re-sequencing of individual genomes from natural populations. Here, we present innovative approaches to chromosome preparation, fluorescent in situ hybridization (FISH), and imaging that facilitate rapid development of physical maps. Using An. gambiae as an example, we demonstrate that the development of physical chromosome maps can potentially improve genome assemblies and, thus, the quality of genomic analyses. First, we use a high-pressure method to prepare polytene chromosome spreads. This method, originally developed for Drosophila, allows the user to visualize more details on chromosomes than the regular squashing technique. Second, a fully automated, front-end system for FISH is used for high-throughput physical genome mapping. The automated slide staining system runs multiple assays simultaneously and dramatically reduces hands-on time. Third, an automatic fluorescent imaging system, which includes a motorized slide stage, automatically scans and photographs labeled chromosomes after FISH. This system is especially useful for identifying and visualizing multiple chromosomal plates on the same slide. In addition, the scanning process captures a more uniform FISH result. Overall, the automated high-throughput physical mapping protocol is more efficient than a standard manual protocol.
Veeranagouda, Yaligara; Debono-Lagneaux, Delphine; Fournet, Hamida; Thill, Gilbert; Didier, Michel
2018-01-16
The emergence of clustered regularly interspaced short palindromic repeats-Cas9 (CRISPR-Cas9) gene editing systems has enabled the creation of specific mutants at low cost, in a short time and with high efficiency, in eukaryotic cells. Since a CRISPR-Cas9 system typically creates an array of mutations in targeted sites, a successful gene editing project requires careful selection of edited clones. This process can be very challenging, especially when working with multiallelic genes and/or polyploid cells (such as cancer and plants cells). Here we described a next-generation sequencing method called CRISPR-Cas9 Edited Site Sequencing (CRES-Seq) for the efficient and high-throughput screening of CRISPR-Cas9-edited clones. CRES-Seq facilitates the precise genotyping up to 96 CRISPR-Cas9-edited sites (CRES) in a single MiniSeq (Illumina) run with an approximate sequencing cost of $6/clone. CRES-Seq is particularly useful when multiple genes are simultaneously targeted by CRISPR-Cas9, and also for screening of clones generated from multiallelic genes/polyploid cells. © 2018 by John Wiley & Sons, Inc. Copyright © 2018 John Wiley & Sons, Inc.
Identification and removal of low-complexity sites in allele-specific analysis of ChIP-seq data.
Waszak, Sebastian M; Kilpinen, Helena; Gschwind, Andreas R; Orioli, Andrea; Raghav, Sunil K; Witwicki, Robert M; Migliavacca, Eugenia; Yurovsky, Alisa; Lappalainen, Tuuli; Hernandez, Nouria; Reymond, Alexandre; Dermitzakis, Emmanouil T; Deplancke, Bart
2014-01-15
High-throughput sequencing technologies enable the genome-wide analysis of the impact of genetic variation on molecular phenotypes at unprecedented resolution. However, although powerful, these technologies can also introduce unexpected artifacts. We investigated the impact of library amplification bias on the identification of allele-specific (AS) molecular events from high-throughput sequencing data derived from chromatin immunoprecipitation assays (ChIP-seq). Putative AS DNA binding activity for RNA polymerase II was determined using ChIP-seq data derived from lymphoblastoid cell lines of two parent-daughter trios. We found that, at high-sequencing depth, many significant AS binding sites suffered from an amplification bias, as evidenced by a larger number of clonal reads representing one of the two alleles. To alleviate this bias, we devised an amplification bias detection strategy, which filters out sites with low read complexity and sites featuring a significant excess of clonal reads. This method will be useful for AS analyses involving ChIP-seq and other functional sequencing assays. The R package abs filter for library clonality simulations and detection of amplification-biased sites is available from http://updepla1srv1.epfl.ch/waszaks/absfilter
Moussavou-Boundzanga, Pamela; Koumakpayi, Ismaël Hervé; Labouba, Ingrid; Leroy, Eric M; Belembaogo, Ernest; Berthet, Nicolas
2017-12-21
Cervical cancer is the fourth most common malignancy in women worldwide. However, screening with human papillomavirus (HPV) molecular tests holds promise for reducing cervical cancer incidence and mortality in low- and middle-income countries. The performance of the Abbott RealTime High-Risk HPV test (AbRT) was evaluated in 83 cervical smear specimens and compared with a conventional nested PCR coupled to high-throughput sequencing (HTS) to identify the amplicons. The AbRT assay detected at least one HPV genotype in 44.57% of women regardless of the grade of cervical abnormalities. Except for one case, good concordance was observed for the genotypes detected with the AbRT assay in the high-risk HPV category determined with HTS of the amplicon generated by conventional nested PCR. The AbRT test is an easy and reliable molecular tool and was as sensitive as conventional nested PCR in cervical smear specimens for detection HPVs associated with high-grade lesions. Moreover, sequencing amplicons using an HTS approach effectively identified the genotype of the hrHPV identified with the AbRT test.
USDA-ARS?s Scientific Manuscript database
We conducted genomic sequencing to identify viruses associated with mosaic disease of an apple tree using the high-throughput sequencing (HTS) Illumina RNA-seq platform. The objective was to examine if rapid identification and characterization of viruses could be effectively achieved by RNA-seq anal...
Analysis of petunia hybrida in response to salt stress using high throughput RNA sequencing
USDA-ARS?s Scientific Manuscript database
Salt and drought are among the greatest challenges to crop and native plants in meeting their yield and reproductive potentials. DNA sequencing-enabled transcriptome profiling provides a means of assessing what genes are responding to salt or drought stress so as to better understand the molecular ...
[Study on Microbial Diversity of Peri-implantitis Subgingival by High-throughput Sequencing].
Li, Zhi-jie; Wang, Shao-guo; Li, Yue-hong; Tu, Dong-xiang; Liu, Shi-yun; Nie, Hong-bing; Li, Zhi-qiang; Zhang, Ju-mei
2015-07-01
To study microbial diversity of peri-implantitis subgingival with high-throughput sequencing, and investigate microbiological etiology of peri-implantitis. Subgingival plaques were sampled from the patients with peri-implantitis (D group) and non-peri-implantitis subjects (N group). The microbiological diversity of the subgingival plaques was detected by sequencing V4 region of 16S rRNA with Illumina Miseq platform. The diversity of the community structure was analyzed using Mothur software. A total of 156 507 gene sequences were detected in nine samples and 4 402 operational taxonomic units (OTUs) were found. Selenomonas, Pseudomonas, and Fusobacterium were dominant bacteria in D group, while Fusobacterium, Veillonella and Streptococcus were dominant bacteria in N group. Differences between peri-implantitis and non-peri-implantitis bacterial communities were observed at all phylogenetic levels by LEfSe, which was also found in PcoA test. The occurrence of peri-implantitis is not only related to periodontitis pathogenic microbe, but also related with the changes of oral microbial community structure. Treponema, Herbaspirillum, Butyricimonas and Phaeobacte may be closely related to the occurrence and development of peri-implantitis.
Tan, Yann-Chong; Blum, Lisa K; Kongpachith, Sarah; Ju, Chia-Hsin; Cai, Xiaoyong; Lindstrom, Tamsin M; Sokolove, Jeremy; Robinson, William H
2014-03-01
We developed a DNA barcoding method to enable high-throughput sequencing of the cognate heavy- and light-chain pairs of the antibodies expressed by individual B cells. We used this approach to elucidate the plasmablast antibody response to influenza vaccination. We show that >75% of the rationally selected plasmablast antibodies bind and neutralize influenza, and that antibodies from clonal families, defined by sharing both heavy-chain VJ and light-chain VJ sequence usage, do so most effectively. Vaccine-induced heavy-chain VJ regions contained on average >20 nucleotide mutations as compared to their predicted germline gene sequences, and some vaccine-induced antibodies exhibited higher binding affinities for hemagglutinins derived from prior years' seasonal influenza as compared to their affinities for the immunization strains. Our results show that influenza vaccination induces the recall of memory B cells that express antibodies that previously underwent affinity maturation against prior years' seasonal influenza, suggesting that 'original antigenic sin' shapes the antibody response to influenza vaccination. Published by Elsevier Inc.
Brain Connectivity as a DNA Sequencing Problem
NASA Astrophysics Data System (ADS)
Zador, Anthony
The mammalian cortex consists of millions or billions of neurons, each connected to thousands of other neurons. Traditional methods for determining the brain connectivity rely on microscopy to visualize neuronal connections, but such methods are slow, labor-intensive and often lack single neuron resolution. We have recently developed a new method, MAPseq, to recast the determination of brain wiring into a form that can exploit the tremendous recent advances in high-throughput DNA sequencing. DNA sequencing technology has outpaced even Moore's law, so that the cost of sequencing the human genome has dropped from a billion dollars in 2001 to below a thousand dollars today. MAPseq works by introducing random sequences of DNA-``barcodes''-to tag neurons uniquely. With MAPseq, we can determine the connectivity of over 50K single neurons in a single mouse cortex in about a week, an unprecedented throughput, ushering in the era of ``big data'' for brain wiring. We are now developing analytical tools and algorithms to make sense of these novel data sets.
High-throughput automated microfluidic sample preparation for accurate microbial genomics
Kim, Soohong; De Jonghe, Joachim; Kulesa, Anthony B.; Feldman, David; Vatanen, Tommi; Bhattacharyya, Roby P.; Berdy, Brittany; Gomez, James; Nolan, Jill; Epstein, Slava; Blainey, Paul C.
2017-01-01
Low-cost shotgun DNA sequencing is transforming the microbial sciences. Sequencing instruments are so effective that sample preparation is now the key limiting factor. Here, we introduce a microfluidic sample preparation platform that integrates the key steps in cells to sequence library sample preparation for up to 96 samples and reduces DNA input requirements 100-fold while maintaining or improving data quality. The general-purpose microarchitecture we demonstrate supports workflows with arbitrary numbers of reaction and clean-up or capture steps. By reducing the sample quantity requirements, we enabled low-input (∼10,000 cells) whole-genome shotgun (WGS) sequencing of Mycobacterium tuberculosis and soil micro-colonies with superior results. We also leveraged the enhanced throughput to sequence ∼400 clinical Pseudomonas aeruginosa libraries and demonstrate excellent single-nucleotide polymorphism detection performance that explained phenotypically observed antibiotic resistance. Fully-integrated lab-on-chip sample preparation overcomes technical barriers to enable broader deployment of genomics across many basic research and translational applications. PMID:28128213
Genome-wide mapping of autonomous promoter activity in human cells
van Arensbergen, Joris; FitzPatrick, Vincent D.; de Haas, Marcel; Pagie, Ludo; Sluimer, Jasper; Bussemaker, Harmen J.; van Steensel, Bas
2017-01-01
Previous methods to systematically characterize sequence-intrinsic activity of promoters have been limited by relatively low throughput and the length of sequences that could be tested. Here we present Survey of Regulatory Elements (SuRE), a method to assay more than 108 DNA fragments, each 0.2–2kb in size, for their ability to drive transcription autonomously. In SuRE, a plasmid library is constructed of random genomic fragments upstream of a 20bp barcode and decoded by paired-end sequencing. This library is then transfected into cells and transcribed barcodes are quantified in the RNA by high throughput sequencing. When applied to the human genome, we achieved a 55-fold genome coverage, allowing us to map autonomous promoter activity genome-wide. By computational modeling we delineated subregions within promoters that are relevant for their activity. For instance, we show that antisense promoter transcription is generally dependent on the sense core promoter sequences, and that most enhancers and several families of repetitive elements act as autonomous transcription initiation sites. PMID:28024146
Tempo and mode of genomic mutations unveil human evolutionary history.
Hara, Yuichiro
2015-01-01
Mutations that have occurred in human genomes provide insight into various aspects of evolutionary history such as speciation events and degrees of natural selection. Comparing genome sequences between human and great apes or among humans is a feasible approach for inferring human evolutionary history. Recent advances in high-throughput or so-called 'next-generation' DNA sequencing technologies have enabled the sequencing of thousands of individual human genomes, as well as a variety of reference genomes of hominids, many of which are publicly available. These sequence data can help to unveil the detailed demographic history of the lineage leading to humans as well as the explosion of modern human population size in the last several thousand years. In addition, high-throughput sequencing illustrates the tempo and mode of de novo mutations, which are producing human genetic variation at this moment. Pedigree-based human genome sequencing has shown that mutation rates vary significantly across the human genome. These studies have also provided an improved timescale of human evolution, because the mutation rate estimated from pedigree analysis is half that estimated from traditional analyses based on molecular phylogeny. Because of the dramatic reduction in sequencing cost, sequencing on-demand samples designed for specific studies is now also becoming popular. To produce data of sufficient quality to meet the requirements of the study, it is necessary to set an explicit sequencing plan that includes the choice of sample collection methods, sequencing platforms, and number of sequence reads.
Onuț-Brännström, Ioana; Benjamin, Mitchell; Scofield, Douglas G; Heiðmarsson, Starri; Andersson, Martin G I; Lindström, Eva S; Johannesson, Hanna
2018-03-13
In this study, we explored the diversity of green algal symbionts (photobionts) in sympatric populations of the cosmopolitan lichen-forming fungi Thamnolia and Cetraria. We sequenced with both Sanger and Ion Torrent High-Throughput Sequencing technologies the photobiont ITS-region of 30 lichen thalli from two islands: Iceland and Öland. While Sanger recovered just one photobiont genotype from each thallus, the Ion Torrent data recovered 10-18 OTUs for each pool of 5 lichen thalli, suggesting that individual lichens can contain heterogeneous photobiont populations. Both methods showed evidence for photobiont sharing between Thamnolia and Cetraria on Iceland. In contrast, our data suggest that on Öland the two mycobionts associate with distinct photobiont communities, with few shared OTUs revealed by Ion Torrent sequencing. Furthermore, by comparing our sequences with public data, we identified closely related photobionts from geographically distant localities. Taken together, we suggest that the photobiont composition in Thamnolia and Cetraria results from both photobiont-mycobiont codispersal and local acquisition during mycobiont establishment and/or lichen growth. We hypothesize that this is a successful strategy for lichens to be flexible in the use of the most adapted photobiont for the environment.
Lou, Tzu-Fang; Weidmann, Chase A; Killingsworth, Jordan; Tanaka Hall, Traci M; Goldstrohm, Aaron C; Campbell, Zachary T
2017-04-15
RNA-binding proteins (RBPs) collaborate to control virtually every aspect of RNA function. Tremendous progress has been made in the area of global assessment of RBP specificity using next-generation sequencing approaches both in vivo and in vitro. Understanding how protein-protein interactions enable precise combinatorial regulation of RNA remains a significant problem. Addressing this challenge requires tools that can quantitatively determine the specificities of both individual proteins and multimeric complexes in an unbiased and comprehensive way. One approach utilizes in vitro selection, high-throughput sequencing, and sequence-specificity landscapes (SEQRS). We outline a SEQRS experiment focused on obtaining the specificity of a multi-protein complex between Drosophila RBPs Pumilio (Pum) and Nanos (Nos). We discuss the necessary controls in this type of experiment and examine how the resulting data can be complemented with structural and cell-based reporter assays. Additionally, SEQRS data can be integrated with functional genomics data to uncover biological function. Finally, we propose extensions of the technique that will enhance our understanding of multi-protein regulatory complexes assembled onto RNA. Copyright © 2016 Elsevier Inc. All rights reserved.
NASA Astrophysics Data System (ADS)
Lin, Chuang; Wang, Binghui; Jiang, Ning; Farina, Dario
2018-04-01
Objective. This paper proposes a novel simultaneous and proportional multiple degree of freedom (DOF) myoelectric control method for active prostheses. Approach. The approach is based on non-negative matrix factorization (NMF) of surface EMG signals with the inclusion of sparseness constraints. By applying a sparseness constraint to the control signal matrix, it is possible to extract the basis information from arbitrary movements (quasi-unsupervised approach) for multiple DOFs concurrently. Main Results. In online testing based on target hitting, able-bodied subjects reached a greater throughput (TP) when using sparse NMF (SNMF) than with classic NMF or with linear regression (LR). Accordingly, the completion time (CT) was shorter for SNMF than NMF or LR. The same observations were made in two patients with unilateral limb deficiencies. Significance. The addition of sparseness constraints to NMF allows for a quasi-unsupervised approach to myoelectric control with superior results with respect to previous methods for the simultaneous and proportional control of multi-DOF. The proposed factorization algorithm allows robust simultaneous and proportional control, is superior to previous supervised algorithms, and, because of minimal supervision, paves the way to online adaptation in myoelectric control.
Wickersheim, Michelle L; Blumenstiel, Justin P
2013-11-01
A large number of methods are available to deplete ribosomal RNA reads from high-throughput RNA sequencing experiments. Such methods are critical for sequencing Drosophila small RNAs between 20 and 30 nucleotides because size selection is not typically sufficient to exclude the highly abundant class of 30 nucleotide 2S rRNA. Here we demonstrate that pre-annealing terminator oligos complimentary to Drosophila 2S rRNA prior to 5' adapter ligation and reverse transcription efficiently depletes 2S rRNA sequences from the sequencing reaction in a simple and inexpensive way. This depletion is highly specific and is achieved with minimal perturbation of miRNA and piRNA profiles.
High-resolution phylogenetic microbial community profiling
DOE Office of Scientific and Technical Information (OSTI.GOV)
Singer, Esther; Bushnell, Brian; Coleman-Derr, Devin
Over the past decade, high-throughput short-read 16S rRNA gene amplicon sequencing has eclipsed clone-dependent long-read Sanger sequencing for microbial community profiling. The transition to new technologies has provided more quantitative information at the expense of taxonomic resolution with implications for inferring metabolic traits in various ecosystems. We applied single-molecule real-time sequencing for microbial community profiling, generating full-length 16S rRNA gene sequences at high throughput, which we propose to name PhyloTags. We benchmarked and validated this approach using a defined microbial community. When further applied to samples from the water column of meromictic Sakinaw Lake, we show that while community structuresmore » at the phylum level are comparable between PhyloTags and Illumina V4 16S rRNA gene sequences (iTags), variance increases with community complexity at greater water depths. PhyloTags moreover allowed less ambiguous classification. Last, a platform-independent comparison of PhyloTags and in silico generated partial 16S rRNA gene sequences demonstrated significant differences in community structure and phylogenetic resolution across multiple taxonomic levels, including a severe underestimation in the abundance of specific microbial genera involved in nitrogen and methane cycling across the Lake's water column. Thus, PhyloTags provide a reliable adjunct or alternative to cost-effective iTags, enabling more accurate phylogenetic resolution of microbial communities and predictions on their metabolic potential.« less
McGuire, Jimmy A; Cotoras, Darko D; O'Connell, Brendan; Lawalata, Shobi Z S; Wang-Claypool, Cynthia Y; Stubbs, Alexander; Huang, Xiaoting; Wogan, Guinevere O U; Hykin, Sarah M; Reilly, Sean B; Bi, Ke; Riyanto, Awal; Arida, Evy; Smith, Lydia L; Milne, Heather; Streicher, Jeffrey W; Iskandar, Djoko T
2018-01-01
We used Massively Parallel High-Throughput Sequencing to obtain genetic data from a 145-year old holotype specimen of the flying lizard, Draco cristatellus . Obtaining genetic data from this holotype was necessary to resolve an otherwise intractable taxonomic problem involving the status of this species relative to closely related sympatric Draco species that cannot otherwise be distinguished from one another on the basis of museum specimens. Initial analyses suggested that the DNA present in the holotype sample was so degraded as to be unusable for sequencing. However, we used a specialized extraction procedure developed for highly degraded ancient DNA samples and MiSeq shotgun sequencing to obtain just enough low-coverage mitochondrial DNA (721 base pairs) to conclusively resolve the species status of the holotype as well as a second known specimen of this species. The holotype was prepared before the advent of formalin-fixation and therefore was most likely originally fixed with ethanol and never exposed to formalin. Whereas conventional wisdom suggests that formalin-fixed samples should be the most challenging for DNA sequencing, we propose that evaporation during long-term alcohol storage and consequent water-exposure may subject older ethanol-fixed museum specimens to hydrolytic damage. If so, this may pose an even greater challenge for sequencing efforts involving historical samples.
High-resolution phylogenetic microbial community profiling
Singer, Esther; Bushnell, Brian; Coleman-Derr, Devin; ...
2016-02-09
Over the past decade, high-throughput short-read 16S rRNA gene amplicon sequencing has eclipsed clone-dependent long-read Sanger sequencing for microbial community profiling. The transition to new technologies has provided more quantitative information at the expense of taxonomic resolution with implications for inferring metabolic traits in various ecosystems. We applied single-molecule real-time sequencing for microbial community profiling, generating full-length 16S rRNA gene sequences at high throughput, which we propose to name PhyloTags. We benchmarked and validated this approach using a defined microbial community. When further applied to samples from the water column of meromictic Sakinaw Lake, we show that while community structuresmore » at the phylum level are comparable between PhyloTags and Illumina V4 16S rRNA gene sequences (iTags), variance increases with community complexity at greater water depths. PhyloTags moreover allowed less ambiguous classification. Last, a platform-independent comparison of PhyloTags and in silico generated partial 16S rRNA gene sequences demonstrated significant differences in community structure and phylogenetic resolution across multiple taxonomic levels, including a severe underestimation in the abundance of specific microbial genera involved in nitrogen and methane cycling across the Lake's water column. Thus, PhyloTags provide a reliable adjunct or alternative to cost-effective iTags, enabling more accurate phylogenetic resolution of microbial communities and predictions on their metabolic potential.« less
Unmanned aerial vehicles for high-throughput phenotyping and agronomic research
USDA-ARS?s Scientific Manuscript database
Advances in automation and data science have led agriculturists to seek real-time, high-quality, high-volume crop data to accelerate crop improvement through breeding and to optimize agronomic practices. Breeders have recently gained massive data-collection capability in genome sequencing of plants....
Tan, Wui Siew; Lewis, Christina L; Horelik, Nicholas E; Pregibon, Daniel C; Doyle, Patrick S; Yi, Hyunmin
2008-11-04
We demonstrate hierarchical assembly of tobacco mosaic virus (TMV)-based nanotemplates with hydrogel-based encoded microparticles via nucleic acid hybridization. TMV nanotemplates possess a highly defined structure and a genetically engineered high density thiol functionality. The encoded microparticles are produced in a high throughput microfluidic device via stop-flow lithography (SFL) and consist of spatially discrete regions containing encoded identity information, an internal control, and capture DNAs. For the hybridization-based assembly, partially disassembled TMVs were programmed with linker DNAs that contain sequences complementary to both the virus 5' end and a selected capture DNA. Fluorescence microscopy, atomic force microscopy (AFM), and confocal microscopy results clearly indicate facile assembly of TMV nanotemplates onto microparticles with high spatial and sequence selectivity. We anticipate that our hybridization-based assembly strategy could be employed to create multifunctional viral-synthetic hybrid materials in a rapid and high-throughput manner. Additionally, we believe that these viral-synthetic hybrid microparticles may find broad applications in high capacity, multiplexed target sensing.
High-Throughput Sequencing Reveals Principles of Adeno-Associated Virus Serotype 2 Integration
Janovitz, Tyler; Klein, Isaac A.; Oliveira, Thiago; Mukherjee, Piali; Nussenzweig, Michel C.; Sadelain, Michel
2013-01-01
Viral integrations are important in human biology, yet genome-wide integration profiles have not been determined for many viruses. Adeno-associated virus (AAV) infects most of the human population and is a prevalent gene therapy vector. AAV integrates into the human genome with preference for a single locus, termed AAVS1. However, the genome-wide integration of AAV has not been defined, and the principles underlying this recombination remain unclear. Using a novel high-throughput approach, integrant capture sequencing, nearly 12 million AAV junctions were recovered from a human cell line, providing five orders of magnitude more data than were previously available. Forty-five percent of integrations occurred near AAVS1, and several thousand novel integration hotspots were identified computationally. Most of these occurred in genes, with dozens of hotspots targeting known oncogenes. Viral replication protein binding sites (RBS) and transcriptional activity were major factors favoring integration. In a first for eukaryotic viruses, the data reveal a unique asymmetric integration profile with distinctive directional orientation of viral genomes. These studies provide a new understanding of AAV integration biology through the use of unbiased high-throughput data acquisition and bioinformatics. PMID:23720718
Aryee, Martin J.; Jaffe, Andrew E.; Corrada-Bravo, Hector; Ladd-Acosta, Christine; Feinberg, Andrew P.; Hansen, Kasper D.; Irizarry, Rafael A.
2014-01-01
Motivation: The recently released Infinium HumanMethylation450 array (the ‘450k’ array) provides a high-throughput assay to quantify DNA methylation (DNAm) at ∼450 000 loci across a range of genomic features. Although less comprehensive than high-throughput sequencing-based techniques, this product is more cost-effective and promises to be the most widely used DNAm high-throughput measurement technology over the next several years. Results: Here we describe a suite of computational tools that incorporate state-of-the-art statistical techniques for the analysis of DNAm data. The software is structured to easily adapt to future versions of the technology. We include methods for preprocessing, quality assessment and detection of differentially methylated regions from the kilobase to the megabase scale. We show how our software provides a powerful and flexible development platform for future methods. We also illustrate how our methods empower the technology to make discoveries previously thought to be possible only with sequencing-based methods. Availability and implementation: http://bioconductor.org/packages/release/bioc/html/minfi.html. Contact: khansen@jhsph.edu; rafa@jimmy.harvard.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:24478339
DOE Office of Scientific and Technical Information (OSTI.GOV)
Brewster, Aaron S.; Sawaya, Michael R.; University of California, Los Angeles, CA 90095-1570
2015-02-01
Special methods are required to interpret sparse diffraction patterns collected from peptide crystals at X-ray free-electron lasers. Bragg spots can be indexed from composite-image powder rings, with crystal orientations then deduced from a very limited number of spot positions. Still diffraction patterns from peptide nanocrystals with small unit cells are challenging to index using conventional methods owing to the limited number of spots and the lack of crystal orientation information for individual images. New indexing algorithms have been developed as part of the Computational Crystallography Toolbox (cctbx) to overcome these challenges. Accurate unit-cell information derived from an aggregate data setmore » from thousands of diffraction patterns can be used to determine a crystal orientation matrix for individual images with as few as five reflections. These algorithms are potentially applicable not only to amyloid peptides but also to any set of diffraction patterns with sparse properties, such as low-resolution virus structures or high-throughput screening of still images captured by raster-scanning at synchrotron sources. As a proof of concept for this technique, successful integration of X-ray free-electron laser (XFEL) data to 2.5 Å resolution for the amyloid segment GNNQQNY from the Sup35 yeast prion is presented.« less
Cao, Buwen; Deng, Shuguang; Qin, Hua; Ding, Pingjian; Chen, Shaopeng; Li, Guanghui
2018-06-15
High-throughput technology has generated large-scale protein interaction data, which is crucial in our understanding of biological organisms. Many complex identification algorithms have been developed to determine protein complexes. However, these methods are only suitable for dense protein interaction networks, because their capabilities decrease rapidly when applied to sparse protein⁻protein interaction (PPI) networks. In this study, based on penalized matrix decomposition ( PMD ), a novel method of penalized matrix decomposition for the identification of protein complexes (i.e., PMD pc ) was developed to detect protein complexes in the human protein interaction network. This method mainly consists of three steps. First, the adjacent matrix of the protein interaction network is normalized. Second, the normalized matrix is decomposed into three factor matrices. The PMD pc method can detect protein complexes in sparse PPI networks by imposing appropriate constraints on factor matrices. Finally, the results of our method are compared with those of other methods in human PPI network. Experimental results show that our method can not only outperform classical algorithms, such as CFinder, ClusterONE, RRW, HC-PIN, and PCE-FR, but can also achieve an ideal overall performance in terms of a composite score consisting of F-measure, accuracy (ACC), and the maximum matching ratio (MMR).
Microbial forensics: fiber optic microarray subtyping of Bacillus anthracis
NASA Astrophysics Data System (ADS)
Shepard, Jason R. E.
2009-05-01
The past decade has seen increased development and subsequent adoption of rapid molecular techniques involving DNA analysis for detection of pathogenic microorganisms, also termed microbial forensics. The continued accumulation of microbial sequence information in genomic databases now better positions the field of high-throughput DNA analysis to proceed in a more manageable fashion. The potential to build off of these databases exists as technology continues to develop, which will enable more rapid, cost effective analyses. This wealth of genetic information, along with new technologies, has the potential to better address some of the current problems and solve the key issues involved in DNA analysis of pathogenic microorganisms. To this end, a high density fiber optic microarray has been employed, housing numerous DNA sequences simultaneously for detection of various pathogenic microorganisms, including Bacillus anthracis, among others. Each organism is analyzed with multiple sequences and can be sub-typed against other closely related organisms. For public health labs, real-time PCR methods have been developed as an initial preliminary screen, but culture and growth are still considered the gold standard. Technologies employing higher throughput than these standard methods are better suited to capitalize on the limitless potential garnered from the sequence information. Microarray analyses are one such format positioned to exploit this potential, and our array platform is reusable, allowing repetitive tests on a single array, providing an increase in throughput and decrease in cost, along with a certainty of detection, down to the individual strain level.
Aschard, Hugues; Cattoir, Vincent; Yoder-Himes, Deborah; Lory, Stephen; Pier, Gerald B.
2013-01-01
High-throughput sequencing of transposon (Tn) libraries created within entire genomes identifies and quantifies the contribution of individual genes and operons to the fitness of organisms in different environments. We used insertion-sequencing (INSeq) to analyze the contribution to fitness of all non-essential genes in the chromosome of Pseudomonas aeruginosa strain PA14 based on a library of ∼300,000 individual Tn insertions. In vitro growth in LB provided a baseline for comparison with the survival of the Tn insertion strains following 6 days of colonization of the murine gastrointestinal tract as well as a comparison with Tn-inserts subsequently able to systemically disseminate to the spleen following induction of neutropenia. Sequencing was performed following DNA extraction from the recovered bacteria, digestion with the MmeI restriction enzyme that hydrolyzes DNA 16 bp away from the end of the Tn insert, and fractionation into oligonucleotides of 1,200–1,500 bp that were prepared for high-throughput sequencing. Changes in frequency of Tn inserts into the P. aeruginosa genome were used to quantify in vivo fitness resulting from loss of a gene. 636 genes had <10 sequencing reads in LB, thus defined as unable to grow in this medium. During in vivo infection there were major losses of strains with Tn inserts in almost all known virulence factors, as well as respiration, energy utilization, ion pumps, nutritional genes and prophages. Many new candidates for virulence factors were also identified. There were consistent changes in the recovery of Tn inserts in genes within most operons and Tn insertions into some genes enhanced in vivo fitness. Strikingly, 90% of the non-essential genes were required for in vivo survival following systemic dissemination during neutropenia. These experiments resulted in the identification of the P. aeruginosa strain PA14 genes necessary for optimal survival in the mucosal and systemic environments of a mammalian host. PMID:24039572
Sánchez, Cecilia Castaño; Smith, Timothy P L; Wiedmann, Ralph T; Vallejo, Roger L; Salem, Mohamed; Yao, Jianbo; Rexroad, Caird E
2009-11-25
To enhance capabilities for genomic analyses in rainbow trout, such as genomic selection, a large suite of polymorphic markers that are amenable to high-throughput genotyping protocols must be identified. Expressed Sequence Tags (ESTs) have been used for single nucleotide polymorphism (SNP) discovery in salmonids. In those strategies, the salmonid semi-tetraploid genomes often led to assemblies of paralogous sequences and therefore resulted in a high rate of false positive SNP identification. Sequencing genomic DNA using primers identified from ESTs proved to be an effective but time consuming methodology of SNP identification in rainbow trout, therefore not suitable for high throughput SNP discovery. In this study, we employed a high-throughput strategy that used pyrosequencing technology to generate data from a reduced representation library constructed with genomic DNA pooled from 96 unrelated rainbow trout that represent the National Center for Cool and Cold Water Aquaculture (NCCCWA) broodstock population. The reduced representation library consisted of 440 bp fragments resulting from complete digestion with the restriction enzyme HaeIII; sequencing produced 2,000,000 reads providing an average 6 fold coverage of the estimated 150,000 unique genomic restriction fragments (300,000 fragment ends). Three independent data analyses identified 22,022 to 47,128 putative SNPs on 13,140 to 24,627 independent contigs. A set of 384 putative SNPs, randomly selected from the sets produced by the three analyses were genotyped on individual fish to determine the validation rate of putative SNPs among analyses, distinguish apparent SNPs that actually represent paralogous loci in the tetraploid genome, examine Mendelian segregation, and place the validated SNPs on the rainbow trout linkage map. Approximately 48% (183) of the putative SNPs were validated; 167 markers were successfully incorporated into the rainbow trout linkage map. In addition, 2% of the sequences from the validated markers were associated with rainbow trout transcripts. The use of reduced representation libraries and pyrosequencing technology proved to be an effective strategy for the discovery of a high number of putative SNPs in rainbow trout; however, modifications to the technique to decrease the false discovery rate resulting from the evolutionary recent genome duplication would be desirable.
Multiplex amplification of large sets of human exons.
Porreca, Gregory J; Zhang, Kun; Li, Jin Billy; Xie, Bin; Austin, Derek; Vassallo, Sara L; LeProust, Emily M; Peck, Bill J; Emig, Christopher J; Dahl, Fredrik; Gao, Yuan; Church, George M; Shendure, Jay
2007-11-01
A new generation of technologies is poised to reduce DNA sequencing costs by several orders of magnitude. But our ability to fully leverage the power of these technologies is crippled by the absence of suitable 'front-end' methods for isolating complex subsets of a mammalian genome at a scale that matches the throughput at which these platforms will routinely operate. We show that targeting oligonucleotides released from programmable microarrays can be used to capture and amplify approximately 10,000 human exons in a single multiplex reaction. Additionally, we show integration of this protocol with ultra-high-throughput sequencing for targeted variation discovery. Although the multiplex capture reaction is highly specific, we found that nonuniform capture is a key issue that will need to be resolved by additional optimization. We anticipate that highly multiplexed methods for targeted amplification will enable the comprehensive resequencing of human exons at a fraction of the cost of whole-genome resequencing.
Xie, Zeyi; Zhou, Zhilin; Li, Hongmin; Yu, Jingjing; Jiang, Jiaojiao; Tang, Zhonghou; Ma, Daifu; Zhang, Baohong; Han, Yonghua; Li, Zongyun
2018-05-21
Sweetpotato (Ipomoea batatas L.) is a globally important economic food crop. It belongs to Convolvulaceae family and origins in the tropics; however, sweetpotato is sensitive to cold stress during storage. In this study, we performed transcriptome sequencing to investigate the sweetpotato response to chilling stress during storage. A total of 110,110 unigenes were generated via high-throughput sequencing. Differentially expressed genes (DEGs) analysis showed that 18,681 genes were up-regulated and 21,983 genes were down-regulated in low temperature condition. Many DEGs were related to the cell membrane system, antioxidant enzymes, carbohydrate metabolism, and hormone metabolism, which are potentially associated with sweetpotato resistance to low temperature. The existence of DEGs suggests a molecular basis for the biochemical and physiological consequences of sweetpotato in low temperature storage conditions. Our analysis will provide a new target for enhancement of sweetpotato cold stress tolerance in postharvest storage through genetic manipulation. Copyright © 2018. Published by Elsevier Inc.
Rosenfeld, Aaron M; Meng, Wenzhao; Luning Prak, Eline T; Hershberg, Uri
2017-01-15
As high-throughput sequencing of B cells becomes more common, the need for tools to analyze the large quantity of data also increases. This article introduces ImmuneDB, a system for analyzing vast amounts of heavy chain variable region sequences and exploring the resulting data. It can take as input raw FASTA/FASTQ data, identify genes, determine clones, construct lineages, as well as provide information such as selection pressure and mutation analysis. It uses an industry leading database, MySQL, to provide fast analysis and avoid the complexities of using error prone flat-files. ImmuneDB is freely available at http://immunedb.comA demo of the ImmuneDB web interface is available at: http://immunedb.com/demo CONTACT: Uh25@drexel.eduSupplementary information: Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Surveying the repair of ancient DNA from bones via high-throughput sequencing.
Mouttham, Nathalie; Klunk, Jennifer; Kuch, Melanie; Fourney, Ron; Poinar, Hendrik
2015-07-01
DNA damage in the form of abasic sites, chemically altered nucleotides, and strand fragmentation is the foremost limitation in obtaining genetic information from many ancient samples. Upon cell death, DNA continues to endure various chemical attacks such as hydrolysis and oxidation, but repair pathways found in vivo no longer operate. By incubating degraded DNA with specific enzyme combinations adopted from these pathways, it is possible to reverse some of the post-mortem nucleic acid damage prior to downstream analyses such as library preparation, targeted enrichment, and high-throughput sequencing. Here, we evaluate the performance of two available repair protocols on previously characterized DNA extracts from four mammoths. Both methods use endonucleases and glycosylases along with a DNA polymerase-ligase combination. PreCR Repair Mix increases the number of molecules converted to sequencing libraries, leading to an increase in endogenous content and a decrease in cytosine-to-thymine transitions due to cytosine deamination. However, the effects of Nelson Repair Mix on repair of DNA damage remain inconclusive.
Genome Sequence of the Novel Marine Member of the Gammaproteobacteria Strain HTCC5015▿
Thrash, J. Cameron; Stingl, Ulrich; Cho, Jang-Cheon; Ferriera, Steve; Johnson, Justin; Vergin, Kevin L.; Giovannoni, Stephen J.
2010-01-01
HTCC5015 is a novel, highly divergent marine member of the Gammaproteobacteria, currently without a cultured representative with greater than 89% 16S rRNA gene identity to itself. The organism was isolated from water collected from Hydrostation S south of Bermuda using high-throughput dilution-to-extinction culturing techniques. Here we present the genome sequence of the unique Gammaproteobacterium strain HTCC5015. PMID:20472792
Stockinger, Herbert; Peyret-Guzzon, Marine; Koegel, Sally; Bouffaud, Marie-Lara; Redecker, Dirk
2014-01-01
Due to the potential of arbuscular mycorrhizal fungi (AMF, Glomeromycota) to improve plant growth and soil quality, the influence of agricultural practice on their diversity continues to be an important research question. Up to now studies of community diversity in AMF have exclusively been based on nuclear ribosomal gene regions, which in AMF show high intra-organism polymorphism, seriously complicating interpretation of these data. We designed specific PCR primers for 454 sequencing of a region of the largest subunit of RNA polymerase II gene, and established a new reference dataset comprising all major AMF lineages. This gene is known to be monomorphic within fungal isolates but shows an excellent barcode gap between species. We designed a primer set to amplify all known lineages of AMF and demonstrated its applicability in combination with high-throughput sequencing in a long-term tillage experiment. The PCR primers showed a specificity of 99.94% for glomeromycotan sequences. We found evidence of significant shifts of the AMF communities caused by soil management and showed that tillage effects on different AMF taxa are clearly more complex than previously thought. The high resolving power of high-throughput sequencing highlights the need for quantitative measurements to efficiently detect these effects.
Xia, Fei; Chen, Xin; Guo, Meng-Yuan; Bai, Xiao-Hui; Liu, Yan; Shen, Guang-Rong; Li, Yu-Ling; Lin, Juan; Zhou, Xuan-Wei
2016-01-01
Chinese Cordyceps, known in Chinese as “DongChong XiaCao”, is a parasitic complex of a fungus (Ophiocordyceps sinensis) and a caterpillar. The current study explored the endogenetic fungal communities inhabiting Chinese Cordyceps. Samples were collected from five different geographical regions of Qinghai and Tibet, and the nuclear ribosomal internal transcribed spacer-1 sequences from each sample were obtained using Illumina high-throughput sequencing. The results showed that Ascomycota was the dominant fungal phylum in Chinese Cordyceps and its soil microhabitat from different sampling regions. Among the Ascomycota, 65 genera were identified, and the abundant operational taxonomic units showed the strongest sequence similarity to Ophiocordyceps, Verticillium, Pseudallescheria, Candida and Ilyonectria Not surprisingly, the genus Ophiocordyceps was the largest among the fungal communities identified in the fruiting bodies and external mycelial cortices of Chinese Cordyceps. In addition, fungal communities in the soil microhabitats were clustered separately from the external mycelial cortices and fruiting bodies of Chinese Cordyceps from different sampling regions. There was no significant structural difference in the fungal communities between the fruiting bodies and external mycelial cortices of Chinese Cordyceps. This study revealed an unexpectedly high diversity of fungal communities inhabiting the Chinese Cordyceps and its microhabitats. PMID:27625176
Pritchard, Leighton; Holden, Nicola J; Bielaszewska, Martina; Karch, Helge; Toth, Ian K
2012-01-01
An Escherichia coli O104:H4 outbreak in Germany in summer 2011 caused 53 deaths, over 4000 individual infections across Europe, and considerable economic, social and political impact. This outbreak was the first in a position to exploit rapid, benchtop high-throughput sequencing (HTS) technologies and crowdsourced data analysis early in its investigation, establishing a new paradigm for rapid response to disease threats. We describe a novel strategy for design of diagnostic PCR primers that exploited this rapid draft bacterial genome sequencing to distinguish between E. coli O104:H4 outbreak isolates and other pathogenic E. coli isolates, including the historical hæmolytic uræmic syndrome (HUSEC) E. coli HUSEC041 O104:H4 strain, which possesses the same serotype as the outbreak isolates. Primers were designed using a novel alignment-free strategy against eleven draft whole genome assemblies of E. coli O104:H4 German outbreak isolates from the E. coli O104:H4 Genome Analysis Crowd-Sourcing Consortium website, and a negative sequence set containing 69 E. coli chromosome and plasmid sequences from public databases. Validation in vitro against 21 'positive' E. coli O104:H4 outbreak and 32 'negative' non-outbreak EHEC isolates indicated that individual primer sets exhibited 100% sensitivity for outbreak isolates, with false positive rates of between 9% and 22%. A minimal combination of two primers discriminated between outbreak and non-outbreak E. coli isolates with 100% sensitivity and 100% specificity. Draft genomes of isolates of disease outbreak bacteria enable high throughput primer design and enhanced diagnostic performance in comparison to traditional molecular assays. Future outbreak investigations will be able to harness HTS rapidly to generate draft genome sequences and diagnostic primer sets, greatly facilitating epidemiology and clinical diagnostics. We expect that high throughput primer design strategies will enable faster, more precise responses to future disease outbreaks of bacterial origin, and help to mitigate their societal impact.
Wang, Weizhi; Li, Menglin; Wei, Zewen; Wang, Zihua; Bu, Xiangli; Lai, Wenjia; Yang, Shu; Gong, He; Zheng, Hui; Wang, Yuqiao; Liu, Ying; Li, Qin; Fang, Qiaojun; Hu, Zhiyuan
2014-04-15
Peptide probes and drugs have widespread applications in disease diagnostics and therapy. The demand for peptides ligands with high affinity and high specificity toward various targets has surged in the biomedical field in recent years. The traditional peptide screening procedure involves selection, sequencing, and characterization steps, and each step is manual and tedious. Herein, we developed a bimodal imprint microarray system to embrace the whole peptide screening process. Silver-sputtered silicon chip fabricated with microwell array can trap and pattern the candidate peptide beads in a one-well-one-bead manner. Peptides on beads were photocleaved in situ. A portion of the peptide in each well was transferred to a gold-coated chip to print the peptide array for high-throughput affinity analyses by surface plasmon resonance imaging (SPRi), and the peptide left in the silver-sputtered chip was ready for in situ single bead sequencing by matrix-assisted laser desorption ionization time-of-flight mass spectrometry (MALDI-TOF-MS). Using the bimodal imprint chip system, affinity peptides toward AHA were efficiently screened out from the 7 × 10(4) peptide library. The method provides a solution for high efficiency peptide screening.
Investigating the viral ecology of global bee communities with high-throughput metagenomics.
Galbraith, David A; Fuller, Zachary L; Ray, Allyson M; Brockmann, Axel; Frazier, Maryann; Gikungu, Mary W; Martinez, J Francisco Iturralde; Kapheim, Karen M; Kerby, Jeffrey T; Kocher, Sarah D; Losyev, Oleksiy; Muli, Elliud; Patch, Harland M; Rosa, Cristina; Sakamoto, Joyce M; Stanley, Scott; Vaudo, Anthony D; Grozinger, Christina M
2018-06-11
Bee viral ecology is a fascinating emerging area of research: viruses exert a range of effects on their hosts, exacerbate impacts of other environmental stressors, and, importantly, are readily shared across multiple bee species in a community. However, our understanding of bee viral communities is limited, as it is primarily derived from studies of North American and European Apis mellifera populations. Here, we examined viruses in populations of A. mellifera and 11 other bee species from 9 countries, across 4 continents and Oceania. We developed a novel pipeline to rapidly and inexpensively screen for bee viruses. This pipeline includes purification of encapsulated RNA/DNA viruses, sequence-independent amplification, high throughput sequencing, integrated assembly of contigs, and filtering to identify contigs specifically corresponding to viral sequences. We identified sequences for (+)ssRNA, (-)ssRNA, dsRNA, and ssDNA viruses. Overall, we found 127 contigs corresponding to novel viruses (i.e. previously not observed in bees), with 27 represented by >0.1% of the reads in a given sample, and 7 contained an RdRp or replicase sequence which could be used for robust phylogenetic analysis. This study provides a sequence-independent pipeline for viral metagenomics analysis, and greatly expands our understanding of the diversity of viruses found in bee communities.
Lee, Seungeun; Yamamoto, Naomichi
2015-12-01
This study characterized the accuracy of high-throughput amplicon sequencing to identify species within the genus Aspergillus. To this end, we sequenced the internal transcribed spacer 1 (ITS1), β-tubulin (BenA), and calmodulin (CaM) gene encoding sequences as DNA markers from eight reference Aspergillus strains with known identities using 300-bp sequencing on the Illumina MiSeq platform, and compared them with the BLASTn outputs. The identifications with the sequences longer than 250 bp were accurate at the section rank, with some ambiguities observed at the species rank due to mostly cross detection of sibling species. Additionally, in silico analysis was performed to predict the identification accuracy for all species in the genus Aspergillus, where 107, 210, and 187 species were predicted to be identifiable down to the species rank based on ITS1, BenA, and CaM, respectively. Finally, air filter samples were analysed to quantify the relative abundances of Aspergillus species in outdoor air. The results were reproducible across biological duplicates both at the species and section ranks, but not strongly correlated between ITS1 and BenA, suggesting the Aspergillus detection can be taxonomically biased depending on the selection of the DNA markers and/or primers. Copyright © 2015 The British Mycological Society. Published by Elsevier Ltd. All rights reserved.
BlackOPs: increasing confidence in variant detection through mappability filtering.
Cabanski, Christopher R; Wilkerson, Matthew D; Soloway, Matthew; Parker, Joel S; Liu, Jinze; Prins, Jan F; Marron, J S; Perou, Charles M; Hayes, D Neil
2013-10-01
Identifying variants using high-throughput sequencing data is currently a challenge because true biological variants can be indistinguishable from technical artifacts. One source of technical artifact results from incorrectly aligning experimentally observed sequences to their true genomic origin ('mismapping') and inferring differences in mismapped sequences to be true variants. We developed BlackOPs, an open-source tool that simulates experimental RNA-seq and DNA whole exome sequences derived from the reference genome, aligns these sequences by custom parameters, detects variants and outputs a blacklist of positions and alleles caused by mismapping. Blacklists contain thousands of artifact variants that are indistinguishable from true variants and, for a given sample, are expected to be almost completely false positives. We show that these blacklist positions are specific to the alignment algorithm and read length used, and BlackOPs allows users to generate a blacklist specific to their experimental setup. We queried the dbSNP and COSMIC variant databases and found numerous variants indistinguishable from mapping errors. We demonstrate how filtering against blacklist positions reduces the number of potential false variants using an RNA-seq glioblastoma cell line data set. In summary, accounting for mapping-caused variants tuned to experimental setups reduces false positives and, therefore, improves genome characterization by high-throughput sequencing.
Takahashi, Mayumi; Wu, Xiwei; Ho, Michelle; Chomchan, Pritsana; Rossi, John J; Burnett, John C; Zhou, Jiehua
2016-09-22
The systemic evolution of ligands by exponential enrichment (SELEX) technique is a powerful and effective aptamer-selection procedure. However, modifications to the process can dramatically improve selection efficiency and aptamer performance. For example, droplet digital PCR (ddPCR) has been recently incorporated into SELEX selection protocols to putatively reduce the propagation of byproducts and avoid selection bias that result from differences in PCR efficiency of sequences within the random library. However, a detailed, parallel comparison of the efficacy of conventional solution PCR versus the ddPCR modification in the RNA aptamer-selection process is needed to understand effects on overall SELEX performance. In the present study, we took advantage of powerful high throughput sequencing technology and bioinformatics analysis coupled with SELEX (HT-SELEX) to thoroughly investigate the effects of initial library and PCR methods in the RNA aptamer identification. Our analysis revealed that distinct "biased sequences" and nucleotide composition existed in the initial, unselected libraries purchased from two different manufacturers and that the fate of the "biased sequences" was target-dependent during selection. Our comparison of solution PCR- and ddPCR-driven HT-SELEX demonstrated that PCR method affected not only the nucleotide composition of the enriched sequences, but also the overall SELEX efficiency and aptamer efficacy.
Alterations of microbiota in urine from women with interstitial cystitis
2012-01-01
Background Interstitial Cystitis (IC) is a chronic inflammatory condition of the bladder with unknown etiology. The aim of this study was to characterize the microbial community present in the urine from IC female patients by 454 high throughput sequencing of the 16S variable regions V1V2 and V6. The taxonomical composition, richness and diversity of the IC microbiota were determined and compared to the microbial profile of asymptomatic healthy female (HF) urine. Results The composition and distribution of bacterial sequences differed between the urine microbiota of IC patients and HFs. Reduced sequence richness and diversity were found in IC patient urine, and a significant difference in the community structure of IC urine in relation to HF urine was observed. More than 90% of the IC sequence reads were identified as belonging to the bacterial genus Lactobacillus, a marked increase compared to 60% in HF urine. Conclusion The 16S rDNA sequence data demonstrates a shift in the composition of the bacterial community in IC urine. The reduced microbial diversity and richness is accompanied by a higher abundance of the bacterial genus Lactobacillus, compared to HF urine. This study demonstrates that high throughput sequencing analysis of urine microbiota in IC patients is a powerful tool towards a better understanding of this enigmatic disease. PMID:22974186
Identification and correction of systematic error in high-throughput sequence data
2011-01-01
Background A feature common to all DNA sequencing technologies is the presence of base-call errors in the sequenced reads. The implications of such errors are application specific, ranging from minor informatics nuisances to major problems affecting biological inferences. Recently developed "next-gen" sequencing technologies have greatly reduced the cost of sequencing, but have been shown to be more error prone than previous technologies. Both position specific (depending on the location in the read) and sequence specific (depending on the sequence in the read) errors have been identified in Illumina and Life Technology sequencing platforms. We describe a new type of systematic error that manifests as statistically unlikely accumulations of errors at specific genome (or transcriptome) locations. Results We characterize and describe systematic errors using overlapping paired reads from high-coverage data. We show that such errors occur in approximately 1 in 1000 base pairs, and that they are highly replicable across experiments. We identify motifs that are frequent at systematic error sites, and describe a classifier that distinguishes heterozygous sites from systematic error. Our classifier is designed to accommodate data from experiments in which the allele frequencies at heterozygous sites are not necessarily 0.5 (such as in the case of RNA-Seq), and can be used with single-end datasets. Conclusions Systematic errors can easily be mistaken for heterozygous sites in individuals, or for SNPs in population analyses. Systematic errors are particularly problematic in low coverage experiments, or in estimates of allele-specific expression from RNA-Seq data. Our characterization of systematic error has allowed us to develop a program, called SysCall, for identifying and correcting such errors. We conclude that correction of systematic errors is important to consider in the design and interpretation of high-throughput sequencing experiments. PMID:22099972
High-throughput full-length single-cell mRNA-seq of rare cells.
Ooi, Chin Chun; Mantalas, Gary L; Koh, Winston; Neff, Norma F; Fuchigami, Teruaki; Wong, Dawson J; Wilson, Robert J; Park, Seung-Min; Gambhir, Sanjiv S; Quake, Stephen R; Wang, Shan X
2017-01-01
Single-cell characterization techniques, such as mRNA-seq, have been applied to a diverse range of applications in cancer biology, yielding great insight into mechanisms leading to therapy resistance and tumor clonality. While single-cell techniques can yield a wealth of information, a common bottleneck is the lack of throughput, with many current processing methods being limited to the analysis of small volumes of single cell suspensions with cell densities on the order of 107 per mL. In this work, we present a high-throughput full-length mRNA-seq protocol incorporating a magnetic sifter and magnetic nanoparticle-antibody conjugates for rare cell enrichment, and Smart-seq2 chemistry for sequencing. We evaluate the efficiency and quality of this protocol with a simulated circulating tumor cell system, whereby non-small-cell lung cancer cell lines (NCI-H1650 and NCI-H1975) are spiked into whole blood, before being enriched for single-cell mRNA-seq by EpCAM-functionalized magnetic nanoparticles and the magnetic sifter. We obtain high efficiency (> 90%) capture and release of these simulated rare cells via the magnetic sifter, with reproducible transcriptome data. In addition, while mRNA-seq data is typically only used for gene expression analysis of transcriptomic data, we demonstrate the use of full-length mRNA-seq chemistries like Smart-seq2 to facilitate variant analysis of expressed genes. This enables the use of mRNA-seq data for differentiating cells in a heterogeneous population by both their phenotypic and variant profile. In a simulated heterogeneous mixture of circulating tumor cells in whole blood, we utilize this high-throughput protocol to differentiate these heterogeneous cells by both their phenotype (lung cancer versus white blood cells), and mutational profile (H1650 versus H1975 cells), in a single sequencing run. This high-throughput method can help facilitate single-cell analysis of rare cell populations, such as circulating tumor or endothelial cells, with demonstrably high-quality transcriptomic data.
Wei, Yu-Jie; Wu, Yun; Yan, Yin-Zhuo; Zou, Wan; Xue, Jie; Ma, Wen-Rui; Wang, Wei; Tian, Ge; Wang, Li-Ye
2018-01-01
In this study Illumina MiSeq was performed to investigate microbial diversity in soil, leaves, grape, grape juice and wine. A total of 1,043,102 fungal Internal Transcribed Spacer (ITS) reads and 2,422,188 high quality bacterial 16S rDNA sequences were used for taxonomic classification, revealed five fungal and eight bacterial phyla. At the genus level, the dominant fungi were Ascomycota, Sordariales, Tetracladium and Geomyces in soil, Aureobasidium and Pleosporaceae in grapes leaves, Aureobasidium in grape and grape juice. The dominant bacteria were Kaistobacter, Arthrobacter, Skermanella and Sphingomonas in soil, Pseudomonas, Acinetobacter and Kaistobacter in grape and grapes leaves, and Oenococcus in grape juice and wine. Principal coordinate analysis showed structural separation between the composition of fungi and bacteria in all samples. This is the first study to understand microbiome population in soil, grape, grapes leaves, grape juice and wine in Xinjiang through High-throughput Sequencing and identify microorganisms like Saccharomyces cerevisiae and Oenococcus spp. that may contribute to the quality and flavor of wine.
Yan, Yin-zhuo; Zou, Wan; Ma, Wen-rui; Wang, Wei; Tian, Ge; Wang, Li-ye
2018-01-01
In this study Illumina MiSeq was performed to investigate microbial diversity in soil, leaves, grape, grape juice and wine. A total of 1,043,102 fungal Internal Transcribed Spacer (ITS) reads and 2,422,188 high quality bacterial 16S rDNA sequences were used for taxonomic classification, revealed five fungal and eight bacterial phyla. At the genus level, the dominant fungi were Ascomycota, Sordariales, Tetracladium and Geomyces in soil, Aureobasidium and Pleosporaceae in grapes leaves, Aureobasidium in grape and grape juice. The dominant bacteria were Kaistobacter, Arthrobacter, Skermanella and Sphingomonas in soil, Pseudomonas, Acinetobacter and Kaistobacter in grape and grapes leaves, and Oenococcus in grape juice and wine. Principal coordinate analysis showed structural separation between the composition of fungi and bacteria in all samples. This is the first study to understand microbiome population in soil, grape, grapes leaves, grape juice and wine in Xinjiang through High-throughput Sequencing and identify microorganisms like Saccharomyces cerevisiae and Oenococcus spp. that may contribute to the quality and flavor of wine. PMID:29565999
FusionAnalyser: a new graphical, event-driven tool for fusion rearrangements discovery
Piazza, Rocco; Pirola, Alessandra; Spinelli, Roberta; Valletta, Simona; Redaelli, Sara; Magistroni, Vera; Gambacorti-Passerini, Carlo
2012-01-01
Gene fusions are common driver events in leukaemias and solid tumours; here we present FusionAnalyser, a tool dedicated to the identification of driver fusion rearrangements in human cancer through the analysis of paired-end high-throughput transcriptome sequencing data. We initially tested FusionAnalyser by using a set of in silico randomly generated sequencing data from 20 known human translocations occurring in cancer and subsequently using transcriptome data from three chronic and three acute myeloid leukaemia samples. in all the cases our tool was invariably able to detect the presence of the correct driver fusion event(s) with high specificity. In one of the acute myeloid leukaemia samples, FusionAnalyser identified a novel, cryptic, in-frame ETS2–ERG fusion. A fully event-driven graphical interface and a flexible filtering system allow complex analyses to be run in the absence of any a priori programming or scripting knowledge. Therefore, we propose FusionAnalyser as an efficient and robust graphical tool for the identification of functional rearrangements in the context of high-throughput transcriptome sequencing data. PMID:22570408
FusionAnalyser: a new graphical, event-driven tool for fusion rearrangements discovery.
Piazza, Rocco; Pirola, Alessandra; Spinelli, Roberta; Valletta, Simona; Redaelli, Sara; Magistroni, Vera; Gambacorti-Passerini, Carlo
2012-09-01
Gene fusions are common driver events in leukaemias and solid tumours; here we present FusionAnalyser, a tool dedicated to the identification of driver fusion rearrangements in human cancer through the analysis of paired-end high-throughput transcriptome sequencing data. We initially tested FusionAnalyser by using a set of in silico randomly generated sequencing data from 20 known human translocations occurring in cancer and subsequently using transcriptome data from three chronic and three acute myeloid leukaemia samples. in all the cases our tool was invariably able to detect the presence of the correct driver fusion event(s) with high specificity. In one of the acute myeloid leukaemia samples, FusionAnalyser identified a novel, cryptic, in-frame ETS2-ERG fusion. A fully event-driven graphical interface and a flexible filtering system allow complex analyses to be run in the absence of any a priori programming or scripting knowledge. Therefore, we propose FusionAnalyser as an efficient and robust graphical tool for the identification of functional rearrangements in the context of high-throughput transcriptome sequencing data.
Library construction for next-generation sequencing: Overviews and challenges
Head, Steven R.; Komori, H. Kiyomi; LaMere, Sarah A.; Whisenant, Thomas; Van Nieuwerburgh, Filip; Salomon, Daniel R.; Ordoukhanian, Phillip
2014-01-01
High-throughput sequencing, also known as next-generation sequencing (NGS), has revolutionized genomic research. In recent years, NGS technology has steadily improved, with costs dropping and the number and range of sequencing applications increasing exponentially. Here, we examine the critical role of sequencing library quality and consider important challenges when preparing NGS libraries from DNA and RNA sources. Factors such as the quantity and physical characteristics of the RNA or DNA source material as well as the desired application (i.e., genome sequencing, targeted sequencing, RNA-seq, ChIP-seq, RIP-seq, and methylation) are addressed in the context of preparing high quality sequencing libraries. In addition, the current methods for preparing NGS libraries from single cells are also discussed. PMID:24502796
Adaptive low-rank subspace learning with online optimization for robust visual tracking.
Liu, Risheng; Wang, Di; Han, Yuzhuo; Fan, Xin; Luo, Zhongxuan
2017-04-01
In recent years, sparse and low-rank models have been widely used to formulate appearance subspace for visual tracking. However, most existing methods only consider the sparsity or low-rankness of the coefficients, which is not sufficient enough for appearance subspace learning on complex video sequences. Moreover, as both the low-rank and the column sparse measures are tightly related to all the samples in the sequences, it is challenging to incrementally solve optimization problems with both nuclear norm and column sparse norm on sequentially obtained video data. To address above limitations, this paper develops a novel low-rank subspace learning with adaptive penalization (LSAP) framework for subspace based robust visual tracking. Different from previous work, which often simply decomposes observations as low-rank features and sparse errors, LSAP simultaneously learns the subspace basis, low-rank coefficients and column sparse errors to formulate appearance subspace. Within LSAP framework, we introduce a Hadamard production based regularization to incorporate rich generative/discriminative structure constraints to adaptively penalize the coefficients for subspace learning. It is shown that such adaptive penalization can significantly improve the robustness of LSAP on severely corrupted dataset. To utilize LSAP for online visual tracking, we also develop an efficient incremental optimization scheme for nuclear norm and column sparse norm minimizations. Experiments on 50 challenging video sequences demonstrate that our tracker outperforms other state-of-the-art methods. Copyright © 2017 Elsevier Ltd. All rights reserved.
USDA-ARS?s Scientific Manuscript database
Many species of mites and ticks are of agricultural and medical importance. Much can be learned from the study of transcriptomes of acarines which can generate DNA-sequence information of potential target genes for the control of acarine pests. High throughput transcriptome sequencing can also yie...
Sha, Shankar Prasad; Jani, Kunal; Sharma, Avinash; Anupma, Anu; Pradhan, Pooja; Shouche, Yogesh; Tamang, Jyoti Prakash
2017-09-08
Marcha and thiat are traditionally prepared amylolytic starters use for production of various ethnic alcoholic beverages in Sikkim and Meghalaya states in India. In the present study we have tried to investigate the bacterial and fungal community composition of marcha and thiat by using high throughput sequencing. Characterization of bacterial community depicts phylum Proteobacteria is the most dominant in both marcha (91.4%) and thiat (53.8%), followed by Firmicutes, and Actinobacteria. Estimates of fungal community composition showed Ascomycota as the dominant phylum. Presence of Zygomycota in marcha distinguishes it from the thiat. The results of NGS analysis revealed dominance of yeasts in marcha whereas molds out numbers in case of thiat. This is the first report on microbial communities of traditionally prepared amylolytic starters of India using high throughput sequencing.
The main challenges that remain in applying high-throughput sequencing to clinical diagnostics.
Loeffelholz, Michael; Fofanov, Yuriy
2015-01-01
Over the last 10 years, the quality, price and availability of high-throughput sequencing instruments have improved to the point that this technology may be close to becoming a routine tool in the diagnostic microbiology laboratory. Two groups of challenges, however, have to be resolved in order to move this powerful research technology into routine use in the clinical microbiology laboratory. The computational/bioinformatics challenges include data storage cost and privacy concerns, requiring analysis to be performed without access to cloud storage or expensive computational infrastructure. The logistical challenges include interpretation of complex results and acceptance and understanding of the advantages and limitations of this technology by the medical community. This article focuses on the approaches to address these challenges, such as file formats, algorithms, data collection, reporting and good laboratory practices.
Mapping specificity landscapes of RNA-protein interactions by high throughput sequencing.
Jankowsky, Eckhard; Harris, Michael E
2017-04-15
To function in a biological setting, RNA binding proteins (RBPs) have to discriminate between alternative binding sites in RNAs. This discrimination can occur in the ground state of an RNA-protein binding reaction, in its transition state, or in both. The extent by which RBPs discriminate at these reaction states defines RBP specificity landscapes. Here, we describe the HiTS-Kin and HiTS-EQ techniques, which combine kinetic and equilibrium binding experiments with high throughput sequencing to quantitatively assess substrate discrimination for large numbers of substrate variants at ground and transition states of RNA-protein binding reactions. We discuss experimental design, practical considerations and data analysis and outline how a combination of HiTS-Kin and HiTS-EQ allows the mapping of RBP specificity landscapes. Copyright © 2017 Elsevier Inc. All rights reserved.
The Grapevine and Wine Microbiome: Insights from High-Throughput Amplicon Sequencing
Morgan, Horatio H.; du Toit, Maret; Setati, Mathabatha E.
2017-01-01
From the time when microbial activity in wine fermentation was first demonstrated, the microbial ecology of the vineyard, grape, and wine has been extensively investigated using culture-based methods. However, the last 2 decades have been characterized by an important change in the approaches used for microbial examination, due to the introduction of DNA-based community fingerprinting methods such as DGGE, SSCP, T-RFLP, and ARISA. These approaches allowed for the exploration of microbial community structures without the need to cultivate, and have been extensively applied to decipher the microbial populations associated with the grapevine as well as the microbial dynamics throughout grape berry ripening and wine fermentation. These techniques are well-established for the rapid more sensitive profiling of microbial communities; however, they often do not provide direct taxonomic information and possess limited ability to detect the presence of rare taxa and taxa with low abundance. Consequently, the past 5 years have seen an upsurge in the application of high-throughput sequencing methods for the in-depth assessment of the grapevine and wine microbiome. Although a relatively new approach in wine sciences, these methods reveal a considerably greater diversity than previously reported, and identified several species that had not yet been reported. The aim of the current review is to highlight the contribution of high-throughput next generation sequencing and metagenomics approaches to vineyard microbial ecology especially unraveling the influence of vineyard management practices on microbial diversity. PMID:28553266
Goodswen, Stephen J.; Kennedy, Paul J.; Ellis, John T.
2012-01-01
Next generation sequencing technology is advancing genome sequencing at an unprecedented level. By unravelling the code within a pathogen’s genome, every possible protein (prior to post-translational modifications) can theoretically be discovered, irrespective of life cycle stages and environmental stimuli. Now more than ever there is a great need for high-throughput ab initio gene finding. Ab initio gene finders use statistical models to predict genes and their exon-intron structures from the genome sequence alone. This paper evaluates whether existing ab initio gene finders can effectively predict genes to deduce proteins that have presently missed capture by laboratory techniques. An aim here is to identify possible patterns of prediction inaccuracies for gene finders as a whole irrespective of the target pathogen. All currently available ab initio gene finders are considered in the evaluation but only four fulfil high-throughput capability: AUGUSTUS, GeneMark_hmm, GlimmerHMM, and SNAP. These gene finders require training data specific to a target pathogen and consequently the evaluation results are inextricably linked to the availability and quality of the data. The pathogen, Toxoplasma gondii, is used to illustrate the evaluation methods. The results support current opinion that predicted exons by ab initio gene finders are inaccurate in the absence of experimental evidence. However, the results reveal some patterns of inaccuracy that are common to all gene finders and these inaccuracies may provide a focus area for future gene finder developers. PMID:23226328
Evaluation of sequencing approaches for high-throughput ...
Whole-genome in vitro transcriptomics has shown the capability to identify mechanisms of action and estimates of potency for chemical-mediated effects in a toxicological framework, but with limited throughput and high cost. We present the evaluation of three toxicogenomics platforms for potential application to high-throughput screening: 1. TempO-Seq utilizing custom designed paired probes per gene; 2. Targeted sequencing (TSQ) utilizing Illumina’s TruSeq RNA Access Library Prep Kit containing tiled exon-specific probe sets; 3. Low coverage whole transcriptome sequencing (LSQ) using Illumina’s TruSeq Stranded mRNA Kit. Each platform was required to cover the ~20,000 genes of the full transcriptome, operate directly with cell lysates, and be automatable with 384-well plates. Technical reproducibility was assessed using MAQC control RNA samples A and B, while functional utility for chemical screening was evaluated using six treatments at a single concentration after 6 hr in MCF7 breast cancer cells: 10 µM chlorpromazine, 10 µM ciclopriox, 10 µM genistein, 100 nM sirolimus, 1 µM tanespimycin, and 1 µM trichostatin A. All RNA samples and chemical treatments were run with 5 technical replicates. The three platforms achieved different read depths, with the TempO-Seq having ~34M mapped reads per sample, while TSQ and LSQ averaged 20M and 11M aligned reads per sample, respectively. Inter-replicate correlation averaged ≥0.95 for raw log2 expression values i
The Grapevine and Wine Microbiome: Insights from High-Throughput Amplicon Sequencing.
Morgan, Horatio H; du Toit, Maret; Setati, Mathabatha E
2017-01-01
From the time when microbial activity in wine fermentation was first demonstrated, the microbial ecology of the vineyard, grape, and wine has been extensively investigated using culture-based methods. However, the last 2 decades have been characterized by an important change in the approaches used for microbial examination, due to the introduction of DNA-based community fingerprinting methods such as DGGE, SSCP, T-RFLP, and ARISA. These approaches allowed for the exploration of microbial community structures without the need to cultivate, and have been extensively applied to decipher the microbial populations associated with the grapevine as well as the microbial dynamics throughout grape berry ripening and wine fermentation. These techniques are well-established for the rapid more sensitive profiling of microbial communities; however, they often do not provide direct taxonomic information and possess limited ability to detect the presence of rare taxa and taxa with low abundance. Consequently, the past 5 years have seen an upsurge in the application of high-throughput sequencing methods for the in-depth assessment of the grapevine and wine microbiome. Although a relatively new approach in wine sciences, these methods reveal a considerably greater diversity than previously reported, and identified several species that had not yet been reported. The aim of the current review is to highlight the contribution of high-throughput next generation sequencing and metagenomics approaches to vineyard microbial ecology especially unraveling the influence of vineyard management practices on microbial diversity.
Advancements in zebrafish applications for 21st century toxicology.
Garcia, Gloria R; Noyes, Pamela D; Tanguay, Robert L
2016-05-01
The zebrafish model is the only available high-throughput vertebrate assessment system, and it is uniquely suited for studies of in vivo cell biology. A sequenced and annotated genome has revealed a large degree of evolutionary conservation in comparison to the human genome. Due to our shared evolutionary history, the anatomical and physiological features of fish are highly homologous to humans, which facilitates studies relevant to human health. In addition, zebrafish provide a very unique vertebrate data stream that allows researchers to anchor hypotheses at the biochemical, genetic, and cellular levels to observations at the structural, functional, and behavioral level in a high-throughput format. In this review, we will draw heavily from toxicological studies to highlight advances in zebrafish high-throughput systems. Breakthroughs in transgenic/reporter lines and methods for genetic manipulation, such as the CRISPR-Cas9 system, will be comprised of reports across diverse disciplines. Copyright © 2016 Elsevier Inc. All rights reserved.
Advancements in zebrafish applications for 21st century toxicology
Garcia, Gloria R.; Noyes, Pamela D.; Tanguay, Robert L.
2016-01-01
The zebrafish model is the only available high-throughput vertebrate assessment system, and it is uniquely suited for studies of in vivo cell biology. A sequenced and annotated genome has revealed a large degree of evolutionary conservation in comparison to the human genome. Due to our shared evolutionary history, the anatomical and physiological features of fish are highly homologous to humans, which facilitates studies relevant to human health. In addition, zebrafish provide a very unique vertebrate data stream that allows researchers to anchor hypotheses at the biochemical, genetic, and cellular levels to observations at the structural, functional, and behavioral level in a high-throughput format. In this review, we will draw heavily from toxicological studies to highlight advances in zebrafish high-throughput systems. Breakthroughs in transgenic/reporter lines and methods for genetic manipulation, such as the CRISPR-Cas9 system, will be comprised of reports across diverse disciplines. PMID:27016469
2011-01-01
Background Next-generation sequencing technologies have decentralized sequence acquisition, increasing the demand for new bioinformatics tools that are easy to use, portable across multiple platforms, and scalable for high-throughput applications. Cloud computing platforms provide on-demand access to computing infrastructure over the Internet and can be used in combination with custom built virtual machines to distribute pre-packaged with pre-configured software. Results We describe the Cloud Virtual Resource, CloVR, a new desktop application for push-button automated sequence analysis that can utilize cloud computing resources. CloVR is implemented as a single portable virtual machine (VM) that provides several automated analysis pipelines for microbial genomics, including 16S, whole genome and metagenome sequence analysis. The CloVR VM runs on a personal computer, utilizes local computer resources and requires minimal installation, addressing key challenges in deploying bioinformatics workflows. In addition CloVR supports use of remote cloud computing resources to improve performance for large-scale sequence processing. In a case study, we demonstrate the use of CloVR to automatically process next-generation sequencing data on multiple cloud computing platforms. Conclusion The CloVR VM and associated architecture lowers the barrier of entry for utilizing complex analysis protocols on both local single- and multi-core computers and cloud systems for high throughput data processing. PMID:21878105
A field ornithologist’s guide to genomics: Practical considerations for ecology and conservation
Oyler-McCance, Sara J.; Oh, Kevin; Langin, Kathryn; Aldridge, Cameron L.
2016-01-01
Vast improvements in sequencing technology have made it practical to simultaneously sequence millions of nucleotides distributed across the genome, opening the door for genomic studies in virtually any species. Ornithological research stands to benefit in three substantial ways. First, genomic methods enhance our ability to parse and simultaneously analyze both neutral and non-neutral genomic regions, thus providing insight into adaptive evolution and divergence. Second, the sheer quantity of sequence data generated by current sequencing platforms allows increased precision and resolution in analyses. Third, high-throughput sequencing can benefit applications that focus on a small number of loci that are otherwise prohibitively expensive, time-consuming, and technically difficult using traditional sequencing methods. These advances have improved our ability to understand evolutionary processes like speciation and local adaptation, but they also offer many practical applications in the fields of population ecology, migration tracking, conservation planning, diet analyses, and disease ecology. This review provides a guide for field ornithologists interested in incorporating genomic approaches into their research program, with an emphasis on techniques related to ecology and conservation. We present a general overview of contemporary genomic approaches and methods, as well as important considerations when selecting a genomic technique. We also discuss research questions that are likely to benefit from utilizing high-throughput sequencing instruments, highlighting select examples from recent avian studies.
Miller, Mark P.; Knaus, Brian J.; Mullins, Thomas D.; Haig, Susan M.
2013-01-01
SSR_pipeline is a flexible set of programs designed to efficiently identify simple sequence repeats (e.g., microsatellites) from paired-end high-throughput Illumina DNA sequencing data. The program suite contains 3 analysis modules along with a fourth control module that can automate analyses of large volumes of data. The modules are used to 1) identify the subset of paired-end sequences that pass Illumina quality standards, 2) align paired-end reads into a single composite DNA sequence, and 3) identify sequences that possess microsatellites (both simple and compound) conforming to user-specified parameters. The microsatellite search algorithm is extremely efficient, and we have used it to identify repeats with motifs from 2 to 25bp in length. Each of the 3 analysis modules can also be used independently to provide greater flexibility or to work with FASTQ or FASTA files generated from other sequencing platforms (Roche 454, Ion Torrent, etc.). We demonstrate use of the program with data from the brine fly Ephydra packardi (Diptera: Ephydridae) and provide empirical timing benchmarks to illustrate program performance on a common desktop computer environment. We further show that the Illumina platform is capable of identifying large numbers of microsatellites, even when using unenriched sample libraries and a very small percentage of the sequencing capacity from a single DNA sequencing run. All modules from SSR_pipeline are implemented in the Python programming language and can therefore be used from nearly any computer operating system (Linux, Macintosh, and Windows).
Hu, Xihao; Wu, Yang; Lu, Zhi John; Yip, Kevin Y
2016-11-01
High-throughput sequencing has been used to study posttranscriptional regulations, where the identification of protein-RNA binding is a major and fast-developing sub-area, which is in turn benefited by the sequencing methods for whole-transcriptome probing of RNA secondary structures. In the study of RNA secondary structures using high-throughput sequencing, bases are modified or cleaved according to their structural features, which alter the resulting composition of sequencing reads. In the study of protein-RNA binding, methods have been proposed to immuno-precipitate (IP) protein-bound RNA transcripts in vitro or in vivo By sequencing these transcripts, the protein-RNA interactions and the binding locations can be identified. For both types of data, read counts are affected by a combination of confounding factors, including expression levels of transcripts, sequence biases, mapping errors and the probing or IP efficiency of the experimental protocols. Careful processing of the sequencing data and proper extraction of important features are fundamentally important to a successful analysis. Here we review and compare different experimental methods for probing RNA secondary structures and binding sites of RNA-binding proteins (RBPs), and the computational methods proposed for analyzing the corresponding sequencing data. We suggest how these two types of data should be integrated to study the structural properties of RBP binding sites as a systematic way to better understand posttranscriptional regulations. © The Author 2015. Published by Oxford University Press. For Permissions, please email: journals.permissions@oup.com.
Miller, Mark P; Knaus, Brian J; Mullins, Thomas D; Haig, Susan M
2013-01-01
SSR_pipeline is a flexible set of programs designed to efficiently identify simple sequence repeats (e.g., microsatellites) from paired-end high-throughput Illumina DNA sequencing data. The program suite contains 3 analysis modules along with a fourth control module that can automate analyses of large volumes of data. The modules are used to 1) identify the subset of paired-end sequences that pass Illumina quality standards, 2) align paired-end reads into a single composite DNA sequence, and 3) identify sequences that possess microsatellites (both simple and compound) conforming to user-specified parameters. The microsatellite search algorithm is extremely efficient, and we have used it to identify repeats with motifs from 2 to 25 bp in length. Each of the 3 analysis modules can also be used independently to provide greater flexibility or to work with FASTQ or FASTA files generated from other sequencing platforms (Roche 454, Ion Torrent, etc.). We demonstrate use of the program with data from the brine fly Ephydra packardi (Diptera: Ephydridae) and provide empirical timing benchmarks to illustrate program performance on a common desktop computer environment. We further show that the Illumina platform is capable of identifying large numbers of microsatellites, even when using unenriched sample libraries and a very small percentage of the sequencing capacity from a single DNA sequencing run. All modules from SSR_pipeline are implemented in the Python programming language and can therefore be used from nearly any computer operating system (Linux, Macintosh, and Windows).
A computational genomics pipeline for prokaryotic sequencing projects.
Kislyuk, Andrey O; Katz, Lee S; Agrawal, Sonia; Hagen, Matthew S; Conley, Andrew B; Jayaraman, Pushkala; Nelakuditi, Viswateja; Humphrey, Jay C; Sammons, Scott A; Govil, Dhwani; Mair, Raydel D; Tatti, Kathleen M; Tondella, Maria L; Harcourt, Brian H; Mayer, Leonard W; Jordan, I King
2010-08-01
New sequencing technologies have accelerated research on prokaryotic genomes and have made genome sequencing operations outside major genome sequencing centers routine. However, no off-the-shelf solution exists for the combined assembly, gene prediction, genome annotation and data presentation necessary to interpret sequencing data. The resulting requirement to invest significant resources into custom informatics support for genome sequencing projects remains a major impediment to the accessibility of high-throughput sequence data. We present a self-contained, automated high-throughput open source genome sequencing and computational genomics pipeline suitable for prokaryotic sequencing projects. The pipeline has been used at the Georgia Institute of Technology and the Centers for Disease Control and Prevention for the analysis of Neisseria meningitidis and Bordetella bronchiseptica genomes. The pipeline is capable of enhanced or manually assisted reference-based assembly using multiple assemblers and modes; gene predictor combining; and functional annotation of genes and gene products. Because every component of the pipeline is executed on a local machine with no need to access resources over the Internet, the pipeline is suitable for projects of a sensitive nature. Annotation of virulence-related features makes the pipeline particularly useful for projects working with pathogenic prokaryotes. The pipeline is licensed under the open-source GNU General Public License and available at the Georgia Tech Neisseria Base (http://nbase.biology.gatech.edu/). The pipeline is implemented with a combination of Perl, Bourne Shell and MySQL and is compatible with Linux and other Unix systems.
de Masson, Adele; O'Malley, John T; Elco, Christopher P; Garcia, Sarah S; Divito, Sherrie J; Lowry, Elizabeth L; Tawa, Marianne; Fisher, David C; Devlin, Phillip M; Teague, Jessica E; Leboeuf, Nicole R; Kirsch, Ilan R; Robins, Harlan; Clark, Rachael A; Kupper, Thomas S
2018-05-09
Mycosis fungoides (MF), the most common cutaneous T cell lymphoma (CTCL) is a malignancy of skin-tropic memory T cells. Most MF cases present as early stage (stage I A/B, limited to the skin), and these patients typically have a chronic, indolent clinical course. However, a small subset of early-stage cases develop progressive and fatal disease. Because outcomes can be so different, early identification of this high-risk population is an urgent unmet clinical need. We evaluated the use of next-generation high-throughput DNA sequencing of the T cell receptor β gene ( TCRB ) in lesional skin biopsies to predict progression and survival in a discovery cohort of 208 patients with CTCL (177 with MF) from a 15-year longitudinal observational clinical study. We compared these data to the results in an independent validation cohort of 101 CTCL patients (87 with MF). The tumor clone frequency (TCF) in lesional skin, measured by high-throughput sequencing of the TCRB gene, was an independent prognostic factor of both progression-free and overall survival in patients with CTCL and MF in particular. In early-stage patients, a TCF of >25% in the skin was a stronger predictor of progression than any other established prognostic factor (stage IB versus IA, presence of plaques, high blood lactate dehydrogenase concentration, large-cell transformation, or age). The TCF therefore may accurately predict disease progression in early-stage MF. Early identification of patients at high risk for progression could help identify candidates who may benefit from allogeneic hematopoietic stem cell transplantation before their disease becomes treatment-refractory. Copyright © 2018 The Authors, some rights reserved; exclusive licensee American Association for the Advancement of Science. No claim to original U.S. Government Works.
Next-Generation Technologies for Multiomics Approaches Including Interactome Sequencing
Ohashi, Hiroyuki; Miyamoto-Sato, Etsuko
2015-01-01
The development of high-speed analytical techniques such as next-generation sequencing and microarrays allows high-throughput analysis of biological information at a low cost. These techniques contribute to medical and bioscience advancements and provide new avenues for scientific research. Here, we outline a variety of new innovative techniques and discuss their use in omics research (e.g., genomics, transcriptomics, metabolomics, proteomics, and interactomics). We also discuss the possible applications of these methods, including an interactome sequencing technology that we developed, in future medical and life science research. PMID:25649523
Visual programming for next-generation sequencing data analytics.
Milicchio, Franco; Rose, Rebecca; Bian, Jiang; Min, Jae; Prosperi, Mattia
2016-01-01
High-throughput or next-generation sequencing (NGS) technologies have become an established and affordable experimental framework in biological and medical sciences for all basic and translational research. Processing and analyzing NGS data is challenging. NGS data are big, heterogeneous, sparse, and error prone. Although a plethora of tools for NGS data analysis has emerged in the past decade, (i) software development is still lagging behind data generation capabilities, and (ii) there is a 'cultural' gap between the end user and the developer. Generic software template libraries specifically developed for NGS can help in dealing with the former problem, whilst coupling template libraries with visual programming may help with the latter. Here we scrutinize the state-of-the-art low-level software libraries implemented specifically for NGS and graphical tools for NGS analytics. An ideal developing environment for NGS should be modular (with a native library interface), scalable in computational methods (i.e. serial, multithread, distributed), transparent (platform-independent), interoperable (with external software interface), and usable (via an intuitive graphical user interface). These characteristics should facilitate both the run of standardized NGS pipelines and the development of new workflows based on technological advancements or users' needs. We discuss in detail the potential of a computational framework blending generic template programming and visual programming that addresses all of the current limitations. In the long term, a proper, well-developed (although not necessarily unique) software framework will bridge the current gap between data generation and hypothesis testing. This will eventually facilitate the development of novel diagnostic tools embedded in routine healthcare.
Wu, Wei; Lu, Chao-Xia; Wang, Yi-Ning; Liu, Fang; Chen, Wei; Liu, Yong-Tai; Han, Ye-Chen; Cao, Jian; Zhang, Shu-Yang; Zhang, Xue
2015-07-10
MYBPC3 dysfunctions have been proven to induce dilated cardiomyopathy, hypertrophic cardiomyopathy, and/or left ventricular noncompaction; however, the genotype-phenotype correlation between MYBPC3 and restrictive cardiomyopathy (RCM) has not been established. The newly developed next-generation sequencing method is capable of broad genomic DNA sequencing with high throughput and can help explore novel correlations between genetic variants and cardiomyopathies. A proband from a multigenerational family with 3 live patients and 1 unrelated patient with clinical diagnoses of RCM underwent a next-generation sequencing workflow based on a custom AmpliSeq panel, including 64 candidate pathogenic genes for cardiomyopathies, on the Ion Personal Genome Machine high-throughput sequencing benchtop instrument. The selected panel contained a total of 64 genes that were reportedly associated with inherited cardiomyopathies. All patients fulfilled strict criteria for RCM with clinical characteristics, echocardiography, and/or cardiac magnetic resonance findings. The multigenerational family with 3 adult RCM patients carried an identical nonsense MYBPC3 mutation, and the unrelated patient carried a missense mutation in the MYBPC3 gene. All of these results were confirmed by the Sanger sequencing method. This study demonstrated that MYBPC3 gene mutations, revealed by next-generation sequencing, were associated with familial and sporadic RCM patients. It is suggested that the next-generation sequencing platform with a selected panel provides a highly efficient approach for molecular diagnosis of hereditary and idiopathic RCM and helps build new genotype-phenotype correlations. © 2015 The Authors. Published on behalf of the American Heart Association, Inc., by Wiley Blackwell.
Lewandowska, Dagmara W; Zagordi, Osvaldo; Geissberger, Fabienne-Desirée; Kufner, Verena; Schmutz, Stefan; Böni, Jürg; Metzner, Karin J; Trkola, Alexandra; Huber, Michael
2017-08-08
Sequence-specific PCR is the most common approach for virus identification in diagnostic laboratories. However, as specific PCR only detects pre-defined targets, novel virus strains or viruses not included in routine test panels will be missed. Recently, advances in high-throughput sequencing allow for virus-sequence-independent identification of entire virus populations in clinical samples, yet standardized protocols are needed to allow broad application in clinical diagnostics. Here, we describe a comprehensive sample preparation protocol for high-throughput metagenomic virus sequencing using random amplification of total nucleic acids from clinical samples. In order to optimize metagenomic sequencing for application in virus diagnostics, we tested different enrichment and amplification procedures on plasma samples spiked with RNA and DNA viruses. A protocol including filtration, nuclease digestion, and random amplification of RNA and DNA in separate reactions provided the best results, allowing reliable recovery of viral genomes and a good correlation of the relative number of sequencing reads with the virus input. We further validated our method by sequencing a multiplexed viral pathogen reagent containing a range of human viruses from different virus families. Our method proved successful in detecting the majority of the included viruses with high read numbers and compared well to other protocols in the field validated against the same reference reagent. Our sequencing protocol does work not only with plasma but also with other clinical samples such as urine and throat swabs. The workflow for virus metagenomic sequencing that we established proved successful in detecting a variety of viruses in different clinical samples. Our protocol supplements existing virus-specific detection strategies providing opportunities to identify atypical and novel viruses commonly not accounted for in routine diagnostic panels.
Nandakumar, Subhiksha; Bae, Eunhae H; Khan, Arifa S
2017-08-17
The full-length genome sequence of a simian foamy virus (SFVmmu_K3T), isolated from a rhesus macaque ( Macaca mulatta ), was obtained using high-throughput sequencing. SFVmmu_K3T consisted of 12,983 bp and had a genomic organization similar to that of other SFVs, with long terminal repeats (LTRs) and open reading frames for Gag, Pol, Env, Tas, and Bet.
Mavromatis, Konstantinos; Land, Miriam L; Brettin, Thomas S; Quest, Daniel J; Copeland, Alex; Clum, Alicia; Goodwin, Lynne; Woyke, Tanja; Lapidus, Alla; Klenk, Hans Peter; Cottingham, Robert W; Kyrpides, Nikos C
2012-01-01
The emergence of next generation sequencing (NGS) has provided the means for rapid and high throughput sequencing and data generation at low cost, while concomitantly creating a new set of challenges. The number of available assembled microbial genomes continues to grow rapidly and their quality reflects the quality of the sequencing technology used, but also of the analysis software employed for assembly and annotation. In this work, we have explored the quality of the microbial draft genomes across various sequencing technologies. We have compared the draft and finished assemblies of 133 microbial genomes sequenced at the Department of Energy-Joint Genome Institute and finished at the Los Alamos National Laboratory using a variety of combinations of sequencing technologies, reflecting the transition of the institute from Sanger-based sequencing platforms to NGS platforms. The quality of the public assemblies and of the associated gene annotations was evaluated using various metrics. Results obtained with the different sequencing technologies, as well as their effects on downstream processes, were analyzed. Our results demonstrate that the Illumina HiSeq 2000 sequencing system, the primary sequencing technology currently used for de novo genome sequencing and assembly at JGI, has various advantages in terms of total sequence throughput and cost, but it also introduces challenges for the downstream analyses. In all cases assembly results although on average are of high quality, need to be viewed critically and consider sources of errors in them prior to analysis. These data follow the evolution of microbial sequencing and downstream processing at the JGI from draft genome sequences with large gaps corresponding to missing genes of significant biological role to assemblies with multiple small gaps (Illumina) and finally to assemblies that generate almost complete genomes (Illumina+PacBio).
Multiuser TOA Estimation Algorithm in DS-CDMA Sparse Channel for Radiolocation
NASA Astrophysics Data System (ADS)
Kim, Sunwoo
This letter considers multiuser time delay estimation in a sparse channel environment for radiolocation. The generalized successive interference cancellation (GSIC) algorithm is used to eliminate the multiple access interference (MAI). To adapt GSIC to sparse channels the alternating maximization (AM) algorithm is considered, and the continuous time delay of each path is estimated without requiring a priori known data sequences.
SEED 2: a user-friendly platform for amplicon high-throughput sequencing data analyses.
Vetrovský, Tomáš; Baldrian, Petr; Morais, Daniel; Berger, Bonnie
2018-02-14
Modern molecular methods have increased our ability to describe microbial communities. Along with the advances brought by new sequencing technologies, we now require intensive computational resources to make sense of the large numbers of sequences continuously produced. The software developed by the scientific community to address this demand, although very useful, require experience of the command-line environment, extensive training and have steep learning curves, limiting their use. We created SEED 2, a graphical user interface for handling high-throughput amplicon-sequencing data under Windows operating systems. SEED 2 is the only sequence visualizer that empowers users with tools to handle amplicon-sequencing data of microbial community markers. It is suitable for any marker genes sequences obtained through Illumina, IonTorrent or Sanger sequencing. SEED 2 allows the user to process raw sequencing data, identify specific taxa, produce of OTU-tables, create sequence alignments and construct phylogenetic trees. Standard dual core laptops with 8 GB of RAM can handle ca. 8 million of Illumina PE 300 bp sequences, ca. 4GB of data. SEED 2 was implemented in Object Pascal and uses internal functions and external software for amplicon data processing. SEED 2 is a freeware software, available at http://www.biomed.cas.cz/mbu/lbwrf/seed/ as a self-contained file, including all the dependencies, and does not require installation. Supplementary data contain a comprehensive list of supported functions. daniel.morais@biomed.cas.cz. Supplementary data are available at Bioinformatics online. © The Author(s) 2018. Published by Oxford University Press.
A high-throughput approach to profile RNA structure.
Delli Ponti, Riccardo; Marti, Stefanie; Armaos, Alexandros; Tartaglia, Gian Gaetano
2017-03-17
Here we introduce the Computational Recognition of Secondary Structure (CROSS) method to calculate the structural profile of an RNA sequence (single- or double-stranded state) at single-nucleotide resolution and without sequence length restrictions. We trained CROSS using data from high-throughput experiments such as Selective 2΄-Hydroxyl Acylation analyzed by Primer Extension (SHAPE; Mouse and HIV transcriptomes) and Parallel Analysis of RNA Structure (PARS; Human and Yeast transcriptomes) as well as high-quality NMR/X-ray structures (PDB database). The algorithm uses primary structure information alone to predict experimental structural profiles with >80% accuracy, showing high performances on large RNAs such as Xist (17 900 nucleotides; Area Under the ROC Curve AUC of 0.75 on dimethyl sulfate (DMS) experiments). We integrated CROSS in thermodynamics-based methods to predict secondary structure and observed an increase in their predictive power by up to 30%. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
Denesvre, Caroline; Dumarest, Marine; Rémy, Sylvie; Gourichon, David; Eloit, Marc
2015-10-01
Recent studies show that human skin at homeostasis is a complex ecosystem whose virome include circular DNA viruses, especially papillomaviruses and polyomaviruses. To determine the chicken skin virome in comparison with human skin virome, a chicken swabs pool sample from fifteen indoor healthy chickens of five genetic backgrounds was examined for the presence of DNA viruses by high-throughput sequencing (HTS). The results indicate a predominance of herpesviruses from the Mardivirus genus, coming from either vaccinal origin or presumably asymptomatic infection. Despite the high sensitivity of the HTS method used herein to detect small circular DNA viruses, we did not detect any papillomaviruses, polyomaviruses, or circoviruses, indicating that these viruses may not be resident of the chicken skin. The results suggest that the turkey herpesvirus is a resident of chicken skin in vaccinated chickens. This study indicates major differences between the skin viromes of chickens and humans. The origin of this difference remains to be further studied in relation with skin physiology, environment, or virus population dynamics.
Droege, Marcus; Hill, Brendon
2008-08-31
The Genome Sequencer FLX System (GS FLX), powered by 454 Sequencing, is a next-generation DNA sequencing technology featuring a unique mix of long reads, exceptional accuracy, and ultra-high throughput. It has been proven to be the most versatile of all currently available next-generation sequencing technologies, supporting many high-profile studies in over seven applications categories. GS FLX users have pursued innovative research in de novo sequencing, re-sequencing of whole genomes and target DNA regions, metagenomics, and RNA analysis. 454 Sequencing is a powerful tool for human genetics research, having recently re-sequenced the genome of an individual human, currently re-sequencing the complete human exome and targeted genomic regions using the NimbleGen sequence capture process, and detected low-frequency somatic mutations linked to cancer.
Froenicke, Lutz; Lavelle, Dean; Martineau, Belinda; Perroud, Bertrand; Michelmore, Richard
2013-01-01
Several applications of high throughput genome and transcriptome sequencing would benefit from a reduction of the high-copy-number sequences in the libraries being sequenced and analyzed, particularly when applied to species with large genomes. We adapted and analyzed the consequences of a method that utilizes a thermostable duplex-specific nuclease for reducing the high-copy components in transcriptomic and genomic libraries prior to sequencing. This reduces the time, cost, and computational effort of obtaining informative transcriptomic and genomic sequence data for both fully sequenced and non-sequenced genomes. It also reduces contamination from organellar DNA in preparations of nuclear DNA. Hybridization in the presence of 3 M tetramethylammonium chloride (TMAC), which equalizes the rates of hybridization of GC and AT nucleotide pairs, reduced the bias against sequences with high GC content. Consequences of this method on the reduction of high-copy and enrichment of low-copy sequences are reported for Arabidopsis and lettuce. PMID:23409088
Matvienko, Marta; Kozik, Alexander; Froenicke, Lutz; Lavelle, Dean; Martineau, Belinda; Perroud, Bertrand; Michelmore, Richard
2013-01-01
Several applications of high throughput genome and transcriptome sequencing would benefit from a reduction of the high-copy-number sequences in the libraries being sequenced and analyzed, particularly when applied to species with large genomes. We adapted and analyzed the consequences of a method that utilizes a thermostable duplex-specific nuclease for reducing the high-copy components in transcriptomic and genomic libraries prior to sequencing. This reduces the time, cost, and computational effort of obtaining informative transcriptomic and genomic sequence data for both fully sequenced and non-sequenced genomes. It also reduces contamination from organellar DNA in preparations of nuclear DNA. Hybridization in the presence of 3 M tetramethylammonium chloride (TMAC), which equalizes the rates of hybridization of GC and AT nucleotide pairs, reduced the bias against sequences with high GC content. Consequences of this method on the reduction of high-copy and enrichment of low-copy sequences are reported for Arabidopsis and lettuce.
Gene expression profiling of flax (Linum usitatissimum L.) under edaphic stress.
Dmitriev, Alexey A; Kudryavtseva, Anna V; Krasnov, George S; Koroban, Nadezhda V; Speranskaya, Anna S; Krinitsina, Anastasia A; Belenikin, Maxim S; Snezhkina, Anastasiya V; Sadritdinova, Asiya F; Kishlyan, Natalya V; Rozhmina, Tatiana A; Yurkevich, Olga Yu; Muravenko, Olga V; Bolsheva, Nadezhda L; Melnikova, Nataliya V
2016-11-16
Cultivated flax (Linum usitatissimum L.) is widely used for production of textile, food, chemical and pharmaceutical products. However, various stresses decrease flax production. Search for genes, which are involved in stress response, is necessary for breeding of adaptive cultivars. Imbalanced concentration of nutrient elements in soil decrease flax yields and also results in heritable changes in some flax lines. The appearance of Linum Insertion Sequence 1 (LIS-1) is the most studied modification. However, LIS-1 function is still unclear. High-throughput sequencing of transcriptome of flax plants grown under normal (N), phosphate deficient (P), and nutrient excess (NPK) conditions was carried out using Illumina platform. The assembly of transcriptome was performed, and a total of 34924, 33797, and 33698 unique transcripts for N, P, and NPK sequencing libraries were identified, respectively. We have not revealed any LIS-1 derived mRNA in our sequencing data. The analysis of high-throughput sequencing data allowed us to identify genes with potentially differential expression under imbalanced nutrition. For further investigation with qPCR, 15 genes were chosen and their expression levels were evaluated in the extended sampling of 31 flax plants. Significant expression alterations were revealed for genes encoding WRKY and JAZ protein families under P and NPK conditions. Moreover, the alterations of WRKY family genes differed depending on LIS-1 presence in flax plant genome. Besides, we revealed slight and LIS-1 independent mRNA level changes of KRP2 and ING1 genes, which are adjacent to LIS-1, under nutrition stress. Differentially expressed genes were identified in flax plants, which were grown under phosphate deficiency and excess nutrition, on the basis of high-throughput sequencing and qPCR data. We showed that WRKY and JAS gene families participate in flax response to imbalanced nutrient content in soil. Besides, we have not identified any mRNA, which could be derived from LIS-1, in our transcriptome sequencing data. Expression of LIS-1 flanking genes, ING1 and KRP2, was suggested not to be nutrient stress-induced. Obtained results provide new insights into edaphic stress response in flax and the role of LIS-1 in these process.
NASA Astrophysics Data System (ADS)
Milshteyn, Eugene; von Morze, Cornelius; Reed, Galen D.; Shang, Hong; Shin, Peter J.; Larson, Peder E. Z.; Vigneron, Daniel B.
2018-05-01
Acceleration of dynamic 2D (T2 Mapping) and 3D hyperpolarized 13C MRI acquisitions using the balanced steady-state free precession sequence was achieved with a specialized reconstruction method, based on the combination of low rank plus sparse and local low rank reconstructions. Methods were validated using both retrospectively and prospectively undersampled in vivo data from normal rats and tumor-bearing mice. Four-fold acceleration of 1-2 mm isotropic 3D dynamic acquisitions with 2-5 s temporal resolution and two-fold acceleration of 0.25-1 mm2 2D dynamic acquisitions was achieved. This enabled visualization of the biodistribution of [2-13C]pyruvate, [1-13C]lactate, [13C, 15N2]urea, and HP001 within heart, kidneys, vasculature, and tumor, as well as calculation of high resolution T2 maps.
Wang, Xixian; Ren, Lihui; Su, Yetian; Ji, Yuetong; Liu, Yaoping; Li, Chunyu; Li, Xunrong; Zhang, Yi; Wang, Wei; Hu, Qiang; Han, Danxiang; Xu, Jian; Ma, Bo
2017-11-21
Raman-activated cell sorting (RACS) has attracted increasing interest, yet throughput remains one major factor limiting its broader application. Here we present an integrated Raman-activated droplet sorting (RADS) microfluidic system for functional screening of live cells in a label-free and high-throughput manner, by employing AXT-synthetic industrial microalga Haematococcus pluvialis (H. pluvialis) as a model. Raman microspectroscopy analysis of individual cells is carried out prior to their microdroplet encapsulation, which is then directly coupled to DEP-based droplet sorting. To validate the system, H. pluvialis cells containing different levels of AXT were mixed and underwent RADS. Those AXT-hyperproducing cells were sorted with an accuracy of 98.3%, an enrichment ratio of eight folds, and a throughput of ∼260 cells/min. Of the RADS-sorted cells, 92.7% remained alive and able to proliferate, which is equivalent to the unsorted cells. Thus, the RADS achieves a much higher throughput than existing RACS systems, preserves the vitality of cells, and facilitates seamless coupling with downstream manipulations such as single-cell sequencing and cultivation.
An innovative SNP genotyping method adapting to multiple platforms and throughputs.
Long, Y M; Chao, W S; Ma, G J; Xu, S S; Qi, L L
2017-03-01
An innovative genotyping method designated as semi-thermal asymmetric reverse PCR (STARP) was developed for genotyping individual SNPs with improved accuracy, flexible throughputs, low operational costs, and high platform compatibility. Multiplex chip-based technology for genome-scale genotyping of single nucleotide polymorphisms (SNPs) has made great progress in the past two decades. However, PCR-based genotyping of individual SNPs still remains problematic in accuracy, throughput, simplicity, and/or operational costs as well as the compatibility with multiple platforms. Here, we report a novel SNP genotyping method designated semi-thermal asymmetric reverse PCR (STARP). In this method, genotyping assay was performed under unique PCR conditions using two universal priming element-adjustable primers (PEA-primers) and one group of three locus-specific primers: two asymmetrically modified allele-specific primers (AMAS-primers) and their common reverse primer. The two AMAS-primers each were substituted one base in different positions at their 3' regions to significantly increase the amplification specificity of the two alleles and tailed at 5' ends to provide priming sites for PEA-primers. The two PEA-primers were developed for common use in all genotyping assays to stringently target the PCR fragments generated by the two AMAS-primers with similar PCR efficiencies and for flexible detection using either gel-free fluorescence signals or gel-based size separation. The state-of-the-art primer design and unique PCR conditions endowed STARP with all the major advantages of high accuracy, flexible throughputs, simple assay design, low operational costs, and platform compatibility. In addition to SNPs, STARP can also be employed in genotyping of indels (insertion-deletion polymorphisms). As vast variations in DNA sequences are being unearthed by many genome sequencing projects and genotyping by sequencing, STARP will have wide applications across all biological organisms in agriculture, medicine, and forensics.
Synthetic Molecular Evolution of Membrane-Active Peptides
NASA Astrophysics Data System (ADS)
Wimley, William
The physical chemistry of membrane partitioning largely determines the function of membrane active peptides. Membrane-active peptides have potential utility in many areas, including in the cellular delivery of polar compounds, cancer therapy, biosensor design, and in antibacterial, antiviral and antifungal therapies. Yet, despite decades of research on thousands of known examples, useful sequence-structure-function relationships are essentially unknown. Because peptide-membrane interactions within the highly fluid bilayer are dynamic and heterogeneous, accounts of mechanism are necessarily vague and descriptive, and have little predictive power. This creates a significant roadblock to advances in the field. We are bypassing that roadblock with synthetic molecular evolution: iterative peptide library design and orthogonal high-throughput screening. We start with template sequences that have at least some useful activity, and create small, focused libraries using structural and biophysical principles to design the sequence space around the template. Orthogonal high-throughput screening is used to identify gain-of-function peptides by simultaneously selecting for several different properties (e.g. solubility, activity and toxicity). Multiple generations of iterative library design and screening have enabled the identification of membrane-active sequences with heretofore unknown properties, including clinically relevant, broad-spectrum activity against drug-resistant bacteria and enveloped viruses as well as pH-triggered macromolecular poration.
Telele, Nigus Fikrie; Kalu, Amare Worku; Gebre-Selassie, Solomon; Fekade, Daniel; Abdurahman, Samir; Marrone, Gaetano; Neogi, Ujjwal; Tegbaru, Belete; Sönnerborg, Anders
2018-05-15
Baseline plasma samples of 490 randomly selected antiretroviral therapy (ART) naïve patients from seven hospitals participating in the first nationwide Ethiopian HIV-1 cohort were analysed for surveillance drug resistance mutations (sDRM) by population based Sanger sequencing (PBSS). Also next generation sequencing (NGS) was used in a subset of 109 baseline samples of patients. Treatment outcome after 6- and 12-months was assessed by on-treatment (OT) and intention-to-treat (ITT) analyses. Transmitted drug resistance (TDR) was detected in 3.9% (18/461) of successfully sequenced samples by PBSS. However, NGS detected sDRM more often (24%; 26/109) than PBSS (6%; 7/109) (p = 0.0001) and major integrase strand transfer inhibitors (INSTI) DRMs were also found in minor viral variants from five patients. Patients with sDRM had more frequent treatment failure in both OT and ITT analyses. The high rate of TDR by NGS and the identification of preexisting INSTI DRMs in minor wild-type HIV-1 subtype C viral variants infected Ethiopian patients underscores the importance of TDR surveillance in low- and middle-income countries and shows added value of high-throughput NGS in such studies.
Ion channel drug discovery and research: the automated Nano-Patch-Clamp technology.
Brueggemann, A; George, M; Klau, M; Beckler, M; Steindl, J; Behrends, J C; Fertig, N
2004-01-01
Unlike the genomics revolution, which was largely enabled by a single technological advance (high throughput sequencing), rapid advancement in proteomics will require a broader effort to increase the throughput of a number of key tools for functional analysis of different types of proteins. In the case of ion channels -a class of (membrane) proteins of great physiological importance and potential as drug targets- the lack of adequate assay technologies is felt particularly strongly. The available, indirect, high throughput screening methods for ion channels clearly generate insufficient information. The best technology to study ion channel function and screen for compound interaction is the patch clamp technique, but patch clamping suffers from low throughput, which is not acceptable for drug screening. A first step towards a solution is presented here. The nano patch clamp technology, which is based on a planar, microstructured glass chip, enables automatic whole cell patch clamp measurements. The Port-a-Patch is an automated electrophysiology workstation, which uses planar patch clamp chips. This approach enables high quality and high content ion channel and compound evaluation on a one-cell-at-a-time basis. The presented automation of the patch process and its scalability to an array format are the prerequisites for any higher throughput electrophysiology instruments.
2014-01-01
Background Next-generation DNA sequencing (NGS) technologies have made huge impacts in many fields of biological research, but especially in evolutionary biology. One area where NGS has shown potential is for high-throughput sequencing of complete mtDNA genomes (of humans and other animals). Despite the increasing use of NGS technologies and a better appreciation of their importance in answering biological questions, there remain significant obstacles to the successful implementation of NGS-based projects, especially for new users. Results Here we present an ‘A to Z’ protocol for obtaining complete human mitochondrial (mtDNA) genomes – from DNA extraction to consensus sequence. Although designed for use on humans, this protocol could also be used to sequence small, organellar genomes from other species, and also nuclear loci. This protocol includes DNA extraction, PCR amplification, fragmentation of PCR products, barcoding of fragments, sequencing using the 454 GS FLX platform, and a complete bioinformatics pipeline (primer removal, reference-based mapping, output of coverage plots and SNP calling). Conclusions All steps in this protocol are designed to be straightforward to implement, especially for researchers who are undertaking next-generation sequencing for the first time. The molecular steps are scalable to large numbers (hundreds) of individuals and all steps post-DNA extraction can be carried out in 96-well plate format. Also, the protocol has been assembled so that individual ‘modules’ can be swapped out to suit available resources. PMID:24460871
Miller, Mark P.; Knaus, Brian J.; Mullins, Thomas D.; Haig, Susan M.
2013-01-01
SSR_pipeline is a flexible set of programs designed to efficiently identify simple sequence repeats (SSRs; for example, microsatellites) from paired-end high-throughput Illumina DNA sequencing data. The program suite contains three analysis modules along with a fourth control module that can be used to automate analyses of large volumes of data. The modules are used to (1) identify the subset of paired-end sequences that pass quality standards, (2) align paired-end reads into a single composite DNA sequence, and (3) identify sequences that possess microsatellites conforming to user specified parameters. Each of the three separate analysis modules also can be used independently to provide greater flexibility or to work with FASTQ or FASTA files generated from other sequencing platforms (Roche 454, Ion Torrent, etc). All modules are implemented in the Python programming language and can therefore be used from nearly any computer operating system (Linux, Macintosh, Windows). The program suite relies on a compiled Python extension module to perform paired-end alignments. Instructions for compiling the extension from source code are provided in the documentation. Users who do not have Python installed on their computers or who do not have the ability to compile software also may choose to download packaged executable files. These files include all Python scripts, a copy of the compiled extension module, and a minimal installation of Python in a single binary executable. See program documentation for more information.
Assaying gene function by growth competition experiment.
Merritt, Joshua; Edwards, Jeremy S
2004-07-01
High-throughput screening and analysis is one of the emerging paradigms in biotechnology. In particular, high-throughput methods are essential in the field of functional genomics because of the vast amount of data generated in recent and ongoing genome sequencing efforts. In this report we discuss integrated functional analysis methodologies which incorporate both a growth competition component and a highly parallel assay used to quantify results of the growth competition. Several applications of the two most widely used technologies in the field, i.e., transposon mutagenesis and deletion strain library growth competition, and individual applications of several developing or less widely reported technologies are presented.
USDA-ARS?s Scientific Manuscript database
Verticillium wilt (VW) of alfalfa is a soilborne disease that causes severe yield loss in alfalfa. To identify molecular markers associated with VW resistance, an integrated framework of genome-wide association study (GWAS) with high-throughput genotyping by sequencing (GBS) was used for mapping lo...
Rolling circle amplification of metazoan mitochondrialgenomes
DOE Office of Scientific and Technical Information (OSTI.GOV)
Simison, W. Brian; Lindberg, D.R.; Boore, J.L.
2005-07-31
Here we report the successful use of rolling circle amplification (RCA) for the amplification of complete metazoan mt genomes to make a product that is amenable to high-throughput genome sequencing techniques. The benefits of RCA over PCR are many and with further development and refinement of RCA, the sequencing of organellar genomics will require far less time and effort than current long PCR approaches.
USDA-ARS?s Scientific Manuscript database
High-throughput sequencing is often used for studies of the transcriptome, particularly for comparisons between experimental conditions. Due to sequencing costs, a limited number of biological replicates are typically considered in such experiments, leading to low detection power for differential ex...
Mora-Castilla, Sergio; To, Cuong; Vaezeslami, Soheila; Morey, Robert; Srinivasan, Srimeenakshi; Dumdie, Jennifer N; Cook-Andersen, Heidi; Jenkins, Joby; Laurent, Louise C
2016-08-01
As the cost of next-generation sequencing has decreased, library preparation costs have become a more significant proportion of the total cost, especially for high-throughput applications such as single-cell RNA profiling. Here, we have applied novel technologies to scale down reaction volumes for library preparation. Our system consisted of in vitro differentiated human embryonic stem cells representing two stages of pancreatic differentiation, for which we prepared multiple biological and technical replicates. We used the Fluidigm (San Francisco, CA) C1 single-cell Autoprep System for single-cell complementary DNA (cDNA) generation and an enzyme-based tagmentation system (Nextera XT; Illumina, San Diego, CA) with a nanoliter liquid handler (mosquito HTS; TTP Labtech, Royston, UK) for library preparation, reducing the reaction volume down to 2 µL and using as little as 20 pg of input cDNA. The resulting sequencing data were bioinformatically analyzed and correlated among the different library reaction volumes. Our results showed that decreasing the reaction volume did not interfere with the quality or the reproducibility of the sequencing data, and the transcriptional data from the scaled-down libraries allowed us to distinguish between single cells. Thus, we have developed a process to enable efficient and cost-effective high-throughput single-cell transcriptome sequencing. © 2016 Society for Laboratory Automation and Screening.
Synthetic spike-in standards for high-throughput 16S rRNA gene amplicon sequencing
Tourlousse, Dieter M.; Yoshiike, Satowa; Ohashi, Akiko; Matsukura, Satoko; Noda, Naohiro
2017-01-01
Abstract High-throughput sequencing of 16S rRNA gene amplicons (16S-seq) has become a widely deployed method for profiling complex microbial communities but technical pitfalls related to data reliability and quantification remain to be fully addressed. In this work, we have developed and implemented a set of synthetic 16S rRNA genes to serve as universal spike-in standards for 16S-seq experiments. The spike-ins represent full-length 16S rRNA genes containing artificial variable regions with negligible identity to known nucleotide sequences, permitting unambiguous identification of spike-in sequences in 16S-seq read data from any microbiome sample. Using defined mock communities and environmental microbiota, we characterized the performance of the spike-in standards and demonstrated their utility for evaluating data quality on a per-sample basis. Further, we showed that staggered spike-in mixtures added at the point of DNA extraction enable concurrent estimation of absolute microbial abundances suitable for comparative analysis. Results also underscored that template-specific Illumina sequencing artifacts may lead to biases in the perceived abundance of certain taxa. Taken together, the spike-in standards represent a novel bioanalytical tool that can substantially improve 16S-seq-based microbiome studies by enabling comprehensive quality control along with absolute quantification. PMID:27980100
Blom, Mozes P K
2015-08-05
Recently developed molecular methods enable geneticists to target and sequence thousands of orthologous loci and infer evolutionary relationships across the tree of life. Large numbers of genetic markers benefit species tree inference but visual inspection of alignment quality, as traditionally conducted, is challenging with thousands of loci. Furthermore, due to the impracticality of repeated visual inspection with alternative filtering criteria, the potential consequences of using datasets with different degrees of missing data remain nominally explored in most empirical phylogenomic studies. In this short communication, I describe a flexible high-throughput pipeline designed to assess alignment quality and filter exonic sequence data for subsequent inference. The stringency criteria for alignment quality and missing data can be adapted based on the expected level of sequence divergence. Each alignment is automatically evaluated based on the stringency criteria specified, significantly reducing the number of alignments that require visual inspection. By developing a rapid method for alignment filtering and quality assessment, the consistency of phylogenetic estimation based on exonic sequence alignments can be further explored across distinct inference methods, while accounting for different degrees of missing data.
Zhu, Qihui; Smith, Shavannor M; Ayele, Mulu; Yang, Lixing; Jogi, Ansuya; Chaluvadi, Srinivasa R; Bennetzen, Jeffrey L
2012-11-01
Tef (Eragrostis tef) is a major cereal crop in Ethiopia. Lodging is the primary constraint to increasing productivity in this allotetraploid species, accounting for losses of ∼15-45% in yield each year. As a first step toward identifying semi-dwarf varieties that might have improved lodging resistance, an ∼6× fosmid library was constructed and used to identify both homeologues of the dw3 semi-dwarfing gene of Sorghum bicolor. An EMS mutagenized population, consisting of ∼21,210 tef plants, was planted and leaf materials were collected into 23 superpools. Two dwarfing candidate genes, homeologues of dw3 of sorghum and rht1 of wheat, were sequenced directly from each superpool with 454 technology, and 120 candidate mutations were identified. Out of 10 candidates tested, six independent mutations were validated by Sanger sequencing, including two predicted detrimental mutations in both dw3 homeologues with a potential to improve lodging resistance in tef through further breeding. This study demonstrates that high-throughput sequencing can identify potentially valuable mutations in under-studied plant species like tef and has provided mutant lines that can now be combined and tested in breeding programs for improved lodging resistance.
Morelli, M; Chiumenti, M; De Stradis, A; La Notte, P; Minafra, A
2015-02-01
Through the application of next generation sequencing, in synergy with conventional cloning of DOP-PCR fragments, two double-stranded RNA (dsRNA) molecules of about 1.5 kbp in size were isolated from leaf tissue of a Japanese persimmon (accession SSPI) from Apulia (southern Italy) showing veinlets necrosis. High-throughput sequencing allowed whole genome sequence assembly, yielding a 1,577 and a 1,491 bp contigs identified as dsRNA-1 and dsRNA-2 of a previously undescribed virus, provisionally named as Persimmon cryptic virus (PeCV). In silico analysis showed that both dsRNA fragments were monocistronic and comprised the RNA-dependent RNA polymerase (RdRp) and the capsid protein (CP) genes, respectively. Phylogenetic reconstruction revealed a close relationship of these dsRNAs with those of cryptoviruses described in woody and herbaceous hosts, recently gathered in genus Deltapartitivirus. Virus-specific primers for RT-PCR, designed in the CP cistron, detected viral RNAs also in symptomless persimmon trees sampled from the same geographical area of SSPI, thus proving that PeCV infection may be fairly common and presumably latent.
Zhou, Rong; Wang, Qian; Jiang, Fangling; Cao, Xue; Sun, Mintao; Liu, Min; Wu, Zhen
2016-01-01
MicroRNAs (miRNAs) are 19–24 nucleotide (nt) noncoding RNAs that play important roles in abiotic stress responses in plants. High temperatures have been the subject of considerable attention due to their negative effects on plant growth and development. Heat-responsive miRNAs have been identified in some plants. However, there have been no reports on the global identification of miRNAs and their targets in tomato at high temperatures, especially at different elevated temperatures. Here, three small-RNA libraries and three degradome libraries were constructed from the leaves of the heat-tolerant tomato at normal, moderately and acutely elevated temperatures (26/18 °C, 33/33 °C and 40/40 °C, respectively). Following high-throughput sequencing, 662 conserved and 97 novel miRNAs were identified in total with 469 conserved and 91 novel miRNAs shared in the three small-RNA libraries. Of these miRNAs, 96 and 150 miRNAs were responsive to the moderately and acutely elevated temperature, respectively. Following degradome sequencing, 349 sequences were identified as targets of 138 conserved miRNAs, and 13 sequences were identified as targets of eight novel miRNAs. The expression levels of seven miRNAs and six target genes obtained by quantitative real-time PCR (qRT-PCR) were largely consistent with the sequencing results. This study enriches the number of heat-responsive miRNAs and lays a foundation for the elucidation of the miRNA-mediated regulatory mechanism in tomatoes at elevated temperatures. PMID:27653374
Wilson, Kitchener D; Shen, Peidong; Fung, Eula; Karakikes, Ioannis; Zhang, Angela; InanlooRahatloo, Kolsoum; Odegaard, Justin; Sallam, Karim; Davis, Ronald W; Lui, George K; Ashley, Euan A; Scharfe, Curt; Wu, Joseph C
2015-09-11
Thousands of mutations across >50 genes have been implicated in inherited cardiomyopathies. However, options for sequencing this rapidly evolving gene set are limited because many sequencing services and off-the-shelf kits suffer from slow turnaround, inefficient capture of genomic DNA, and high cost. Furthermore, customization of these assays to cover emerging targets that suit individual needs is often expensive and time consuming. We sought to develop a custom high throughput, clinical-grade next-generation sequencing assay for detecting cardiac disease gene mutations with improved accuracy, flexibility, turnaround, and cost. We used double-stranded probes (complementary long padlock probes), an inexpensive and customizable capture technology, to efficiently capture and amplify the entire coding region and flanking intronic and regulatory sequences of 88 genes and 40 microRNAs associated with inherited cardiomyopathies, congenital heart disease, and cardiac development. Multiplexing 11 samples per sequencing run resulted in a mean base pair coverage of 420, of which 97% had >20× coverage and >99% were concordant with known heterozygous single nucleotide polymorphisms. The assay correctly detected germline variants in 24 individuals and revealed several polymorphic regions in miR-499. Total run time was 3 days at an approximate cost of $100 per sample. Accurate, high-throughput detection of mutations across numerous cardiac genes is achievable with complementary long padlock probe technology. Moreover, this format allows facile insertion of additional probes as more cardiomyopathy and congenital heart disease genes are discovered, giving researchers a powerful new tool for DNA mutation detection and discovery. © 2015 American Heart Association, Inc.
CisSERS: Customizable in silico sequence evaluation for restriction sites
Sharpe, Richard M.; Koepke, Tyson; Harper, Artemus; ...
2016-04-12
High-throughput sequencing continues to produce an immense volume of information that is processed and assembled into mature sequence data. Here, data analysis tools are urgently needed that leverage the embedded DNA sequence polymorphisms and consequent changes to restriction sites or sequence motifs in a high-throughput manner to enable biological experimentation. CisSERS was developed as a standalone open source tool to analyze sequence datasets and provide biologists with individual or comparative genome organization information in terms of presence and frequency of patterns or motifs such as restriction enzymes. Predicted agarose gel visualization of the custom analyses results was also integrated tomore » enhance the usefulness of the software. CisSERS offers several novel functionalities, such as handling of large and multiple datasets in parallel, multiple restriction enzyme site detection and custom motif detection features, which are seamlessly integrated with real time agarose gel visualization. Using a simple fasta-formatted file as input, CisSERS utilizes the REBASE enzyme database. Results from CisSERSenable the user to make decisions for designing genotyping by sequencing experiments, reduced representation sequencing, 3’UTR sequencing, and cleaved amplified polymorphic sequence (CAPS) molecular markers for large sample sets. CisSERS is a java based graphical user interface built around a perl backbone. Several of the applications of CisSERS including CAPS molecular marker development were successfully validated using wet-lab experimentation. Here, we present the tool CisSERSand results from in-silico and corresponding wet-lab analyses demonstrating that CisSERS is a technology platform solution that facilitates efficient data utilization in genomics and genetics studies.« less
CisSERS: Customizable in silico sequence evaluation for restriction sites
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sharpe, Richard M.; Koepke, Tyson; Harper, Artemus
High-throughput sequencing continues to produce an immense volume of information that is processed and assembled into mature sequence data. Here, data analysis tools are urgently needed that leverage the embedded DNA sequence polymorphisms and consequent changes to restriction sites or sequence motifs in a high-throughput manner to enable biological experimentation. CisSERS was developed as a standalone open source tool to analyze sequence datasets and provide biologists with individual or comparative genome organization information in terms of presence and frequency of patterns or motifs such as restriction enzymes. Predicted agarose gel visualization of the custom analyses results was also integrated tomore » enhance the usefulness of the software. CisSERS offers several novel functionalities, such as handling of large and multiple datasets in parallel, multiple restriction enzyme site detection and custom motif detection features, which are seamlessly integrated with real time agarose gel visualization. Using a simple fasta-formatted file as input, CisSERS utilizes the REBASE enzyme database. Results from CisSERSenable the user to make decisions for designing genotyping by sequencing experiments, reduced representation sequencing, 3’UTR sequencing, and cleaved amplified polymorphic sequence (CAPS) molecular markers for large sample sets. CisSERS is a java based graphical user interface built around a perl backbone. Several of the applications of CisSERS including CAPS molecular marker development were successfully validated using wet-lab experimentation. Here, we present the tool CisSERSand results from in-silico and corresponding wet-lab analyses demonstrating that CisSERS is a technology platform solution that facilitates efficient data utilization in genomics and genetics studies.« less
High Throughput Biological Analysis Using Multi-bit Magnetic Digital Planar Tags
NASA Astrophysics Data System (ADS)
Hong, B.; Jeong, J.-R.; Llandro, J.; Hayward, T. J.; Ionescu, A.; Trypiniotis, T.; Mitrelias, T.; Kopper, K. P.; Steinmuller, S. J.; Bland, J. A. C.
2008-06-01
We report a new magnetic labelling technology for high-throughput biomolecular identification and DNA sequencing. Planar multi-bit magnetic tags have been designed and fabricated, which comprise a magnetic barcode formed by an ensemble of micron-sized thin film Ni80Fe20 bars encapsulated in SU8. We show that by using a globally applied magnetic field and magneto-optical Kerr microscopy the magnetic elements in the multi-bit magnetic tags can be addressed individually and encoded/decoded remotely. The critical steps needed to show the feasibility of this technology are demonstrated, including fabrication, flow transport, remote writing and reading, and successful functionalization of the tags as verified by fluorescence detection. This approach is ideal for encoding information on tags in microfluidic flow or suspension, for such applications as labelling of chemical precursors during drug synthesis and combinatorial library-based high-throughput multiplexed bioassays.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wang, Yonggang, E-mail: wangyg@ustc.edu.cn; Hui, Cong; Liu, Chong
The contribution of this paper is proposing a new entropy extraction mechanism based on sampling phase jitter in ring oscillators to make a high throughput true random number generator in a field programmable gate array (FPGA) practical. Starting from experimental observation and analysis of the entropy source in FPGA, a multi-phase sampling method is exploited to harvest the clock jitter with a maximum entropy and fast sampling speed. This parametrized design is implemented in a Xilinx Artix-7 FPGA, where the carry chains in the FPGA are explored to realize the precise phase shifting. The generator circuit is simple and resource-saving,more » so that multiple generation channels can run in parallel to scale the output throughput for specific applications. The prototype integrates 64 circuit units in the FPGA to provide a total output throughput of 7.68 Gbps, which meets the requirement of current high-speed quantum key distribution systems. The randomness evaluation, as well as its robustness to ambient temperature, confirms that the new method in a purely digital fashion can provide high-speed high-quality random bit sequences for a variety of embedded applications.« less
Wang, Yonggang; Hui, Cong; Liu, Chong; Xu, Chao
2016-04-01
The contribution of this paper is proposing a new entropy extraction mechanism based on sampling phase jitter in ring oscillators to make a high throughput true random number generator in a field programmable gate array (FPGA) practical. Starting from experimental observation and analysis of the entropy source in FPGA, a multi-phase sampling method is exploited to harvest the clock jitter with a maximum entropy and fast sampling speed. This parametrized design is implemented in a Xilinx Artix-7 FPGA, where the carry chains in the FPGA are explored to realize the precise phase shifting. The generator circuit is simple and resource-saving, so that multiple generation channels can run in parallel to scale the output throughput for specific applications. The prototype integrates 64 circuit units in the FPGA to provide a total output throughput of 7.68 Gbps, which meets the requirement of current high-speed quantum key distribution systems. The randomness evaluation, as well as its robustness to ambient temperature, confirms that the new method in a purely digital fashion can provide high-speed high-quality random bit sequences for a variety of embedded applications.
Future technologies for monitoring HIV drug resistance and cure.
Parikh, Urvi M; McCormick, Kevin; van Zyl, Gert; Mellors, John W
2017-03-01
Sensitive, scalable and affordable assays are critically needed for monitoring the success of interventions for preventing, treating and attempting to cure HIV infection. This review evaluates current and emerging technologies that are applicable for both surveillance of HIV drug resistance (HIVDR) and characterization of HIV reservoirs that persist despite antiretroviral therapy and are obstacles to curing HIV infection. Next-generation sequencing (NGS) has the potential to be adapted into high-throughput, cost-efficient approaches for HIVDR surveillance and monitoring during continued scale-up of antiretroviral therapy and rollout of preexposure prophylaxis. Similarly, improvements in PCR and NGS are resulting in higher throughput single genome sequencing to detect intact proviruses and to characterize HIV integration sites and clonal expansions of infected cells. Current population genotyping methods for resistance monitoring are high cost and low throughput. NGS, combined with simpler sample collection and storage matrices (e.g. dried blood spots), has considerable potential to broaden global surveillance and patient monitoring for HIVDR. Recent adaptions of NGS to identify integration sites of HIV in the human genome and to characterize the integrated HIV proviruses are likely to facilitate investigations of the impact of experimental 'curative' interventions on HIV reservoirs.
Murillo, Gabriel H; You, Na; Su, Xiaoquan; Cui, Wei; Reilly, Muredach P; Li, Mingyao; Ning, Kang; Cui, Xinping
2016-05-15
Single nucleotide variant (SNV) detection procedures are being utilized as never before to analyze the recent abundance of high-throughput DNA sequencing data, both on single and multiple sample datasets. Building on previously published work with the single sample SNV caller genotype model selection (GeMS), a multiple sample version of GeMS (MultiGeMS) is introduced. Unlike other popular multiple sample SNV callers, the MultiGeMS statistical model accounts for enzymatic substitution sequencing errors. It also addresses the multiple testing problem endemic to multiple sample SNV calling and utilizes high performance computing (HPC) techniques. A simulation study demonstrates that MultiGeMS ranks highest in precision among a selection of popular multiple sample SNV callers, while showing exceptional recall in calling common SNVs. Further, both simulation studies and real data analyses indicate that MultiGeMS is robust to low-quality data. We also demonstrate that accounting for enzymatic substitution sequencing errors not only improves SNV call precision at low mapping quality regions, but also improves recall at reference allele-dominated sites with high mapping quality. The MultiGeMS package can be downloaded from https://github.com/cui-lab/multigems xinping.cui@ucr.edu Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
SubCellProt: predicting protein subcellular localization using machine learning approaches.
Garg, Prabha; Sharma, Virag; Chaudhari, Pradeep; Roy, Nilanjan
2009-01-01
High-throughput genome sequencing projects continue to churn out enormous amounts of raw sequence data. However, most of this raw sequence data is unannotated and, hence, not very useful. Among the various approaches to decipher the function of a protein, one is to determine its localization. Experimental approaches for proteome annotation including determination of a protein's subcellular localizations are very costly and labor intensive. Besides the available experimental methods, in silico methods present alternative approaches to accomplish this task. Here, we present two machine learning approaches for prediction of the subcellular localization of a protein from the primary sequence information. Two machine learning algorithms, k Nearest Neighbor (k-NN) and Probabilistic Neural Network (PNN) were used to classify an unknown protein into one of the 11 subcellular localizations. The final prediction is made on the basis of a consensus of the predictions made by two algorithms and a probability is assigned to it. The results indicate that the primary sequence derived features like amino acid composition, sequence order and physicochemical properties can be used to assign subcellular localization with a fair degree of accuracy. Moreover, with the enhanced accuracy of our approach and the definition of a prediction domain, this method can be used for proteome annotation in a high throughput manner. SubCellProt is available at www.databases.niper.ac.in/SubCellProt.
TARGETED CAPTURE IN EVOLUTIONARY AND ECOLOGICAL GENOMICS
Jones, Matthew R.; Good, Jeffrey M.
2016-01-01
The rapid expansion of next-generation sequencing has yielded a powerful array of tools to address fundamental biological questions at a scale that was inconceivable just a few years ago. Various genome partitioning strategies to sequence select subsets of the genome have emerged as powerful alternatives to whole genome sequencing in ecological and evolutionary genomic studies. High throughput targeted capture is one such strategy that involves the parallel enrichment of pre-selected genomic regions of interest. The growing use of targeted capture demonstrates its potential power to address a range of research questions, yet these approaches have yet to expand broadly across labs focused on evolutionary and ecological genomics. In part, the use of targeted capture has been hindered by the logistics of capture design and implementation in species without established reference genomes. Here we aim to 1) increase the accessibility of targeted capture to researchers working in non-model taxa by discussing capture methods that circumvent the need of a reference genome, 2) highlight the evolutionary and ecological applications where this approach is emerging as a powerful sequencing strategy, and 3) discuss the future of targeted capture and other genome partitioning approaches in light of the increasing accessibility of whole genome sequencing. Given the practical advantages and increasing feasibility of high-throughput targeted capture, we anticipate an ongoing expansion of capture-based approaches in evolutionary and ecological research, synergistic with an expansion of whole genome sequencing. PMID:26137993
fluff: exploratory analysis and visualization of high-throughput sequencing data
Georgiou, Georgios
2016-01-01
Summary. In this article we describe fluff, a software package that allows for simple exploration, clustering and visualization of high-throughput sequencing data mapped to a reference genome. The package contains three command-line tools to generate publication-quality figures in an uncomplicated manner using sensible defaults. Genome-wide data can be aggregated, clustered and visualized in a heatmap, according to different clustering methods. This includes a predefined setting to identify dynamic clusters between different conditions or developmental stages. Alternatively, clustered data can be visualized in a bandplot. Finally, fluff includes a tool to generate genomic profiles. As command-line tools, the fluff programs can easily be integrated into standard analysis pipelines. The installation is straightforward and documentation is available at http://fluff.readthedocs.org. Availability. fluff is implemented in Python and runs on Linux. The source code is freely available for download at https://github.com/simonvh/fluff. PMID:27547532
Egorov, Evgeny S; Merzlyak, Ekaterina M; Shelenkov, Andrew A; Britanova, Olga V; Sharonov, George V; Staroverov, Dmitriy B; Bolotin, Dmitriy A; Davydov, Alexey N; Barsova, Ekaterina; Lebedev, Yuriy B; Shugay, Mikhail; Chudakov, Dmitriy M
2015-06-15
Emerging high-throughput sequencing methods for the analyses of complex structure of TCR and BCR repertoires give a powerful impulse to adaptive immunity studies. However, there are still essential technical obstacles for performing a truly quantitative analysis. Specifically, it remains challenging to obtain comprehensive information on the clonal composition of small lymphocyte populations, such as Ag-specific, functional, or tissue-resident cell subsets isolated by sorting, microdissection, or fine needle aspirates. In this study, we report a robust approach based on unique molecular identifiers that allows profiling Ag receptors for several hundred to thousand lymphocytes while preserving qualitative and quantitative information on clonal composition of the sample. We also describe several general features regarding the data analysis with unique molecular identifiers that are critical for accurate counting of starting molecules in high-throughput sequencing applications. Copyright © 2015 by The American Association of Immunologists, Inc.
Kondrashova, Olga; Love, Clare J.; Lunke, Sebastian; Hsu, Arthur L.; Waring, Paul M.; Taylor, Graham R.
2015-01-01
Whilst next generation sequencing can report point mutations in fixed tissue tumour samples reliably, the accurate determination of copy number is more challenging. The conventional Multiplex Ligation-dependent Probe Amplification (MLPA) assay is an effective tool for measurement of gene dosage, but is restricted to around 50 targets due to size resolution of the MLPA probes. By switching from a size-resolved format, to a sequence-resolved format we developed a scalable, high-throughput, quantitative assay. MLPA-seq is capable of detecting deletions, duplications, and amplifications in as little as 5ng of genomic DNA, including from formalin-fixed paraffin-embedded (FFPE) tumour samples. We show that this method can detect BRCA1, BRCA2, ERBB2 and CCNE1 copy number changes in DNA extracted from snap-frozen and FFPE tumour tissue, with 100% sensitivity and >99.5% specificity. PMID:26569395
Low-rank and Adaptive Sparse Signal (LASSI) Models for Highly Accelerated Dynamic Imaging
Ravishankar, Saiprasad; Moore, Brian E.; Nadakuditi, Raj Rao; Fessler, Jeffrey A.
2017-01-01
Sparsity-based approaches have been popular in many applications in image processing and imaging. Compressed sensing exploits the sparsity of images in a transform domain or dictionary to improve image recovery from undersampled measurements. In the context of inverse problems in dynamic imaging, recent research has demonstrated the promise of sparsity and low-rank techniques. For example, the patches of the underlying data are modeled as sparse in an adaptive dictionary domain, and the resulting image and dictionary estimation from undersampled measurements is called dictionary-blind compressed sensing, or the dynamic image sequence is modeled as a sum of low-rank and sparse (in some transform domain) components (L+S model) that are estimated from limited measurements. In this work, we investigate a data-adaptive extension of the L+S model, dubbed LASSI, where the temporal image sequence is decomposed into a low-rank component and a component whose spatiotemporal (3D) patches are sparse in some adaptive dictionary domain. We investigate various formulations and efficient methods for jointly estimating the underlying dynamic signal components and the spatiotemporal dictionary from limited measurements. We also obtain efficient sparsity penalized dictionary-blind compressed sensing methods as special cases of our LASSI approaches. Our numerical experiments demonstrate the promising performance of LASSI schemes for dynamic magnetic resonance image reconstruction from limited k-t space data compared to recent methods such as k-t SLR and L+S, and compared to the proposed dictionary-blind compressed sensing method. PMID:28092528
Low-Rank and Adaptive Sparse Signal (LASSI) Models for Highly Accelerated Dynamic Imaging.
Ravishankar, Saiprasad; Moore, Brian E; Nadakuditi, Raj Rao; Fessler, Jeffrey A
2017-05-01
Sparsity-based approaches have been popular in many applications in image processing and imaging. Compressed sensing exploits the sparsity of images in a transform domain or dictionary to improve image recovery fromundersampledmeasurements. In the context of inverse problems in dynamic imaging, recent research has demonstrated the promise of sparsity and low-rank techniques. For example, the patches of the underlying data are modeled as sparse in an adaptive dictionary domain, and the resulting image and dictionary estimation from undersampled measurements is called dictionary-blind compressed sensing, or the dynamic image sequence is modeled as a sum of low-rank and sparse (in some transform domain) components (L+S model) that are estimated from limited measurements. In this work, we investigate a data-adaptive extension of the L+S model, dubbed LASSI, where the temporal image sequence is decomposed into a low-rank component and a component whose spatiotemporal (3D) patches are sparse in some adaptive dictionary domain. We investigate various formulations and efficient methods for jointly estimating the underlying dynamic signal components and the spatiotemporal dictionary from limited measurements. We also obtain efficient sparsity penalized dictionary-blind compressed sensing methods as special cases of our LASSI approaches. Our numerical experiments demonstrate the promising performance of LASSI schemes for dynamicmagnetic resonance image reconstruction from limited k-t space data compared to recent methods such as k-t SLR and L+S, and compared to the proposed dictionary-blind compressed sensing method.
Du, Yushen; Wu, Nicholas C; Jiang, Lin; Zhang, Tianhao; Gong, Danyang; Shu, Sara; Wu, Ting-Ting; Sun, Ren
2016-11-01
Identification and annotation of functional residues are fundamental questions in protein sequence analysis. Sequence and structure conservation provides valuable information to tackle these questions. It is, however, limited by the incomplete sampling of sequence space in natural evolution. Moreover, proteins often have multiple functions, with overlapping sequences that present challenges to accurate annotation of the exact functions of individual residues by conservation-based methods. Using the influenza A virus PB1 protein as an example, we developed a method to systematically identify and annotate functional residues. We used saturation mutagenesis and high-throughput sequencing to measure the replication capacity of single nucleotide mutations across the entire PB1 protein. After predicting protein stability upon mutations, we identified functional PB1 residues that are essential for viral replication. To further annotate the functional residues important to the canonical or noncanonical functions of viral RNA-dependent RNA polymerase (vRdRp), we performed a homologous-structure analysis with 16 different vRdRp structures. We achieved high sensitivity in annotating the known canonical polymerase functional residues. Moreover, we identified a cluster of noncanonical functional residues located in the loop region of the PB1 β-ribbon. We further demonstrated that these residues were important for PB1 protein nuclear import through the interaction with Ran-binding protein 5. In summary, we developed a systematic and sensitive method to identify and annotate functional residues that are not restrained by sequence conservation. Importantly, this method is generally applicable to other proteins about which homologous-structure information is available. To fully comprehend the diverse functions of a protein, it is essential to understand the functionality of individual residues. Current methods are highly dependent on evolutionary sequence conservation, which is usually limited by sampling size. Sequence conservation-based methods are further confounded by structural constraints and multifunctionality of proteins. Here we present a method that can systematically identify and annotate functional residues of a given protein. We used a high-throughput functional profiling platform to identify essential residues. Coupling it with homologous-structure comparison, we were able to annotate multiple functions of proteins. We demonstrated the method with the PB1 protein of influenza A virus and identified novel functional residues in addition to its canonical function as an RNA-dependent RNA polymerase. Not limited to virology, this method is generally applicable to other proteins that can be functionally selected and about which homologous-structure information is available. Copyright © 2016 Du et al.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gallaher, Sean D.; Fitz-Gibbon, Sorel T.; Strenkert, Daniela
Chlamydomonas reinhardtii is a unicellular chlorophyte alga that is widely studied as a reference organism for understanding photosynthesis, sensory and motile cilia, and for development of an algal-based platform for producing biofuels and bio-products. Its highly repetitive, ~205-kbp circular chloroplast genome and ~15.8-kbp linear mitochondrial genome were sequenced prior to the advent of high-throughput sequencing technologies. Here, high coverage shotgun sequencing was used to assemble both organellar genomes de novo. These new genomes correct dozens of errors in the prior genome sequences and annotations. Gen-ome sequencing coverage indicates that each cell contains on average 83 copies of the chloroplast genomemore » and 130 copies of the mitochondrial genome. Using protocols and analyses optimized for organellar tran-scripts, RNA-Seq was used to quantify their relative abundances across 12 different growth conditions. Forty-six percent of total cellular mRNA is attributable to high expression from a few dozen chloroplast genes. RNA-Seq data were used to guide gene annotation, to demonstrate polycistronic gene expression, and to quantify splicing of psaA and psbA introns. In contrast to a conclusion from a recent study, we found that chloroplast transcripts are not edited. Unexpectedly, cytosine-rich polynucleotide tails were observed at the 3’-end of all mitochondrial transcripts. A comparative genomics analysis of eight laboratory strains and 11 wild isolates of C. reinhardtii identified 2658 variants in the organellargenomes, which is 1/10th as much genetic diversity as is found in the nucleus.« less
Identification of miRNAs Involved in Stolon Formation in Tulipa edulis by High-Throughput Sequencing
Zhu, Zaibiao; Miao, Yuanyuan; Guo, Qiaosheng; Zhu, Yunhao; Yang, Xiaohua; Sun, Yuan
2016-01-01
MicroRNAs (miRNAs) are a class of endogenous, non-coding small RNAs that play an important role in transcriptional and post-transcriptional gene regulation. However, the sequence information and functions of miRNAs are still unexplored in Tulipa edulis. In this study, high-throughput sequencing was used to identify small RNAs in stolon formation stages (stage 1, 2, and 3) in T. edulis. A total of 12,890,912, 12,182,122, and 12,061,434 clean reads were obtained from stage 1, 2, and 3, respectively. Among the reads, 88 conserved miRNAs and 70 novel miRNAs were identified. Target prediction of 122 miRNAs resulted in 531 potential target genes. Nr, Swiss-Prot, GO, COG, and KEGG annotations revealed that these target genes participate in many biologic and metabolic processes. Moreover, qRT-PCR was performed to analyze the expression levels of the miRNAs and target genes in stolon formation. The results revealed that miRNAs play a key role in T. edulis stolon formation. PMID:27446103
Zhu, Zaibiao; Miao, Yuanyuan; Guo, Qiaosheng; Zhu, Yunhao; Yang, Xiaohua; Sun, Yuan
2016-01-01
MicroRNAs (miRNAs) are a class of endogenous, non-coding small RNAs that play an important role in transcriptional and post-transcriptional gene regulation. However, the sequence information and functions of miRNAs are still unexplored in Tulipa edulis. In this study, high-throughput sequencing was used to identify small RNAs in stolon formation stages (stage 1, 2, and 3) in T. edulis. A total of 12,890,912, 12,182,122, and 12,061,434 clean reads were obtained from stage 1, 2, and 3, respectively. Among the reads, 88 conserved miRNAs and 70 novel miRNAs were identified. Target prediction of 122 miRNAs resulted in 531 potential target genes. Nr, Swiss-Prot, GO, COG, and KEGG annotations revealed that these target genes participate in many biologic and metabolic processes. Moreover, qRT-PCR was performed to analyze the expression levels of the miRNAs and target genes in stolon formation. The results revealed that miRNAs play a key role in T. edulis stolon formation.
Tacchi, Luca; Larragoite, Erin; Salinas, Irene
2013-01-01
J chain is a small polypeptide responsible for immunoglobulin (Ig) polymerization and transport of Igs across mucosal surfaces in higher vertebrates. We identified a J chain in dipnoid fish, the African lungfish (Protopterus dolloi) by high throughput sequencing of the transcriptome. P. dolloi J chain is 161 aa long and contains six of the eight Cys residues present in mammalian J chain. Phylogenetic studies place the lungfish J chain closer to tetrapod J chain than to the coelacanth or nurse shark sequences. J chain expression occurs in all P. dolloi immune tissues examined and it increases in the gut and kidney in response to an experimental bacterial infection. Double fluorescent in-situ hybridization shows that 88.5% of IgM+ cells in the gut co-express J chain, a significantly higher percentage than in the pre-pyloric spleen. Importantly, J chain expression is not restricted to the B-cell compartment since gut epithelial cells also express J chain. These results improve our current view of J chain from a phylogenetic perspective. PMID:23967082
Fykse, Else Marie; Aarskaug, Tone; Madslien, Elisabeth H; Dybwad, Marius
2016-12-01
High-throughput amplicon sequencing of six biomass samples from a full-scale anaerobic reactor at a Norwegian wood and pulp factory using Biothane Biobed Expanded Granular Sludge Bed (EGSB) technology during start-up and first year of operation was performed. A total of 106,166 16S rRNA gene sequences (V3-V5 region) were obtained. The number of operational taxonomic units (OTUs) ranged from 595 to 2472, and a total of 38 different phyla and 143 families were observed. The predominant phyla were Bacteroidetes, Chloroflexi, Firmicutes, Proteobacteria, and Spirochaetes. A more diverse microbial community was observed in the inoculum biomass coming from an Upflow Anaerobic Sludge Blanket (USAB) reactor, reflecting an adaptation of the inoculum diversity to the specific conditions of the new reactor. In addition, no taxa classified as obligate pathogens were identified and potentially opportunistic pathogens were absent or observed in low abundances. No Legionella bacteria were identified by traditional culture-based and molecular methods. Copyright © 2016 Elsevier Ltd. All rights reserved.
Pyicos: a versatile toolkit for the analysis of high-throughput sequencing data.
Althammer, Sonja; González-Vallinas, Juan; Ballaré, Cecilia; Beato, Miguel; Eyras, Eduardo
2011-12-15
High-throughput sequencing (HTS) has revolutionized gene regulation studies and is now fundamental for the detection of protein-DNA and protein-RNA binding, as well as for measuring RNA expression. With increasing variety and sequencing depth of HTS datasets, the need for more flexible and memory-efficient tools to analyse them is growing. We describe Pyicos, a powerful toolkit for the analysis of mapped reads from diverse HTS experiments: ChIP-Seq, either punctuated or broad signals, CLIP-Seq and RNA-Seq. We prove the effectiveness of Pyicos to select for significant signals and show that its accuracy is comparable and sometimes superior to that of methods specifically designed for each particular type of experiment. Pyicos facilitates the analysis of a variety of HTS datatypes through its flexibility and memory efficiency, providing a useful framework for data integration into models of regulatory genomics. Open-source software, with tutorials and protocol files, is available at http://regulatorygenomics.upf.edu/pyicos or as a Galaxy server at http://regulatorygenomics.upf.edu/galaxy eduardo.eyras@upf.edu Supplementary data are available at Bioinformatics online.
Variability Studies of Two Prunus-Infecting Fabaviruses with the Aid of High-Throughput Sequencing
Sarkisova, Tatiana; Lenz, Ondřej; Přibylová, Jaroslava; Špak, Josef; Lotos, Leonidas; Beta, Christina; Katsiani, Asimina; Candresse, Thierry
2018-01-01
During their lifetime, perennial woody plants are expected to face multiple infection events. Furthermore, multiple genotypes of individual virus species may co-infect the same host. This may eventually lead to a situation where plants harbor complex communities of viral species/strains. Using high-throughput sequencing, we describe co-infection of sweet and sour cherry trees with diverse genomic variants of two closely related viruses, namely prunus virus F (PrVF) and cherry virus F (CVF). Both viruses are most homologous to members of the Fabavirus genus (Secoviridae family). The comparison of CVF and PrVF RNA2 genomic sequences suggests that the two viruses may significantly differ in their expression strategy. Indeed, similar to comoviruses, the smaller genomic segment of PrVF, RNA2, may be translated in two collinear proteins while CVF likely expresses only the shorter of these two proteins. Linked with the observation that identity levels between the coat proteins of these two viruses are significantly below the family species demarcation cut-off, these findings support the idea that CVF and PrVF represent two separate Fabavirus species. PMID:29670059
Farlora, Rodolfo; Araya-Garay, José; Gallardo-Escárate, Cristian
2014-06-01
Understanding the molecular underpinnings involved in the reproduction of the salmon louse is critical for designing novel strategies of pest management for this ectoparasite. However, genomic information on sex-related genes is still limited. In the present work, sex-specific gene transcription was revealed in the salmon louse Caligus rogercresseyi using high-throughput Illumina sequencing. A total of 30,191,914 and 32,292,250 high quality reads were generated for females and males, and these were de novo assembled into 32,173 and 38,177 contigs, respectively. Gene ontology analysis showed a pattern of higher expression in the female as compared to the male transcriptome. Based on our sequence analysis and known sex-related proteins, several genes putatively involved in sex differentiation, including Dmrt3, FOXL2, VASA, and FEM1, and other potentially significant candidate genes in C. rogercresseyi, were identified for the first time. In addition, the occurrence of SNPs in several differentially expressed contigs annotating for sex-related genes was found. This transcriptome dataset provides a useful resource for future functional analyses, opening new opportunities for sea lice pest control. Copyright © 2014 Elsevier B.V. All rights reserved.
Iteration and superposition encryption scheme for image sequences based on multi-dimensional keys
NASA Astrophysics Data System (ADS)
Han, Chao; Shen, Yuzhen; Ma, Wenlin
2017-12-01
An iteration and superposition encryption scheme for image sequences based on multi-dimensional keys is proposed for high security, big capacity and low noise information transmission. Multiple images to be encrypted are transformed into phase-only images with the iterative algorithm and then are encrypted by different random phase, respectively. The encrypted phase-only images are performed by inverse Fourier transform, respectively, thus new object functions are generated. The new functions are located in different blocks and padded zero for a sparse distribution, then they propagate to a specific region at different distances by angular spectrum diffraction, respectively and are superposed in order to form a single image. The single image is multiplied with a random phase in the frequency domain and then the phase part of the frequency spectrums is truncated and the amplitude information is reserved. The random phase, propagation distances, truncated phase information in frequency domain are employed as multiple dimensional keys. The iteration processing and sparse distribution greatly reduce the crosstalk among the multiple encryption images. The superposition of image sequences greatly improves the capacity of encrypted information. Several numerical experiments based on a designed optical system demonstrate that the proposed scheme can enhance encrypted information capacity and make image transmission at a highly desired security level.
A computational genomics pipeline for prokaryotic sequencing projects
Kislyuk, Andrey O.; Katz, Lee S.; Agrawal, Sonia; Hagen, Matthew S.; Conley, Andrew B.; Jayaraman, Pushkala; Nelakuditi, Viswateja; Humphrey, Jay C.; Sammons, Scott A.; Govil, Dhwani; Mair, Raydel D.; Tatti, Kathleen M.; Tondella, Maria L.; Harcourt, Brian H.; Mayer, Leonard W.; Jordan, I. King
2010-01-01
Motivation: New sequencing technologies have accelerated research on prokaryotic genomes and have made genome sequencing operations outside major genome sequencing centers routine. However, no off-the-shelf solution exists for the combined assembly, gene prediction, genome annotation and data presentation necessary to interpret sequencing data. The resulting requirement to invest significant resources into custom informatics support for genome sequencing projects remains a major impediment to the accessibility of high-throughput sequence data. Results: We present a self-contained, automated high-throughput open source genome sequencing and computational genomics pipeline suitable for prokaryotic sequencing projects. The pipeline has been used at the Georgia Institute of Technology and the Centers for Disease Control and Prevention for the analysis of Neisseria meningitidis and Bordetella bronchiseptica genomes. The pipeline is capable of enhanced or manually assisted reference-based assembly using multiple assemblers and modes; gene predictor combining; and functional annotation of genes and gene products. Because every component of the pipeline is executed on a local machine with no need to access resources over the Internet, the pipeline is suitable for projects of a sensitive nature. Annotation of virulence-related features makes the pipeline particularly useful for projects working with pathogenic prokaryotes. Availability and implementation: The pipeline is licensed under the open-source GNU General Public License and available at the Georgia Tech Neisseria Base (http://nbase.biology.gatech.edu/). The pipeline is implemented with a combination of Perl, Bourne Shell and MySQL and is compatible with Linux and other Unix systems. Contact: king.jordan@biology.gatech.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:20519285
Cotney, Justin L; Noonan, James P
2015-02-02
Chromatin immunoprecipitation coupled with high-throughput sequencing (ChIP-Seq) is a powerful method used to identify genome-wide binding patterns of transcription factors and distribution of various histone modifications associated with different chromatin states. In most published studies, ChIP-Seq has been performed on cultured cells grown under controlled conditions, allowing generation of large amounts of material in a homogeneous biological state. Although such studies have provided great insight into the dynamic landscapes of animal genomes, they do not allow the examination of transcription factor binding and chromatin states in adult tissues, developing embryonic structures, or tumors. Such knowledge is critical to understanding the information required to create and maintain a complex biological tissue and to identify noncoding regions of the genome directly involved in tissues affected by complex diseases such as autism. Studying these tissue types with ChIP-Seq can be challenging due to the limited availability of tissues and the lack of complex biological states able to be achieved in culture. These inherent differences require alterations of standard cross-linking and chromatin extraction typically used in cell culture. Here we describe a general approach for using small amounts of animal tissue to perform ChIP-Seq directed at histone modifications and transcription factors. Tissue is homogenized before treatment with formaldehyde to ensure proper cross-linking, and a two-step nuclear isolation is performed to increase extraction of soluble chromatin. Small amounts of soluble chromatin are then used for immunoprecipitation (IP) and prepared for multiplexed high-throughput sequencing. © 2015 Cold Spring Harbor Laboratory Press.
Kumar, Dhananjay; Dutta, Summi; Singh, Dharmendra; Prabhu, Kumble Vinod; Kumar, Manish; Mukhopadhyay, Kunal
2017-01-01
Deep sequencing identified 497 conserved and 559 novel miRNAs in wheat, while degradome analysis revealed 701 targets genes. QRT-PCR demonstrated differential expression of miRNAs during stages of leaf rust progression. Bread wheat (Triticum aestivum L.) is an important cereal food crop feeding 30 % of the world population. Major threat to wheat production is the rust epidemics. This study was targeted towards identification and functional characterizations of micro(mi)RNAs and their target genes in wheat in response to leaf rust ingression. High-throughput sequencing was used for transcriptome-wide identification of miRNAs and their expression profiling in retort to leaf rust using mock and pathogen-inoculated resistant and susceptible near-isogenic wheat plants. A total of 1056 mature miRNAs were identified, of which 497 miRNAs were conserved and 559 miRNAs were novel. The pathogen-inoculated resistant plants manifested more miRNAs compared with the pathogen infected susceptible plants. The miRNA counts increased in susceptible isoline due to leaf rust, conversely, the counts decreased in the resistant isoline in response to pathogenesis illustrating precise spatial tuning of miRNAs during compatible and incompatible interaction. Stem-loop quantitative real-time PCR was used to profile 10 highly differentially expressed miRNAs obtained from high-throughput sequencing data. The spatio-temporal profiling validated the differential expression of miRNAs between the isolines as well as in retort to pathogen infection. Degradome analysis provided 701 predicted target genes associated with defense response, signal transduction, development, metabolism, and transcriptional regulation. The obtained results indicate that wheat isolines employ diverse arrays of miRNAs that modulate their target genes during compatible and incompatible interaction. Our findings contribute to increase knowledge on roles of microRNA in wheat-leaf rust interactions and could help in rust resistance breeding programs.
High-throughput sequencing of three Lemnoideae (duckweeds) chloroplast genomes from total DNA.
Wang, Wenqin; Messing, Joachim
2011-01-01
Chloroplast genomes provide a wealth of information for evolutionary and population genetic studies. Chloroplasts play a particularly important role in the adaption for aquatic plants because they float on water and their major surface is exposed continuously to sunlight. The subfamily of Lemnoideae represents such a collection of aquatic species that because of photosynthesis represents one of the fastest growing plant species on earth. We sequenced the chloroplast genomes from three different genera of Lemnoideae, Spirodela polyrhiza, Wolffiella lingulata and Wolffia australiana by high-throughput DNA sequencing of genomic DNA using the SOLiD platform. Unfractionated total DNA contains high copies of plastid DNA so that sequences from the nucleus and mitochondria can easily be filtered computationally. Remaining sequence reads were assembled into contiguous sequences (contigs) using SOLiD software tools. Contigs were mapped to a reference genome of Lemna minor and gaps, selected by PCR, were sequenced on the ABI3730xl platform. This combinatorial approach yielded whole genomic contiguous sequences in a cost-effective manner. Over 1,000-time coverage of chloroplast from total DNA were reached by the SOLiD platform in a single spot on a quadrant slide without purification. Comparative analysis indicated that the chloroplast genome was conserved in gene number and organization with respect to the reference genome of L. minor. However, higher nucleotide substitution, abundant deletions and insertions occurred in non-coding regions of these genomes, indicating a greater genomic dynamics than expected from the comparison of other related species in the Pooideae. Noticeably, there was no transition bias over transversion in Lemnoideae. The data should have immediate applications in evolutionary biology and plant taxonomy with increased resolution and statistical power.
High-Throughput Sequencing of Three Lemnoideae (Duckweeds) Chloroplast Genomes from Total DNA
Wang, Wenqin; Messing, Joachim
2011-01-01
Background Chloroplast genomes provide a wealth of information for evolutionary and population genetic studies. Chloroplasts play a particularly important role in the adaption for aquatic plants because they float on water and their major surface is exposed continuously to sunlight. The subfamily of Lemnoideae represents such a collection of aquatic species that because of photosynthesis represents one of the fastest growing plant species on earth. Methods We sequenced the chloroplast genomes from three different genera of Lemnoideae, Spirodela polyrhiza, Wolffiella lingulata and Wolffia australiana by high-throughput DNA sequencing of genomic DNA using the SOLiD platform. Unfractionated total DNA contains high copies of plastid DNA so that sequences from the nucleus and mitochondria can easily be filtered computationally. Remaining sequence reads were assembled into contiguous sequences (contigs) using SOLiD software tools. Contigs were mapped to a reference genome of Lemna minor and gaps, selected by PCR, were sequenced on the ABI3730xl platform. Conclusions This combinatorial approach yielded whole genomic contiguous sequences in a cost-effective manner. Over 1,000-time coverage of chloroplast from total DNA were reached by the SOLiD platform in a single spot on a quadrant slide without purification. Comparative analysis indicated that the chloroplast genome was conserved in gene number and organization with respect to the reference genome of L. minor. However, higher nucleotide substitution, abundant deletions and insertions occurred in non-coding regions of these genomes, indicating a greater genomic dynamics than expected from the comparison of other related species in the Pooideae. Noticeably, there was no transition bias over transversion in Lemnoideae. The data should have immediate applications in evolutionary biology and plant taxonomy with increased resolution and statistical power. PMID:21931804
Hess, Jon E; Campbell, Nathan R; Docker, Margaret F; Baker, Cyndi; Jackson, Aaron; Lampman, Ralph; McIlraith, Brian; Moser, Mary L; Statler, David P; Young, William P; Wildbill, Andrew J; Narum, Shawn R
2015-01-01
Next-generation sequencing data can be mined for highly informative single nucleotide polymorphisms (SNPs) to develop high-throughput genomic assays for nonmodel organisms. However, choosing a set of SNPs to address a variety of objectives can be difficult because SNPs are often not equally informative. We developed an optimal combination of 96 high-throughput SNP assays from a total of 4439 SNPs identified in a previous study of Pacific lamprey (Entosphenus tridentatus) and used them to address four disparate objectives: parentage analysis, species identification and characterization of neutral and adaptive variation. Nine of these SNPs are FST outliers, and five of these outliers are localized within genes and significantly associated with geography, run-timing and dwarf life history. Two of the 96 SNPs were diagnostic for two other lamprey species that were morphologically indistinguishable at early larval stages and were sympatric in the Pacific Northwest. The majority (85) of SNPs in the panel were highly informative for parentage analysis, that is, putatively neutral with high minor allele frequency across the species' range. Results from three case studies are presented to demonstrate the broad utility of this panel of SNP markers in this species. As Pacific lamprey populations are undergoing rapid decline, these SNPs provide an important resource to address critical uncertainties associated with the conservation and recovery of this imperiled species. © 2014 John Wiley & Sons Ltd.
UPIC + GO: Zeroing in on informative markers
USDA-ARS?s Scientific Manuscript database
Microsatellites/SSRs (simple sequence repeats) have become a powerful tool in genomic biology because of their broad range of applications and availability. An efficient method recently developed to generate microsatellite-enriched libraries used in combination with high throughput DNA pyrosequencin...
Life in the fast lane for protein crystallization and X-ray crystallography
NASA Technical Reports Server (NTRS)
Pusey, Marc L.; Liu, Zhi-Jie; Tempel, Wolfram; Praissman, Jeremy; Lin, Dawei; Wang, Bi-Cheng; Gavira, Jose A.; Ng, Joseph D.
2005-01-01
The common goal for structural genomic centers and consortiums is to decipher as quickly as possible the three-dimensional structures for a multitude of recombinant proteins derived from known genomic sequences. Since X-ray crystallography is the foremost method to acquire atomic resolution for macromolecules, the limiting step is obtaining protein crystals that can be useful of structure determination. High-throughput methods have been developed in recent years to clone, express, purify, crystallize and determine the three-dimensional structure of a protein gene product rapidly using automated devices, commercialized kits and consolidated protocols. However, the average number of protein structures obtained for most structural genomic groups has been very low compared to the total number of proteins purified. As more entire genomic sequences are obtained for different organisms from the three kingdoms of life, only the proteins that can be crystallized and whose structures can be obtained easily are studied. Consequently, an astonishing number of genomic proteins remain unexamined. In the era of high-throughput processes, traditional methods in molecular biology, protein chemistry and crystallization are eclipsed by automation and pipeline practices. The necessity for high-rate production of protein crystals and structures has prevented the usage of more intellectual strategies and creative approaches in experimental executions. Fundamental principles and personal experiences in protein chemistry and crystallization are minimally exploited only to obtain "low-hanging fruit" protein structures. We review the practical aspects of today's high-throughput manipulations and discuss the challenges in fast pace protein crystallization and tools for crystallography. Structural genomic pipelines can be improved with information gained from low-throughput tactics that may help us reach the higher-bearing fruits. Examples of recent developments in this area are reported from the efforts of the Southeast Collaboratory for Structural Genomics (SECSG).
Life in the Fast Lane for Protein Crystallization and X-Ray Crystallography
NASA Technical Reports Server (NTRS)
Pusey, Marc L.; Liu, Zhi-Jie; Tempel, Wolfram; Praissman, Jeremy; Lin, Dawei; Wang, Bi-Cheng; Gavira, Jose A.; Ng, Joseph D.
2004-01-01
The common goal for structural genomic centers and consortiums is to decipher as quickly as possible the three-dimensional structures for a multitude of recombinant proteins derived from known genomic sequences. Since X-ray crystallography is the foremost method to acquire atomic resolution for macromolecules, the limiting step is obtaining protein crystals that can be useful of structure determination. High-throughput methods have been developed in recent years to clone, express, purify, crystallize and determine the three-dimensional structure of a protein gene product rapidly using automated devices, commercialized kits and consolidated protocols. However, the average number of protein structures obtained for most structural genomic groups has been very low compared to the total number of proteins purified. As more entire genomic sequences are obtained for different organisms from the three kingdoms of life, only the proteins that can be crystallized and whose structures can be obtained easily are studied. Consequently, an astonishing number of genomic proteins remain unexamined. In the era of high-throughput processes, traditional methods in molecular biology, protein chemistry and crystallization are eclipsed by automation and pipeline practices. The necessity for high rate production of protein crystals and structures has prevented the usage of more intellectual strategies and creative approaches in experimental executions. Fundamental principles and personal experiences in protein chemistry and crystallization are minimally exploited only to obtain "low-hanging fruit" protein structures. We review the practical aspects of today s high-throughput manipulations and discuss the challenges in fast pace protein crystallization and tools for crystallography. Structural genomic pipelines can be improved with information gained from low-throughput tactics that may help us reach the higher-bearing fruits. Examples of recent developments in this area are reported from the efforts of the Southeast Collaboratory for Structural Genomics (SECSG).
Optimization and high-throughput screening of antimicrobial peptides.
Blondelle, Sylvie E; Lohner, Karl
2010-01-01
While a well-established process for lead compound discovery in for-profit companies, high-throughput screening is becoming more popular in basic and applied research settings in academia. The development of combinatorial libraries combined with easy and less expensive access to new technologies have greatly contributed to the implementation of high-throughput screening in academic laboratories. While such techniques were earlier applied to simple assays involving single targets or based on binding affinity, they have now been extended to more complex systems such as whole cell-based assays. In particular, the urgent need for new antimicrobial compounds that would overcome the rapid rise of drug-resistant microorganisms, where multiple target assays or cell-based assays are often required, has forced scientists to focus onto high-throughput technologies. Based on their existence in natural host defense systems and their different mode of action relative to commercial antibiotics, antimicrobial peptides represent a new hope in discovering novel antibiotics against multi-resistant bacteria. The ease of generating peptide libraries in different formats has allowed a rapid adaptation of high-throughput assays to the search for novel antimicrobial peptides. Similarly, the availability nowadays of high-quantity and high-quality antimicrobial peptide data has permitted the development of predictive algorithms to facilitate the optimization process. This review summarizes the various library formats that lead to de novo antimicrobial peptide sequences as well as the latest structural knowledge and optimization processes aimed at improving the peptides selectivity.
Yu, Duo; Li, Yunfeng; Ming, Zhihui; Wang, Hongyong; Dong, Zhuo; Qiu, Ling; Wang, Tiejun
2018-01-01
Cervical cancer is one of the most common cancers in women worldwide. Malignant tumors develop resistance mechanisms and are less sensitive to or do not respond to irradiation. With the development of high-throughput sequencing technologies, circular RNA (circRNA) has been identified in an increasing number of diseases, especially cancers. It has been reported that circRNA can compete with microRNAs (miRNAs) to change the stability or translation of target RNAs, thus regulating gene expression at the transcriptional level. However, the role of circRNAs in cervical cancer and the radioresistance mechanisms of HeLa cells are unknown. The objective of this study is to investigate the role of circRNAs in radioresistance in HeLa cells. High-throughput sequencing and bioinformatics analysis of irradiated and sham-irradiated HeLa cells. The reliability of high-throughput RNA sequencing was validated using quantitative real-time polymerase chain reaction. The most significant circRNA functions and pathways were selected by Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) analyses. A circRNA-miRNA-target gene interaction network was used to find circRNAs associated with radioresistance. Moreover, a protein-protein interaction network was constructed to identify radioresistance-related hub proteins. High-throughput sequencing allowed the identification of 16,893 circRNAs involved in the response of HeLa cells to radiation. Compared with the control group, there were 153 differentially expressed circRNAs, of which 76 were up-regulated and 77 were down-regulated. GO covered three domains: biological process (BP), cellular component (CC) and molecular function (MF). The terms assigned to the BP domain were peptidyl-tyrosine dephosphorylation and regulation of cell migration. The identified CC terms were cell-cell adherens junction, nucleoplasm and cytosol, and the identified MF terms were protein binding and protein tyrosine phosphatase activity. The top five KEGG pathways were MAPK signaling pathway, endocytosis, axon guidance, neurotrophin signaling pathway, and SNARE interactions in vesicular transport. The protein-protein interaction analysis indicated that 19 proteins might be hub proteins. CircRNAs may play a major role in the response to radiation. These findings may improve our understanding of the role of circRNAs in radioresistance in HeLa cells and allow the development of novel therapeutic approaches.
Desai, Aarti; Marwah, Veer Singh; Yadav, Akshay; Jha, Vineet; Dhaygude, Kishor; Bangar, Ujwala; Kulkarni, Vivek; Jere, Abhay
2013-01-01
Next Generation Sequencing (NGS) is a disruptive technology that has found widespread acceptance in the life sciences research community. The high throughput and low cost of sequencing has encouraged researchers to undertake ambitious genomic projects, especially in de novo genome sequencing. Currently, NGS systems generate sequence data as short reads and de novo genome assembly using these short reads is computationally very intensive. Due to lower cost of sequencing and higher throughput, NGS systems now provide the ability to sequence genomes at high depth. However, currently no report is available highlighting the impact of high sequence depth on genome assembly using real data sets and multiple assembly algorithms. Recently, some studies have evaluated the impact of sequence coverage, error rate and average read length on genome assembly using multiple assembly algorithms, however, these evaluations were performed using simulated datasets. One limitation of using simulated datasets is that variables such as error rates, read length and coverage which are known to impact genome assembly are carefully controlled. Hence, this study was undertaken to identify the minimum depth of sequencing required for de novo assembly for different sized genomes using graph based assembly algorithms and real datasets. Illumina reads for E.coli (4.6 MB) S.kudriavzevii (11.18 MB) and C.elegans (100 MB) were assembled using SOAPdenovo, Velvet, ABySS, Meraculous and IDBA-UD. Our analysis shows that 50X is the optimum read depth for assembling these genomes using all assemblers except Meraculous which requires 100X read depth. Moreover, our analysis shows that de novo assembly from 50X read data requires only 6-40 GB RAM depending on the genome size and assembly algorithm used. We believe that this information can be extremely valuable for researchers in designing experiments and multiplexing which will enable optimum utilization of sequencing as well as analysis resources.
Nakazato, Takeru; Ohta, Tazro; Bono, Hidemasa
2013-01-01
High-throughput sequencing technology, also called next-generation sequencing (NGS), has the potential to revolutionize the whole process of genome sequencing, transcriptomics, and epigenetics. Sequencing data is captured in a public primary data archive, the Sequence Read Archive (SRA). As of January 2013, data from more than 14,000 projects have been submitted to SRA, which is double that of the previous year. Researchers can download raw sequence data from SRA website to perform further analyses and to compare with their own data. However, it is extremely difficult to search entries and download raw sequences of interests with SRA because the data structure is complicated, and experimental conditions along with raw sequences are partly described in natural language. Additionally, some sequences are of inconsistent quality because anyone can submit sequencing data to SRA with no quality check. Therefore, as a criterion of data quality, we focused on SRA entries that were cited in journal articles. We extracted SRA IDs and PubMed IDs (PMIDs) from SRA and full-text versions of journal articles and retrieved 2748 SRA ID-PMID pairs. We constructed a publication list referring to SRA entries. Since, one of the main themes of -omics analyses is clarification of disease mechanisms, we also characterized SRA entries by disease keywords, according to the Medical Subject Headings (MeSH) extracted from articles assigned to each SRA entry. We obtained 989 SRA ID-MeSH disease term pairs, and constructed a disease list referring to SRA data. We previously developed feature profiles of diseases in a system called “Gendoo”. We generated hyperlinks between diseases extracted from SRA and the feature profiles of it. The developed project, publication and disease lists resulting from this study are available at our web service, called “DBCLS SRA” (http://sra.dbcls.jp/). This service will improve accessibility to high-quality data from SRA. PMID:24167589
Bolsheva, Nadezhda L; Melnikova, Nataliya V; Kirov, Ilya V; Speranskaya, Anna S; Krinitsina, Anastasia A; Dmitriev, Alexey A; Belenikin, Maxim S; Krasnov, George S; Lakunina, Valentina A; Snezhkina, Anastasiya V; Rozhmina, Tatiana A; Samatadze, Tatiana E; Yurkevich, Olga Yu; Zoshchuk, Svyatoslav A; Amosova, Аlexandra V; Kudryavtseva, Anna V; Muravenko, Olga V
2017-12-28
The species relationships within the genus Linum have already been studied several times by means of different molecular and phylogenetic approaches. Nevertheless, a number of ambiguities in phylogeny of Linum still remain unresolved. In particular, the species relationships within the sections Stellerolinum and Dasylinum need further clarification. Also, the question of independence of the species of the section Adenolinum still remains unanswered. Moreover, the relationships of L. narbonense and other species of the section Linum require further clarification. Additionally, the origin of tetraploid species of the section Linum (2n = 30) including the cultivated species L. usitatissimum has not been explored. The present study examines the phylogeny of blue-flowered species of Linum by comparisons of 5S rRNA gene sequences as well as ITS1 and ITS2 sequences of 35S rRNA genes. High-throughput sequencing has been used for analysis of multicopy rRNA gene families. In addition to the molecular phylogenetic analysis, the number and chromosomal localization of 5S and 35S rDNA sites has been determined by FISH. Our findings confirm that L. stelleroides forms a basal branch from the clade of blue-flowered flaxes which is independent of the branch formed by species of the sect. Dasylinum. The current molecular phylogenetic approaches, the cytogenetic analysis as well as different genomic DNA fingerprinting methods applied previously did not discriminate certain species within the sect. Adenolinum. The allotetraploid cultivated species L. usitatissimum and its wild ancestor L. angustifolium (2n = 30) could originate either as the result of hybridization of two diploid species (2n = 16) related to the modern L. gandiflorum and L. decumbens, or hybridization of a diploid species (2n = 16) and a diploid ancestor of modern L. narbonense (2n = 14). High-throughput sequencing of multicopy rRNA gene families allowed us to make several adjustments to the phylogeny of blue-flowered flax species and also reveal intra- and interspecific divergence of the rRNA gene sequences.